I am currently working on a rest service to support a new game the Electronic Arts is going to be launching. Our mandate is to support 50K-100K concurrent users at launch. After several months of work we had all the features finished and fairly stable but now the time came to measure and optimize performance. I like to think that I am a half decent Java programmer and server engineer so I can read some code and figure out what the performance bottle necks are. However, with any large project the code is too large for manual optimization especially if there is no structural problems for easy wins. How do you find those small bugs that are so easy to miss but make all the difference. There are many articles telling you what structures and architectures work best but very few that tell you what to do if your architecture is fine but the system is still to slow. This article covers how to go about finding what is actually wrong with your server.
The first thing you need to do is identify the Metric you will use to quantify that there is a problem and how much difference each change makes. Profiles and other tools will give you executions times and percentages of specific parts of the code but you need to have an external metric which captures the bigger picture. A good value for servers is client response time or the number of concurrent users (given an expected user session). For a desktop app it can be the running time of a specific operation or set of operations.
A first cut solution is to look at your threads and what they are doing and you can easily do that using the JStack utility. Using this you can take a thread dump of any running java process on your system. Just run get the process id of your application using the ps command and run jstack on that process id.
This will list all active threads and also give stack traces of any thread that is in the waiting state. Most modern computers should not be CPU bound so if your app is slow, the CPU(s) and there for threads are usually waiting for something. JStack quickly tells you how many threads are active and what they are waiting for. If there are a lot of waiting threads and the wait is unavoidable then consider using asynchronous code such as Executor Service or Continuations.
Lets go little deeper into the app using the build in HPROF tool that ships with the Sun/Oracle JVM. With Hprof we can define several profiling metrics that but first lets look at CPU times. Run your application with hprof enabled and configured to capture times:
Once your program runs (and exits) the hprof.txt file will contain a table CPU time details like the one shown below. This shows the methods where the CPU spends most of its execution time. The self column marks the percentage of time used by the method and the accum column describes the total time accounted for so far starting at the top. So reading here thread.start accounts for 0.98% of the time and all methods up to here account for 23.96% of execution time. The trace column allows you to look at the stack trace for that particular type of call. This table is important because it tells you which methods to optimize first and what kind of gains to expect. My trace is fairly innocuous because its from a jetty server coming up but its not uncommon for poorly performing applications to spend 90% of their time in just a handful of methods. One application I was working on with badly configured loggers was spending 90% in just logging, fixing that doubled our concurrent user cap.
Count the calls
In addition to giving CPU execution times HPROF can also give us counts for how many times a method is executed. In the example you can see that the first 10 methods account for almost 80% of all method calls. So if you would like to reduce calls those ones will be the targets. If the same method shows up on the times and samples table than you have a definite problem as a slow method call is being called very often.
Count the Objects
Memory is an important component for optimization and if you are getting out of memory exceptions then you know that your application is in trouble. However there are also more subtle performance implications of memory management. One thing to always check is whether your singletons are actually singletons. In a recent project we used to Google Guice to inject singleton objects using annotations. However in some cases we were using javax.inject.Singletion instead of com.google.inject.Singleton. This small error meant that our singletons weren't actually singletons. This has all sorts of implications including the fact that one of our singleton classes which handled calls to external servers had several thousand instances and thus was holding several thousand connections open.
Also if your applications memory profile is choppy, i.e. it used very little memory for long periods of times followed by a lot of memory quickly you can get OutOfMemmory Exceptions and long garbage collection delays. Therefore try to minimize the number of objects that need to be created repeatedly (i.e. for each request in case of a server). Any objects that can be singletons should be, any objects can be reused should be.
We can also get the object allocation information out of HPROF profiling as shown below. The table should the top objects created, the amount of memory they take up. This list will help you find any singletons, which aren't. Also the profile gives you the number of live object of any type and the total objects of that type allocated. This gives you two key information points. If the number of objects allocated is equal to the number of live objects then these objects were never released and garbage collected. Is this expected? Also if the number of allocated objects is much higher than your live objects count than your application is churning a lot of memory. Maybe you should reuse more objects. Also time the constructor of those objects if this will give you a good idea of the performance overhead of creating those objects.