The Performance Overview screen consists of three main elements:
- Response Times graph
- Requests Per Minute graph
- Endpoint list
The Response Times graph will tell you what the average, 95th percentile and 99th percentile response times for your app were for the selected time period.
The default time period is one hour.
Assuming you’re looking at the past hour, you can see what the average response time for your app is per minute for the past 60 minutes.
Hover over the graph to see the specifics of a particular minute. If your average response time for your app in that minute is 40 milliseconds, that means that across all endpoints that were requested in that minute, the average response time was 40 milliseconds.
If you have 100 requests to your app in that minute, the 95th percentile number will be the response time of the 95th slowest request. That means 95% were faster than this number and 5% were slower.
In other words, 95% of requests were served faster than the 95th percentile number. The average response time can often hide the fact that there was a host of problematic requests. The percentiles can help you surface those problematic requests.
Sometimes, a few long-running requests are preventing you from seeing what the average is, so you can disable 99th and 95th. Click on any one of the legends in the upper right of the graph area to hide that response time from the graph.
The Requests Per Minute graph shows you the amount of requests your app had in the chosen time period.
A response to a request return a response code. In the upper right corner of the Requests Per Minute graph, you can see the various response codes that were returned per minute. This is what they mean:
- 2XX: HTTP Status codes in the 200-range mean they were successful. “200 OK” is by far the most regular one.
- 3XX: These are requests that were redirected, e.g. “301 Moved Permanently”.
- 4XX: Client error requests, meaning requests that were unsuccessful, but not because of an error on the server-side. The most common is “404 Not found” which might get returned when a link points to something that no longer exists, or a user typed the wrong address.
- 5XX: Requests that received a “Server error” response. Examples you often encounter are “500 Internal Server Error” or “503 Service Unavailable”. Most of these requests will be logged to Opbeat as Error occurrences
On the graph you see a light blue graph line, indicating the “2XX”-responses. If your app is having troubles, the red line will stand out to show you when the bad requests occurred.
What you will typically see is a correlation between the requests per minutes and response times; a high number of requests per minute will usually cause an increase in response times. However, if response times are high while the number of requests stay constant, it might indicate a performance problem with specific requests.
A dip in requests and an increase in response times is another indicator something is not working as well as it should.
Use the list of Endpoints to drill down and find which ones to spend time on.
The Endpoints list will show you all the endpoints that have been requested in the selected time period. In the list you’ll see:
- Search: Filter the list of endpoints using the input field, e.g. “GET” will show you all the endpoints that were requested using the “GET” method.
- Avg. response time: For each endpoint, you can see the average response time for that endpoint only.
- 95th response time: Just like the average response time, you can see what the 95th response time for a particular endpoint is.
- RPM (Requests Per Minute): For the selected time period, how many requests did the endpoint get per minute on average.
- Impact: A visualization of average response time multiplied by requests per minute.
The Impact indicator is a great tool to figure out where you might want to concentrate your efforts when trying to optimize performance. By default, your endpoints are sorted by Impact.
The Impact indicator is a way to see the collective time spent on a particular endpoint. When you have an endpoint that is only a bit slow but is used constantly, it might be better to spend time improving the response time for that endpoint, as opposed to one that is 10 times slower but used more rarely.
Clicking on an endpoint will take you to the detail page for that endpoint.
For each endpoint in your app, you can see the details on the Response Time graph and the Requests Per Minute graph. The graphs are the same as on the Performance Overview page, but limited to the particular endpoint.
The Response Time Distribution graph gives you an overview of the distribution of the response times for the requests to this endpoint.
The bars in the distribution graph indicate how many requests fall into a range of response times. E.g. you might have 16 requests that took between 45-90 milliseconds to complete (you can see the number of requests and the exact range for the selected bucket in the upper right corner of the distribution graph). The distribution of requests should (depending on a number of factors) form a long tail, with most requests happening on the left side of the graph, with some outliers to the right.
Clicking on a blue bar in the graph (a “bucket”), the Timeline Breakdown below will show you a sample request from that bucket.
The Response Time Distribution graph is particularly useful to investigate the difference between the fast requests and the slow requests. You can drag the cursor across the bars to see how the Timeline Breakdown below changes, when going from fast requests to slow requests.
The Timeline breakdown of a request will show you what is going on inside the request from initiation to completion.
Each horizontal bar is called a trace. They represent an operation that was going on during the request. These include database calls (orange), rendering templates (green), or calls to an external API (purple).
The width of the colored bars in the Breakdown Timeline indicates how long a particular operation took, relative to the total time of the request. The wider the bar, the more time the operation took.
In the upper right corner above the timeline, you can see how much time each type of operation took. For example, it might say a request spent 60% of its time on database calls, and 20% on template rendering. That means database operations were ongoing for 60% of the request and template rendering operations were ongoing for 20% of the request.
The numbers will most likely not add up to 100%, as the operations might overlap. As an example, your code might start a database operation and a call to an external API almost simultaneously. In that case, the database operation could take 70% of the total time of the request to complete, but the external call took 85% of the total time to complete.
Another example is when your code triggers multiple database calls in quick succession. If they overlap, we calculate the amount of time it took for the first to start until the last one ends.