How to scale performance of a .NET API to handle high number of requests per second

October 12, 2024

I was recently assigned a task at my job to optimize an API endpoint to handle around 500 requests per second without failing. Initially the API was able to handle around 100 requests per second and then it would start timing out as there was a 9 second timeout on the gateway. Here are the steps that I followed to improve the performance:

1. Measure!

The first step of any performance optimization exercise is to first measure the existing performance. You can use any load testing tool. I have used Apache JMeter to bombard the API endpoint with 500 requests per second for 10 seconds. You can save the result in a file so that you can compare it later on after the code changes.

2 The low hanging fruit: Reduce the number of calls to the database

This one is fairly straight forward. In most modern computers, we can run around 1,000,000 - 100,000,000 operations per second. The real time consuming tasks are IO (file) and Network (database) related. If you can batch the calls that the code is making to the DB, it will improve performance significantly. A simple select statement can also take 300ms if the database is under heavy load. So if your code is making multiple select queries to the DB, group them in a stored procedure or in a single select statement.

3. Asynchronous is the way to go

What is asynchronous code? When we write code it is usually executed sequentially by the CPU. However, in async code, when a async method is called, it is called on a background thread and the current thread is free to do other work. Imagine you own a restaurant with 3 chefs, and you get 3 orders for eggs and toast. All 3 chefs start frying the egg on their own pans, but then you get another order for eggs and toast! Now since all chefs are currently paying attention (sync) to their pans, they cannot take another order.

The same restaurant with a async approach: All 3 chefs would put the eggs to fry on their own pan, and then they are free to put the bread in the toaster. They can set timers to notify when the egg is ready, when the toast is ready. This way the chefs don't have to wait on a single time consuming task. They can also handle the 4th order.

When you write async code, the threads are returned to the thread pool so they can be used in another request whenever there is an async method running.

3.1. Log asynchronously

If you are logging to the database or a log file, do it asynchronously so as to save threads.

3.2 Call SQL methods asynchronously

You can use the async version of ADO.NET methods to call SQL code asynchronously. This would ensure that the DB call is made and the thread is freed up to do other work until the response is received.

3.3 Don't mix synchronous and asynchronous DB calls

If you are calling DB calls asynchronously, do it for all DB calls, if you have a synchronous DB call in there, it would drastically reduce performance and it would be better to just have all synchronous code. Asynchrony is viral, it should spread all the way till the controller action method if you use it. And that's ok.

4. Don't throw exceptions for known scenarios

There is some additional overhead attached to throwing and handling exceptions. So don't throw exceptions for non exceptional scenarios. If you can instead handle it as a result object with error property, it would be much better for performance.

5. Infrastructural Scaling

This can be done once you have exhausted all your options and still are unable to get the required requests per second. This is expensive so writing it as the last option. There are two types:

Horizontal scaling: You can add another server that serves the same API endpoint and a load balancer to balance the load between the 2 servers.

Vertical scaling: This is adding resources (more RAM, more storage, better CPU) to the existing server.

Conclusion

With the above steps, running another JMeter load test, I was able to get the required 500 requests per second for a POST API endpoint.