问:
During the Techdays here in the Netherlands Steve Sanderson gave a presentation about C#5, ASP.NET MVC 4, and asynchronous Web.
He explained that when requests take a long time to finish, all the threads from the threadpool become busy and new requests have to wait. The server can't handle the load and everything slows down.
He then showed how the use of async webrequests improves performance because the work is then delegated to another thread and the threadpool can respond quickly to new incoming requests. He even demoed this and showed that 50 concurrent requests first took 50 * 1s but with the async behavior in place only 1,2 s in total.
But after seeing this I still have some questions.
- Why can't we just use a bigger threadpool? Isn't using async/await to bring up another thread slower then just increasing the threadpool from the start? It's not like the server we run on suddenly get more threads or something?
- The request from the user is still waiting for the async thread to finish. If the thread from the pool is doing something else, how is the 'UI' thread kept busy? Steve mentioned something about 'a smart kernel that knows when something is finished'. How does this work?
回答1:
This is a very good question, and understanding it is key to understand why asynchronous IO is so important. The reason why the new async/await feature has been added to C# 5.0 is to simplify writing asynchronous code. Support for asynchronous processing on the server is not new however, it exists since ASP.NET 2.0.
Like Steve showed you, with synchronous processing, each request in ASP.NET (and WCF) takes one thread from the thread pool. The issue he demoed is a well known issue called "thread pool starvation". If you make synchronous IO on your server, the thread pool thread will remain blocked (doing nothing) for the duration of the IO. Since there is a limit in the number of threads in the thread pool, under load, this may lead in a situation where all the threads pool threads are being blocked waiting for IO, and requests starts being queued, causing an increase to response time. Since all the threads are waiting for an IO to complete, you will see a CPU occupation close to 0% (even though response times go through the roof).
What you are asking (Why can't we just use a bigger threadpool?) is a very good question. As a matter of fact, this is how most people have been solving the problem of thread pool starvation until now: just have more threads on the thread pool. Some documentation from Microsoft even indicates that as a fix for situations when thread pool starvation may occur. This is an acceptable solution, and until C# 5.0, it was much easier to do that, than rewriting your code to be fully asynchronous.
There are a few problems with the approach though:
- There is no value that works in all situations: the number of thread pool threads you are going to need depends linearly on the duration of the IO, and the load on your server. Unfortunately, IO latency is mostly unpredictable. Here is an exemple: Let's say you make HTTP requests to a third party web service in your ASP.NET application, which take about 2 seconds to complete. You encounter thread pool starvation, so you decide to increase the thread pool size to, let's say, 200 threads, and then it starts working fine again. The problem is that maybe next week, the web service will have technical problems which increases their response time to 10 seconds. All of the sudden, thread pool starvation is back, because threads are blocked 5 times longer, so you now need to increase the number 5 times, to 1,000 threads.
- Scalability and performance: The second problem is that if you do that, you will still use one thread per request. Threads are an expensive resource. Each managed thread in .NET requires a memory allocation of 1 MB for the stack. For a webpage making IO that last 5 seconds, and with a load of 500 requests per second, you will need 2,500 threads in your thread pool, that means 2.5 GB of memory for the stacks of threads that will sit doing nothing. Then you have the issue of context switching, that will take a heavy toll on the performance of your machine (affecting all the services on the machine, not just your web application). Even though Windows does a fairly good job at ignoring waiting threads, it is not designed to handle such a large number of threads. Remember that the highest efficiency is obtained when the number of threads running equals the number of logical CPUs on the machine (usually not more than 16).
So increasing the size of the thread pool is a solution, and people have been doing that for a decade (even in Microsoft's own products), it is just less scalable and efficient, in terms of memory and CPU usage, and you are always at the mercy of a sudden increase of IO latency that would cause starvation. Up until C# 5.0, the complexity of asynchronous code wasn't worth the trouble for many people. async/await changes everything as now, you can benefit from the scalability of asynchronous IO, and write simple code, at the same time.
More details: https://docs.microsoft.com/en-us/previous-versions/msp-n-p/ff647787(v=pandp.10) "Use asynchronous calls to invoke Web services or remote objects when there is an opportunity to perform additional parallel processing while the Web service call proceeds. Where possible, avoid synchronous (blocking) calls to Web services because outgoing Web service calls are made by using threads from the ASP.NET thread pool. Blocking calls reduce the number of available threads for processing other incoming requests."
回答2:
- Async/await is not based on threads; it is based on asynchronous processing. When you do an asynchronous wait in ASP.NET, the request thread is returned to the thread pool, so there are no threads servicing that request until the async operation completes. Since request overhead is lower than thread overhead, this means async/await can scale better than the thread pool.
- The request has a count of outstanding asynchronous operations. This count is managed by the ASP.NET implementation of SynchronizationContext. You can read more about SynchronizationContext in my MSDN article - it covers how ASP.NET's SynchronizationContext works and how await uses SynchronizationContext.
ASP.NET asynchronous processing was possible before async/await - you could use async pages, and use EAP components such as WebClient (Event-based Asynchronous Programming is a style of asynchronous programming based on SynchronizationContext). Async/await also uses SynchronizationContext, but has a much easier syntax.
--------------------------------------------------
@Wouter Asynchronous processing doesn't require threads. In ASP.NET, if you await an operation that isn't complete, then the await will schedule the remainder of the method as a continuation, and return. The thread is returned to the thread pool, leaving no threads servicing the request. Later, when the await operation completes, it will take a thread from the thread pool and continue servicing the request on that thread. So, asynchronous programming doesn't depend on threads. Though it does work well with threads if you need it: you can await a thread pool operation using Task.Run. – Stephen Cleary
--------------------------------------------------
@StephenCleary I think the main problem people have is this: "The thread is returned to the thread pool, leaving no threads servicing the request. Later, when the await operation completes,..." how does the await operation complete if no thread is used to handle the request? What executes that code? It doesn't complete 'spontaneously', something must run it. That's the vague part. – Frans Bouma
--------------------------------------------------
@FransBouma: This troubled me too when I first encountered the term "asynchronous IO" (while studying Node.js). After some research, I found that some operations can be performed asynchronously at hardware level by some devices, like the HD. The OS requests a read operation to the HD, and goes back to doing other stuff. The HD, by itself, will fetch the data, fill its (phisical) buffers and then dispatch a signal to the processor, indicating that the read is done. The OS detects this and then grabs some thread from a pool to continue the processing with the fetched data. – Raphael
从上面最后– Raphael的回答中,我们可以看到有些很耗时的操作(例如IO操作等),的确是不需要用.NET线程(thread)去执行的,因为这些操作可以由硬件去自动完成,完成后硬件层面会通知.NET程序使用线程再继续执行后续的操作。所以我们不必用.NET线程去一直等待这些耗时的操作完成,而应该用async/await模式让线程去做其它的事情,让.NET程序尽可能少地去申请和创建新的线程。
读完这篇文章后,我才明白为什么要在.NET中尽量用async/await模式,为什么async/await模式会提升.NET程序的性能,最关键的问题是async/await模式可以减轻"thread pool starvation"。
参考文献: