Understanding the Node.js Event Loop - Node.js at Scale_how to understand javascript is single-threaded?-CSDN博客

转载：https://blog.risingstack.com/node-js-at-scale-understanding-node-js-event-loop/

This article helps you to understand how the Node.js event loop works, and how you can leverage it to build fast applications. We’ll also discuss the most common problems you might encounter, and the solutions for them.

With Node.js at Scale we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

Upcoming chapters for the Node.js at Scale series:

Using npm
Node.js Internals Deep Dive
- The Node.js Event Loop ( You are reading it now. )
- Node.js Garbage Collection Explained
- Writing Native Node.js Modules
Building
- Advanced Node.js Project Structuring
- Clean Code
- Handling Async
- Event sourcing
- Command Query Responsibility Segregation
Testing
- Unit testing
- End-to-end testing
Node.js in Production
- Monitoring Node.js Applications
- Debugging Node.js Applications
- Profiling Node.js Applications
Microservices
- Request Signing
- Distributed Tracing
- API Gateways

The problem

Most of the backends behind websites don’t need to do complicated computations. Our programs spend most of their time waiting for the disk to read & write , or waiting for the wire to transmit our message and send back the answer.

IO operations can be orders of magnitude slower than data processing. Take this for example: SSD-s can have a read speed of 200-730 MB/s - at least a high-end one. Reading just one kilobyte of data would take 1.4 microseconds, but during this time a CPU clocked at 2GHz could have performed 28 000 of instruction-processing cycles.

For network communications it can be even worse, just try and ping google.com

$ ping google.com

64 bytes from 172.217.16.174: icmp_seq=0 ttl=52 time=33.017 ms

64 bytes from 172.217.16.174: icmp_seq=1 ttl=52 time=83.376 ms

64 bytes from 172.217.16.174: icmp_seq=2 ttl=52 time=26.552 ms

64 bytes from 172.217.16.174: icmp_seq=3 ttl=52 time=40.153 ms

64 bytes from 172.217.16.174: icmp_seq=4 ttl=52 time=37.291 ms

64 bytes from 172.217.16.174: icmp_seq=5 ttl=52 time=58.692 ms

64 bytes from 172.217.16.174: icmp_seq=6 ttl=52 time=45.245 ms

64 bytes from 172.217.16.174: icmp_seq=7 ttl=52 time=27.846 ms

The average latency is about 44 milliseconds. Just while waiting for a packet to make a round-trip on the wire, the previously mentioned processor can perform 88 millions of cycles.

The solution

Most operational systems provide some kind of an Asynchronous IO interface, which allows you to start processing data that does not require the result of the communication, meanwhile the communication still goes on..

This can be achieved in several ways. Nowadays it is mostly done by leveraging the possibilities of multithreading at the cost of extra software complexity. For example reading a file in Java or Python is a blocking operation. Your program cannot do anything else while it is waiting for the network / disk communication to finish. All you can do - at least in Java - is to fire up a different thread then notify your main thread when the operation has finished.

It is tedious, complicated, but gets the job done. But what about Node? Well, we are surely facing some problems as Node.js - or more like V8 - is single-threaded. Our code can only run in one thread.

EDIT: This is not entirely true. Both Java and Python have async interfaces, but using them is definitely more difficult than in Node.js. Thanks to Shahar and Dirk Harrington for pointing this out.

You might have heard that in a browser, setting setTimeout(someFunction, 0) can sometimes fix things magically. But why does setting a timeout to 0, deferring execution by 0 milliseconds fix anything? Isn’t it the same as simply calling someFunction immediately? Not really.

First of all, let's take a look at the call stack, or simply, “stack”. I am going to make things simple, as we only need to understand the very basics of the call stack. In case you are familiar how it works, feel free to jump to the next section.

Stack

Whenever you call a functions return address, parameters and local variables will be pushed to the stack. If you call another function from the currently running function, its contents will be pushed on top in the same manner as the previous one - with its return address.

For the sake of simplicity I will say that 'a function is pushed' to the top of the stack from now on, even though it is not exactly correct.

Let's take a look!

1 function main () {

2 const hypotenuse = getLengthOfHypotenuse(3, 4)

3 console.log(hypotenuse)

4 }

6 function getLengthOfHypotenuse(a, b) {

7 const squareA = square(a)

8 const squareB = square(b)

9 const sumOfSquares = squareA + squareB

10 return Math.sqrt(sumOfSquares)

11 }

13 function square(number) {

14 return number * number

15 }

17 main()

main is called first:

then main calls getLengthOfHypotenuse with 3 and 4 as arguments

afterwards square is with the value of a

when square returns, it is popped from the stack, and its return value is assigned to squareA. squareA is added to the stack frame of getLengthOfHypotenuse

same goes for the next call to square

in the next line the expression squareA + squareB is evaluated

then Math.sqrt is called with sumOfSquares

now all is left for getLengthOfHypotenuse is to return the final value of its calculation

the returned value gets assigned to hypotenuse in main

the value of hypotenuse is logged to console

finally, main returns without any value, gets popped from the stack leaving it empty

SIDE NOTE: You saw that local variables are popped from the stack when the functions execution finishes. It happens only when you work with simple values such as numbers, strings and booleans. Values of objects, arrays and such are stored in the heap and your variable is merely a pointer to them. If you pass on this variable, you will only pass the said pointer, making these values mutable in different stack frames. When the function is popped from the stack, only the pointer to the Object gets popped with leaving the actual value in the heap. The garbage collector is the guy who takes care of freeing up space once the objects outlived their usefulness.

Enter Node.js Event Loop

No, not this loop. :)

So what happens when we call something like setTimeout, http.get, process.nextTick, or fs.readFile? Neither of these things can be found in V8's code, but they are available in the Chrome WebApi and the C++ API in case of Node.js. To understand this, we will have to understand the order of execution a little bit better.

Let's take a look at a more common Node.js application - a server listening on localhost:3000/. Upon getting a request, the server will call wttr.in/<city> to get the weather, print some kind messages to the console, and it forwards responses to the caller after recieving them.

'use strict'

const express = require('express')

const superagent = require('superagent')

const app = express()

app.get('/', sendWeatherOfRandomCity)

function sendWeatherOfRandomCity (request, response) {

getWeatherOfRandomCity(request, response)

sayHi()

}

const CITIES = [

'london',

'newyork',

'paris',

'budapest',

'warsaw',

'rome',

'madrid',

'moscow',

'beijing',

'capetown',

]

function getWeatherOfRandomCity (request, response) {

const city = CITIES[Math.floor(Math.random() * CITIES.length)]

superagent.get(`wttr.in/${city}`)

.end((err, res) => {

if (err) {

console.log('O snap')

return response.status(500).send('There was an error getting the weather, try looking out the window')

}

const responseText = res.text

response.send(responseText)

console.log('Got the weather')

})

console.log('Fetching the weather, please be patient')

}

function sayHi () {

console.log('Hi')

}

app.listen(3000)

What will be printed out aside from getting the weather when a request is sent to localhost:3000?

If you have some experience with Node, you shouldn't be surprised that even though console.log('Fetching the weather, please be patient') is called after console.log('Got the weather') in the code, the former will print first resulting in:

Fetching the weather, please be patient

Got the weather

What happened? Even though V8 is single-threaded, the underlying C++ API of Node isn't. It means that whenever we call something that is a non-blocking operation, Node will call some code that will run concurrently with our javascript code under the hood. Once this hiding thread receives the value it awaits for or throws an error, the provided callback will be called with the necessary parameters.

SIDE NOTE: The ‘some code’ we mentioned is actually part of libuv. libuv is the open source library that handles the thread-pool, doing signaling and all other magic that is needed to make the asynchronous tasks work. It was originally developed for Node.js but a lot of other projects use of it by now.

Monitor how the event loop works in your application - check out Trace by RisingStack!

To peek under the hood, we need to introduce two new concepts: the event loop and the task queue.

Task queue

Javascript is a single-threaded, event-driven language. This means that we can attach listeners to events, and when a said event fires, the listener executes the callback we provided.

Whenever you call setTimeout, http.get or fs.readFile, Node.js sends these operations to a different thread allowing V8 to keep executing our code. Node also calls the callback when the counter has run down or the IO / http operation has finished.

These callbacks can enqueue other tasks and those functions can enqueue others and so on. This way you can read a file while processing a request in your server, and then make an http call based on the read contents without blocking other requests from being handled.

"#nodejs sends IO operations to different threads so #v8 can keep executing our code" via @RisingStack #javascript

CLICK TO TWEET

However, we only have one main thread and one call-stack, so in case there is another request being served when the said file is read, its callback will need to wait for the stack to become empty. The limbo where callbacks are waiting for their turn to be executed is called the task queue (or event queue, or message queue). Callbacks are being called in an infinite loop whenever the main thread has finished its previous task, hence the name 'event loop'.

In our previous example it would look something like this:

express registers a handler for the 'request' event that will be called when request arrives to '/'
skips the functions and starts listening on port 3000
the stack is empty, waiting for 'request' event to fire
upon incoming request, the long awaited event fires, express calls the provided handler sendWeatherOfRandomCity
sendWeatherOfRandomCity is pushed to the stack
getWeatherOfRandomCity is called and pushed to the stack
Math.floor and Math.random are called, pushed to the stack and popped, a from cities is assigned to city
superagent.get is called with 'wttr.in/${city}', the handler is set for the end event.
the http request to http://wttr.in/${city} is send to a background thread, and the execution continues
'Fetching the weather, please be patient' is logged to the console, getWeatherOfRandomCity returns
sayHi is called, 'Hi' is printed to the console
sendWeatherOfRandomCity returns, gets popped from the stack leaving it empty
waiting for http://wttr.in/${city} to send it's response
once the response has arrived, the end event is fired.
the anonymous handler we passed to .end() is called, gets pushed to the stack with all variables in its closure, meaning it can see and modify the values of express, superagent, app, CITIES, request, response, city and all the functions we have defined
response.send() gets called either with 200 or 500 statusCode, but again it is sent to a background thread, so the response stream is not blocking our execution, anonymous handler is popped from the stack.

So now we can understand why the previously mentioned setTimeout hack works. Even though we set the counter to zero, it defers the execution until the current stack and the task queue is empty, allowing the browser to redraw the UI, or Node to serve other requests.

Microtasks and Macrotasks

If this wasn't enough, we actually have more then one task queue. One for microtasks and another for macrotasks.

examples of microtasks:

process.nextTick
promises
Object.observe

examples of macrotasks:

setTimeout
setInterval
setImmediate
I/O

Let's take a look at the following code:

console.log('script start')

const interval = setInterval(() => {

console.log('setInterval')

}, 0)

setTimeout(() => {

console.log('setTimeout 1')

Promise.resolve().then(() => {

console.log('promise 3')

}).then(() => {

console.log('promise 4')

}).then(() => {

setTimeout(() => {

console.log('setTimeout 2')

Promise.resolve().then(() => {

console.log('promise 5')

}).then(() => {

console.log('promise 6')

}).then(() => {

clearInterval(interval)

})

}, 0)

})

}, 0)

Promise.resolve().then(() => {

console.log('promise 1')

}).then(() => {

console.log('promise 2')

})

this will log to the console:

script start

promise1

promise2

setInterval

setTimeout1

promise3

promise4

setInterval

setTimeout2

setInterval

promise5

promise6

According to the WHATVG specification, exactly one (macro)task should get processed from the macrotask queue in one cycle of the event loop. After said macrotask has finished, all of the available microtasks will be processed within the same cycle. While these microtasks are being processed, they can queue more microtasks, which will all be run one by one, until the microtask queue is exhausted.

This diagram tries to make the picture a bit clearer:

In our case:

Cycle 1:

`setInterval` is scheduled as task
`setTimeout 1` is scheduled as task
in `Promise.resolve 1` both `then`s are scheduled as microtasks
the stack is empty, microtasks are run

Task queue: setInterval, setTimeout 1

Cycle 2:

the microtask queue is empty, `setInteval`'s handler can be run, another `setInterval` is scheduled as a task, right behind `setTimeout 1`

Task queue: setTimeout 1, setInterval

Cycle 3:

the microtask queue is empty, `setTimeout 1`'s handler can be run, `promise 3` and `promise 4` are scheduled as microtasks,
handlers of `promise 3` and `promise 4` are run `setTimeout 2` is scheduled as task

Task queue: setInterval, setTimeout 2

Cycle 4:

the microtask queue is empty, `setInteval`'s handler can be run, another `setInterval` is scheduled as a task, right behind `setTimeout`

Task queue: setTimeout 2, setInteval

`setTimeout 2`'s handler run, `promise 5` and `promise 6` are scheduled as microtasks

Now handlers of promise 5 and promise 6 should be run clearing our interval, but for some strange reason setInterval is run again. However, if you run this code in Chrome, you will get the expected behavior.

We can fix this in Node too with process.nextTick and some mind-boggling callback hell.

console.log('script start')

const interval = setInterval(() => {

console.log('setInterval')

}, 0)

setTimeout(() => {

console.log('setTimeout 1')

process.nextTick(() => {

console.log('nextTick 3')

process.nextTick(() => {

console.log('nextTick 4')

setTimeout(() => {

console.log('setTimeout 2')

process.nextTick(() => {

console.log('nextTick 5')

process.nextTick(() => {

console.log('nextTick 6')

clearInterval(interval)

})

}, 0)

})

process.nextTick(() => {

console.log('nextTick 1')

process.nextTick(() => {

console.log('nextTick 2')

})

This is the exact same logic as our beloved promises use, only a little bit more hideous. At least it gets the job done the way we expected.

Tame the async beast!

As we saw, we need to manage and pay attention to both task queues, and to the event loop when we write an app in Node.js - in case we wish to leverage all its power, and if we want to keep our long running tasks from blocking the main thread.

The event loop might be a slippery concept to grasp at first, but once you get the hang of it, you won't be able to imagine that there is life without it. The continuation passing style that can lead to a callback hell might look ugly, but we have Promises, and soon we will have async-await in our hands... and while we are (a)waiting, you can simulate async-await using co and/or koa.

One last parting advice:

Knowing how Node.js and V8 handles long running executions, you can start using it for your own good. You might have heard before that you should send your long running loops to the task queue. You can do it by hand or make use of async.js.

Happy coding!

If you have any questions or thoughts, share them in the comments, I’ll be there! The next part of the Node.js at Scale series is discussing the Garbage Collection in Node.js, I recommend to check it out!

Tamas Kadlecsik

Full-Stack Developer at RisingStack.