I've heard tell of some confusion lately around what the term "non-blocking" means. This isn't the first time I've tried to explain it, and it certainly won't be the last, but blogging is easier than the job Sisyphus got, so I can't complain.
A thread is blocking when it is performing an input or output operation that may take an unknown amount of time. Crucially, a blocking thread is doing no useful work. It is stuck, consuming resources - in particular, its thread stack, and its process table entry. It is sucking up resources and getting nothing done. These are resources that one can most definitely run out of, and are in fact artificially limited on most operating systems, because if one has too many of them, the system bogs down and becomes unusable.
A thread may also be "stuck" doing some computationally intensive work; performing a complex computation, and sucking up CPU cycles. There is a very important distinction here, though. If that thread is burning up CPU, it is getting work done. It is computing. This is why we have computers: to compute things.
It is of course possible for a program to have a bug where a program goes into an infinite loop, or otherwise performs work on the CPU without actually getting anything useful to the user done, but if that's happening then the program is just buggy, or inefficient. But such a program is not blocking: it might be "thrashing" or "stuck" or "broken", but "blocking" means something more specific: that the program is sitting around, doing nothing, while it is waiting for some other thing to get work done, and not doing any of its own.
A program written in an event-driven style may be busy as long as it needs to be, but that does not mean it is blocking. Hence, event-driven and non-blocking are synonyms.
Furthermore, non-blocking doesn't necessarily mean single-process. Twisted is non-blocking, for example, but it has a sophisticated facility for starting, controlling and stopping other processes. Information about changes to those processes is represented as plain old events, making it reasonably easy to fold the results of computation in another process back into the main one.
If you need to perform a lengthy computation in an event-driven program, that does not mean you need to stop the world in order to do it. It doesn't mean that you need to give up on the relatively simple execution model of an event loop for a mess of threads, either. Just ask another process to do the work, and handle the result of that work as just another event.
Migration Report
1 week ago
9 comments:
You pose an interesting question.
There are many people who use the term blocking for CPU using processes too - and anything blocking the execution. Words take on multiple meanings, and I believe running also can mean blocking.
Memory IO through a VM is another interesting case, in that big memory moves are not instant and can in fact block for a significant amount of time. Memory could be a disk drive, data coming in over a network, it could be compressed, or encrypted, it could be a kilometer away over a PCIe bus. It could be in L1, L2, or L3 cache. It could be addressed on the same CPU/core, or on a different core.
Since memory can be network, or file IO in a modern VM(one from the 90s onwards), then any memory read or write can be blocking.
Isn't doing a long-running calculation one of the legitimate use cases for a worker thread in an event-driven application?
The event loop really shouldn't care if it was another process or a worker thread that did the calculation...
Minor quibble: sleep(60) is blocking, but does not take an "unknown amount of time".
Your definitions are quite wrong and honestly pretty problematic.
Blocking, as it pertains to I/O operations, means that the operating system stops a thread of execution when it cannot fulfill an I/O request immediately[1], resuming the thread of execution (process, thread, fiber, etc.) after the I/O operation is complete. Non-blocking means that the OS:
* Indicates the I/O operating cannot be completed at this time and the TOE should try again later, or
* Internally queues the operation for future execution and then notifies completion in some fashion, or
* Some combination of the two.
This allows the TOE to busy itself with other operations while it waits for the I/O operation to complete or to become possible.
In both blocking and non-blocking I/O, the operation may take an indeterminate amount of time. The amount of time it takes to complete an I/O operation has little to do with the definition. Nor does the fact that the TOE is sucking up resources while blocking: it sucks up those resources merely by existing. The only way to get it to not consume resources is to destroy it, at which point blocking v. non-blocking becomes moot.
The blocking vs. non-blocking concept occurs in other contexts as well. Most locking constructs, like mutexes and semaphores, support both blocking waits and timed (non-blocking) waits. As a result, UNIX developers (in particular) tend to consider blocking synonymous with 'sleeping': scheduler states where the TOE is not running.
Based on all of this, we can generalize blocking in this sense to something like: "When a piece of code waits for a resource to become available instead of continuing onward with other processing".
In practical terms, blocking means: "when your code waits for a resource to become available" and non-blocking means: "some other code waits for a resource to become available and then notifies your code".
In particular, this means that event-driven and non-blocking are not synonymous in the least. Event-driven programs block all of the time, on purpose, typically to wait for the event loop. Despite your statement, this is very desirable[2]: we'd much rather have the CPU resources made available for other programs than endlessly wasted by an application asking, "Are there any events ready yet?" This is, in fact, why blocking operations are the default on most operating systems.
Frequently, a second type of blocking is performed as well. If there are N I/O operations and M workers (N > M) to service the I/O operations, then some of the I/O operations will be blocked waiting for a worker to become available to service the operation[3]. This is common because it may only be logically possible to service one operation at a time (e.g., GUI events) or due to other resource constraints (e.g., available CPU or memory)
They frequently perform a second type of blocking too: outstanding I/O operations are performed with limited concurrency (N operations, M workers, where N > M)
Win32 and X11 both are event-driven in nature, but I/O under both is still blocking by default. This is how GUI applications get the "grey screen of death": they perform a lengthy computation or a different blocking operation (e.g., I/O to a file) on whatever thread of execution is responsible for servicing the event loop, preventing events from being serviced. In this case, we can say the code responsible for accepting events is blocked by the code responsible for responding to events, regardless of program structure.
In fact, such behavior is the default in virtually all event-driven APIs, because asynchronous computation is extremely difficult. You are correct that these APIs provide mechanisms for performing asynchronous computation, but the programmer must explicitly leverage such functionality.
Making I/O asynchronous by default does nothing to help with the asynchronous computation problem. If you attempt to schedule multiple I/O operations onto the same TOE, then your program will not service I/O in a timely fashion if computing a response takes too long. At the end of the day, this is what we care about.
[1] "Immediate" has widely varying definitions depending on the situation and may include blocking temporarily.
[2] The notion that waiting is somehow not useful work is absurd.
[3] It's worth noting that people have solved C10K using one thread per connection in Java on Linux. One thread per connection is a fine model these days provided your programming language supports true concurrent execution of multiple threads.
@Nick, absolutely. What I was trying to say was "use worker threads for doing work, not a thread for each I/O operation".
@Andrew, you've got me there! I'm going to have to think about how to modify that statement, especially since other blocking calls (select() comes to mind) can specify a maximum amount of time...
@Adam, my definition of "blocking" is derived from the open group standard, which is a good place for definitions of terms, especially in a UNIX-ish context. I realize that this short post doesn't explain all the nuances of event-driven systems and non-blocking I/O, but your criticism has a fair number of problems as well. For example, it's not really meaningful to say that the I/O under X11 is "blocking by default"; also, that's not why applications get the grey screen of death: they can either be stalled on computation or blocking on I/O; any failure to respond promptly to events causes that behavior.
Perhaps you should write your own post, laying out your own definitions? A comment here just linking to that would give you more room to express yourself.
Your definition of blocking does not match the Open Group's definition in the least, but you shouldn't use their definition anyway as they do not use it consistently nor correctly throughout the SUS.
How exactly is not fair to say that I/O under X11 isn't blocking by default? It's the only I/O mode supported by XLib. Requests are synchronous in nature and communication does not proceed until the X server replies. XCB admittedly goes a long way to changing this, but most (all?) high-level toolkits still inherit XLib's limitations. All other I/O on UNIX is blocking by default, so I fail to see how what I said as unreasonable.
Also, how can I be wrong about why applications see a gray screen of death when your reason is exactly the same as mine[1]? Plus, you claim I need to provide definitions when I actually did so, so I'm not very confident you read the entirety of my post. Please go back and read it carefully.
[1] Though I don't understand your obsession with computational "stalls" whatever those might be.
@Adam,
> How exactly is not fair to say that I/O under X11 isn't blocking by default?
Because X11 is not a thing you have "I/O" under, really. You do have I/O to the X server, but that's a bit of a special case, and not the cause of the "grey screen of death", unless you're talking to a remote X application, which is highly unusual. Using a library like glib, for example, with GTK+, I/O to systems other than the X server itself is non-blocking by convention, except in buggy applications. And, in fact, most calls to Xlib by high-level toolkits are carefully invoked when the XDisplay file descriptor is accepting of events (by checking first with a blocking multiplexor like select() or poll()), so although it's dangerous to make the file description itself nonblocking, the I/O operations to it usually don't block anyway.
> Also, how can I be wrong about why applications see a gray screen of death when your reason is exactly the same as mine
Because I actually gave a different reason than you did: "failing to respond promptly to events" isn't "blocking". (And in fact, a process may block frequently but for little time, and thereby respond to events promptly anyway.) That is rather the point of the post here :). If you want to use your own definitions of terms, you're welcome to get your own blog to explain them!
The X server delivers events. In return, I send requests to the X server. How can that possibly be anything other than I/O? Again, combined with the fact that all other I/O is blocking by default how is my claim invalid?
Talking to X server is not a special case (how can it be?), and failure to service the event loop is the cause of gray screens of death. Local or remote has nothing to do with it.
I/O channels in GLib are not asynchronous by default, you must request the behavior somehow: either by setting the flag or implicitly by watching in an event loop.
> And, in fact, most calls to Xlib by high-level toolkits are carefully invoked when the XDisplay file descriptor is accepting of events (by checking first with a blocking multiplexor like select() or poll()),
No, many high-level toolkits and applications dedicate a thread for GUI event handling, avoiding the issue entirely. Such a technique is infinitely more portable and considerably safer anyway. And no, the synchronous nature of XLib means there's still plenty of blocking involved.
> Because I actually gave a different reason than you did: "failing to respond promptly to events" isn't "blocking".
Yes it is. The event is waiting to be serviced, ergo it is blocked. Your process is blocking the event from being serviced. The fact that it's running in some computational loop instead of sleeping in an I/O call doesn't change the fact it's blocking the event from being serviced. A process sleeping in read() is still blocked on I/O regardless of what the kernel is doing. How this situation is any different is beyond me.
> (And in fact, a process may block frequently but for little time, and thereby respond to events promptly anyway.) That is rather the point of the post here :).
I find that rather hard to believe since you quite clearly said that blocking is bad--that blocking is useless.
> If you want to use your own definitions of terms, you're welcome to get your own blog to explain them!
My definitions do not become more valid by suddenly appearing on my own blog, and I don't know why you would persist is believing such absurd fiction.
Post a Comment