What Is Async Anyway

· 13min

While going through my article backlog, A tale of event loops dragged me once again into the world of asynchronous programming.

There are already many great resources explaining asynchronous programming out there. Lukasz Langa’s wonderful video series covers both the basics and internals of the implementation in Python. Complement it with Trio tutorial and you are practically ready to go. Afterwards, you might enjoy the coloring parable to come full circle and declare threads to be superior.

My goal is to gradually build a simplified mental model to make asynchronous thinking more intuitive. I hope it helps you develop an even firmer grasp of the concept you are already familiar with. Although the examples are in Python, some of these ideas should carry over to other contexts.


Necessary Building Block

The essential piece for asynchronicity in the programming is the ability to suspend and resume the code execution. In Python world, meet coroutines:

async def coro(name):
    print(f"Hi, {name}, from coroutine!")

Being familiar with asyncio, you might be tempted to bring run_until_complete in to execute. But given the simplicity of our coro, the following is more than enough to give us the same result:

def run(coro):
    try:
        # here we resume coroutine to be executed
        return coro.send(None)
    except StopIteration as e:
        return e.value

# note that the coroutine is created suspended
c = coro("you")
run(c)

How come that we can run asynchronous function without the event loop? Is event loop doing more than just calling .send on coroutine?1

Take a look at slightly more involved asynchronous addition:

async def add(numbers):
    async def _dissolve(number):
        return [1 for _ in range(number)]

    async def _cardinality(elements):
        return len(elements)

    count = 0

    for n in numbers:
        dissolved = await _dissolve(n)
        count += await _cardinality(dissolved)

    print(f"Sum of {numbers} is {count}")

Will this work with run as defined above? Or do we need to already invoke the loop? Why? What if we rewrite the loop part like this:

for n in numbers:
        dissolved = run(_dissolve(n))
        count += run(_cardinality(dissolved))

Is the code more familiar with await gone? The above snippet is essentially inlining the syntax sugar. We are able to run the computation declared with async def, yet we are completely ignoring asyncio or anything vaguely related to it. We are basically writing good ol’ fashioned synchronous code in just very convoluted way. Ouch.

To observe consecutive resumptions, we need to bring yield into play:

from types import coroutine


@coroutine
def really_just_coro():
    yield 1
    yield 1
    return 2

Luckily, it behaves similar to the coroutines defined using async def:

c = really_just_coro()
r1 = run(c)
r2 = run(c)
r3 = run(c)

assert r1 + r2 == r3

However, our naive run implementation quickly reaches its limits, since fully exhausting the coroutine requires manually invoking it multiple times.

While the coroutines are necessary building block, they do not give us any asynchronicity out of the box.

Coroutine in a nutshell
Coroutine in a nutshell

All You Need Is Runtime

Assuming you’ve already wrapped your head around the asynchronous paradigm, there’s a good chance someone’s asked you to roll your own naive event loop. If not, let’s fix that now—building one from scratch is a great way to understand the internals.

Our goal is to implement a function:

def loop_run_until_complete(coro):
    ???

which takes coroutine and run it until its completion. To test that implementation is working, we will use this program:

async def coro():
    await spawn(say("Hi"))
    await spawn(say("you"))
    print("!")


async def say(word):
    print(word)


@coroutine
def spawn(coro):
    yield coro

Once it is running, we should get:

❯ python loop_run_until_complete.py
Hi
you
!

Unless you’re reading this on your phone, stop right now and just try to write it down. Set a timer for 5 minutes, then give it your all to implement loop_run_until_complete.

If you get stuck, try these prompts:

  • original run method is not sufficient to inline await spawn(say("hi") as spawn return value is new say coroutine. Thus we need something to .send on newly spawned coroutines too
  • until now our coroutines were returning value only when exhausted (i.e. StopIteration raised). spawn is different as it can return multiple values before its done
  • think in terms of two distinct APIs. One between you, the developer, and loop_run_until_complete and second between loop_run_until_complete and the coroutines

Let’s take a look at working example:

def loop_run_until_complete(coro):
    coros = [coro]
    while coros:
        queue, coros = coros, []
        for c in queue:
            try:
                # advance the coro by one step
                d = c.send(None)
            except StopIteration:
                # coro is exhausted, nothing more to do with `c`
                pass
            else:
                # schedule child first for the next loop round
                # as such implementation gives us a warm but
                # false feeling of determinism
                if iscoroutine(d):
                    coros.append(d)
                # as c.send did not raised StopIteration,
                # current task is not exhausted and
                # has to be rescheduled
                coros.append(c)

Not counting comments, it’s only 12 lines of code—yet there’s a ton to unpack.

One issue you might have encountered is consistently getting this output:

❯ python loop_run_until_complete.py
Hi
!
you

Our toy implementation doesn’t yet support anything that can shuffle the execution order of coroutines. Such mess will come only once we introduce a mechanism, like sleep, with ability to reorder coroutines in the queue. Then you’ll see that expecting a predictable output when using spawn is, to put it mildly, a bit hopeful.

Another takeway is that yield is the only place when coroutine gives up control and let the loop to decide what happens next. That’s where the coroutine cooperation happens. await statements themselves does not interact with the loop. Consider following example:

async def coro():
    await spawn(say("you"))
    await say("Hi")
    await say("!")

You might try shuffling the lines in loop_run_until_complete up and down, but you won’t be able to squeeze anything between the Hi and ! prints—they’re just good ol’ synchronous code running back-to-back.

In our current implementation, reading async def full of await is actually the same as going over plain sequential synchronous code. Unfortunately, we will later see that validity of such assumptions depends on the runtime we use.

The tricky part of async is keeping track of what’s happening at any given moment, and it only gets tougher when every await becomes a potential forking point.

Lastly, the point you’ve probably already guessed: there’s no real asynchronicity without the runtime framework like loop_run_until_complete driving it. You need both cooperative coroutines and the runtime to orchestrate them.

Language designers can make our life easier with features like async/await but it’s the runtime—and the yield-based machinery—that actually delivers the async magic. And it’s the runtime designers who decide which APIs we, the developers, get to use.

Next we extend our runtime API with sleep feature to illustrate the source of non-determinism in async code.

Async runtime in a nutshell
Async runtime in a nutshell

Sleep

As before, give it a shot and implement it on your own. Being in charge of our runtime design, we can decide how sleep is implemented internally. The goal is to expose async def sleep which behaves like the standard time.sleep but while coroutine is sleeping, pending coroutines in the queue can continue spinning.

Test your implementation against following code:

async def coro():
    # note the order in which the prints are declared
    await spawn(say_after("you", 2))
    await spawn(say_after("!", 3))
    await spawn(say_after("Hi", 1))


async def say_after(word, seconds):
    await sleep(seconds)
    print(word)

again expecting:

❯ python loop_with_sleep.py
Hi
you
!

Sadly, even a naive implementation gets already fairly involved
from types import coroutine
import time


def loop_run_until_complete(coro):
    coros = [coro]
    # bench for coros waiting for their time to run
    waiting = []
    # yes, you are right - CPU can be unnecessary spinning
    # if there are `waiting` coros but none of them ready
    # try to implement more eco friendly version as an exercise
    while coros or waiting:
        queue, coros = coros, []
        ready, waiting = _get_ready_coros(waiting, time.time())
        # prepend or append, just an implementation detail
        # and not proper coordination mechanism
        # it's best to consider coroutines to be
        # independent of each other
        queue = ready + queue
        for c in queue:
            try:
                d = c.send(None)
            except StopIteration:
                pass
            else:
                # runtime internal API is extended
                # we now support multitude of "yield commands",
                # each with their own payload
                match d:
                    case None:
                        coros.append(c)
                    case ("spawn", child_coro):
                        coros.append(child_coro)
                        coros.append(c)
                    case ("sleep", seconds):
                        scheduled_time = time.time() + seconds
                        waiting.append((c, scheduled_time))


def _get_ready_coros(waiting, now):
    # a way how to figure which waiting coro are ready
    ready, still_waiting = [], []
    for coro, scheduled_time in waiting:
        # this tells the story why `await sleep(seconds)`
        # does not give you any guarantee of coroutine
        # being executed exactly in `seconds`
        if scheduled_time < now:
            ready.append(coro)
        else:
            still_waiting.append((coro, scheduled_time))

    return ready, still_waiting


@coroutine
def spawn(coro):
    yield ("spawn", coro)


@coroutine
def sleep(seconds):
    yield ("sleep", seconds)
  • the runtime internal API with yield coroutines is extended. Coroutines now yields structured objects communicating to the runtime what should happen at the point of cooperation
  • the cooperating coroutines must be compatible with the runtime API. The previous spawn version would not work with the current implementation 2
  • sleep rips apart our previous ability to read and predict the order of spawn coroutine execution. The link between the order code is declared and the order it actually runs is gone

The last point is best understood with following snippet:

async def coro():
    await spawn(ping_db())
    await spawn(ping_url())


async def ping_db():
    await sleep(db_sleep_duration)
    print("DB pinged")


async def ping_url():
    await sleep(url_ping_duration)
    print("URL pinged")

The actual execution order hinges entirely on db_ping_duration and url_ping_duration. So? Without peeking at how coroutines scheduled with spawn are implemented, there is suddenly no way to predict what will happen at runtime. They become independent execution units.

See if you can predict what this snippet will print:

async def coro():
    await ping_db()
    await ping_url()

Hopefully the hammering of the same point paid off—no matter the values of db_ping_duration and url_ping_duration, ping_db always runs first. That’s because, from our runtime’s perspective, there is just single coro coroutine as there is no spawn call. Real asynchronicity only shows up once you schedule multiple coroutines and let the runtime interleave them.

Which finally brings us to the real culprit behind async pain—what scheduling API is exposed by the runtime.

The runtime API surface can be quite rich
The runtime API surface can be quite rich

Managing The Forking Points

While the runtime we’ve built so far has its limitations3, it nicely reveals the potential hazards of the design. When you look closer at spawn implementation, it becomes clear that every call to it is effectively modifying a global variable queue. So the runtime state can be modified from anywhere in the code. Let me assume I don’t have to sell you on why this might cause troubles.

Thus every await potentially represent a fork in the execution tree as spawn might be called internally. Up until that call, you can usually read the code as a straightforward, sequential flow. After it, though, you have no idea what’s in the queue or how many coroutines are running.

Don’t get me wrong, we obviously need a way how to run multiple coroutines at once, or else the whole async premise goes out of the window. But with global access, it’s all on the developer to not mess up and keep the app well-structured.

Consider following program:

async def main():
    t1 = asyncio.create_task(fetch_and_store())
    # inform the server we are alive
    t2 = asyncio.create_task(ping_pong_client())

    # manage the running tasks
    await t1
    await t2


async def fetch_and_store():
    while True:
      data = await read_data()
      t = asyncio.create_task(store_data(data))
      # what to do with `t`?

Common answers are:

  • await t
  • do not handle t at all

Which one do you prefer? Can you think about what might be the problem?

First, few assumptions about its intent. Presumably, the read_data function’s job is to fetch incoming data—say, from an HTTP request or a WebSocket—and then persist it, perhaps into a database. And since we’re using async here, we are likely trying to squeeze out every drop of throughput.

When we await t, we are essentially giving up on the concurrency benefits. As examples in previous sections showed us, call to create_task is useless and the implementation is basically the same as:

async def fetch_and_store():
    while True:
      data = await read_data()
      await store_data(data)

Calling store_data is simply blocking and fetch_and_store is executed sequentially.

Well, surely you are now convinced that we just should not handle t. By calling create_task, we are appending new coroutines for the runtime to execute, the call itself is non-blocking so we can right away call read_data again. The execution scales.

Can you think of what can go wrong?


Arguably, the second approach is worse than the first one. In the first case, the program is at least doing what we would expect, even though it might be slow. There is so much which can go wrong in second version. Let’s go over some of the failure modes one by one. Guess the runtime output before executing the code.

Losing data and not knowing about it is easy

async def main():
    t = asyncio.create_task(store_data())
    await t


async def store_data():
    async def read_data():
        return 1

    while True:
       data = await read_data()
       asyncio.create_task(buggy_store(data))
       await asyncio.sleep(1)


async def buggy_store(data):
    raise ValueError("There is something wrong with me")

The program will just keep running without saving any data. If you’re lucky enough to have error-log monitoring, you’ll get an alert. If not, you’ll blissfully cruise along until that mysterious “missing client data on production” bug lands in your inbox.

Or even easier way to loose the data

async def main():
    t = asyncio.create_task(store_data())
    await t
    print("Successfully finished! :crossed-fingers:")
    print(f"Psst: {asyncio.all_tasks()}")


async def store_data():
    async def read_data(count):
      return count + 1

    for count in range(5):
      data = await read_data(count)
      asyncio.create_task(store(data))


async def store(data):
    await asyncio.sleep(1)

Because store calls run independently of main, the await t hits immediately. With nothing else holding the entry point open, the interpreter just shuts down. It doesn’t even know about those scheduled coroutines—they’re just memory chunks to it.

Only the runtime framework tracks them, and we never told it to manage them. It’d be great if we could surface every spawned coroutine up the stack so they can be dealt with in one place.

Also it’s a good example how the responsibilities are split between the developer, the runtime framework, and the language.

Or chasing the mysterious production bugs

Coming back to the original snippet—imagine traffic spike so that new store_data coroutines are created non-stop.

Suddenly your database connection pool maxes out and writes start failing. Or there is a sudden spike in storage duration even though the database looks healthy because the coroutines are waiting for the execution in ever growing loop as the runtime overhead balloons.

Such bugs are a nightmare to debug. To reproduce them, you’d have to both mimic production traffic patterns and watch the queue size at the same time. And they’re really hard to spot as you have to follow every await all the way through.


I’m actually not trying to slam asyncio itself. These issues can be usually fixed by adding communication primitives like queues or by ditching global spawning in favor of TaskGroup.

Using TaskGroup, originally introduced as Trio nursery, gives you one more huge benefit:

async def main():
    async with asyncio.TaskGroup() as tg:
        task1 = tg.create_task(tricky_coro(tg, ...))
        task2 = tg.create_task(plain_coro(...))

By explicitly passing tg factory to the coroutine, we can easily separate the shady ones spawning new coroutines on fly from the boring rest that execute sequentially internally. trio just goes a step further by deliberately offering a more constrained API compared to asyncio.

Lastly, keep in mind this isn’t an async/await–only problem. You’ll inevitably encounter the same pitfalls in implementation using effect systems that let you tweak global state.

Pretend it is almost comprehensible
Pretend it is almost comprehensible

Wrapping Up

Coroutines give us cooperative multitasking by letting us pause and resume execution at will. Features like async and await make writing and reading asynchronous code a lot smoother, but they don’t actually do the concurrency themselves—that’s the job of the runtime.

In other words, the runtime’s API defines how concurrency works in your program. And exposing global scheduling controls makes writing and understanding your code much harder.

Footnotes

  1. There’s a lot we’ll deliberately omit for the sake of exposition. And this post already offers an exceptional, in-depth explanation of the more complex reality

  2. That’s why different runtimes require specialized tools to implement the same concept

  3. You may have noticed that we are not able to retrieve return value from spawn coroutines