Update the logcontext doc (#10353)

By referring to awaitables instead of deferreds.
2024-11-28 15:08:49 +03:00 · 2021-07-15 12:38:05 +02:00 · 2021-07-15 12:38:05 +02:00 · 5ecad4e7a5
commit 5ecad4e7a5
parent bf72d10dbf
2 changed files with 101 additions and 229 deletions
--- a/changelog.d/10353.doc
+++ b/changelog.d/10353.doc
@ -0,0 +1 @@
 Refresh the logcontext dev documentation.
--- a/docs/log_contexts.md
+++ b/docs/log_contexts.md
@ -14,12 +14,16 @@ The `synapse.logging.context` module provides a facilities for managing
 the current log context (as well as providing the `LoggingContextFilter`
 class).
-Deferreds make the whole thing complicated, so this document describes
+Asynchronous functions make the whole thing complicated, so this document describes
 how it all works, and how to write code which follows the rules.
-## Logcontexts without Deferreds
+In this document, "awaitable" refers to any object which can be `await`ed. In the context of
 Synapse, that normally means either a coroutine or a Twisted 
 [`Deferred`](https://twistedmatrix.com/documents/current/api/twisted.internet.defer.Deferred.html).
-In the absence of any Deferred voodoo, things are simple enough. As with
+## Logcontexts without asynchronous code
 In the absence of any asynchronous voodoo, things are simple enough. As with
 any code of this nature, the rule is that our function should leave
 things as it found them:
@ -55,126 +59,109 @@ def do_request_handling():
    logger.debug("phew")
 ```
-## Using logcontexts with Deferreds
+## Using logcontexts with awaitables
-Deferreds --- and in particular, `defer.inlineCallbacks` --- break the
+Awaitables break the linear flow of code so that there is no longer a single entry point
-linear flow of code so that there is no longer a single entry point
+where we should set the logcontext and a single exit point where we should remove it.
 where we should set the logcontext and a single exit point where we
 should remove it.
 Consider the example above, where `do_request_handling` needs to do some
-blocking operation, and returns a deferred:
+blocking operation, and returns an awaitable:
 ```python
-@defer.inlineCallbacks
+async def handle_request(request_id):
 def handle_request(request_id):
    with context.LoggingContext() as request_context:
        request_context.request = request_id
-        yield do_request_handling()
+        await do_request_handling()
        logger.debug("finished")
 ```
 In the above flow:
 -   The logcontext is set
-   `do_request_handling` is called, and returns a deferred
+-   `do_request_handling` is called, and returns an awaitable
-   `handle_request` yields the deferred
+-   `handle_request` awaits the awaitable
-   The `inlineCallbacks` wrapper of `handle_request` returns a deferred
+-   Execution of `handle_request` is suspended
 So we have stopped processing the request (and will probably go on to
 start processing the next), without clearing the logcontext.
 To circumvent this problem, synapse code assumes that, wherever you have
-a deferred, you will want to yield on it. To that end, whereever
+an awaitable, you will want to `await` it. To that end, whereever
-functions return a deferred, we adopt the following conventions:
+functions return awaitables, we adopt the following conventions:
-**Rules for functions returning deferreds:**
+**Rules for functions returning awaitables:**
-> -   If the deferred is already complete, the function returns with the
+> -   If the awaitable is already complete, the function returns with the
 >     same logcontext it started with.
-> -   If the deferred is incomplete, the function clears the logcontext
+> -   If the awaitable is incomplete, the function clears the logcontext
->     before returning; when the deferred completes, it restores the
+>     before returning; when the awaitable completes, it restores the
 >     logcontext before running any callbacks.
 That sounds complicated, but actually it means a lot of code (including
 the example above) "just works". There are two cases:
-   If `do_request_handling` returns a completed deferred, then the
+-   If `do_request_handling` returns a completed awaitable, then the
    logcontext will still be in place. In this case, execution will
-    continue immediately after the `yield`; the "finished" line will
+    continue immediately after the `await`; the "finished" line will
    be logged against the right context, and the `with` block restores
    the original context before we return to the caller.
-   If the returned deferred is incomplete, `do_request_handling` clears
+-   If the returned awaitable is incomplete, `do_request_handling` clears
    the logcontext before returning. The logcontext is therefore clear
-    when `handle_request` yields the deferred. At that point, the
+    when `handle_request` `await`s the awaitable.
    `inlineCallbacks` wrapper adds a callback to the deferred, and
    returns another (incomplete) deferred to the caller, and it is safe
    to begin processing the next request.
-    Once `do_request_handling`'s deferred completes, it will reinstate
+    Once `do_request_handling`'s awaitable completes, it will reinstate
-    the logcontext, before running the callback added by the
+    the logcontext, before running the second half of `handle_request`,
-    `inlineCallbacks` wrapper. That callback runs the second half of
+    so again the "finished" line will be logged against the right context,
-    `handle_request`, so again the "finished" line will be logged
+    and the `with` block restores the original context.
    against the right context, and the `with` block restores the
    original context.
 As an aside, it's worth noting that `handle_request` follows our rules
-though that only matters if the caller has its own logcontext which it
+- though that only matters if the caller has its own logcontext which it
 cares about.
 The following sections describe pitfalls and helpful patterns when
 implementing these rules.
-Always yield your deferreds
+Always await your awaitables
---------------------------
+----------------------------
-Whenever you get a deferred back from a function, you should `yield` on
+Whenever you get an awaitable back from a function, you should `await` on
-it as soon as possible. (Returning it directly to your caller is ok too,
+it as soon as possible. Do not pass go; do not do any logging; do not
-if you're not doing `inlineCallbacks`.) Do not pass go; do not do any
+call any other functions.
 logging; do not call any other functions.
 ```python
-@defer.inlineCallbacks
+async def fun():
 def fun():
    logger.debug("starting")
-    yield do_some_stuff()       # just like this
+    await do_some_stuff()       # just like this
-    d = more_stuff()
+    coro = more_stuff()
-    result = yield d            # also fine, of course
+    result = await coro         # also fine, of course
    return result
 def nonInlineCallbacksFun():
    logger.debug("just a wrapper really")
    return do_some_stuff()      # this is ok too - the caller will yield on
                                # it anyway.
 ```
 Provided this pattern is followed all the way back up to the callchain
 to where the logcontext was set, this will make things work out ok:
 provided `do_some_stuff` and `more_stuff` follow the rules above, then
-so will `fun` (as wrapped by `inlineCallbacks`) and
+so will `fun`.
 `nonInlineCallbacksFun`.
-It's all too easy to forget to `yield`: for instance if we forgot that
+It's all too easy to forget to `await`: for instance if we forgot that
-`do_some_stuff` returned a deferred, we might plough on regardless. This
+`do_some_stuff` returned an awaitable, we might plough on regardless. This
 leads to a mess; it will probably work itself out eventually, but not
 before a load of stuff has been logged against the wrong context.
 (Normally, other things will break, more obviously, if you forget to
-`yield`, so this tends not to be a major problem in practice.)
+`await`, so this tends not to be a major problem in practice.)
 Of course sometimes you need to do something a bit fancier with your
-Deferreds - not all code follows the linear A-then-B-then-C pattern.
+awaitable - not all code follows the linear A-then-B-then-C pattern.
 Notes on implementing more complex patterns are in later sections.
-## Where you create a new Deferred, make it follow the rules
+## Where you create a new awaitable, make it follow the rules
-Most of the time, a Deferred comes from another synapse function.
+Most of the time, an awaitable comes from another synapse function.
-Sometimes, though, we need to make up a new Deferred, or we get a
+Sometimes, though, we need to make up a new awaitable, or we get an awaitable
-Deferred back from external code. We need to make it follow our rules.
+back from external code. We need to make it follow our rules.
-The easy way to do it is with a combination of `defer.inlineCallbacks`,
+The easy way to do it is by using `context.make_deferred_yieldable`. Suppose we want to implement
 and `context.PreserveLoggingContext`. Suppose we want to implement
 `sleep`, which returns a deferred which will run its callbacks after a
 given number of seconds. That might look like:
@ -186,25 +173,12 @@ def get_sleep_deferred(seconds):
    return d
 ```
-That doesn't follow the rules, but we can fix it by wrapping it with
+That doesn't follow the rules, but we can fix it by calling it through
-`PreserveLoggingContext` and `yield` ing on it:
+`context.make_deferred_yieldable`:
 ```python
-@defer.inlineCallbacks
+async def sleep(seconds):
-def sleep(seconds):
+    return await context.make_deferred_yieldable(get_sleep_deferred(seconds))
    with PreserveLoggingContext():
        yield get_sleep_deferred(seconds)
 ```
 This technique works equally for external functions which return
 deferreds, or deferreds we have made ourselves.
 You can also use `context.make_deferred_yieldable`, which just does the
 boilerplate for you, so the above could be written:
 ```python
 def sleep(seconds):
    return context.make_deferred_yieldable(get_sleep_deferred(seconds))
 ```
 ## Fire-and-forget
@ -213,20 +187,18 @@ Sometimes you want to fire off a chain of execution, but not wait for
 its result. That might look a bit like this:
 ```python
-@defer.inlineCallbacks
+async def do_request_handling():
-def do_request_handling():
+    await foreground_operation()
    yield foreground_operation()
    # *don't* do this
    background_operation()
    logger.debug("Request handling complete")
-@defer.inlineCallbacks
+async def background_operation():
-def background_operation():
+    await first_background_step()
    yield first_background_step()
    logger.debug("Completed first step")
-    yield second_background_step()
+    await second_background_step()
    logger.debug("Completed second step")
 ```
@ -235,13 +207,13 @@ The above code does a couple of steps in the background after
 against the `request_context` logcontext, which may or may not be
 desirable. There are two big problems with the above, however. The first
 problem is that, if `background_operation` returns an incomplete
-Deferred, it will expect its caller to `yield` immediately, so will have
+awaitable, it will expect its caller to `await` immediately, so will have
 cleared the logcontext. In this example, that means that 'Request
 handling complete' will be logged without any context.
 The second problem, which is potentially even worse, is that when the
-Deferred returned by `background_operation` completes, it will restore
+awaitable returned by `background_operation` completes, it will restore
-the original logcontext. There is nothing waiting on that Deferred, so
+the original logcontext. There is nothing waiting on that awaitable, so
 the logcontext will leak into the reactor and possibly get attached to
 some arbitrary future operation.
@ -254,9 +226,8 @@ deferred completes will be the empty logcontext), and will restore the
 current logcontext before continuing the foreground process:
 ```python
-@defer.inlineCallbacks
+async def do_request_handling():
-def do_request_handling():
+    await foreground_operation()
    yield foreground_operation()
    # start background_operation off in the empty logcontext, to
    # avoid leaking the current context into the reactor.
@ -274,16 +245,15 @@ Obviously that option means that the operations done in
 The second option is to use `context.run_in_background`, which wraps a
 function so that it doesn't reset the logcontext even when it returns
-an incomplete deferred, and adds a callback to the returned deferred to
+an incomplete awaitable, and adds a callback to the returned awaitable to
 reset the logcontext. In other words, it turns a function that follows
-the Synapse rules about logcontexts and Deferreds into one which behaves
+the Synapse rules about logcontexts and awaitables into one which behaves
 more like an external function --- the opposite operation to that
 described in the previous section. It can be used like this:
 ```python
-@defer.inlineCallbacks
+async def do_request_handling():
-def do_request_handling():
+    await foreground_operation()
    yield foreground_operation()
    context.run_in_background(background_operation)
@ -294,152 +264,53 @@ def do_request_handling():
 ## Passing synapse deferreds into third-party functions
 A typical example of this is where we want to collect together two or
-more deferred via `defer.gatherResults`:
+more awaitables via `defer.gatherResults`:
 ```python
-d1 = operation1()
+a1 = operation1()
-d2 = operation2()
+a2 = operation2()
-d3 = defer.gatherResults([d1, d2])
+a3 = defer.gatherResults([a1, a2])
 ```
 This is really a variation of the fire-and-forget problem above, in that
-we are firing off `d1` and `d2` without yielding on them. The difference
+we are firing off `a1` and `a2` without awaiting on them. The difference
 is that we now have third-party code attached to their callbacks. Anyway
 either technique given in the [Fire-and-forget](#fire-and-forget)
 section will work.
-Of course, the new Deferred returned by `gatherResults` needs to be
+Of course, the new awaitable returned by `gather` needs to be
 wrapped in order to make it follow the logcontext rules before we can
-yield it, as described in [Where you create a new Deferred, make it
+yield it, as described in [Where you create a new awaitable, make it
 follow the
-rules](#where-you-create-a-new-deferred-make-it-follow-the-rules).
+rules](#where-you-create-a-new-awaitable-make-it-follow-the-rules).
 So, option one: reset the logcontext before starting the operations to
 be gathered:
 ```python
-@defer.inlineCallbacks
+async def do_request_handling():
 def do_request_handling():
    with PreserveLoggingContext():
-        d1 = operation1()
+        a1 = operation1()
-        d2 = operation2()
+        a2 = operation2()
-        result = yield defer.gatherResults([d1, d2])
+        result = await defer.gatherResults([a1, a2])
 ```
 In this case particularly, though, option two, of using
-`context.preserve_fn` almost certainly makes more sense, so that
+`context.run_in_background` almost certainly makes more sense, so that
 `operation1` and `operation2` are both logged against the original
 logcontext. This looks like:
 ```python
-@defer.inlineCallbacks
+async def do_request_handling():
-def do_request_handling():
+    a1 = context.run_in_background(operation1)
-    d1 = context.preserve_fn(operation1)()
+    a2 = context.run_in_background(operation2)
    d2 = context.preserve_fn(operation2)()
-    with PreserveLoggingContext():
+    result = await make_deferred_yieldable(defer.gatherResults([a1, a2]))
        result = yield defer.gatherResults([d1, d2])
 ```
-## Was all this really necessary?
+## A note on garbage-collection of awaitable chains
-The conventions used work fine for a linear flow where everything
+It turns out that our logcontext rules do not play nicely with awaitable
 happens in series via `defer.inlineCallbacks` and `yield`, but are
 certainly tricky to follow for any more exotic flows. It's hard not to
 wonder if we could have done something else.
 We're not going to rewrite Synapse now, so the following is entirely of
 academic interest, but I'd like to record some thoughts on an
 alternative approach.
 I briefly prototyped some code following an alternative set of rules. I
 think it would work, but I certainly didn't get as far as thinking how
 it would interact with concepts as complicated as the cache descriptors.
 My alternative rules were:
 -   functions always preserve the logcontext of their caller, whether or
    not they are returning a Deferred.
 -   Deferreds returned by synapse functions run their callbacks in the
    same context as the function was orignally called in.
 The main point of this scheme is that everywhere that sets the
 logcontext is responsible for clearing it before returning control to
 the reactor.
 So, for example, if you were the function which started a
 `with LoggingContext` block, you wouldn't `yield` within it --- instead
 you'd start off the background process, and then leave the `with` block
 to wait for it:
 ```python
 def handle_request(request_id):
    with context.LoggingContext() as request_context:
        request_context.request = request_id
        d = do_request_handling()
    def cb(r):
        logger.debug("finished")
    d.addCallback(cb)
    return d
 ```
 (in general, mixing `with LoggingContext` blocks and
 `defer.inlineCallbacks` in the same function leads to slighly
 counter-intuitive code, under this scheme).
 Because we leave the original `with` block as soon as the Deferred is
 returned (as opposed to waiting for it to be resolved, as we do today),
 the logcontext is cleared before control passes back to the reactor; so
 if there is some code within `do_request_handling` which needs to wait
 for a Deferred to complete, there is no need for it to worry about
 clearing the logcontext before doing so:
 ```python
 def handle_request():
    r = do_some_stuff()
    r.addCallback(do_some_more_stuff)
    return r
 ```
 --- and provided `do_some_stuff` follows the rules of returning a
 Deferred which runs its callbacks in the original logcontext, all is
 happy.
 The business of a Deferred which runs its callbacks in the original
 logcontext isn't hard to achieve --- we have it today, in the shape of
 `context._PreservingContextDeferred`:
 ```python
 def do_some_stuff():
    deferred = do_some_io()
    pcd = _PreservingContextDeferred(LoggingContext.current_context())
    deferred.chainDeferred(pcd)
    return pcd
 ```
 It turns out that, thanks to the way that Deferreds chain together, we
 automatically get the property of a context-preserving deferred with
 `defer.inlineCallbacks`, provided the final Defered the function
 `yields` on has that property. So we can just write:
 ```python
@defer.inlineCallbacks
 def handle_request():
    yield do_some_stuff()
    yield do_some_more_stuff()
 ```
 To conclude: I think this scheme would have worked equally well, with
 less danger of messing it up, and probably made some more esoteric code
 easier to write. But again --- changing the conventions of the entire
 Synapse codebase is not a sensible option for the marginal improvement
 offered.
 ## A note on garbage-collection of Deferred chains
 It turns out that our logcontext rules do not play nicely with Deferred
 chains which get orphaned and garbage-collected.
 Imagine we have some code that looks like this:
@ -451,13 +322,12 @@ def on_something_interesting():
    for d in listener_queue:
        d.callback("foo")
-@defer.inlineCallbacks
+async def await_something_interesting():
-def await_something_interesting():
+    new_awaitable = defer.Deferred()
-    new_deferred = defer.Deferred()
+    listener_queue.append(new_awaitable)
    listener_queue.append(new_deferred)
    with PreserveLoggingContext():
-        yield new_deferred
+        await new_awaitable
 ```
 Obviously, the idea here is that we have a bunch of things which are
@ -476,18 +346,19 @@ def reset_listener_queue():
    listener_queue.clear()
 ```
-So, both ends of the deferred chain have now dropped their references,
+So, both ends of the awaitable chain have now dropped their references,
-and the deferred chain is now orphaned, and will be garbage-collected at
+and the awaitable chain is now orphaned, and will be garbage-collected at
-some point. Note that `await_something_interesting` is a generator
+some point. Note that `await_something_interesting` is a coroutine, 
-function, and when Python garbage-collects generator functions, it gives
+which Python implements as a generator function.  When Python
-them a chance to clean up by making the `yield` raise a `GeneratorExit`
+garbage-collects generator functions, it gives them a chance to 
 clean up by making the `async` (or `yield`) raise a `GeneratorExit`
 exception. In our case, that means that the `__exit__` handler of
 `PreserveLoggingContext` will carefully restore the request context, but
 there is now nothing waiting for its return, so the request context is
 never cleared.
-To reiterate, this problem only arises when *both* ends of a deferred
+To reiterate, this problem only arises when *both* ends of a awaitable
-chain are dropped. Dropping the the reference to a deferred you're
+chain are dropped. Dropping the the reference to an awaitable you're
-supposed to be calling is probably bad practice, so this doesn't
+supposed to be awaiting is bad practice, so this doesn't
 actually happen too much. Unfortunately, when it does happen, it will
 lead to leaked logcontexts which are incredibly hard to track down.