Pctx and logging

I realized had not blogged in a while, primarily because I was in Qatar for a bit more than 2 weeks and then I’ve been sick with some type of sinus problem since I got back. But let’s return to our regularly scheduled transmission…

On the flight back to the US from the world cup I completed milestone two: I have a program compiled to wasm that can be run as a single process or as multiple processes over a network. For now, I’m going to table milestone 3, the ability to choose that some services run as part of a “local bundle” and the others run out in the network. This “mixed mode” is clearly doable, but is going to be a lot of configuration work for not much business bang. Shortly, I’m going to start on milestone 4, probably tomorrow. Milestone 4 is support for the python language as a first class citizen for writing services.

I am taking a small diversion right now into logging. One of the things that makes the protocols that we communicate with interesting is that a “p-context” is pushed around with the remote procedure calls (local or over the network). This context is the “cousin”, perhaps “distant cousin”, of the context.Context that is omnipresent in go. The “p-context” or Pctx is passed to every implementation function of a service, and one is created on the client side for each service that the client is using.

As of this writing, the Pctx has three main purposes:

  • To be a collector of random bits of data that different services want to pass to each other.
    This functionality is why it is the “cousin” of golang’s context.Context. The Pctx however does not create the deeply nested structure that the context.Context does—it is a simple hashtable with two levels of naming like “subsystem.key=value”.

  • To allow user code to access the time or “wall clock time”. This is to force all uses of Now() and the like to go through the Pctx. Although not implemented yet, there is some scaffolding in place for eventually allowing test runners to call SetNowAndFreeze(time.Time). The purpose of this is to allow easy testing of processing that needs to occur every night at 2am or every weekday at noon. So long as the only access to the clock is through the Pctx this test feature should “just work”.

  • To allow user code to log things and have that log collected “properly”. By properly I mean that a single log would be generated for the whole interaction between the client and the server with the logs of both sides in the same log, in the correct order (so you can see causality).

    A simplified example may help: Suppose a program you wrote, a client, needs to access some service that has two methods, fleazil() and quux().

    Program started at 809am Dec 10
    I am about to call the fleazil service.
      --> Call to svc.fleazil(), Arguments: None at 809am <--
      I usually have the weekends off!
      --> Return value: None <--
    I am about to call the quux service... here we go!
      --> Call to svc.quux(), Arguments: None at 809am <--
      How 'bout them cowboys? :-)
      --> Return value: None <--
    Program terminated at 809am
    

So in this example, the two log messages which say “I am about to call” were generated by client side logging on the Pctx and the two messages inside the calls are from the server side. So the server side function fleazil is complaining about working weekends, and quux is apparently a fan of the Dallas Cowboys. I hope from this simple example you can see the value of having unified logs via the Pctx. This should make you mildly sick to your stomach about having spent so many hours of your life trying to “coordinate” multiple log files to see in what order things happened.

Bigly

The problem with this strategy should be pretty obvious. A naive implementation will have to pass the complete contents of the log both on the RPC call and the RPC response. And frequently people have lots of log messages that they aren’t really using, but are just there temporarily while developing. I’m trying not to optimize much yet.

Premature optimization is the root of all evil. – Donald Knuth

The only optimization I have done so far is one hack to make the call/return not so expensive. This hack relies on the protobuf type protosupport.PCtx, note capital C which is different from Pctx. That protobuf type is how the data is marshaled and unmarshaled for transmission. While the Pctx is in use in memory on either the client or server side I use wrappers to provide the Pctx functionality, but keep all the data in the prototbuf structure, adding or updating it as needed. So, when it’s time to marshal the code never does any “transform” on the log data, it just marshals the top element of the log, which has pointers to all the following parts so they will get marshaled as well.

I have no idea if this is going to be workable, but I hope it can be made so. One could also imagine logging being a “service” that is smart enough to create the output above given many different parts of a microservices program reporting to it. I guess we shall see.