Envoy event dispatcher

Andreas Hohmann March 22, 2024 #envoy #proxy

Like most high-performance servers, Envoy performs IO asynchronously using event loops on a relatively small number of worker threads. How Does Envoy's implementation work? Where does this functionality live?

Libevent

Envoy's event dispatcher uses the libevent library that defines an abstraction layer on top of the platform-dependent asynchronous IO APIs such as epoll on Linux and kqueue on FreeBSD/Mac, similar to the libuv library used by node.js. Libevent also offers a buffer API and even an HTTP library on top of that, but Envoy uses only the low level event dispatching.

The main purpose of an event library is to provide an event loop and some mechanism that allows us to react to IO events. We also need timer events (for example, for timeouts) and a way to schedule general asynchronous tasks. Another important aspect is the ability to cancel these operations. This is not only needed for timeouts but also more generally for concurrent tasks that depend on each other. Let's see how libevent provides this functionality, starting with the event loop.

The state of a libevent dispatcher is a kept in an event_base structure. Libevent is a typical C library that hides its structures behind opaque pointers and exposes a set of functions for creating, manipulating, and destroying these structures in an object-oriented fashion. The functions start with the name of the structure followed by _new, _free, or some custom function name.

To run a libevent event loop, we have to create an event_base with event_base_new or event_base_new_with_config, run it with event_base_loop, and eventually destroy it with event_base_free.

struct event_base* base = event_base_new();
// ... register initial events
event_base_loop(base);
event_base_free(base);

The event loop can be stopped by calling event_base_loopexit or libevent-event_base_loopbreak, for example, from a task on the event loop or from another thread. event_base_loopexit stops the event loop after a timeout and completing queued events whereas libevent-event_base_loopbreak stops immediately.

An "event" in the libevent library is the combination of the specification of the event we are interested in and the callback function that will be called by libevent when the event occurs. An event specification could be "READ on file descriptor 5" or "timer in 10 seconds". libevent callbacks are specified, again in typical C fashion, using a function pointer and a void* pointer to arbitrary data that is passed back to the callback function. An event callback, for example, takes the file descriptor, the events that occurred (as bits), and the void* context object that was given to libevent when registering the callback function.

typedef void (*event_callback_fn)(evutil_socket_t fd, short events, void *ctx);

An event (that is, an event specification with callback) is represented as the event structure that is created with event_new and destroyed with event_free following libevent's object-oriented pattern. The event_new function handles the creation of all event types (IO, timers, tasks). libevent defines macros that look like event functions for different event types, but they all boil down to calls of the event API with different parameters.

struct event *event_new(
  struct event_base *base,
  evutil_socket_t fd,
  short events,
  event_callback_fn callback,
  void *callback_arg
);

event_new creates the event on the heap. Alternatively, one can initialize an existing event structure with event_assign. That's useful for wrapper libraries (like the one defined in Envoy) that embed the event in a wrapper structure whose memory is managed by the application. The event_del function deletes an event without freeing the memory.

Creating an event starts the event's lifecycle in the initialized state. It does not schedule it yet. To this end, one has to add the event to the event dispatcher (event_base) by calling event_add which moves the event to the pending state. Once the conditions of the event occur, it becomes active. While an event is pending, we can remove it from the dispatcher by deleting it with event_del and putting it back by calling event_add again.

Looking at the event_new function, you may have missed a timeout duration. libevent does not consider the timeout part of the event specification and instead takes the timeout as a second argument to the event_add function.

int event_add(struct event *ev, const struct timeval *timeout);

As mentioned above, libevent uses this single event API for all the supported kinds of events. Besides the callback, we have three parameters at our disposal:

The EV_ET flag stands for "edge triggered" and controls whether we want to react to state changes or values. EV_PERSIST keeps an event pending when it triggers (saving us a call to event_add).

The following table shows the meaning of the event parameters for the different event types. The prefix is combined with the new, add, and pending function names, for example, evtimer_new to create a timer event.

event typefile descriptoreventstimeoutprefix
file (socket)socket file descriptorREAD, WRITE, PERSIST, ETtimeout (NULL = forever)event_
task-1delay (NULL = now)evtimer_
signalsignal numberSIGNAL, PERSISTtimeout (NULL = forever )evsignal_

Here is a small C program reading from a UNIX socket and intercepting the SIGINT signal for a graceful shutdown. We use the cleanup attribute supported by gcc and clang to keep the error handling under control without RAII or a defer operator.

#include <signal.h>
#include <stdio.h>
#include <string.h>

#include <sys/socket.h>
#include <sys/un.h>

#include "event2/event.h"

#define BUFFER_SIZE 1024

typedef struct Buffer {
  char data[BUFFER_SIZE];
  size_t length;
} Buffer;

void on_read(evutil_socket_t fd, short events, void *ctx) {
  Buffer *buffer = (Buffer *)ctx;
  ssize_t bytes_read_count = read(fd, buffer->data, BUFFER_SIZE);
  printf("Read %ld bytes\n", bytes_read_count);
  if (bytes_read_count >= 0) {
    buffer->length = bytes_read_count;
  }
}

void on_signal(evutil_socket_t fd, short events, void *ctx) {
  printf("stopping event loop\n");
  struct event_base *base = (struct event_base *)ctx;
  event_base_loopbreak(base);
}

#define defer(func) __attribute__((__cleanup__(func)))

#define RETURN_ON_ERROR(result, message)                                       \
  if ((result) == -1) {                                                        \
    perror(message);                                                           \
    return 1;                                                                  \
  }
#define RETURN_ON_NULL(result, message)                                        \
  if (!(result)) {                                                             \
    perror(message);                                                           \
    return 1;                                                                  \
  }

void clean_up_fd(int *fd) { close(*fd); }
void clean_up_event(struct event **ev) { event_free(*ev); }
void clean_up_event_base(struct event_base **base) { event_base_free(*base); }

int main() {
  int fd defer(clean_up_fd) = socket(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK, 0);
  RETURN_ON_ERROR(fd, "could not open socket");

  struct sockaddr_un addr;
  addr.sun_family = AF_UNIX;
  strcpy(addr.sun_path, "/tmp/foo");

  int result = connect(fd, (struct sockaddr *)&addr, sizeof(addr));
  RETURN_ON_ERROR(result, "could not connect to socket");

  Buffer buffer{.length = 0};
  struct event_base *base defer(clean_up_event_base) = event_base_new();
  RETURN_ON_NULL(base, "could not create libevent event base");

  struct event *file_event defer(clean_up_event) =
      event_new(base, fd, EV_READ | EV_PERSIST | EV_TIMEOUT, on_read, &buffer);
  RETURN_ON_NULL(file_event, "could not create read event");
  struct timeval timeout = {1, 0};
  RETURN_ON_ERROR(event_add(file_event, &timeout), "could not add read event");

  struct event *signal_event defer(clean_up_event) =
      event_new(base, SIGINT, EV_SIGNAL, on_signal, base);
  RETURN_ON_NULL(signal_event, "could not create signal event");
  RETURN_ON_ERROR(evsignal_add(signal_event, NULL),
                  "could not add signal event");

  printf("starting event loop\n");
  event_base_loop(base, 0);
  printf("event loop stopped\n");

  if (buffer.length > 0) {
    printf("last data read: %s\n", buffer.data);
  }
}

The program opens an existing UNIX socket (that can be created with nc -l -U /tmp/foo), creates the libevent dispatcher, and registers an event reading from this socket with a 1 second timeout. It also registers a signal event for SIGINT (Ctrl-C on Linux) that exits the event loop with event_base_loopbreak. The two callbacks demonstrate how we can pass arbitrary data such as the buffer or the event base to the callback functions.

Using libevent's buffer functionality, we could also write a TCP server or client with little effort. However, libevent does not offer asynchronous file IO. If we need file IO, we have to resort to multi-threading or jump to the newer asynchronous Linux APIs such as io_uring to keep the blocking functions out of the event loop.

Envoy event dispatching

Now that we have seen how libevent works (at least on the event level), we can turn to Envoy's event dispatcher implementation. The following diagram shows some of the key classes.

envoy-dispatcher-dark

Envoy's Dispatcher interface encapsulates an event loop. Besides the lifecycle methods such as run and exit, this interface allows for registering callbacks for the various event types (files, timers, signals) and offers higher-level methods for managing server and client connections. There are two related interfaces: The Scheduler, which provides a single method to register a timer, and the CallbackScheduler, which allows for scheduling a callback for immediate (within the current event dispatch cycle) or asynchronous execution.

LibeventScheduler implements the two scheduler interfaces by wrapping a libevent event_base, and DispatcherImpl, the implementation of the Dispatcher interface, owns an instance of the LibeventScheduler.

Let's take a closer look at Envoy's use of the libevent API. The event_base_loop is called from the LibeventScheduler implementation.

void LibeventScheduler::run(Dispatcher::RunType mode) {
  int flag = 0;
  switch (mode) {
  case Dispatcher::RunType::NonBlock:
    flag = LibeventScheduler::flagsBasedOnEventType();
  case Dispatcher::RunType::Block:
    break;
  case Dispatcher::RunType::RunUntilExit:
    flag = EVLOOP_NO_EXIT_ON_EMPTY;
    break;
  }
  event_base_loop(libevent_.get(), flag);
}

We can find the creation of the file (socket) events in the assignEvents method of the FileEventImpl. The FileEventImpl contains the libevent event structure as a plain attribute (not a pointer) in its ImplBase base class and therefore uses event_assign rather than event_new. The callback passed to event_assign is the closure calling the mergeInjectedEventsAndRunCb method, and the callback object is the FileEventImpl itself.

void FileEventImpl::assignEvents(uint32_t events, event_base* base) {
  ASSERT(dispatcher_.isThreadSafe());
  ASSERT(base != nullptr);

  enabled_events_ = events;
  event_assign(
      &raw_event_, base, fd_,
      EV_PERSIST | (trigger_ == FileTriggerType::Edge ? EV_ET : 0) |
          (events & FileReadyType::Read ? EV_READ : 0) |
          (events & FileReadyType::Write ? EV_WRITE : 0) |
          (events & FileReadyType::Closed ? EV_CLOSED : 0),
      [](evutil_socket_t, short what, void* arg) -> void {
        auto* event = static_cast<FileEventImpl*>(arg);
        uint32_t events = 0;
        if (what & EV_READ) {
          events |= FileReadyType::Read;
        }
        if (what & EV_WRITE) {
          events |= FileReadyType::Write;
        }
        if (what & EV_CLOSED) {
          events |= FileReadyType::Closed;
        }
        ASSERT(events != 0);
        event->mergeInjectedEventsAndRunCb(events);
      },
      this);
}

Most of the code handles the conversion between Envoy's FileReadyType and FileTriggerType and libevent's event type constants. Note that Envoy always sets the EV_PERSIST flag, that is, all file events stay in the event loop after becoming active and have to be explicitly removed (using event_del) at some point. This is done in the destructor of Envoy's event base class ImplBase.

FileEventImpl's constructor first creates the event with assignEvents and then adds it to the event loop with event_add.

Envoy uses the libevent API in a similar fashion for timers and scheduled tasks. The main difference is that these events are not added automatically to the event loop. Instead, the Timer and SchedulableCallback interfaces offer methods to enable and disable the events explicitly. These methods call libevent's event_add and event_del functions, respectively, to add and remove the event from the event loop.

As we can see, Envoy's event management is a thin layer on top of libevent, shielding the rest of the application from the libevent API and taking care of the event lifecycles. The Envoy team considered moving to a newer event library during the early years, but for now libevent is working just fine, and any change to such a fundamental aspect of Envoy not worth the effort and risk.