The Em* Event System
Em* has been designed to build complex, reactive systems. We chose an event-driven
structure to handle concurrency. This page describes some of the details
on how to use this event system. Most of the Em* libraries are built to integrate
with the event system, enabling easy integration of different libraries.
A Word About Threads
Before getting to the usage details, we should address the main competitor
to event-drive structures: threads. Many systems use threads to manage
concurrency. Threads bring two advantages:
- Threads are truly concurrent; a slow acting thread will not impede
other independent threads from operating in a timely way.
- Threads enable a more direct programming style, such as a sequence
of blocking calls, that in an event model would be a sequence of chained callbacks.
However, there is a hidden cost to threads: synchronization and deadlock
avoidance. Because threads run concurrently, they must negotiate use of shared
data structures. For simple uses of threads, that don't need to coordinate
shared data structures, threads are easier to use, but when inter-thread communication
is needed, the complexity skyrockets.
Reactivity, one of the key design requirements of Em*, essentially boils
down to inter-thread communication as the common case. For example, when a
lengthy operation is undertaken, reactivity means that if arriving data makes
that operation irrelevant, it should be interrupted. Implementing this correctly
with threads is much more difficult and error-prone than the corresponding
event-based implementation. For a more complete and lucid discussion
of the difficulties of threads, see Ousterhout 96 and ESR 03.
The Big Picture of Em* Design
Em* follows the UNIX design philosophy: combat complexity with many small
modules connected by clean interfaces. Em* modules are implemented in separate
processes with separate process spaces. Clients of Em* modules use thin, lightweight
client libraries to connect to other modules. This structure provides fault
isolation between clients and servers, and simplifies the client libraries,
reducing the possibility that bugs or unforseen interactions in library code
cause problems for applications. More information on the big picture can
be found in Girod
03 and Elson 03.
Individual modules form a dependency graph that is expressed in the EmRun
config file. This dependency graph must be free of cycles. That is, a module
can't be a client of some other module that is a client of the original module.
Such a loop in the dependency graph would potentially cause deadlock,
because while a process is making a client call, it can't be responsive as
a server. FUSD will detect this case and prevent it by refusing the client
connection. While this restriction appears at first to be severe, in fact
it is always possible to get around this limitation by structuring your system
differently. If the dependency graph disallows module A being a client of
module B, the transaction can usually be reversed by module A instead providing
a service to module B. Notification and callback can be exchanged for
an active call. In practice, we have rarely found this issue to be a problem.
While separate modules run concurrently, the modules themselves are typically
implemented using a single-threaded, event-driven structure. An event-driven
implementation simplifies the implementation of reactive modules by eliminating
the need to worry about synchronization and deadlock: no event handler will
ever be interrupted to run another event handler, and no event handlers run
"concurrently". The underlying sources of events (e.g. IPC from other modules,
events from devices, etc.) are serialized at the kernel interface. Thus,
so long as every event handler leaves shared data structures in a valid state
when it terminates, there is never a need to worry about whether a data structure
will be valid when it is accessed.
The main caveat to this structure is that in order to remain responsive,
no event handler can run for a long time. Direct system calls that are likely
to block should be replaced by a notification-based scheme. For example, in
place of a blocking read() call, use an event to trigger when the file is
readable and then call read(). In practice, the key phrase above is
"likely to block". If domain knowledge suggests that blocking is unlikely
(or would only happen because of a bug), the system call is generally used
directly.
The other caveat is processes that are by nature long-lived. For instance,
a module might need to perform a lengthy computation for which segmentation
into bite-sized chunks is impractical. In these cases, where concurrency really
becomes handy, a thread is used. A message queue event can be used to enable
the computation thread to communicate with the "main" thread that is running
the event system. Even in this reduced case, working out the synchronization
can be tricky -- but not nearly as bad as using threads pervasively.
Relationship to GLib
The Em* event system is a wrapper around GLib, that provides a slightly
different interface. However, because it is a wrapper, it is 100% compatible
with GLib events, and GLib-based code can be used together with Em* libraries.
The purpose of the interface changes is to simplify them and tune it for the
common cases of embedded networked systems. GLib was designed to run GTK
GUI apps, and the native simplified interface they provide didn't have quite
the right API for building network protocols and reactive systems. The other
change relative to GLib has to do with memory management. GLib uses a cumbersome
reference-counting scheme to track memory usage. Rather than support a full
reference counting scheme, we implement a much simpler scheme that handles
a subset of the problem that represents the common case for our applications.
Em* Initialization and the Main Event Loop
An Em* main() function takes a routine form, that is typically cut-and-pasted
from one module to the next. The main() function is typically composed
of the following steps:
- Initialize the module's state variables
- Call misc_init(). This will initialize certain Em*
libraries and facilities.
- Process command-line arguments
- Initialize the events to be handled
- Call emrun_init(). This will connect to EmRun and enable
graceful shutdown and enable EmRun to start other modules that depend on this
one.
- Call g_main(). This will start up the event loop. This
function never returns; all activity from now on occurs when events are triggered.
An example of this can be found in the skeletons
directory.
Event Initialization
Events are created and installed by event constructors. These constructors
typically take an options structure that defines the configuration of the
event, and fill in an "event reference", which is a handle on the event. This
handle can be used to reconfigure the event, or perform other operations
on it. The nature of these event references will be discussed in detail
in the next section.
Examples of events are:
- A timer that fires once per second.
- A timer that fires 4.25 seconds from now.
- An event that triggers when a particuar file descriptor becomes readable.
- An event that encapsulates an interface to a neighbor discovery service,
and reports a new list of neighbors whenever a change occurs in the neighborhood.
As you can see, these events encapsulate varying amounts of complexity and
domain knowledge; whereas timers are very simple, the neighbor discovery client
event encapsulates the knowledge of how to connect to the neighbor discovery
service, how to receive notification of new data, and how to read the new
data. This capability to encapsulate arbitrary mechanism within an
"event" enables very simple APIs to be constructed that encapsulate complex
access protocols.
The other advantage of events is that independent events can be integrated
together very easily. Individual Em* events rarely need to know anything about
the other events in the system in order to integrate properly. Through the
GLib event system, all events from timers to notification events configure
hooks in GLib's central select() loop, and then are dispatched accordingly.
Event References
By convention, all event constructors take the address of an "event reference"
as their final argument. This reference is a handle that enables the event
to be asynchronously reconfigured, accessed, or destroyed. For example,
the g_timer_add() function creates a timer event. Using the reference, the
timer may be asynchronously canceled using the g_event_destroy() function.
For another example, the g_status_dev() function creates a status device.
A reference to that status device can be used to cause notification on that
device, or to destroy it. In the sense that it is passed to "method" functions,
the event reference is similar to an "instance" of an object class.
GLib internally uses reference counting to address the variety of conditions
that can arise when GLib applications need asynchronous destruction or multiple
independent references. However, reference counting can be very tricky to
use, leading to subtle memory bugs. Automated forms of reference counting
always fall short as well, most notably when confronted with cyclic linked
structures.
After gaining some experience with early versions of Em*, we found that
we never used the "multiple reference" capability of reference counting,
but that we did need to support asynchronous destruction. That is, we found
that all of our applications could get by storing only one copy of a pointer
to the event, but that our software often needed to destroy an event from
several different places in the code. While events are never run concurrently,
it's possible that an event might be destroyed down inside a call chain,
and then be referenced later by the caller.
Given this insight, we implemented event references to support asynchronous
destruction, but not multiple reference. Thus, each event has a single
"owner" who holds the pointer to it. Anyone may obtain that pointer
from the owner's structure, and use it, even to destroy the event. If the
pointer obtained is NULL, that means the event does not exist. Once obtained,
the pointer should only be used while it is known to be valid; if you call
out to unknown library code, you should re-acquire the pointer on your return.
When the event is destroyed, the pointer in the owner's structure will automatically
be zeroed, so that it can't be used again.
This sounds complicated, but it is actually very simple, as long as a few
simple rules are followed.
Rules for event "owners":
- Allocate the event reference in a fixed location (typically
in the owner's state structure).
- Initialize the event reference to NULL.
- Pass the (fixed) address of the event reference to the event constructor.
- Never "move" or reallocate the structure containing
the reference.
- Before deallocating the structure containing the reference, call the
appropriate destroy() function to destroy the event.
Rules for using event references:
- Don't store a copy of an event reference, except temporarily. If you
call out to code that might destroy the event, acquire a fresh copy before
using it again. When possible, use the reference directly from its fixed location.
- If the reference is NULL, that means the event has been destroyed.
However, you need not worry about this in general, because methods and destructors
silently ignore NULL references. (For example, it is always safe to
destroy a destroyed reference.)
These rules place slight restrictions on the memory that holds event references,
and disallow multiple independent copies of the event reference (i.e. requires
a single "owner"). However, compared with the potential for reference-counting
related bugs in a full-blown reference counted framework, this seemed a small
price to pay.
NOTE: If you will never need to destroy or access the event, you can pass
NULL in to the event reference argument of the constructor. In that case,
no reference will be saved.
NOTE: Some data structure libraries, such as STL, may move memory "under
the covers". If stored event references are moved this will cause problems.
Most Em* modules use dynamically allocated structures kept in linked lists
to avoid this problem.
The Main Event Loop
After initialization, all activity in an Em* system originates with the
event loop. This is similar in structure to the scheduler in a multitasking
OS. The implementation of the main loop is a part of GLib; it is documented
in the GLib
API reference. Applications that need to control the details of the scheduling
can use optional arguments provided throughout the Em* API. Event-based programming
can take a bit to get used to for programmers accustomed to thread-based programming,
but once the control flow is understood, it should become natural.
Installing a Timer Event
The simplest form of event is the timer event. This event has a single callback
that is called when the specified time expires. The callback may set the timer
to retrigger, or terminate the event, depending on its return value. In addition,
a timer can be canceled at any time by calling g_event_destroy() with the
timer's event reference.
Setting a Timer
To set a timer, call the g_timer_add() function. For example:
g_event_t *timer_ref = NULL;
int cb_data = 5;
int status =
g_timer_add(1000, /* 1 second (1000 ms)
*/
cb_func,
/* our callback function */
&cb_data,
/* a data pointer */
NULL,
/* some rarely-used options */
&timer_ref); /* an
event reference */
This call will set a one-second timer. If for some reason it cannot be set,
it will return -1. If it is not canceled before 1 second, the callback will
be called. The following call would cancel the timer:
g_event_destroy(timer_ref);
An example of a callback function is:
int cb_func(void *data, int interval, g_event_t *event)
{
int *count = (int *)data;
if (*count == 0) {
elog(LOG_INFO, "No more timeouts");
return TIMER_DONE;
}
(*count)--;
elog(LOG_INFO, "%d more timeouts", *count);
return TIMER_RENEW;
}
Given that cb_data starts out 5, after 1 second, this callback will print
'4 more timeouts', then wait one second, print '3 more timeouts', then wait
one seconds, etc. until it prints 'No more timeouts', and destroys the event.
You can see this behavior in action by running the program in skeletons/tutorial/timer.c
(obj.i686-linux/skeletons/timer). You may notice that after the timer finishes,
the program hangs. This is because, having destroyed the timer, there are
no further events registered, so the event loop is idle, and will stay that
way forever.
There are just a few additional details about timers. First, in addition
to renewing the timer for the same time interval, a timer callback can also
return TIMER_RENEW_MS(x) to reset the timer for a different interval.
Second, if no callback is specified when setting the timer, the timer event
will still be created, and will exist for the specified time, and then be
destroyed. This can actually be useful, because the function g_timer_isset(timer_ref)
will tell you if the timer is still set. Thus, this effectively sets a variable
"true" for the specified length of time.
Watching a File Descriptor
The second-simplest event to construct is one that watches for notification
on a file descriptor.