Neighbor discovery is a critical piece of any truly ad-hoc, dynamic system. If we imagine a huge bag full of nodes being dumped on a field from an airplane, it's probably clear that pre-configuring them with a particular network topology or neighbor list isn't going to work. In addition, dynamics makes the problem much more difficult: nodes can run out of batteries, fall into water, be carried away by wind, or get eaten by a wild boar. These changes are hard to predict in advance.
Even if we could predict exactly where nodes will be geographically, the vagaries of RF propagation would still prevent us from predicting the resulting topology. When writing emstar applications, it's important to keep this in mind.
Wrong mindset: I need to design a network topology makes solving the problem easy. Then, I can put a configuration file on each node, telling it what my pre-computed topology is.Right mindset: The nodes we deploy will end up in some arbitrary arrangement. When a node activates, it needs to figure out who's around, and try to do useful work with whatever resources are available.
(The so-called ``wrong mindset'' can often be the right solution in Internet applications, where it's possible to explicitly set up a particular topology. But in sensor networks we can't assume the ability to engineer such controlled deployments. This is one aspect of design for sensor network software that can make it more challenging and interesting than Internet design.)
Neighbor discovery is an important part of the ``right mindset.'' When a node is activated, it must use neighbor discovery to figure out what other nodes are around. With a neighborlist, additional logical structures can be formed -- for example, routing tables can start to form if nodes transmit their neighborlists to their neighbors.
Recall from our earlier discussion what we mean by hard state: once we learn a fact about the state of the network, we assume that it remains true until we are told otherwise. For example, when a neighbor tells us ``I am here!'', we assume the neighbor is there until it explicitly tells us ``I am leaving now.'' In contrast, a soft state approach is one where a neighbor transmits ``I am here!'' messages periodically. If several periods pass without hearing this message, we assume the neighbor is gone.
Hard state is much more efficient, of course: a hypothetical hard-state neighbor discovery uses only a few packets on startup, and then rarely sends packets again. However, it is not robust. What if a node fails without being able to send a signoff message? What if a node moves out of range? What if the signoff message is corrupted? In all these cases, a node will think the neighbor is still there, even though it is not. And what if the receiver node fails -- for example, if it reboots, losing its neighbor list? Neighbors won't necessarily know that a node has rebooted, and therefore won't send new ``I am here'' messages. What if the sign-on message is corrupted? What if the message is valid, but the Node ID is incorrect? Other, more subtle problems can arise as well: what happens if there is a bug in the neighborlist code that causes entries to be lost off a linked list? In these cases, the node will not be aware of neighbors that it has.
Many of these failure modes can be fixed using some scheme or other -- checksums, request-response protocols, explicit reboot-recovery modes, sign-on and sign-off messages, three-phase commits, and so forth. Each of these special cases adds complexity; it adds up to what can be an unmanageable system.
In contrast, soft-state protocols make no distinction between normal operation and failure recovery. A neighbor will periodically transmit ``I am here!'', and receivers update their neighbor lists accordingly. A neighbor on the list who hasn't been heard from in some amount of time is removed from the list. Despite its simplicity, every failure mode we described above is handled gracefully---including the scenario where a bug in the code causes entries on the neighborlist to be deleted. In other words, the protocol can even be robust to bugs in its implementation! This ``self-healing'' property can vastly decrease the system's complexity; it's a benefit that is often (although not always) worth the cost in efficiency.
A side benefit we get from soft state is that it automatically handles dynamics. Imagine that two nodes are on the border of their radio ranges, and initially can't communicate with each other. Then, due to some environmental change, the RF propagation improves and the two nodes are able to communicate. In a soft-state protocol, this is automatically handled the right way---the new neighbor appears, without an explicit mode for handling new neighbors. This ability to handle new configurations as conditions change is critical.
The emstar neighbor discovery service is designed in a soft-state manner. However, it's important that your applications---things that use neighbor discovery---are designed with a similar mindset.
Wrong mindset: When a node boots, it first runs a neighbor discovery algorithm. When neighbor discovery is finished, the node runs (for example) a routing algorithm.Right mindset:The neighbor discovery algorithm is always running. Every time it updates the neighbor-list, the layers above should update their state (e.g., recompute routes).
One-time configuration leads to brittle applications.
One objection to the ``right mindset'' is that it can be wasteful of CPU time. For example, when a node first boots, it is likely to experience a flurry of neighborlists updates as it learns which other nodes are in the area. This can lead to a large number of, say, spurious recomputations of the routing table that immediately get thrown away in favor of newer data.
A nice solution to this problem is using a holdoff timer, or a delay between receiving new state and acting on it. For example, an application can set a 5-second timer when it receives a new neighborlist. When the holdoff timer expires, the node can do its expensive recomputation on the most recent neighborlist received -- which might be a neighborlist that was subsequently received, after the timer was set. (Neighborlists that are received during holdoff should be saved, but should not reset the holdoff timer, or start a new one.) This way, route recomputation is done at most only once per holdoff period.
An application opens a connection to the neighbor discovery service by calling g_neighbors, passing an options structure that specifies which link we're interested in, and a callback to be used when new neighborlists are available. When the neighborlist changes, the user's callback is activated, and is passed an array of neighbor_t structures, and a count which gives the size of the array. Each entry in the array represents one neighbor reachable on the selected link. The callback is responsible for freeing the neighbor list (using free()) when it is done using it.
Each time the user's callback is activated, it gets the complete list of neighbors---not a ``diff'' to the previous list. There are no ``new neighbor arrived'' or ``neighbor left'' messages. This fits into the soft-state philosophy: by providing the complete list each time, we can be sure that any problems in processing the previous list are corrected by the newer one.
The neighbor_t structure has members that give you information about the neighbor, including:
In the previous examples (for example, when learning how to use ping and pingd), we saw that EmRun's configuration file describes the modules of the system we wish to use, and their interdependencies. neighbortab is a configuration file similar to the one in the basic link example, but which also tells emrun we want to use the neighbor discovery service. It looks like this:
udpd {
type = daemon;
cmd = "link/udpd";
no-sim; # the simulator provides its own link-layer driver
}
neighbord {
type = daemon;
waitfor = udpd;
cmd = "link/neighbord";
# uncomment the line below if you want to see more detail
# about what the neighbor discovery app is doing
# loglevel = LOG_DEBUG(1);
}
neighbor-app {
type = daemon;
waitfor = neighbord;
cmd = "link/examples/neighbor-app";
}
As in the pingd program's emruntab, the first block runs UDPd, which
is the link layer driver for running over a ``real'' Ethernet
network. The next block runs neighbord---the neighbor
discovery service itself. The line "waitfor = udpd", as in the ping example,
means that the neighbor discovery service should not be started until
udpd is running. Finally, the neighbor-app program is run after the
neighbord service is running (due to the "waitfor = neighbord" line).
Now, as with the ping example, we can log into two machines (e.g., dragon and tobiko, or two IPAQs), and run the following command on each computer (from the obj.i686-linux directory):
BLUE:/home/jelson/projects/emstar/obj.i686-linux(5044) emrun/emrun ../link/examples/neighbors/neighbortab Tue Feb 4 17:32:39.416: emrun: emrun/emrun: starting (node id 1400) Tue Feb 4 17:32:39.420: emrun: Boot complete... starting normal processes Tue Feb 4 17:32:39.422: emrun: udpd launched, pid 23832 Tue Feb 4 17:32:39.427: udpd: link/udpd: starting (node id 1400) Tue Feb 4 17:32:39.430: emrun: neighbord launched, pid 23833 Tue Feb 4 17:32:39.433: udpd: connecting /dev/link/0/data to eth0 port 6950 Tue Feb 4 17:32:39.437: neighbord: link/neighbord: starting (node id 1400) Tue Feb 4 17:32:39.441: emrun: neighbor-app launched, pid 23834 Tue Feb 4 17:32:39.444: neighbord: running, sending with period 6648 to /dev/link/0/data Tue Feb 4 17:32:39.448: neighbor-app: link/examples/neighbor-app: starting (node id 1400)As in previous examples, we can go to another shell and cat the emrun status device (/dev/emrun/status) to check that the daemons are running:
BLUE:/home/jelson/projects/emstar(4946) cat /dev/emrun/status Process Name Type Current Status Command ------------ -------- -------------- ------- udpd daemon pid 23906 link/udpd neighbord daemon pid 23907 link/neighbord neighbor-app daemon pid 23908 link/examples/neighbor-appIn this example, we have an additional status device available, too---the neighbor discovery service creates a file called /dev/link/0/neighbors that shows the neighbors discovered so far on that link. (Systems with multiple network interfaces will have /dev/link/1/neighbors, /dev/link/2/neighbors, etc.). We can check that device to see who's around:
BLUE:/home/jelson/projects/emstar(4947) cat /dev/link/0/neighbors Node Interface State Last HB Last Ack ---- ---------------- ---------- ------- -------- 1400 0.0.0.0 self -- -- 1003 131.179.144.103 active 1.97 3.75In this case, the columns indicate the node ID, the node's interface ID, the state of the neighbor, and the time since various transmissions were received from the neighbor. Neighborlists always have a "self" entry; this is convenient for some applications instead of having the node consider itself to be a special case.
The legal states are:
The "dead" state can be demonstrated by killing one of the emruns (on one computer). Within about 20 seconds, the other neighbor status device should reflect the change:
BLUE:/home/jelson(4990) cat /dev/link/0/neighbors Node Interface State Last HB Last Ack ---- ---------------- ---------- ------- -------- 1400 0.0.0.0 self -- -- 1003 131.179.144.103 dead 67.25 67.32This output shows it has been 67 seconds since we last heard from node 1003, and it is now considered "dead".
An example of such a configuration file is in emrun/examples/sim-generic, which looks like this:
# Number of nodes is arg 2
num-nodes = $2;
# Physical size of the simulated field is 20m^2
field-size = (20, 20);
# We want to simulate having MoteNIC available
sim-component = "sim/sim_mote -m circle";
# The default for all nodes is for them to be at a random position in
# the field, power on
node default {
position = (random(0, 20), random(0, 20));
power = on;
# "emruntab" is the emrun config file that you want the nodes
#to run. First argument.
emruntab = $1;
}
This is just like the simulator configuration we used for ping, but
the emruntab filename and the number of nodes in the simulation are
now both command-line arguments. So, we can run a 10-node
neighbor discovery simulation like this:
GAMA:/home/jelson/projects/emstar/obj.i686-linux(4950) setenv SIM_GROUP 5 GAMA:/home/jelson/projects/emstar/obj.i686-linux(4951) emrun/emsim ../emrun/examples/sim-generic ../link/examples/neighbors/neighbortab 10 Tue Feb 18 21:09:03.919: emrun/emsim: starting (simulator component) Tue Feb 18 21:09:03.947: main: 10-node simulation starting (loglevel NOTICE) Tue Feb 18 21:09:03.947: sim-component-0 launched, pid 2534 Tue Feb 18 21:09:03.950: libchannel_config: using circle channel model Tue Feb 18 21:09:03.952: motesim_new_config: now simulating 10 motes Tue Feb 18 21:09:04.460: simulator state 0 (t=0) now current Tue Feb 18 21:09:04.461: node 1 powering ON Tue Feb 18 21:09:04.461: node 2 powering ON Tue Feb 18 21:09:04.461: node 3 powering ON ...As with the ping example, we can look at /dev/sim/group5/config to see which random node positions were selected by the simulator:
GAMA:/home/jelson/projects/emstar/obj.i686-linux(5077) cat /dev/sim/group5/config Simulator Configuration State 0 Valid from beginning of simulation to end of simulation *** ACTIVE NOW *** Node On/Off Position (m) EmRun Configuration ---- ------ ---------------- ------------------------ 1 On ( 18.92, 19.14) ../link/examples/neighbors/neighbortab 2 On ( 13.16, 6.31) ../link/examples/neighbors/neighbortab 3 On ( 9.11, 19.88) ../link/examples/neighbors/neighbortab 4 On ( 8.71, 8.97) ../link/examples/neighbors/neighbortab 5 On ( 0.77, 17.89) ../link/examples/neighbors/neighbortab 6 On ( 6.05, 3.83) ../link/examples/neighbors/neighbortab 7 On ( 19.16, 15.66) ../link/examples/neighbors/neighbortab 8 On ( 1.59, 17.67) ../link/examples/neighbors/neighbortab 9 On ( 7.61, 11.44) ../link/examples/neighbors/neighbortab 10 On ( 10.69, 6.64) ../link/examples/neighbors/neighbortab ****And, as before, all the device files from simulated nodes are moved from their normal locations into the /dev/sim/groupXX/nodeYY directory, where XX is the simulator group and YY is the Node ID. So, for example, to see the neighbor discovery status of simulated node 4, we can do this:
GAMA:/home/jelson/projects/emstar/obj.i686-linux(5078) cat /dev/sim/group5/node4/link/0/neighbors Node Interface State Location Last HB Last Ack ---- ---------------- ---------- -------------- ------- -------- 4 0.0.0.0 self ( 8.71, 8.97) -- -- 2 142.116.0.2 active (13.15, 6.31) 2.35 5.04 9 199.148.0.9 active ( 7.60, 11.44) 2.52 5.04 10 191.89.0.10 active (10.68, 6.64) 2.20 5.04 6 69.3.0.6 active ( 6.04, 3.82) 5.31 8.44Of course, this human-readable output is purely for your own convenience in debugging and understanding the operation of the system. If your application needs a neighbor list, it should use the programmatic interface, as demonstrated in the neighbor-app.c source code -- the g_neighbors() function call we talked about earlier. (Don't parse this text output!)
Finally, it's useful to remember that we can also check on the overall status of simulated node 4, as we did in the ping example:
GAMA:/home/jelson(4934) cat /dev/sim/group5/node4/emrun/status Process Name Type Current Status Command ------------ -------- -------------- ------- udpd dead not simulated link/udpd neighbord daemon pid 26533 link/neighbord neighbor-app daemon pid 26543 link/examples/neighbor-appSimply press Control-C to terminate the simulation.