CS 213: Advanced Topics in
Distributed Embedded Systems
Assignment 3: Node Counts and Aggregation in Em*
Due Date: February 2nd, 11:59pm PST
This assignment will continue to introduce Em*, continuing our work
from Assignment 1. By now you should be familiar enough with the setup
of Em*, so this assignment is more focused on implementation.
As we have discussed in class, Em* requires Linux 2.4 with DevFS
installed. If your home machine runs Linux with DevFS, you can use it
for this assignment. This would have its advantages given the load on
the class simulation servers, but beware that
the installation might be time-consuming. If you want to give it a try,
see this page.
IMPORTANT: We are using a
simulation environment in this assignment. However, the object of
Em* development (and of this assignment) is to develop code that works
outside of simulation. While the simulation and the visualizer
both have an omnicient view of the system, and the nodes are physically
on the same machine, you must NOT leverage any of those artefacts of
simulation to solve your problem. When you are writing your code,
think of it as
running on a real node, separate from the others.. the point of Em* is
that
it could be deployed unchanged.
1. Recap from Assignment 1
The first assignment led us to write a simple application to count the
nodes in our network. The object of this effort was to produce, using
only the Em* "flooding" service, a scalable solution that would work
well for a wide range of network sizes, densities, and topologies. This
was an interesting problem because while the idea is simple, and the
interface to flooding is simple, there are a great many different
approaches to solve the problem, and finding a good solution turns out
to be quite challenging.
However, it is also obvious that flooding back and forth is not really
a very good way to count nodes. In this assignment, we will first
devise a more efficient solution to this problem, based on an
aggregation tree. Then, we will ask you to try it with collisions
turned on, and fix it, similar to last time. Then, we will analyse
your new algorithm, comparing it with the flooding algorithm from
assignment 1. The last parts of the assignment implement a timeout
scheme to "un-count" nodes that have disappeared, and finally some
discussion of some of the ideas we have learned.
We are not giving out any skeleton code for this assignment. However,
you are free to use the solutions from assignment 1
as a starting point, or any other publically available Em* code
for that matter.
2. An Aggregation Tree
Many sensor network protocols are based on the concept of a "sink tree"
(e.g. Directed Diffusion, TinyDB, and TASK). In this assignment, we
will implement a sink tree that aggregates a count of the nodes in each
subtree.
The algorithm we suggest is for the sink to send a flooded request,
just as in assignment 1. A node that receives that request, rather than
flood back, will send messages back one hop to its parent in the tree.
This is done by directing the message to address in the "prev_hop"
field of the incoming flooded message, sending at the link layer. The
message sent from child to parent contains a count of the child's
complete subtree. At the parent, messages from children will be
received and recorded. Based on these messages, the parent can compute
an aggregated count of the nodes in the entire subtree. This aggregate
is then passed on to its parent, and reported through status
information.
2.1 Using the Link Layer
In Assignment 1, we had two connections open to flooding, one for
requests and the other for replies. In this assignment, we will still
have the connection for requests, but for replies we will connect to
the link layer rather than the flooding layer. To do this, we must call
link_open() with the LINK_DEV_DATA link type, rather than
LINK_DEV_FLOOD.
Otherwise, sending and receiving messages works the same as before. The
only other difference in interface is that the link layer uses
"interface addresses" for src and dst, rather than node IDs. You may
find it helpful to include the node ID of the peer in your message, so
that it is easier to figure out the ID of the node that sent you a
message.
2.2 Some Hints
Unlike last time, we aren't providing you with a skeleton. However, you
are free to start with the solutions from Assignment 1 as a skeleton,
and work from there. To help you out, here are some hints and snippets
from my implementation. Not all implementations will follow the same
approach, but maybe these will help you figure out whether you're on
the right track.
I used the following structure for my reply header:
struct count_msg_reply {
node_id_t sending_node; /*
The node ID of the sender */
node_id_t
sink; /* the node ID of
the sink */
uint16_t seqno;
uint16_t
total; /* Total for
sender's tree */
uint16_t
expires; /* Lifetime in seconds */
};
For each neighbor, I stored the following information:
struct neighbor {
if_id_t
id; /* neighbor's
interface ID (link layer ID) */
node_id_t node_id;
/* neighbor's node ID */
int subtree_count;
/* neighbor's reported tree count */
int
expires; /* absolute
expiration time in seconds */
};
As in the case of assignment 1, there are several different approaches
to this problem. For example, some methods attempt optimal
aggreagation by sending a single packet chain from the leaves of the
network, but pay a cost in latency. Other methods might be simpler,
possibly with lower latency, but pay a cost in additional traffic.
2.3 Debugging and Testing Your Code
As always, it's best to test with small numbers of nodes. Try using the
count-test.conf file from Assignment 1 to test your new code without
collisions.
Note: if you are having problems with crashes, you can keep and analyse
core files. To enable cores, issue the command "ulimit -c 100000000" in
the shell that you run the simulation. Then, use GDB to examine the
core.
Submit a tree
aggregation implementation that works without collisions. (10 points)
3 Collisions
Once again, turning on collisions may cause problems for your
algorithm. Implement a fix. How efficient can you make your scheme?
Can you trade off latency for generated traffic?
Hint: There are several ways to solve this problem, depending on the
solution you implemented. The "optimal aggregation" solution may not
have too many problems with collisions in the first place. Other
solutions that generate more traffic may be subject to collisions.
Possible schemes to increase robustness include simple periodic
retransmissions, explicit acknowledgements, and passive
acknowledgements. You may need to apply similar desynchronization
schemes to those used in Assignment 1.
You may be best starting with something simple and seeing how it
works. Whichever algorithm you use, be sure to include a definite
stopping condition. Otherwise it will be difficult to compare your
solution with the flooding algorithm (part 4). For example, for a periodic
retransmission scheme, the stopping condition could simply limit the
number of retransmissions.
Submit a tree aggregation implementation that works with collisions.
(10 points)
4 Analysis
In this part, we will analyse the performance of the new scheme, and
compare it to the performance of the flooding implementation from the
solutions. Test your algorithm with 25 node networks, for field sizes
15, 20, and 25. Measure the latency to achieve an accurate count, as
well as the traffic. Then, compare these results with those in the
solutions.
Be sure to count floods correctly (i.e. deduce the number of
packets transmitted).
If you are running on your own machine, or if the simulation servers
are not loaded, you can also try larger networks.
To facilitate this measurement, we are providing an analysis tool, in
skeletons/sim/analyser.c . This tool monitors all of the running count
applications and reports whether your sink is getting the right answer.
To get and build this code, do a cvs update and rebuild. To run
analyser, give it an argument -n <number of nodes>, e.g. 25.
Submit graphs of latency and traffic load, comparing your algorithm to
the flood-based algorithm from the solutions. Include raw data and any
relevant tuning parameters. (10 points)
Extra Credit:
If your algorithm has tunable knobs, experiment with them and show how
they impact latency and traffic. (2 points)
5 Node Death
When a node dies, it will not necessarily be removed from the count.
Devise a way to expire neighbors after a certain interval, so that they
are removed from the count in a timely manner.
When your modification works, measure the detection latency for a 25
node net with a field size of 20.
If you are interested in scripting the simulator to kill nodes for you,
see this page
for information on how this can be done.
Submit a graph
of latency. Include raw data and
any relevant tuning parameters. (5 points)
Explain how
would you implement an efficient mechanism that could introduce new
nodes (or returned nodes) to the count? (5 points)
6 Discussion
Flooding is not
the ideal building block for this application. Why? What limitations
does it impose that are cumbersome for this application? (3 points)
How could you
customize the algorithmic behavior of flooding to enable our
application to perform better? (3 points)
What would be a
simple interface to this new version of flooding? (2 point)
What algorithms proposed in the reading might be appropriate here, and
how could they be applied? (2 points)
7 Extra Credit: Sink Election
In a real distributed system, several nodes may need to know the
current count. Describe how a sink election scheme might be
implemented. Can you make your implementation correctly handle multiple
competing sinks and sink refresh? (3 points)
Submission Instructions
You should submit ONE copy of the code. If you have multiple behaviors,
please invoke them via command line options and document them.
Please submit your homework as a .tgz file using the following scheme:
- Submit your written answers as a .doc, .txt, or .pdf file.
Be sure to include your name and student ID at the top of the file.
- Name your file in the source tree, as
emstar/cs213-W04/ps3-answers.XXX
- Tar up your portion of the source tree using these commands:
cd emstar
tar -czvf
Last_First_213-W04-3.tgz cs213-W04
- Place the file in a web-accessible place and mail me the URL, or
mail me for alternative arrangements.