CS 213: Advanced Topics in 

Distributed Embedded Systems

Assignment 3: Node Counts and Aggregation in Em*

Due Date: February 2nd, 11:59pm PST

This assignment will continue to introduce Em*, continuing our work from Assignment 1. By now you should be familiar enough with the setup of Em*, so this assignment is more focused on implementation.

As we have discussed in class, Em* requires Linux 2.4 with DevFS installed. If your home machine runs Linux with DevFS, you can use it for this assignment. This would have its advantages given the load on the class simulation servers, but beware that the installation might be time-consuming. If you want to give it a try, see this page

IMPORTANT: We are using a simulation environment in this assignment.  However, the object of Em* development (and of this assignment) is to develop code that works outside of simulation.  While the simulation and the visualizer both have an omnicient view of the system, and the nodes are physically on the same machine, you must NOT leverage any of those artefacts of simulation to solve your problem.  When you are writing your code, think of it as running on a real node, separate from the others.. the point of Em* is that it could be deployed unchanged.

1. Recap from Assignment 1

The first assignment led us to write a simple application to count the nodes in our network. The object of this effort was to produce, using only the Em* "flooding" service, a scalable solution that would work well for a wide range of network sizes, densities, and topologies. This was an interesting problem because while the idea is simple, and the interface to flooding is simple, there are a great many different approaches to solve the problem, and finding a good solution turns out to be quite challenging.

However, it is also obvious that flooding back and forth is not really a very good way to count nodes. In this assignment, we will first devise a more efficient solution to this problem, based on an aggregation tree. Then, we will ask you to try it with collisions turned on, and fix it, similar to last time. Then, we will analyse your new algorithm, comparing it with the flooding algorithm from assignment 1. The last parts of the assignment implement a timeout scheme to "un-count" nodes that have disappeared, and finally some discussion of some of the ideas we have learned.

We are not giving out any skeleton code for this assignment. However, you are free to use the solutions from assignment 1 as a starting point, or any other publically available Em* code for that matter.

2. An Aggregation Tree

Many sensor network protocols are based on the concept of a "sink tree" (e.g. Directed Diffusion, TinyDB, and TASK). In this assignment, we will implement a sink tree that aggregates a count of the nodes in each subtree.

The algorithm we suggest is for the sink to send a flooded request, just as in assignment 1. A node that receives that request, rather than flood back, will send messages back one hop to its parent in the tree. This is done by directing the message to address in the "prev_hop" field of the incoming flooded message, sending at the link layer. The message sent from child to parent contains a count of the child's complete subtree. At the parent, messages from children will be received and recorded. Based on these messages, the parent can compute an aggregated count of the nodes in the entire subtree. This aggregate is then passed on to its parent, and reported through status information.

2.1 Using the Link Layer

In Assignment 1, we had two connections open to flooding, one for requests and the other for replies. In this assignment, we will still have the connection for requests, but for replies we will connect to the link layer rather than the flooding layer. To do this, we must call link_open() with the LINK_DEV_DATA link type, rather than LINK_DEV_FLOOD.

Otherwise, sending and receiving messages works the same as before. The only other difference in interface is that the link layer uses "interface addresses" for src and dst, rather than node IDs. You may find it helpful to include the node ID of the peer in your message, so that it is easier to figure out the ID of the node that sent you a message.

2.2 Some Hints

Unlike last time, we aren't providing you with a skeleton. However, you are free to start with the solutions from Assignment 1 as a skeleton, and work from there. To help you out, here are some hints and snippets from my implementation. Not all implementations will follow the same approach, but maybe these will help you figure out whether you're on the right track.

I used the following structure for my reply header:

struct count_msg_reply { 
  node_id_t sending_node; /* The node ID of the sender */
  node_id_t sink;         /* the node ID of the sink */
  uint16_t seqno;
  uint16_t total;         /* Total for sender's tree */
  uint16_t expires;       /* Lifetime in seconds */
};

For each neighbor, I stored the following information:

struct neighbor {
  if_id_t id;         /* neighbor's interface ID (link layer ID) */
  node_id_t node_id;  /* neighbor's node ID */
  int subtree_count;  /* neighbor's reported tree count */
  int expires;        /* absolute expiration time in seconds */
};

As in the case of assignment 1, there are several different approaches to this problem. For example, some methods attempt optimal aggreagation by sending a single packet chain from the leaves of the network, but pay a cost in latency. Other methods might be simpler, possibly with lower latency, but pay a cost in additional traffic.

2.3 Debugging and Testing Your Code

As always, it's best to test with small numbers of nodes. Try using the count-test.conf file from Assignment 1 to test your new code without collisions.

Note: if you are having problems with crashes, you can keep and analyse core files. To enable cores, issue the command "ulimit -c 100000000" in the shell that you run the simulation. Then, use GDB to examine the core.

Submit a tree aggregation implementation that works without collisions. (10 points)

3 Collisions

Once again, turning on collisions may cause problems for your algorithm. Implement a fix. How efficient can you make your scheme? Can you trade off latency for generated traffic?

Hint: There are several ways to solve this problem, depending on the solution you implemented. The "optimal aggregation" solution may not have too many problems with collisions in the first place. Other solutions that generate more traffic may be subject to collisions. Possible schemes to increase robustness include simple periodic retransmissions, explicit acknowledgements, and passive acknowledgements. You may need to apply similar desynchronization schemes to those used in Assignment 1.

You may be best starting with something simple and seeing how it works. Whichever algorithm you use, be sure to include a definite stopping condition. Otherwise it will be difficult to compare your solution with the flooding algorithm (part 4). For example, for a periodic retransmission scheme, the stopping condition could simply limit the number of retransmissions.

Submit a tree aggregation implementation that works with collisions. (10 points)

4 Analysis

In this part, we will analyse the performance of the new scheme, and compare it to the performance of the flooding implementation from the solutions. Test your algorithm with 25 node networks, for field sizes 15, 20, and 25. Measure the latency to achieve an accurate count, as well as the traffic. Then, compare these results with those in the solutions. Be sure to count floods correctly (i.e. deduce the number of packets transmitted).

If you are running on your own machine, or if the simulation servers are not loaded, you can also try larger networks.

To facilitate this measurement, we are providing an analysis tool, in skeletons/sim/analyser.c . This tool monitors all of the running count applications and reports whether your sink is getting the right answer. To get and build this code, do a cvs update and rebuild. To run analyser, give it an argument -n <number of nodes>, e.g. 25.

Submit graphs of latency and traffic load, comparing your algorithm to the flood-based algorithm from the solutions. Include raw data and any relevant tuning parameters. (10 points)


Extra Credit: If your algorithm has tunable knobs, experiment with them and show how they impact latency and traffic. (2 points)

5 Node Death

When a node dies, it will not necessarily be removed from the count. Devise a way to expire neighbors after a certain interval, so that they are removed from the count in a timely manner.

When your modification works, measure the detection latency for a 25 node net with a field size of 20.

If you are interested in scripting the simulator to kill nodes for you, see this page for information on how this can be done.

Submit a graph of latency. Include raw data and any relevant tuning parameters. (5 points)
 
Explain how would you implement an efficient mechanism that could introduce new nodes (or returned nodes) to the count? (5 points)

6 Discussion

Flooding is not the ideal building block for this application. Why? What limitations does it impose that are cumbersome for this application? (3 points)

How could you customize the algorithmic behavior of flooding to enable our application to perform better? (3 points)

What would be a simple interface to this new version of flooding? (2 point)

What algorithms proposed in the reading might be appropriate here, and how could they be applied? (2 points)

7 Extra Credit: Sink Election

In a real distributed system, several nodes may need to know the current count. Describe how a sink election scheme might be implemented. Can you make your implementation correctly handle multiple competing sinks and sink refresh? (3 points)

Submission Instructions

You should submit ONE copy of the code. If you have multiple behaviors, please invoke them via command line options and document them.

Please submit your homework as a .tgz file using the following scheme:
  1. Submit your written answers as a .doc, .txt, or .pdf file.  Be sure to include your name and student ID at the top of the file.
  2. Name your file in the source tree, as emstar/cs213-W04/ps3-answers.XXX
  3. Tar up your portion of the source tree using these commands:
    cd emstar
    tar -czvf Last_First_213-W04-3.tgz cs213-W04
  4. Place the file in a web-accessible place and mail me the URL, or mail me for alternative arrangements.