|
|
Jump to this file's LXR Page |
|
|
File: [CENS] / emstar / tos-contrib / sympathy / TODO
(download)
Revision: 1.33, Sat Mar 26 22:42:33 2005 UTC (4 years, 7 months ago) by nithya Branch: MAIN CVS Tags: rdd_alpha_version_1, pregeonet, acoustic-05-18-06, PRE_TOSNIC_FIX, PRE_64BIT, MOTENIC_PRE_BUGFIX_20050415, LAURA_CALIBRATION_EXPERIMENTS, HEAD, ESS_RELEASE_3_5, ESS_RELEASE_3_4, ESS_RELEASE_3_3, ESS_RELEASE_3_2, ESS_RELEASE_3_1, ESS_RELEASE_3_0, ESS_RELEASE_2_0, ESS_CONNECTIVITY, ESS_CENTROUTE_TESTING, ESS2-CMS-V1_5_pretest, ESS2-CMS-V1_4cMergeSympathy_2, ESS2-CMS-V1_4c, ESS2-CMS-V1_4b, ESS2-CMS-V1_4a, ESS2-CMS-V1_3, ESS2-CMS-V1_2, ESS2-CMS-V1_1, EMSTAR_RELEASE_2_5, CYCLOPS_RELEASE_CANDIDATE_2_0, CYCLOPS_PRERELEASE_STABLE, CENTROUTE_EMSTAR_SOCKETS, BG_1_0, BANGLADESH_ARSENIC_1_2, BANGLADESH_ARSENIC_1_1, AMARSS_JR_DEPLOYMENT_6_05_07 Changes since 1.32: +4 -0 lines Working version - node reboot/congestion detected, node dieing not because packet-dropping is not working yet. |
for paper:
- explain iterations (epochs) in order to ensure some state synchronization
- explain using time-awake to diff bet teboot/rollover/stale packets
- failure scenarios:
*** excessive congestion and 2-hop nodes result in insufficient data
delivered at the sink - correlate unexpected events: when have
next-hop with a fault - mention that!
Use #times tried to tx as additional indication of congestion.
Failure Scenarios:
- dont receive data from node - but its because we're not receiving data from its
next-hop
- also nobody claims that node as a neighbor!
INITIAL TESTED:
- when disable jitter - detects congestion, and when enable it,
congestion goes away (i think?)
=> add: no nodes claim node as a neighbor as a fault?
- results in a fault if nothing else
DONE - add to narrative/design file
too much congestion:
- check if mult nodes sharing same next-hop (
check if all nodes w same next-hop) have lots
of errors: then prob congestion issue
- this is a fault - can correlate with not enough data?
Faults detected:
- faults can be detected from 1-hop away
* too much congestion
- faults can be detected from anywhere
* all others
add conditions: too much congestion, route-flapping
asynch notification of - route-flapping
Testing Plan:
Node rebooting
Errors induced by Packet-Loss:
INSUFF DATA: Excessive packet loss due to channel/hidden terminal/unsynchronized
radios - this is addressed by the first option (telling a single node to
drop x% of packets from another node).
** NO DATA: node doesn't have next-hop so cant send data!! but neighbors have heard
it so we know it exists!
NO DATA: A node that doesn't have data to send - this should be done at the
sender: have the sender not send data out (this could happen at the comm
layer, or even at the application layer - may be more interesting at the
application layer - that way we culd have the situation where one type of
pckates are being sent - but another type are NOT being sent).
NODE DIE: A node that has died - this should also be done at the sender: have the
sender not receive/transmit any packets.
- dont care much about point failures - care if a failure
is persistent - how do we track this??
For tests:
add asynchronous events:
- too many errors all w same next hop
- routing a pkt for a node thats not a neighbor
- route flapping
if haven't heard from a node then:
- see how long it takes if we wait for nodes to drop it off their neighbor lists */
- flood request to get info on node (nodes that have routed pkts for it and
node itself, and neighbors should respond with:
num-pkts routed for node, last time it heard pkt from node - involves
tos-code that snoops lowest layer and understands multihop packets as
well as app-layer pkts to determine original source)
- if no nodes report it as a neighbor then record that!
command-logistics:
add ability to trigger a check to status-device
ability to user-interactive: "ping" a node - basically send a request for
metrics! command: ping=<node_id>
Trigger self-tests: each node should determine if its getting what it needs
for each module (i.e. ts pkts for ts, beacon pkts for routing,
dse-queries for application layer) - each module can export an
interface: trigger test and respond with yes/no.
record children (you know implicitly) as well as parent (next-hop)
Add set_next_hop() primitive that checks if data-loss is due to next-hop
selection - only do this if all nodes using a next-hop are losing
data
- check more info on surrounding nodes using same nh
Later Design
upon reboot - save last 5 events, and line number of segfault,
and number of seg-faults
- flood help message after seg-faulting > 1 time w/events
Experiments
-----------
- ping-quality:
are pings an accurate representation of link-quality - couldnt there be
a link that is allowing pings through but not larger data-packets through?
add a test where i compare the 'ground-truth' with measured link-quality to
number of packets making it through.
- Are there situations where a sink does not get data from a node, but its neighbors
can still talk to it? what are these situations?
Abstract Issues
- Don't make thresholding static: number-msgs need before declaring
a problem - instead, measure average time you are hearing from all
nodes, and if one node is some sigma away from this then it is a
problem. or if a node is irregular.
- Figure out a way to determine if number beacons corresponds with
actual long data packets that are getting through.
What I know:
------------
mh-hdr = 8B
when sink calls flood, nodes NodeI.recvFlood interface get called -
len of data is len of data sent (so nodes dont get multihop_hdr)
7th byte = type (5: DSE, 6: SYMPATHY)
mh data starts on 9th byte (first byte is: 1st byte)
packets passed up to dse/sympathy are TOS type 4 (TOS[4,125]),
beacons are TOS-1, and the debugging packets sent by dse are type TOS:2
DSE packets:
byte 9,10 = seqNo
byte 11,12 = srcAddr
byte 13 = type (1:TOPO_DATA)
byte 14 = pkt-length
if TOPO:
byte 15 = len of next-hop data
k
ext_type = 1 for Beacon
ext_type = 2:
struct is: uint16_t node_id,
uint8_t egress quality
sympathy-req:
>> src=0.0.0.2 dst=255.255.255.255 type=TOS[4,125] data_len=9 rssi=93 time=...
0000: 02 00 FF FF 01 00 06 03 03
| CENS CVS Mailing List |
Powered by ViewCVS 0.9.2 |