1 for paper:
2 - explain iterations (epochs) in order to ensure some state synchronization
3 - explain using time-awake to diff bet teboot/rollover/stale packets
4 - failure scenarios:
5 *** excessive congestion and 2-hop nodes result in insufficient data
6 delivered at the sink - correlate unexpected events: when have
7 next-hop with a fault - mention that!
8
9 Use #times tried to tx as additional indication of congestion.
10
11 Failure Scenarios:
12 - dont receive data from node - but its because we're not receiving data from its
13 next-hop
14 - also nobody claims that node as a neighbor!
15
16 INITIAL TESTED:
17 - when disable jitter - detects congestion, and when enable it,
18 congestion goes away (i think?)
19 => add: no nodes claim node as a neighbor as a fault?
20 - results in a fault if nothing else
21
22 DONE - add to narrative/design file
23 too much congestion:
24 - check if mult nodes sharing same next-hop (
25 check if all nodes w same next-hop) have lots
26 of errors: then prob congestion issue
27 - this is a fault - can correlate with not enough data?
28
29 Faults detected:
30 - faults can be detected from 1-hop away
31 * too much congestion
32 - faults can be detected from anywhere
33 * all others
34
35 add conditions: too much congestion, route-flapping
36 asynch notification of - route-flapping
37
38
39 Testing Plan:
40 Node rebooting
41 Errors induced by Packet-Loss:
42 INSUFF DATA: Excessive packet loss due to channel/hidden terminal/unsynchronized
43 radios - this is addressed by the first option (telling a single node to
44 drop x% of packets from another node).
45
46 ** NO DATA: node doesn't have next-hop so cant send data!! but neighbors have heard
47 it so we know it exists!
48 NO DATA: A node that doesn't have data to send - this should be done at the
49 sender: have the sender not send data out (this could happen at the comm
50 layer, or even at the application layer - may be more interesting at the
51 application layer - that way we culd have the situation where one type of
52 pckates are being sent - but another type are NOT being sent).
53
54 NODE DIE: A node that has died - this should also be done at the sender: have the
55 sender not receive/transmit any packets.
56
57 - dont care much about point failures - care if a failure
58 is persistent - how do we track this??
59 For tests:
60 add asynchronous events:
61 - too many errors all w same next hop
62 - routing a pkt for a node thats not a neighbor
63 - route flapping
64
65 if haven't heard from a node then:
66 - see how long it takes if we wait for nodes to drop it off their neighbor lists */
67 - flood request to get info on node (nodes that have routed pkts for it and
68 node itself, and neighbors should respond with:
69 num-pkts routed for node, last time it heard pkt from node - involves
70 tos-code that snoops lowest layer and understands multihop packets as
71 well as app-layer pkts to determine original source)
72 - if no nodes report it as a neighbor then record that!
73
74 command-logistics:
75 add ability to trigger a check to status-device
76 ability to user-interactive: "ping" a node - basically send a request for
77 metrics! command: ping=<node_id>
78
79 Trigger self-tests: each node should determine if its getting what it needs
80 for each module (i.e. ts pkts for ts, beacon pkts for routing,
81 dse-queries for application layer) - each module can export an
82 interface: trigger test and respond with yes/no.
83
84 record children (you know implicitly) as well as parent (next-hop)
85
86 Add set_next_hop() primitive that checks if data-loss is due to next-hop
87 selection - only do this if all nodes using a next-hop are losing
88 data
89 - check more info on surrounding nodes using same nh
90
91 Later Design
92 upon reboot - save last 5 events, and line number of segfault,
93 and number of seg-faults
94 - flood help message after seg-faulting > 1 time w/events
95
96 Experiments
97 -----------
98 - ping-quality:
99 are pings an accurate representation of link-quality - couldnt there be
100 a link that is allowing pings through but not larger data-packets through?
101 add a test where i compare the 'ground-truth' with measured link-quality to
102 number of packets making it through.
103 - Are there situations where a sink does not get data from a node, but its neighbors
104 can still talk to it? what are these situations?
105
106 Abstract Issues
107 - Don't make thresholding static: number-msgs need before declaring
108 a problem - instead, measure average time you are hearing from all
109 nodes, and if one node is some sigma away from this then it is a
110 problem. or if a node is irregular.
111
112 - Figure out a way to determine if number beacons corresponds with
113 actual long data packets that are getting through.
114
115 What I know:
116 ------------
117 mh-hdr = 8B
118 when sink calls flood, nodes NodeI.recvFlood interface get called -
119 len of data is len of data sent (so nodes dont get multihop_hdr)
120
121 7th byte = type (5: DSE, 6: SYMPATHY)
122 mh data starts on 9th byte (first byte is: 1st byte)
123 packets passed up to dse/sympathy are TOS type 4 (TOS[4,125]),
124 beacons are TOS-1, and the debugging packets sent by dse are type TOS:2
125
126 DSE packets:
127 byte 9,10 = seqNo
128 byte 11,12 = srcAddr
129 byte 13 = type (1:TOPO_DATA)
130 byte 14 = pkt-length
131 if TOPO:
132 byte 15 = len of next-hop data
133 k
134
135 ext_type = 1 for Beacon
136 ext_type = 2:
137 struct is: uint16_t node_id,
138 uint8_t egress quality
139 sympathy-req:
140 >> src=0.0.0.2 dst=255.255.255.255 type=TOS[4,125] data_len=9 rssi=93 time=...
141 0000: 02 00 FF FF 01 00 06 03 03
This page was automatically generated by the
LXR engine.
Visit the LXR main site for more
information.