Another very interesting day at CoNEXT. Van Jacobson presented his extremely interesting CCN work (definitely the highlight of the conference – maybe this will be known as the “CCN conference” in the future... :). Besides that, other very interesting talks too.
Keynote Presentation by the 2009 ACM Sigcomm Rising Star Ratul Mahajan (Microsoft Research, USA)
The title of the talk was “How to build research network systems in your spare time”.
The goal: articulate a research method for building network systems.
0. Pick domain (be wary of the hot trends);
1. Know the problem well (papers alone are rarely a good source – scrutinise, measure, survey);
2. Debate several solution ideas (read more papers for solutions – from other areas for instance – than for problems);
3. Start small and then embellish (to evaluate, see if it works, when and why it works, and what are the benefits and costs);
4. Make it real (release data and code, deploy, talk to practitioners).
Networking Named Content
V. Jacobson, D. Smetters, J. Thornton, M. Plass, N. Briggs, R. Braynard (Palo Alto Research Center)
For 150 years, communication has meant a wire connecting two devices. For users, the web forever changed that; content matters, not the hosts it came from.
Design criteria 1: IP runs over anything (that moves bits to some destination), anything runs over IP. CCN runs over anything (that moves bits in space and time), anything runs over CCN. But you don’t need any entity in layer two. You just talk. There is no state to establish. Notion of communication generalises.
Design criteria 2: an evolutionary path: a) incremental deployment gives immediate value, b) painless, viral, bottom up growth, c) use existing infrastructure, including its lessons.
Design criteria 3: security (communication = trusted info). Trust has to be a property of the info, not the path it comes over or the container it’s in. Any entity should be able to assess the integrity, relevance and provenance of any piece of info.
Basic idea: 1) consumer announces an interest over any and all available communications links (an http get kind of thing). 2) Interest identifies a collection of data – all data items whose name has the interest as a prefix. 3) Anything that hears the interest and has an element of the data sends it.
Answer to some “Buts”: “This doesn’t handle conversations or realtime” – yes it does – see Rearch voccn paper yesterday.
“this is just google” – this is IP for content. They don’t search for data, they route it.
“this will never scale” – hierarchically structured names give same log(n) scaling as IP but CCN tables can be much smaller since multi-source model allows inexact state (e.g., bloom filter) – tradeoff: a little bit of bandwidth can save state.
Measurements: on an “apples for apples” comparison, rise time is worse than normal TCP, and achiever lower throughput because there are more headers for signatures (but not that bad). But, on shared content distribution, that’s where CCN wins very clearly.
Other cool stuff: 1) names can be context sensitive and active. 2) Robust, simple single point configuration (you are not talking to something). 3) Security model makes infrastructure easy to protect and hard to attack. 4) Unifies communication and storage (files and wires are really the same thing).
Open source, GPL’d, CCN stack and VoCCN linphone at www.ccnx.org.
A comment from the audience resumes the expectations this work is generating: “It has excited me more than anything I have heard in the past 10 years”.
Virtually Eliminating Router Bugs
E. Keller, M. Yu (Princeton University), M. Caesar (University of Illinois at Urbana-Champaign), J. Rexford (Princeton University)
Routers have complex software, so they have bugs (XORP has 826k lines... CISCO routers have millions of lines).
Main question: how to detect and stop the problem caused by the bug before it spreads?
Answer: Run multiple, diverse routing instances (software and data diversity ensures correctness - SDD).
SDD challenges in routers: 1) making replication transparent to neighbour routers (they can be normal routers), 2) handling transient real time nature of routers (react quickly to network events, but do not over react).
SDD opportunities: 1) easy to vote on standardised output, 2) easy to recover from errors via bootstraps (routers depend very little on history), 3) diversity is effective in avoiding router bugs.
Hence, outline of the talk: 1) exploiting SDD, 2) bug tolerant router architecture, 3) prototype and evaluation.
Why does diversity works? 1) There is enough diversity in routers (software: quagga, xorp, etc.), protocols (ospf, is-is), 2) enough resources for diversity (extra processor blades for hardware reliability, multi-core processors), 3) effective in avoiding bugs
Evaluate diversity effect: most bugs can be avoided by diversity. 88% of the bugs avoided in their experiment.
Effect of software diversity: by picking 10 bugs from xorp and quagga they realised that none were present in the other implementation. Also, static code analysis on version diversity: some bugs from one version are corrected, others added, but in the end they don’t overlap that much.
Bug tolerant router: protocol routing instances running in parallel. Then when an update arrives, there is a voter that votes on the results, generating a single setup to the forwarding table. So it is transparent model (it intercepts system calls only).
How to do the voting (two shown, others in paper): 1) master slave approach (speed reaction time, but doesn’t handle transient behaviour well) and 2) continuous majority (handles transience better).
Voting code is small (500 lines of code)
Prototype and evaluation: 1) no modification of router software, simple hypervisor, built on linux with xorp and quagga. 2) Evaluated using BGP trace from Router Views. 3) Evaluation checking delay introduced.
Test with 3 XORP and 3 quagga routing instances. Single router: 0.066% fault rate. Master slave (reduces to 0.0006%, with delay of 0.02 seconds) and with continuous majority reduces even more, with a bit higher delay.
Small overhead: Delay overhead of hypervisor: 0. 1% (0.06sec). Overhead of 5 routing instances (4.6%).
More info on http://verb.cs.princeton.edu/.
MDCube: A High Performance Network Structure for Modular Data Center Interconnection
H. Wu, G. Lu, D. Li, C. Guo, Y. Zhang (Microsoft Research Asia)
Mega data centre: millions of servers. Each container has thousands of servers, and they want to connect hundreds of these.
Design goals: 1) high bandwidth for data intensive computing, 2) cost effective, 3) addressable cabling complexity
Existing approaches: 1) tree (oversubscribed, bottlenecked root), 2) fat tree (multiple rooted tree, costly as large number of switches)
MDCube design: leverage the link rate hierarchy of Ethernet switches (1GBps for server switch port, and 10gbps to 40gbps for switch uplink). Use switch uplinks for intercontainer interconnection.
Construction of MDCube: 1) treat each container as a “virtual node”, 2) treat switches high speed uplink interfaces as “virtual interfaces”, 3) connecting switches to peer switches directly, 4) virtual nodes form a modular data centre cube: MDCube. Inner container structure: BCube (presented at sigcomm).
With 48x1G+4x10G they can support 5M+ servers...
Properties of MDCube: 1) small network diameter,2) high capacity between servers, 3) large ABT (aggregate bottleneck throughput), 4) graceful incremental deployment.
MDCube routing: 1) hierarchical routing (divide and conquer), topology is known; 2) loosely controlled source routing with load balancing and fault tolerance. Detour routing: similar to Valiant Load Balance. In VLB, a random node is selected. But they choose to detour only at container level. Fault tolerant routing: hierarchical routing approach (source node only chose which container to route, and path inside container is maintained by BCube).
Greening the Internet with Nano Data Centers
V. Valancius (Georgia Institute of Technology), N. Laoutaris (Telefonica Research), L. Massoulie, C. Diot (Thomson), P. Rodriguez (Telefonica Research)
Data centres at the moment host applications. Problems: expensive to build, expensive to run. Far from user. Lots of redundancy for robustness. Power usage: use lots of power, for example for cooling, and underused resources.
Nano datacenters (NaDa): some content servers (not as many as today), and a nano data centre network (p2p), with gateways. Benefits: ISP friendly (reduce load on backbone), self scalable (adding a new user also joins in a guy to supply content), saves energy (reusing baseline power, no need for cooling and redundancy, reduce energy consumed in network).
Challenges: limited uplink bandwidth, scale, incentives for users, data privacy, cost of gateway.
Focus of this talk on how to save energy.
Proportional Energy use in NaDa: currently servers have a not very nice power usage profile (big baseline power). With NaDa, using dedicated home gateways for NaDa service, would be even less efficient then servers. But we can amortise the baseline energy using gateways only when they are active for something else. So power consumption is proportional to load.
Evaluation: 1) energy measurements of content streaming servers, residential gateways, and DSLAMs. 2) Content access traces (from commercial IPTV provider, youtube, netflix), 3) gateway availability (lower and upper bounds from measurements).
Event driven simulation, with various parameters: 1) upstream bw, 2) storage, 3) video streaming rate, 4) number of NaDa users, 5) network distance, 6) data centre cooling and energy distribution losses, 7) residential energy distribution losses.
Measurements: server vs gateway. Carrier grade IPTV streaming server – server energy use 211 joules /gbit; 2) Thomson residential DSLM gateways: DSLAM power use unaffected, gateway energy use: 100 joules /gbit.
Most important variable for NaDa: bandwidth. Uptstream rate (UR) and video rate (VR). When UR>VR everything fine: each gateway can serve more than one user, and increased replicas used. If upstream bandwidth is increased we can get more energy efficiency in networks.
SWARM: The Power of Structure in Community Wireless Mesh Networks
S. Das (Qualcomm), K. Papagiannaki (Intel Labs), S. Banerjee (University of Wisconsin at Madison), Y. Tay (National University of Singapore)
The work is aimed at community wireless networks (meraki, roofnet, CUWIN), where we have wireless forwarding nodes (mesh routers) and few gateway nodes with network connectivity. The clients connect to a mesh router for access to a gateway.
Observations: 1) mostly single radio networks (low cost, off the shelf); 2) traffic patterns mainly between gateways and mesh routers; 3) this is a static/stable network and the objective is to find gateways to access internet, so using normal ad-hoc techniques is not good.
Problems with current approaches: 1) do not optimise network topology for the intended use of the network; 2) interference not taken into account; 3) requires operation on same channel.
Motivations for SWARM: 1) improve performance by organising the network better; 2) gateway trying to minimise interference; 3) nodes assigned to gateways for good performance given a certain network scenario.
Their scheme involves 3 phases. Phase 1: enumeration (iterate through all possible assignments of MRs to gateways) and pruning (for disconnected or imbalanced configurations). Phase 2: Construct a tree (simple interference aware model). Phase 3: channel assignment (extension of MAX-chop algorithm for WLANs, mobicom 2006): choose channel hoping sequence that minimises conflicts.
Evaluation by simulation in qualnet and also a deployment on a testbed shows that SWARM increases throughput significantly.
EZ-Flow: Removing Turbulence in IEEE 802.11 Wireless Mesh Networks without Message Passing
A. Aziz (EPFL), D. Starobinski (Boston University), P. Thiran, A. Fawal (EPFL)
Multi-hop scenario: greedy sources, with no source rate limiting. The intermediate nodes are relays, and a single channel with CSMA/CA, and with turbulence (instability – this problem occurs in a stable wireless network larger than 3 nodes).
Design goal: network stabilisation (delay reduction), unmodified MAC layer, backward compatibility.
EZ flow: analogy to road traffic problem (same as a solution in New York, to generate a “flow of green lights”: EZ-pass). It is useless to send more packets to a heavily loaded successor.
EZ-flow lies just above the MAC layer. It dedicates one MAC queue per successor (max. 4). It is composed of 2 modules: Buffer Occupancy Estimator (deriving successor buffer) and Channel Access Adaptation (adapts the cw accordingly).
BOE module: works passively without any message passing. This is the main advantage of the scheme. Also, no header modification (hence, runs with current hardware).
How to derive buffer occupancy implicitly: 1) keep in memory a list of identifiers of recently sent packet. 2) use broadcast nature of the medium to monitor the forwarded packet. 3) obtain (precisely) the current buffer occupancy at successor.
Why is explicit messaging less precise? Packet header can be modified just before going to the MAC, and all this adds queuing delay, so the info can be out of date (that doesn’t occur in the passive option).
CAA module: 1) adapt channel access probability my modifying cw. 2) decision based on successor buffer size, derived by BOE, plus some thresholds.
Nice results in reducing delay and in stabilising the network.
The Age of Impatience: Optimal Replication Schemes for Opportunistic Networks
J. Reich (Columbia University), A. Chaintreau (Thomson)
Infrastructure-based content dissemination drawbacks: expensive, not always available, capacity can’t keep up with demand.
Opportunistic content dissemination: each meeting of nodes is a chance to transfer data. Potential: huge pool of untapped bandwidth; doesn’t require infrastructure. But: contacts unpredictable, may be much slower, limited local memory and energy buffer
Problems: 1) how to chose what content should be placed where (in which peer) to satisfy impatient users? 2) How to attain a desired allocation with an opportunistic p2p protocol?
Contributions of this paper: 1) they show what is the optimal cache allocation for impatient users (analytic model), and 2) how to apply these insights in a distributed scheme?
Impatience: increasing wait, decrease utility. Their objective function is the aggregate expected delay utility (U). They choose an allocation (X) to maximise the objective – this minimises losses due to impatience.
Main analytic results: 1) objective U is submodular in X. 2) In a homogenous network U is concave.
Their distributed system QCR is demand/cache reactive: depending on the impatience factor (can the user wait for, say, one hour for the service?) they defines parameters like cache size.
Feasibility of content dissemination between devices in moving vehicles
T. Zahn, G. O'Shea, A. Rowstron (Microsoft Research Cambridge)
Can we use opportunistic WiFi communication to disseminate content between PND (Personal Navigation Devide) in vehicles (assuming no static infrastructure, AP or cellular)? Most prior work involved infrastructure.
Challenges for vehicle to vehicle communication: 1) initial density low (thus exploit every encounter), 2) short connections (e.g. passing side-street), 3) variable link quality.
Why PNDs? 1) Lots of them around, good storage capacity; 2) not just road maps (also gas stations or restaurants info, other services, etc.).
How to update data? 1) USB cables; 2) built-in GSM/3G (still expensive); 3) bluetooth to cell phone (suitable contract and data plan needed – some cost); 4) FM radio – low bit rate. So: what about WiFi?
Design: 1) assume a global set of versioned files; 2) when 2 devices meet a) rapid device detection over ad-hoc network, b) find a common file with different versions, c) send data using light weight FTP.
Phase 1: rapid peer detection – join or form 802.11 ad-hoc network (fixed channel).
Phase 2: discovery protocol: use of bloom filters to a) hash some of its filenames, and also b) hash filename and versions. Then send this message to neighbour. Receiver does the same, and sends back similar message. Then make the transfers of the new versions of files between them.
Phase 3: transport protocol. Block oriented transfer of files of fixed max size. Ack is a bit vector.
Evaluation with their raw protocol and TCP/IP. They can transmit significant amounts of data even for vehicles moving at considerable speed.
Cool-Tether: Energy Efficient On-the-fly WiFi Hot-spots using Mobile Phones
A. Sharma (University of California, Santa Barbara), V. Navda, R. Ramjee, V. Padmanabhan (Microsoft Research India), E. Belding (University of California, Santa Barbara)
There are more mobile users than broadband users: so it is probably better to provide internet connectivity using mobile phones.
Motivation: 1) provide ubiquitous connectivity to devices, 2) leverage smartphones as internet gateways (considering battery life), 3) multiple phones around you at home, work, or on the move.
Other tethering mechanisms: 1) via usb (not possible to use multiple phones), 2) WiFi adhoc mode (no power save mode), 3) via Bluetooth (short range, low data rates, hence high energy/bit cost)
Design goals: 1) support multiple gateways, 2) reduce energy consumption (optimise LAN and WAN).
Incremental cost of sending more data is negligible. Hence, sending at highest possible rate is energy efficient. Also, activating an interface cost a lot in terms of energy, so it is probably not good to use many mobile phones at the same time.
Energy aware design: 1) keep WAN pipe full whenever in use. For this they aggregate web requests in a gatherer present on the internet (the cloud deals with all the requests a single web page request needs to make, since one page has many links inside requiring several requests). This way we transform the short bursts into a single high burst), 2) use optimal number of number of phone (they’ve done some analytical work to find the optimal number of mobile phones), 3) use reverse mode WiFi: laptop is the AP, mobile phones the clients.
Based on 1) (keep WAN pipe full) they get very nice savings in energy, and also reduce latency. They compare their results with COMBINE, and they achieve 38% to 70% savings compared with this scheme.