What's on IPTV's neighbourhood?
Tuesday, 15 December 2009
Saturday, 5 December 2009
CoNEXT 1 - SIGCOMM 0
Thursday, 3 December 2009
CoNEXT, Day 3
Again, some cool and useful papers, and a nice panel in the end of the day moderated by Paul Mockapetris (the inventor of DNS). Other info: next CoNEXT will take place in Philadelphia, from Nov. 30 to Dec. 3, 2010.
Content Availability in Swarming Systems: Models, Measurements and Bundling Implications
D. Menasche (University of Massachusetts at Amherst), A. Rocha (Universidade Federal do Rio de Janeiro), B. Li (Tsinghua University), D. Towsley, A. Venkataramani (University of Massachusetts at Amherst)
This paper received the best paper award.
Bittorrent: huge success because of the self-scalability property. However, there is a problem: availability of less popular content.
Swarm: set of peers interested in the same content. Publisher (seed): interested in dissemination. Peers (leechers): interested in download.
Their setup: monitors in around 100 planet nodes. 45 thousand swarms monitored for 8 months.
Finding 1: content unavailability is severe; 40% of swarms unavailable 50% of the time. Swarms are more available during the first month after they are published.
Finding 2: bundling is common (a single swarm consisting of multiple files – songs in an album, episodes of a TV shown). In music around 75% are bundles.
Finding 3: bundled content is more available. Among all books for instance 62% swarms had no seeds. But in collections of books the figure drops to 36%. “Friends” episodes: 52 swarms. 23 with at least one seed (from these 21 bundles), and 29 with no seeds (only 7 bundles).
For the availability model, they ask the question: “How long is the content available without a publisher?” They use an M/G/inf queuing model for this.
Other questions the model tries to answer: What is the fraction of time in which content is unavailable? What is the mean download time of peers?
Questions for the validation part: 1) can bundling reduce download time? 2) What is the impact of bundling if different files have different popularities?
Answers: 1) bundling reduces download time (idle waiting plus active download time) but only to a certain instant. The model captures this trend. 2) Bundling reduces download time for unpopular time at the expense of popular content.
FairTorrent: Bringing Fairness to Peer-to-Peer Systems
A. Sherman, J. Nieh, C. Stein (Columbia University)
Problem: 1) free riders and low contributing peers consume bandwidth and cause slower downloads, and 2) high contributors lose. Not fair.
Inherent flaw of existing schemes: all based in rate allocation. Inherently imprecise, perform poorly in realistic scenarios. What if we don’t use rate allocation?
Idea: FairTorrent algorithm. 1) For leechers: each leecher maintains deficit counter with each other leech. Always send data to leecher with lowest deficit. The effect is that it ensures fast rate convergence. 2) For seeds: pick next leecher using round roubin. Evenly splits bandwidth among leechers.
Properties: 1) fast rate convergence of upload/download rates, 2) resilience, 3) fully distributed, 4) requires no estimates of peers intended rates allocation.
Results: Really fast convergence compared to other systems. Download times also better.
More info in http://www.cs.columbia.edu/~asherman/fairtorrent
Scaling All-Pairs Overlay Routing
D. Sontag, Y. Zhang (MIT), A. Phanishayee, D. Andersen (Carnegie Mellon University), D. Karger (MIT)
The goal: find best routes in full-mesh overlay networks. Motivation: rapidly route around fails (resilient overlay networks) or find lower latency (skype).
Prior approaches to scaling routing: choose from a few alternate paths (by eliminating the high latency ones for instance). They show how to find optimal routes in a more scalable manner.
Scaling to large overlay networks. Where is the bottleneck? 1) each nodes probes all other, 2) per-node communication thus scales linearly. Link state routing sends info to all nodes. Quadratic scaling.
They present an algorithm that finds all optimal routes with only nsqrt(n) per-node configuration. They prove in the paper that their algorithm is theoretically the best possible.
T heir approach to one hop routes: send a link state to a subset of nodes only, and not all. These are the rendezvous nodes (RN). Then each RN will calculate its best interface to all places he received info for, and then send this recommendation for the same guys that sent stuff to them. So, with fewer rendezvous servers, you have less communication.
How to assign rendezvous servers? 1) every two nodes should have a rendezvous server in common. 2) They use quorum systems to provide this property (they use grid quorum). Each node has 2sqrt(n) rendezvous servers.
Summary: round 1) size n link state tables sent to 2sqrt(n) nodes; round 2) size 2sqrt(n) recommendation tables to 2sqrt(n) nodes.
Strategies to deal with RS fails: built in redundancy (every pair of nodes has 2 RS in common).
Multi-hop algorithm: repeat the one-hop algorithm using modified link costs.
They applied the algorithm to find optimal one hop routes in a Resilient Overlay Network (RON). Key questions: how do rendezvous failures affect bandwidth requirements? They deployed this in planetlab with 140 nodes for 2 hours. They observed link failures: on average 50% of nodes experience over 8 concurrent link failures. Double rendezvous failures are infrequent.
StarClique: Providing Privacy Guarantees in Social Networks using Graph Augmentation
K. Puttaswamy, A. Sala, B. Zhao (University of California, Santa Barbara)
Social apps are growing. You share valuable info with friends (recommendations, social bookmarking). Example: social web caching app. By sharing web cache with friends, you improve performance and also it works as a collaborative phishing filter.
Major concern of the user using these apps: privacy. 1) Friends receive data, but if a friend’s computer is compromised, then the info can be used for other purposes; 2) botnets are targeting social network credentials; 3) effective spamming via social engineering; 4) rising of effective phishing attacks.
Naive defense: remove user identity from the data shared. But it is still possible to have a social intersection attack: several attackers intercept the social network topology. Properties of the attack: 1) need 2 or more attackers, 2) application independent, 3) works within the applications bounds.
They measured the attack’s effectiveness (in facebook), and realised that more than 70% of the users can be exactly identified by other users... so the attack can be really effective.
Now, solution to defend to this attack. 1) Intuition: increase the number of common friends (more common friends lead to better privacy); 2) adapt the k-anonymity approach; 3) anonymise the social graph to achieve k-anonymity
Cost of solution: users receive additional data, and app servers process and send additional data
Goals of their system model: minimise number of latent edges added, and keep the latent edges close in social distance. They evaluated their system in 6 different traces from previous studies (facebook, flickr, etc.).
Lockr: Better Privacy for Social Networks
A. Tootoonchian (University of Toronto), S. Saroiu (Microsoft Research), Y. Ganjali (University of Toronto), A. Wolman (Microsoft Research)
The state of privacy on Online Social Networks (OSN): 1) no control over what facebook does with my social info; 2) users need to duplicate their social info; 3) must register to see friend’s content; 4) privacy terms can be contradictory, confusing, and make some users very nervous. Sense of privacy is really not very good.
Lockr’s goal: today users have no control of their social info. They want to put the user in control of their info. How? By using social attestation. You can say that a certain user is your “friend”, or “colleague”, or “drinking buddy”. The social network is created by passing out attestations. You can store your content anywhere, but you set a social Access Control List when you upload this stuff. Everything revolves around you. Also, there is an attestation verification protocol (for you to download info from a user).
Locker works with any OSN. Also works with P2P systems, for instance.
Privacy benefits: 1) users form one social network using social attestation; 2) they continue sharing content via the normal OSN or P2P mechanisms.
What privacy problems are solved by Lockr? 1) “users have no control of their social info”: with lockr users can choose where to store their attestations; 2) “user must register to see friend’s content”: in lockr it removes this, users just show their attestation(only requirement is to map ID to public key); 3) “social deadlock” (in P2P each user wants to verify the other attestation’s first), or “who goes first?”: in lockr you also have a relationship key (all your friend’s have your “friends” relationship key); 4) “hard to revoke attestations”: in lockers the attestation has expiration dates; 4.1) “expired attestation still have valid relationship keys”: a) rotate relationship keys daily (until it expires) – but, lots of keys, so 2) they use a one way hash chains; 5) “OSN sites can now ‘sell’ attestations”: they use zero knowledge proofs for attestation verification. So you keep a signature of your attestation (not sharing the signature). But you can prove to anyone that you can sign the attestation, without sending the signature.
Is lockr a real system? Yep - they implemented Lockr center as a facebook app, and added lockr for bittorrent and flickr.
Code: http://www.lockr.org
ARES: An Anti-jamming REinforcement System for 802.11 Networks
K. Pelechrinis, I. Broustis, S. Krishnamurthy (University of California, Riverside), C. Gkantsidis (Microsoft Research, Cambridge)
Launch DoS attacks in WiFi nets in easy. Jammers can disrupt communication. How to deal with jammers? Frequency hopping was a possibility, but this is weak (WiOpt2009: only 4 jammers can block entire 5GHz spectrum).
So, how to alleviate this without frequency hopping?
Main contributions: 1) fixed rates often preferable in the presence of a jammers; 2) clear channel assessment (CCA) is important.
1) Interaction random jamming/rate control. Jamming effects last beyond the jamming period when rate control is used. In the presence of an intermittent jammer the rate control algorithm might be slow to converge to optimal rate. Remedy: fixed rate assignments increase immediately throughput. So it is important to decide when to perform rate adaptation. This can be done analytically. They show that fixed rates provide high throughput gains. However, perfect knowledge is not realistic, so they include a Markovian Rate Control (MRC) without requiring knowledge of any of the parameters needed for than analytical solution. This can be tuned to be close to optimal.
2) Rate control removes transient jamming effects. What about constant jamming effects? Solution: power control. Power adaptation helps only when transmitter is not in the jammer’s range, and low transmission rates are used. Increasing power helps when transmission rate is low.
How to deal with high power jammers (compared to a legitimate wireless card)? Transmitter needs to ignore jamming signals, and receiver must decode legitimate packet. Using CCA helps in here. There are, however, side effects to this: the transmitter may become a jammer unintentionally, and the receiver can look at legitimate signals and think they are noise. So, they present a simple heuristic for setting the CCA on the link.
By using these observations on rate and power control they built ARES. It works like this: if jammer is detected, check if CCA is tuneable. If so, they invoke power control. If it doesn’t solve the problem, do rate control. If it is not tuneable, just do rate control.
They implemented their system in an indoor testbed. With rate control only it improves performance up to 100%. With a mobile jammer the system can increase performance (using power control) by 150%. Note: by adding more APs the benefits are reduced.
ThunderDome: Discovering Upload Constraints Using Decentralized Bandwidth Tournaments
J. Douceur, J. Mickens, T. Moscibroda (Microsoft Research), D. Panigrahi (MIT)
Motivation: upload constraints limit scale performance in online games, for instance. How to discover these constraints? Bandwidth estimation is non trivial...
ThunderDome provides estimates that are fast, accurate and decentralised.
Many tools for one way bandwidth discovery: packet rate (pathload, PTR). What is the problem with these? 1) Directional ambiguity; 2) what’s the proper probing schedule?, 3) how to deal with probing error?
Solution: 1) pair-wise bandwidth probe: we measure b(X->Y and Y->X) between pairs of nodes to determine upload speed; 2) decentralised bandwidth tournaments, 3) probe aggregation.
Participant model assumptions: 1) peers gather and then stay for 10s of minutes: ex. games. 2) hosts have public IP addresses or cone NATs, 3) hosts do not lie. Network model assumptions: 1) peers connected by hub and spoke topology, 2) asymmetric bandwidth constraints.
The pair-wise bandwidth probe: hosts X and Y transmit to each other. With a single exchange they can get the upload speed of the loser. You do these between pairs of nodes (the “tournament”), and in the end you have info of all (except one).
But we may have measurement errors: they may misjudge one way bandwidth, confusing the upload of two hosts. This occurs when separation between up and down speed is too small (quite common in dial up). To recover from the problem we lose nodes in the “tournament”: basically, we add more samples.
They also have a system to coordinate the “tournament”, basically building a tree. These coordinators can be regular peers, so it is mostly decentralised.
Where the Sidewalk Ends: Extending the Internet AS Graph Using Traceroutes From P2P Users
K. Chen, D. Choffnes, R. Potharaju, Y. Chen, F. Bustamante (Northwestern University), D. Pei (AT&T Labs Research), Y. Zhao (Northwestern University)
Currently the public available AS graph is incomplete. ISPs do not want to disclose their connectivity.
Existing approaches: passive (BGP routing tables/updates), active (traceroutes). Limitation: small number of vendor points (ASes with monitors inside).
Promising platform should: 1) have more VPs, 2) allow active measurement, 3) scale with the Internet growth. So, P2P seems a natural fit.
Their approach: 1) increase the number of VPs using P2P users, 2) active traceroutes from edge of the internet, 3) map IP paths to AS paths, extract AS links. They also offer many corrections to false AS links, bad traceroute results, etc.
It seems a more complete AS map is now available online, which is cool:
http://aqualab.cs.northwestern.edu/projects/SidewalkEnds.html
Exploiting Dynamicity in Graph-based Traffic Analysis: Techniques and Applications
M. Iliofotou, M. Faloutsos (University of California, Riverside), M. Mitzenmacher (Harvard University)
Focus on the network wide behaviour of internet apps (IP to IP communication patterns), and build Traffic Dispersion Graphs (TDs). Goal of these graphs is to do behavioural profiling (assign apps to different groups, and detect unusual behaviours).
Previous work: static graphs to capture network-wide behaviour. These can distinguish client-server from p2p apps. But how to distinguish different flavours of these two?
In here they use dynamic graphs characteristics to extract more app properties. It is dynamic because nodes and edges change with time. Questions that arise: how to express the dynamicity of traffic graphs? What metrics to use?
How to express dynamicity: 1) graphic represented as a sequence of graph snapshots taken over time, 2) use metrics that quantify differences between graphs.
Metrics: static, like average degree, degree distribution, etc., but also binary ones that allow checking how nodes or edges change with time.
To quantify dynamic behaviour, they create a new metric: 1) detailed edge similarity, which shows both persistent and non persistent edges, for instance.
Application: detecting polymorphic blending. Action: an app mimics the traffic of another app to avoid detection. Effect: traffic of polluting app will be added to the target app. This will change the dynamics of the graphs.
They looked at many metrics (both static and dynamic), to see which detect polymorphic blending better. For small blending the dynamic metrics are more sensitive and result in more alarms. However, the best is to mix the static and dynamic metrics.
RFDump: An Architecture for Monitoring the Wireless Ether
K. Lakshminarayanan, S. Sapra, S. Seshan, P. Steenkiste (Carnegie Mellon University)
Popularity causes crowding. Look at WiFi unlicensed spectrum: what if we turn the microwave oven? How to troubleshoot these problems?
Sniffers don’t work well in the physical layer due to this interference of all sources. How could we sniff such wireless network? A possibility would be to add lots of interface cards (WiFi, Bluetooth, etc).
What about using software defined radio (SDR)? SDR hardware exposes physical layer info and supports programmable analysis modules.
Challenges of SDR: 1) how to process 256 Mbps of info in real time? 2) How to differentiate between samples? The requirements: 1) real time, 2) multiprotocol, extensibility.
Other naive solution: demodulate all signals received. Good thing is that it is protocol extensible. Problem: real time capability missing. Demodulation is costly, so all demodulators have to process everything. How to make it more efficient?
A better solution: use a energy filter after the SDR. This filter decides what signals it can use (this can remove noise, for instance). Demodulators do less work. But it is only OK when medium utilisation is low. If it is high, it can’t work on not real time.
Their solution: RDump. After the energy filter, insert a fast detector. This detector detects to which demodulator the signal should be sent. This solves everything: it is protocol extensible, and real time. Detectors can be faster because they can tolerate false positives, and can tolerate delay.
The tricky thing is how the detector works. How to detect which protocol it is? 1) Check time: 802.11 uses SIFS and DIFS, Bluetooth uses TDD slots; 2) check how phase changes: 802.11b uses DBPSK, Bluetooth GMSK; 3) also check frequency: look at channel width, 802.11b uses 22MHz, Bluetooth 1MHz.
They implement this in GNU radio and USRP SDR platform, and achieve pretty nice results.
Panel: What will be the origin of the next network revolution?
Moderator: Paul Mockapetris (inventor of DNS, ISI)
Members: Bruce Davie (Cisco), Van Jacobson (one of the main contributors to TCP/IP, PARC), Dirk Trossen (BT Research), Kenjiro Cho (IJJ Japan), Max Ott (NICTA)
More info here: http://conferences.sigcomm.org/co-next/2009/panel.php
In general, everyone seemed to agree that research needs more money, that betting in risky projects is important (not only incremental research), that more cooperation of industry is needed (it is increasingly going down), and that more freedom is needed (don’t expect great results in six months).
An interesting point by Keshav: it is easy to get money from industry, but then the company doesn’t have anyone to follow the research project. “No one answers my calls!” The company seems to give the money but then doesn’t care with the results. Bruce Davie agreed, saying that it is cheaper for Cisco to give that money “away” than to find a person inside the company to follow the research work.
Wednesday, 2 December 2009
CoNEXT, Day 2
Another very interesting day at CoNEXT. Van Jacobson presented his extremely interesting CCN work (definitely the highlight of the conference – maybe this will be known as the “CCN conference” in the future... :). Besides that, other very interesting talks too.
Keynote Presentation by the 2009 ACM Sigcomm Rising Star Ratul Mahajan (Microsoft Research, USA)
The title of the talk was “How to build research network systems in your spare time”.
The goal: articulate a research method for building network systems.
Method:
0. Pick domain (be wary of the hot trends);
1. Know the problem well (papers alone are rarely a good source – scrutinise, measure, survey);
2. Debate several solution ideas (read more papers for solutions – from other areas for instance – than for problems);
3. Start small and then embellish (to evaluate, see if it works, when and why it works, and what are the benefits and costs);
4. Make it real (release data and code, deploy, talk to practitioners).
Networking Named Content
V. Jacobson, D. Smetters, J. Thornton, M. Plass, N. Briggs, R. Braynard (Palo Alto Research Center)
For 150 years, communication has meant a wire connecting two devices. For users, the web forever changed that; content matters, not the hosts it came from.
Design criteria 1: IP runs over anything (that moves bits to some destination), anything runs over IP. CCN runs over anything (that moves bits in space and time), anything runs over CCN. But you don’t need any entity in layer two. You just talk. There is no state to establish. Notion of communication generalises.
Design criteria 2: an evolutionary path: a) incremental deployment gives immediate value, b) painless, viral, bottom up growth, c) use existing infrastructure, including its lessons.
Design criteria 3: security (communication = trusted info). Trust has to be a property of the info, not the path it comes over or the container it’s in. Any entity should be able to assess the integrity, relevance and provenance of any piece of info.
Basic idea: 1) consumer announces an interest over any and all available communications links (an http get kind of thing). 2) Interest identifies a collection of data – all data items whose name has the interest as a prefix. 3) Anything that hears the interest and has an element of the data sends it.
Answer to some “Buts”: “This doesn’t handle conversations or realtime” – yes it does – see Rearch voccn paper yesterday.
“this is just google” – this is IP for content. They don’t search for data, they route it.
“this will never scale” – hierarchically structured names give same log(n) scaling as IP but CCN tables can be much smaller since multi-source model allows inexact state (e.g., bloom filter) – tradeoff: a little bit of bandwidth can save state.
Measurements: on an “apples for apples” comparison, rise time is worse than normal TCP, and achiever lower throughput because there are more headers for signatures (but not that bad). But, on shared content distribution, that’s where CCN wins very clearly.
Other cool stuff: 1) names can be context sensitive and active. 2) Robust, simple single point configuration (you are not talking to something). 3) Security model makes infrastructure easy to protect and hard to attack. 4) Unifies communication and storage (files and wires are really the same thing).
Open source, GPL’d, CCN stack and VoCCN linphone at www.ccnx.org.
A comment from the audience resumes the expectations this work is generating: “It has excited me more than anything I have heard in the past 10 years”.
Virtually Eliminating Router Bugs
E. Keller, M. Yu (Princeton University), M. Caesar (University of Illinois at Urbana-Champaign), J. Rexford (Princeton University)
Routers have complex software, so they have bugs (XORP has 826k lines... CISCO routers have millions of lines).
Main question: how to detect and stop the problem caused by the bug before it spreads?
Answer: Run multiple, diverse routing instances (software and data diversity ensures correctness - SDD).
SDD challenges in routers: 1) making replication transparent to neighbour routers (they can be normal routers), 2) handling transient real time nature of routers (react quickly to network events, but do not over react).
SDD opportunities: 1) easy to vote on standardised output, 2) easy to recover from errors via bootstraps (routers depend very little on history), 3) diversity is effective in avoiding router bugs.
Hence, outline of the talk: 1) exploiting SDD, 2) bug tolerant router architecture, 3) prototype and evaluation.
Why does diversity works? 1) There is enough diversity in routers (software: quagga, xorp, etc.), protocols (ospf, is-is), 2) enough resources for diversity (extra processor blades for hardware reliability, multi-core processors), 3) effective in avoiding bugs
Evaluate diversity effect: most bugs can be avoided by diversity. 88% of the bugs avoided in their experiment.
Effect of software diversity: by picking 10 bugs from xorp and quagga they realised that none were present in the other implementation. Also, static code analysis on version diversity: some bugs from one version are corrected, others added, but in the end they don’t overlap that much.
Bug tolerant router: protocol routing instances running in parallel. Then when an update arrives, there is a voter that votes on the results, generating a single setup to the forwarding table. So it is transparent model (it intercepts system calls only).
How to do the voting (two shown, others in paper): 1) master slave approach (speed reaction time, but doesn’t handle transient behaviour well) and 2) continuous majority (handles transience better).
Voting code is small (500 lines of code)
Prototype and evaluation: 1) no modification of router software, simple hypervisor, built on linux with xorp and quagga. 2) Evaluated using BGP trace from Router Views. 3) Evaluation checking delay introduced.
Test with 3 XORP and 3 quagga routing instances. Single router: 0.066% fault rate. Master slave (reduces to 0.0006%, with delay of 0.02 seconds) and with continuous majority reduces even more, with a bit higher delay.
Small overhead: Delay overhead of hypervisor: 0. 1% (0.06sec). Overhead of 5 routing instances (4.6%).
More info on http://verb.cs.princeton.edu/.
MDCube: A High Performance Network Structure for Modular Data Center Interconnection
H. Wu, G. Lu, D. Li, C. Guo, Y. Zhang (Microsoft Research Asia)
Mega data centre: millions of servers. Each container has thousands of servers, and they want to connect hundreds of these.
Design goals: 1) high bandwidth for data intensive computing, 2) cost effective, 3) addressable cabling complexity
Existing approaches: 1) tree (oversubscribed, bottlenecked root), 2) fat tree (multiple rooted tree, costly as large number of switches)
MDCube design: leverage the link rate hierarchy of Ethernet switches (1GBps for server switch port, and 10gbps to 40gbps for switch uplink). Use switch uplinks for intercontainer interconnection.
Construction of MDCube: 1) treat each container as a “virtual node”, 2) treat switches high speed uplink interfaces as “virtual interfaces”, 3) connecting switches to peer switches directly, 4) virtual nodes form a modular data centre cube: MDCube. Inner container structure: BCube (presented at sigcomm).
With 48x1G+4x10G they can support 5M+ servers...
Properties of MDCube: 1) small network diameter,2) high capacity between servers, 3) large ABT (aggregate bottleneck throughput), 4) graceful incremental deployment.
MDCube routing: 1) hierarchical routing (divide and conquer), topology is known; 2) loosely controlled source routing with load balancing and fault tolerance. Detour routing: similar to Valiant Load Balance. In VLB, a random node is selected. But they choose to detour only at container level. Fault tolerant routing: hierarchical routing approach (source node only chose which container to route, and path inside container is maintained by BCube).
Greening the Internet with Nano Data Centers
V. Valancius (Georgia Institute of Technology), N. Laoutaris (Telefonica Research), L. Massoulie, C. Diot (Thomson), P. Rodriguez (Telefonica Research)
Data centres at the moment host applications. Problems: expensive to build, expensive to run. Far from user. Lots of redundancy for robustness. Power usage: use lots of power, for example for cooling, and underused resources.
Nano datacenters (NaDa): some content servers (not as many as today), and a nano data centre network (p2p), with gateways. Benefits: ISP friendly (reduce load on backbone), self scalable (adding a new user also joins in a guy to supply content), saves energy (reusing baseline power, no need for cooling and redundancy, reduce energy consumed in network).
Challenges: limited uplink bandwidth, scale, incentives for users, data privacy, cost of gateway.
Focus of this talk on how to save energy.
Proportional Energy use in NaDa: currently servers have a not very nice power usage profile (big baseline power). With NaDa, using dedicated home gateways for NaDa service, would be even less efficient then servers. But we can amortise the baseline energy using gateways only when they are active for something else. So power consumption is proportional to load.
Evaluation: 1) energy measurements of content streaming servers, residential gateways, and DSLAMs. 2) Content access traces (from commercial IPTV provider, youtube, netflix), 3) gateway availability (lower and upper bounds from measurements).
Event driven simulation, with various parameters: 1) upstream bw, 2) storage, 3) video streaming rate, 4) number of NaDa users, 5) network distance, 6) data centre cooling and energy distribution losses, 7) residential energy distribution losses.
Measurements: server vs gateway. Carrier grade IPTV streaming server – server energy use 211 joules /gbit; 2) Thomson residential DSLM gateways: DSLAM power use unaffected, gateway energy use: 100 joules /gbit.
Most important variable for NaDa: bandwidth. Uptstream rate (UR) and video rate (VR). When UR>VR everything fine: each gateway can serve more than one user, and increased replicas used. If upstream bandwidth is increased we can get more energy efficiency in networks.
SWARM: The Power of Structure in Community Wireless Mesh Networks
S. Das (Qualcomm), K. Papagiannaki (Intel Labs), S. Banerjee (University of Wisconsin at Madison), Y. Tay (National University of Singapore)
The work is aimed at community wireless networks (meraki, roofnet, CUWIN), where we have wireless forwarding nodes (mesh routers) and few gateway nodes with network connectivity. The clients connect to a mesh router for access to a gateway.
Observations: 1) mostly single radio networks (low cost, off the shelf); 2) traffic patterns mainly between gateways and mesh routers; 3) this is a static/stable network and the objective is to find gateways to access internet, so using normal ad-hoc techniques is not good.
Problems with current approaches: 1) do not optimise network topology for the intended use of the network; 2) interference not taken into account; 3) requires operation on same channel.
Motivations for SWARM: 1) improve performance by organising the network better; 2) gateway trying to minimise interference; 3) nodes assigned to gateways for good performance given a certain network scenario.
Their scheme involves 3 phases. Phase 1: enumeration (iterate through all possible assignments of MRs to gateways) and pruning (for disconnected or imbalanced configurations). Phase 2: Construct a tree (simple interference aware model). Phase 3: channel assignment (extension of MAX-chop algorithm for WLANs, mobicom 2006): choose channel hoping sequence that minimises conflicts.
Evaluation by simulation in qualnet and also a deployment on a testbed shows that SWARM increases throughput significantly.
EZ-Flow: Removing Turbulence in IEEE 802.11 Wireless Mesh Networks without Message Passing
A. Aziz (EPFL), D. Starobinski (Boston University), P. Thiran, A. Fawal (EPFL)
Multi-hop scenario: greedy sources, with no source rate limiting. The intermediate nodes are relays, and a single channel with CSMA/CA, and with turbulence (instability – this problem occurs in a stable wireless network larger than 3 nodes).
Design goal: network stabilisation (delay reduction), unmodified MAC layer, backward compatibility.
EZ flow: analogy to road traffic problem (same as a solution in New York, to generate a “flow of green lights”: EZ-pass). It is useless to send more packets to a heavily loaded successor.
EZ-flow lies just above the MAC layer. It dedicates one MAC queue per successor (max. 4). It is composed of 2 modules: Buffer Occupancy Estimator (deriving successor buffer) and Channel Access Adaptation (adapts the cw accordingly).
BOE module: works passively without any message passing. This is the main advantage of the scheme. Also, no header modification (hence, runs with current hardware).
How to derive buffer occupancy implicitly: 1) keep in memory a list of identifiers of recently sent packet. 2) use broadcast nature of the medium to monitor the forwarded packet. 3) obtain (precisely) the current buffer occupancy at successor.
Why is explicit messaging less precise? Packet header can be modified just before going to the MAC, and all this adds queuing delay, so the info can be out of date (that doesn’t occur in the passive option).
CAA module: 1) adapt channel access probability my modifying cw. 2) decision based on successor buffer size, derived by BOE, plus some thresholds.
Nice results in reducing delay and in stabilising the network.
The Age of Impatience: Optimal Replication Schemes for Opportunistic Networks
J. Reich (Columbia University), A. Chaintreau (Thomson)
Infrastructure-based content dissemination drawbacks: expensive, not always available, capacity can’t keep up with demand.
Opportunistic content dissemination: each meeting of nodes is a chance to transfer data. Potential: huge pool of untapped bandwidth; doesn’t require infrastructure. But: contacts unpredictable, may be much slower, limited local memory and energy buffer
Problems: 1) how to chose what content should be placed where (in which peer) to satisfy impatient users? 2) How to attain a desired allocation with an opportunistic p2p protocol?
Contributions of this paper: 1) they show what is the optimal cache allocation for impatient users (analytic model), and 2) how to apply these insights in a distributed scheme?
Impatience: increasing wait, decrease utility. Their objective function is the aggregate expected delay utility (U). They choose an allocation (X) to maximise the objective – this minimises losses due to impatience.
Main analytic results: 1) objective U is submodular in X. 2) In a homogenous network U is concave.
Their distributed system QCR is demand/cache reactive: depending on the impatience factor (can the user wait for, say, one hour for the service?) they defines parameters like cache size.
Feasibility of content dissemination between devices in moving vehicles
T. Zahn, G. O'Shea, A. Rowstron (Microsoft Research Cambridge)
Can we use opportunistic WiFi communication to disseminate content between PND (Personal Navigation Devide) in vehicles (assuming no static infrastructure, AP or cellular)? Most prior work involved infrastructure.
Challenges for vehicle to vehicle communication: 1) initial density low (thus exploit every encounter), 2) short connections (e.g. passing side-street), 3) variable link quality.
Why PNDs? 1) Lots of them around, good storage capacity; 2) not just road maps (also gas stations or restaurants info, other services, etc.).
How to update data? 1) USB cables; 2) built-in GSM/3G (still expensive); 3) bluetooth to cell phone (suitable contract and data plan needed – some cost); 4) FM radio – low bit rate. So: what about WiFi?
Design: 1) assume a global set of versioned files; 2) when 2 devices meet a) rapid device detection over ad-hoc network, b) find a common file with different versions, c) send data using light weight FTP.
Phase 1: rapid peer detection – join or form 802.11 ad-hoc network (fixed channel).
Phase 2: discovery protocol: use of bloom filters to a) hash some of its filenames, and also b) hash filename and versions. Then send this message to neighbour. Receiver does the same, and sends back similar message. Then make the transfers of the new versions of files between them.
Phase 3: transport protocol. Block oriented transfer of files of fixed max size. Ack is a bit vector.
Evaluation with their raw protocol and TCP/IP. They can transmit significant amounts of data even for vehicles moving at considerable speed.
Cool-Tether: Energy Efficient On-the-fly WiFi Hot-spots using Mobile Phones
A. Sharma (University of California, Santa Barbara), V. Navda, R. Ramjee, V. Padmanabhan (Microsoft Research India), E. Belding (University of California, Santa Barbara)
There are more mobile users than broadband users: so it is probably better to provide internet connectivity using mobile phones.
Motivation: 1) provide ubiquitous connectivity to devices, 2) leverage smartphones as internet gateways (considering battery life), 3) multiple phones around you at home, work, or on the move.
Other tethering mechanisms: 1) via usb (not possible to use multiple phones), 2) WiFi adhoc mode (no power save mode), 3) via Bluetooth (short range, low data rates, hence high energy/bit cost)
Design goals: 1) support multiple gateways, 2) reduce energy consumption (optimise LAN and WAN).
Incremental cost of sending more data is negligible. Hence, sending at highest possible rate is energy efficient. Also, activating an interface cost a lot in terms of energy, so it is probably not good to use many mobile phones at the same time.
Energy aware design: 1) keep WAN pipe full whenever in use. For this they aggregate web requests in a gatherer present on the internet (the cloud deals with all the requests a single web page request needs to make, since one page has many links inside requiring several requests). This way we transform the short bursts into a single high burst), 2) use optimal number of number of phone (they’ve done some analytical work to find the optimal number of mobile phones), 3) use reverse mode WiFi: laptop is the AP, mobile phones the clients.
Based on 1) (keep WAN pipe full) they get very nice savings in energy, and also reduce latency. They compare their results with COMBINE, and they achieve 38% to 70% savings compared with this scheme.
Tuesday, 1 December 2009
CoNEXT, Day 1
I am attending CoNEXT these days, and as usual I will maintain some notes on the talks I find more interesting.
ReArch'09
During the morning I decided to attend the ReArch workshop. Some interesting talks in the morning sessions, with Mark Handley and Van Jacobson figuring at the top of the list.
Keynote Presentation by Mark Handley (University College London)
A similar talk to the one Mark gave when he received the Royal Society-Wolfson Research Merit Award (as far as I can remember).
The Internet Control Architecture can be divided in three parts: routing, congestion control, and traffic engineering. These three things have been a bit disconnected because they were not planned together: they were an accumulation to fixes to specific problems (TCP with Jacobson addition for congestion control, BGP for routing, etc.). The control plane was made of incremental fixes, where the power of legacy had an important role.
It would be nice to have an architecture where all control protocols fit well.
Main question: is it possible to change the internet architecture in a planned way, so as to achieve long term goals?
By looking back to Internet control architecture, are there opportunities to change the game? In routing, it’s nearly impossible to replace BGP. It would have a huge network effect. For congestion control it is nearly impossible to replace TCP. But there is lots of stuff to do in-between these “layers”.
A wish list for control mechanisms: very high robustness (no downtime, robust to attack), load dependent routing (move traffic away from congestion), diverse traffic mix, and sane economics (reward net operators for investment).
Also, consider the problem of multi-homing. Multi-homing provides redundancy (more than one path provides robustness). However, routing is not a good way to access that redundancy. It would be good to access the redundancy at the transport layer, where congestion control can see it. With multi-homing (multipath) servers can do load balancing. Idea of pooling resources: several links working as if they were a single pool. Multipath TCP pools multiple link resources
Economics: what is the marginal cost of a packet? If no one else wanted to send at that time, no cost. It only makes sense to charge if our traffic displaces another user’s traffic. We should be charging for congestion.
Common models – rate based or volume based charging (x giga per month) don’t offer the right incentives. They are inefficient economically. Also, some apps are latency sensitive, some only care about long term throughput. Charging for congestion volume would thus make sense, encouraging machine to machine traffic to move to uncongested times.
ISPs can’t charge for congestion because they can’t see it properly. But end systems can – and some routers too. Bob Briscoe’s re-feedback and re-ECN ideas (congestion exposure): to indicate to an ISP the congestion seen by traffic downstream. Congestion exposure is thus an enabler for sane economics.
VoCCN: Voice Over Content-Centric Networks
Van Jacobson (Palo Alto Research Center (PARC), USA); Diana Smetters (PARC, USA); Nicholas H. Briggs (Palo Alto Research Center (PARC), USA); Michael Plass (Palo Alto Research Center (PARC), USA); Paul Stewart (Palo Alto Research Center (PARC), USA); James D. Thornton (PARC, USA); Rebecca L Braynard (PARC, USA)
Everybody knows that content based networking is great for content dissemination, but can’t handle conversational or real time traffic: everybody is half right... In fact, content networking is more general than IP, and does anything that IP can.
They implemented VoCCN, a VoIP functional equivalent based on CCN to prove this.
VoCCN: why bother? VoIP works badly for multipoint, multi-interface and mobility. VoIP wants to talk to an IP address, and in a voice call we want to talk to a person. Also, VoIP security is poor (only SSL between proxies).
With CCN, no need to map from a user to an address: we have very flexible names. And sender can encrypt its identity, with only the receiver able to decrypt it. No need for the user to reveal its identity. Supports secure VoIP.
They’ve built the voice app on top of CCN (instead of IP), and in the end the app performance was very similar to VoIP. They flipped frantically from one Ethernet connection to another, and the shift was almost imperceptible, and no packets were lost.
All this is available open source: www.ccnx.org. The VoCCN linphone code should be there by the end of the week.
Classifying Network Complexity
Michael H. Behringer (Cisco, France)
Network complexity is increasing: everybody knows that. But what is “network complexity”?
You need some complexity to achieve a certain level of robustness. How to deal with complexity? Divide and conquer (layering, o-o: “classes” matter, not instantiations), shifting complexity (e.g., away from the human – make simple user interfaces – look at the iPhone), meta-languages, structural approaches (reduce dependencies by design).
The “complexity cube” idea: a cube with three axes that represent the three areas of complexity: operator, physical network, network management.
Future work: quantitative metrics, impact of the rate of change, investigate human factors.
IP Version 10.0: A Strawman Design Beyond IPv6
Kenneth Carlberg (SAIC, USA); Saleem Bhatti (University of St Andrews, United Kingdom); Jon Crowcroft (University of Cambridge, United Kingdom)
Once upon a time: internet unknown by the general public, best effort was the only game in town, people used telnet. But now: need for security, and need for a Next Generation IP.
We were running out of address space. CIDR and NAT are near term solutions. Also, associated routing table size explosion. Solutions: new lookup algorithms to reduce impact, and faster hardware. But multi-homing has renewed the problem...
New Generation IP: 1) Simple IP (Steve’s IP) – minimise header, adding more extensibility (one or more Next Header), flow ID, larger flat address structure. 2) The P internet protocol (Paul’s IP) – change addressing to locater and identifier split, and hierarchical and variable length locator (implied source routing). Finally, 3), the grand compromise of 94: the simple IP-Plus. Simple IP with hierarchical addresses of Paul’s IP: IPv6.
Critique to IPv6: not much of an architectural change. 1) Large 128 bit address (retains locator & identifier, provider is still cling to NATs – they have no economic incentive to migrate). 2) Same size diff-serve field. 3) Multiple next headers (encapsulations or MPLS). 4) End to end flow labels (“market” uses islands to cut through routing (MPLS)). Note: recent report show IPv6 traffic is 1/100 of 1% of all IP traffic... Main questions: Does “more” qualify as architectural change? Where are the “must have” features?
Four significant discussions on “location/Identifier split”: 1) 1977 – tcp and mobility, 2) 1992-93: Paul’s IP work, 3) 1996 (O’Dell 8+8 proposal, 4) 2007 (IAB report). Three efforts now: HIP, LISP and ILNP.
Multi-homing problem: provider independent prefixes tend to be popular, but are non aggregatable.
IPv10 design: retain minimalism and extensibility of IPv6. Incorporate identifier/locator split. Besides headers, also introduce tails (change state insertion model: temporary headers and tails...). In the header include header navigation and forwarding info. In the trailer: trailer navigation , end to end info, diff-serv.
Impact of tails: change the end to end model of constructing headers (facilitate temporary insertion of overhead info). Avoid inefficient encapsulation. Forster the need to go beyond current ASIC header lookup limitation.
Should we be more radical in our design? Are there any must have features in IPv10?
ACM CoNEXT Student Workshop
During the afternoon we had the poster session, where I presented the poster “Relative Delay Estimator for Multipath Transport”. After that, a nice panel session to offer advice to PhD students, which included a discussion on what is best: to publish in conferences or journals? Contrary to most other fields, CS researchers tend to prefer conferences... However, there was some important points in favour of journals - namely the idea of creating a "scientific archive". Keshav, for instance, defended strongly journals, and invited everybody to publish their research in CCR... :) Also, TON wants to add bigger size papers and reduce the costs to add extra pages. We may see a move to papers in the meanwhile. A nice idea seems to be to publish in a top conference, and then to publish a longer version as a journal paper. Let's do it! :)
Sunday, 15 November 2009
New internet TV announcements
Thursday, 1 October 2009
How broad is your band in the land?
akamai's view
doesn't appear to be consistent with cisco's:
http://news.bbc.co.uk/1/hi/technology/8282839.stm
(e-mail by Jon Crowcroft)
Thursday, 10 September 2009
SIGCOMM Day 4
Session 6: Routing and Forwarding (Chair: Jia Wang, AT&T Research)
Stable and Flexible iBGP
Ashley Flavel (University of Adelaide), Matthew Roughan (University of Adelaide)
LIPSIN: Line speed Publish/Subscribe Inter-Networking
Petri Jokela (Ericsson), Andras Zahemszky (Ericsson), Somaya Arianfar (Ericsson), Pekka Nikander (Ericsson), Christian Esteve (University of Campinas)
· Most Internet applications are internally publish/subscribe in nature
· Supporting efficient publish/subscribe requires 1) data-oriented naming, 2) efficient multicast, and 3) in-network caching.
· Deployment of native IP-based multicast has failed, and overlay- based multicast systems are inherently inefficient.
· Scalable and efficient publish/subscribe will require substantial architectural changes, such as moving from endpoint-oriented systems to information-centric architectures.
· Proposal: a novel multicast forwarding fabric, suitable for large-scale topic-based publish/subscribe. Fabric more energy efficient than the currently used ones.
PLUG: Flexible lookup modules for rapiddeployment of new protocols in high-speed routers
Lorenzo De Carli (University of Wisconsin-Madison), Yi Pan (University of Wisconsin-Madison), Amit Kumar (University of Wisconsin-Madison), Cristian Estan (University of Wisconsin-Madison), Karthikeyan Sankaralingam (University of Wisconsin-Madison)
· New protocols for the data link and network layer are being proposed to address limitations of current protocols (scalability, security, manageability).
· High-speed routers and switches that implement these protocols traditionally perform packet processing using ASICs. Why? High speed, low power, low chip area. But inflexible.
· Proposal: a flexible lookup module, PLUG (Pipelined Lookup Grid).
· They show that PLUG presents similar results to that of specialized lookup modules (in terms of throughput, area, power, and latency).
Session 7: Network Management (Chair: Thomas Karagiannis, Microsoft Research)
Modeling and Understanding End-to-End Class of Service Policies in Operational Networks
Yu-Wei Eric Sung (Purdue University), Carsten Lund (AT&T Labs Research), Mark Lyn (AT&T Labs), Sanjay Rao (Purdue University), Subhabrata Sen (AT&T Labs Research)
· Extensive use of service differentiation in Virtual Private Networks (VPNs) operated for business enterprises.
· The resulting Class of Service (CoS) designs embed complex policy decisions (bandwidth availability, cost).
· These complex high-level policies are realized through low-level router configurations (tedious process, error-prone ).
· They have proposed a formal approach to modeling CoS policies from router configuration files.
· They also present a practical and computationally efficient tool that can determine the CoS treatments received by an arbitrary set of flows across multiple routers
· They have validated their system using router configuration data from a cross-section of 150 diverse enterprise VPNs.
Towards Automated Performance Diagnosis in a Large IPTV Network
Ajay Mahimkar (The University of Texas at Austin), Zihui Ge (AT&T Labs - Research), Aman Shaikh (AT&T Labs - Research), Jia Wang (AT&T Labs - Research), Jennifer Yates (AT&T Labs - Research), Yin Zhang (The University of Texas at Austin), Qi Zhao (AT&T Labs - Research)
· IPTV is increasingly being deployed and offered as a commercial service.
· IPTV distribution network: hierarchical structure (instead of mesh), stringent requirements on reliability and performance, different distribution protocols (IP multicast), serious scalability challenges in managing millions of network elements.
· Characterisation and troubleshoot performance issues in one of the largest IPTV networks in North America.
· Measurement data used: device usage, error logs, user activity logs, video quality alarms, customer trouble tickets.
· They developed a novel diagnosis tool called Giza tailored to the enormous scale and hierarchical structure of the IPTV network.
· Multi-resolution data analysis to detect and localize regions experiencing problems.
· They make use of several statistical data mining techniques to troubleshoot the identified problems and diagnose their root causes.
Detailed Diagnosis in Enterprise Networks
Srikanth Kandula (Microsoft Research), Ratul Mahajan (Microsoft Research), Patrick Verkaik (UC San Diego), Sharad agarwal (Microsoft Research), Jitu Padhye (Microsoft Research), Paramvir Bahl (Microsoft Research)
· They analysed trouble tickets from small enterprise networks, and concluded that their operators need detailed fault diagnosis.
· They built a system (NetMedic) that enables detailed diagnosis by looking at info from operating systems and applications.
· Main idea: use the joint behaviour of two components in the past to estimate the likelihood of them impacting one another in the present.
· Prototype is effective at diagnosing faults that we inject in a live environment.
Session 8: Network Measurement (Chair: Gianluca Iannaccone, Intel Labs Berkeley)
Every Microsecond Counts: Tracking Fine-Grain Latencies with a Lossy Difference Aggregator
Ramana Rao Kompella (Purdue University), Kirill Levchenko (UC San Diego), Alex C. Snoeren (UC San Diego), George Varghese (UC San Diego)
· Network applications have stringent end-to-end latency requirements
· Fine-grain measurement demands cannot be met effectively by existing technologies, such as SNMP, NetFlow, or active probing.
· They propose a hash-based primitive LDA (Lossy Difference Aggregator) to measure latencies down to tens of microseconds and losses as infrequent as one in a million.
· They define a compact data structure that computes the average and standard deviation of latency and loss rate in a coordinated streaming environment.
· The system was compared with Poisson-spaced active probing with similar overheads. The LDA mechanism delivers orders of magnitude smaller relative error.
· Active probing requires 60 times as much bandwidth to deliver similar levels of accuracy.
Spatio-Temporal Compressive Sensing and Internet Traffic Matrices
Yin Zhang (University of Texas at Austin), Matthew Roughan (University of Adelaide), Walter Willinger (AT&T Labs -- Research), Lili Qiu (University of Texas at Austin)
· How to fill missing valies in a matrix (coulbe be a traffic matrix, delay matrix, social proximity matrix)
· They focus on traffic matrices.
· Missing values are quite common – direct measurement in infeasible, measurement unreliable, anomalies appear, and future traffic has not yet appeard
· Many network tasks are sensitive to missing values
· Ideas: 1) Exploit low rank nature of Traffic Matrices (TM); 2) exploit spatio-temporal properties (TM rows and columns close to each other are often close in value); 3) Exploit local strucytures in TMs
· Compressive sensing is used (generic technique for dealing with missing values that exploits the presence of structure and redundancy). However, existing compressive-sensing solutions perform poorly for traffic matrix interpolation.
· In the paper a novel spatio-temporal compressive sensing framework was developed. Key components: 1) a new technique called Sparsity Regularized Matrix Factorization (SRMF) that leverages the low-rank nature of traffic matrices and their spatio-temporal properties; 2) a mechanism for combining low-rank approximations with local interpolation procedures.
· They used real datasets from 3 networks.
· Compared to all other algorithms, this one is always better. Even with 98% missing values, the error is only of about 20%.
Passive Aggressive Measurement with MGRP
Pavlos Papageorgiou (University of Maryland, College Park), Justin McCann (University of Maryland, College Park), Michael Hicks (University of Maryland, College Park)
· Active proboing is very expensive – bandwidth expensive, and probes can interfere with the data
· With custom active measurement you shape the application data for measurement. This is efficient but not modular. And it is not reusable
· They propose MGRP – main intuition: MGRP piggybacks application data inside active probes.
· MGRP properties: 1) end to end measurement architecture (schedules probes for tx, and piggybacks application data on probes), 2) transparent to applications, 3) independent of measurement algorithms, 4) easy to adapt existing measurement tools, 5) can piggyback data across applications
· It enables aggressive probing with passive-like overhead.
· Piggybacking reduces bandwidth wasted by probes and enables measurement tools to be more aggressive faster and more accurate.
· Any measurement algorithm can now be written as if active, but implemented as passive.
Session 9: Performance Optimization (Chair: Ratul Mahajan, Microsoft Research)
ROAR: Increasing the Flexibility and Performance of Distributed Search
Costin Raiciu (UCL), Felipe Huici (UCL, NEC Labs), Mark Handley (UCL), David S. Rosenblum (UCL)
· We rely in distributed search everyday.
· Characteristics : big data set that doesn’t fit in one server. To reduce latency you use multiple servers, and put a bit of the data in each. But we also replicate the data. We create clusters of servers.
· PR = N (Number of Query Partitions * Replication = Number of Servers)
· P affects system behaviour. It dictates how much data each node stores, and it impacts latency.
· Partitioning determines latency and cost.
· Question: how to choose P, without knowing the workload? Choose the biggest value? Not desirable. CPU load increases...
· P is very difficult to change with existing solutions.
· How does google change P? In a way that requires over-provisioning, and lots of copies of the data (an estimate of 20TB/date center).
· The proposal: ROAR. Key observation: we do not need clusters to ensure each query meets all the data.
· ROAR uses consistent hashing. It uses a single parameter: P.
· ROAR copies zero data to increase P (while google’s algorithm has to copy a lot of data)
· Minimal data is copied when P decreases.
· They demonstrate the system using a privacy-preserving search application built upon ROAR.
· ROAR Changes P efficiently.
· The system scales. They tested this with 1000 servers in Amazon EC2.
Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication
Vijay Vasudevan (Carnegie Mellon University), Amar Phanishayee (Carnegie Mellon University), Hiral Shah (Carnegie Mellon University), Elie Krevat (Carnegie Mellon University), David Andersen (Carnegie Mellon University), Greg Ganger (Carnegie Mellon University), Garth Gibson (Carnegie Mellon University and Panasas, Inc), Brian Mueller (Panasas, Inc.)
· Timeout delay in TCP is big – 200 ms could be a big problem for datacenters.
· Some datacenter applications are very sensitive to these 200ms timeouts.
· Solution: enable microsecond retx (being safe in the wide area).
· Datacenter environment: 1-10Gbps, commodity Ethernet switches, and 10-100 microseconds latency.
· In a datacenter environment, 1 TCP timeout means more than 1000 RTTs...
· Long periods of idle links due to these high timeouts
· Other TCP variants did not prevent TCP timeouts.
· Timeouts also increase latency
· The solution: Microsecond TCP retransmissions
· 1st solution: simple one line change in Linux (eliminate minRTO). It improves throughput, but still not microsecond RTO.
· Requirements for microsecond RTO: TCP must track RTT in microseconds (timestamp option). Efficient high resolution kernel timers (use HPET for efficient interrupt signalling).
· They were able to provide high throughput sustained for up to 47 servers.
· Is it safe? One has to look at the interaction with delayed ACK, and the performance in the wide area.
· Better performance with delayed ACK disabled.
· Is it safe for the wide area? Potential concerns: stability (could we cause congestion collapse?), and performance (do we oftenb timeeout unnecessarily?)
· Stability is preserved, because timeouts retain exponential backoff and spurious timeouts slow rate of transfer.
· Today’s TCP has mechanisms to detect spurious timeouts (using timestamp) and to recover from spurious timeour (forward RTO), and they are both implemented widely.
· They analysed wide area performance without minRTO, to see if it harms the wide area. They didn’t find any difference in throughput. Reason? Few total timeouts (supuious or legitimate).
Matchmaking for Online Games and Other Latency-Sensitive P2P Systems
Sharad Agarwal (Microsoft Research), Jacob R. Lorch (Microsoft Research)
About me
- fvramos
- e-mail: fvramos at gmail dot com