High Bandwidth Data Dissemination for Large-scale Distributed Systems
[Bullet source code (SOSP version) and more details are available here.]
The goal of this work is to provide infrastructure for high bandwidth dissemination from a single source to a large group of end hosts. A generic application that benefits from this service is file distribution to a large population of Internet users or content distribution network (CDN) servers. For instance, consider the requirements of distributing DVD-quality movies across the Internet to a large population of users. Software updates, such open source ISOs or Microsoft Windows XP updates, are a second example. A more recent example includes executing jobs on a wide-area test-bed, like PlanetLab or the Grid, where the executable must be distributed to a number of machines before each machine can run it locally or where multi-gigabyte data sets must be synchronized before the computation can proceed.
A related application is real-time multimedia streaming to a large group of end hosts. For example, worldwide soccer fans could watch the Internet broadcast of the World Cup final. Significant challenges in this work include scalable orchestration of data retrieval while achieving sustained high-bandwidth by overcoming variations in network bandwidth, latency, and loss rate, as well as node and network failures.
Bullet [1] operates on the premise that the model for high-bandwidth multicast data dissemination should be re-examined. The hypothesis of this work is that, relative to traditional techniques that use a logical tree to distribute content, an overlay mesh can provide higher bandwidth – leveraging the bandwidth from simultaneous parallel downloads from multiple sources rather than a single parent – and higher reliability – retrieving data from multiple peers reduces the potential damage from a single node failure. Harnessing the additional bandwidth available from an overlay mesh can qualitatively improve the utility of a broad range of important and emerging application classes, for instance PlanetLab, Grid, and publish/subscribe.
Departing from the traditional tree dissemination model raises several challenges. For example, sending disjoint data from the source or any other node can increase system performance because less duplicate data is sent over the network. What is the correct data dissemination strategy? Once a sender splits the data and sends it in to the overlay network, how does a participant locate missing content? Data location should complete quickly and in a scalable fashion, without maintaining global state or performing global network operation, because such an approach would not accommodate a large number of participants. Once a node finds the data, potentially at multiple sites, it needs to decide which data item to retrieve from which site, as quickly as possible, while not causing congestion. When performed system-wide, this process effectively forms the overlay mesh for data dissemination. How should the mesh adapt to dynamic network conditions and failures? These problems are difficult to solve even if global network state was available to every system participant.
One key contribution of this work is the design and analysis of an overlay construction algorithm that creates a mesh over any distribution tree that matches the properties of the underlying network topology. As a benefit, Bullet eliminates the overhead required to probe for available bandwidth in traditional distributed tree construction techniques. Another contribution of this work is a mechanism for making data deliberately disjoint and then distributing it in a uniform way that makes the probability of finding a peer containing missing data equal for all nodes. An insight that makes Bullet scalable is the use of RanSub[2] for locating missing data from peers in an efficient manner. Further, we propose a mechanism for recovering data from peers in a disjoint manner that minimizes retrieval of duplicate data objects. A large-scale evaluation shows that Bullet running over a random tree can achieve twice the throughput of streaming over a bandwidth tree computed by an off-line algorithm with perfect information about the network.
Bullet’ [3] continues the overlay mesh data dissemination approach advocated by Bullet, and considers the problem of large file dissemination to a large group of Internet users. The first contribution of this work is the exploration of the design space of file distribution protocols. Specifically, we examine the problems of source sending strategy, push vs. pull of data, need for data encoding, download peer set selection, data request strategy, and flow control. Second, we conduct a detailed performance evaluation of a number of competing systems running in both controlled emulation environments and live across the Internet. Our experience shows that protocols which have tunable parameters might perform well in a certain range of conditions, but that once outside that range they will break down and perform worse than if they had been tuned differently. To combat this problem, we employ adaptive strategies to self-tune to dynamically changing network conditions. For example, our system uses a feedback control loop to determine dynamically the quantity of outstanding data requested from each sending peer. This method of flow control keeps the data connection occupied while risking waiting for the least amount of data in case bandwidth from the peer deteriorates. We have compared Bullet with BitTorrent and SplitStream. In the cases we considered, Bullet outperforms the other systems.
[1] “Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh,”
Dejan Kostić, Adolfo Rodriguez, Jeannie Albrecht, Amin Vahdat, Proceedings of the 19th ACM Symposium on Operating System Principles, October 2003.
[2] “Using Random Subsets to Build Scalable Network Services”, Dejan Kostić, Adolfo Rodriguez, Jeannie Albrecht, Abhijeet Bhirud, and Amin Vahdat. Proceedings of 4th USENIX Symposium on Internet Technologies and Systems (USITS), March 2003.
[3] “Maintaining High-bandwidth under Dynamic Network Conditions”, Dejan Kostić, Ryan Braud, Charles Killian, Erik Vandekieft, James W. Anderson, Alex C. Snoeren and Amin Vahdat, Proceedings of 2005 USENIX Annual Technical Conference, April 2005.
[4] High-bandwidth Data Dissemination for Large-scale Distributed Systems, Dejan Kostic, Alex C. Snoeren, Amin Vahdat, Ryan Braud, Charles Killian, James W. Anderson, Jeannie Albrecht, Adolfo Rodriguez, and Erik Vandekieft, ACM Transactions on Computer Systems (TOCS), February 2008.