Difference between revisions of "Resource:Seminar"

Revision as of 19:56, 4 October 2021

Time: 2021-10-08 8:40
Address: Main Building B1-612
Useful links: Readling list; Schedules; Previous seminars.

[SIGCOMM 2021] Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems, Xianyang
Abstract: Task-based distributed frameworks (e.g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving. As more data-intensive applications move to run on top of task-based systems, collective communication efficiency has become an important problem. Unfortunately, traditional collective communication libraries (e.g., MPI, Horovod, NCCL) are an ill fit, because they require the communication schedule to be known before runtime and they do not provide fault tolerance. We design and implement Hoplite, an efficient and fault-tolerant collective communication layer for task-based distributed systems. Our key technique is to compute data transfer schedules on the fly and execute the schedules efficiently through fine-grained pipelining. At the same time, when a task fails, the data transfer schedule adapts quickly to allow other tasks to keep making progress. We apply Hoplite to a popular task-based distributed framework, Ray. We show that Hoplite speeds up asynchronous stochastic gradient descent, reinforcement learning, and serving an ensemble of machine learning models that are difficult to execute efficiently with traditional collective communication by up to 7.8x, 3.9x, and 3.3x, respectively.
[NSDI 2021] EPaxos Revisited, Jianfei
Abstract: This paper re-evaluates the performance of the EPaxos consensus protocol for geo-replication and proposes an enhancement that uses synchronized clocks to reduce operation latency. The benchmarking approach used for the original EPaxos evaluation does not trigger or measure the full impact of conflict behavior on system performance. Our re-evaluation confirms the original claim that EPaxos provides optimal median commit latency in a WAN, but it shows much worse tail latency than previously reported (more than 4x worse than Multi-Paxos). Furthermore, performance is highly sensitive to application workloads, particularly at the tail. In addition, we show how synchronized clocks can be used to reduce conflicts in geo-replication. By imposing intentional delays on message processing, we can achieve roughly in-order deliveries to multiple replicas. When applied to EPaxos, this technique reduced conflicts by at least 50% without introducing additional overhead, decreasing mean latency by up to 7.5%.

[Topic] [ The path planning algorithm for multiple mobile edge servers in EdgeGO], Rong Cong, 2020-11-18

[Mobisys20] Combating packet collisions using non-stationary signal scaling in LPWANs, Wenliang Mao, 2020-11-18
[Topic] [ Dependency-Aware and Latency-Optimal Service Cache in Edge networks], Jiwei Mo, 2020-11-18
[talk] Paper Carnival 2020, ALL, 2020-09-24,25,26

请使用Latest_seminar和Hist_seminar模板更新本页信息.

- 修改时间和地点信息
- 将当前latest seminar部分的code复制到这个页面中
- 将{{Latest_seminar... 修改为 {{Hist_seminar...，并增加对应的日期信息|date=
- 填入latest seminar各字段信息
- link请务必不要留空，如果没有link则填本页地址 https://mobinets.org/index.php?title=Resource:Seminar

格式说明
- Latest_seminar:

{{Latest_seminar
|confname=
|link=
|title=
|speaker=
}}

- Hist_seminar

{{Hist_seminar
|confname=
|link=
|title=
|speaker=
|date=
}}

@@ Line 1: / Line 1: @@
 {{SemNote
-|time=2021-09-24 8:40
+|time=2021-10-08 8:40
 |addr=Main Building B1-612
 |note=Useful links: [[Resource:Reading_List|Readling list]]; [[Resource:Seminar_schedules|Schedules]]; [[Resource:Previous_Seminars|Previous seminars]].
@@ Line 7: / Line 7: @@
 ===Latest===
 {{Latest_seminar
-|abstract=Should you decide to launch a nano-satellite today in Low-Earth Orbit (LEO), the cost of renting ground station communication infrastructure is likely to significantly exceed your launch costs. While space launch costs have lowered significantly with innovative launch vehicles, private players, and smaller payloads, access to ground infrastructure remains a luxury. This is especially true for smaller LEO satellites that are only visible at any location for a few tens of minutes a day and whose signals are extremely weak, necessitating bulky and expensive ground station infrastructure. In this paper, we present a community-driven distributed reception paradigm for LEO satellite signals where signals received on many tiny handheld receivers (not necessarily deployed on rooftops but also indoors) are coherently combined to recover the desired signal. This is made possible by employing new synchronization and receiver orientation techniques that study satellite trajectories and leverage the presence of other ambient signals. We compare our results with a large commercial receiver deployed on a rooftop and show a 8 dB SNR increase both indoors and outdoors using 8 receivers, costing $38 per RF frontend.
+|abstract=Task-based distributed frameworks (e.g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving. As more data-intensive applications move to run on top of task-based systems, collective communication efficiency has become an important problem. Unfortunately, traditional collective communication libraries (e.g., MPI, Horovod, NCCL) are an ill fit, because they require the communication schedule to be known before runtime and they do not provide fault tolerance. We design and implement Hoplite, an efficient and fault-tolerant collective communication layer for task-based distributed systems. Our key technique is to compute data transfer schedules on the fly and execute the schedules efficiently through fine-grained pipelining. At the same time, when a task fails, the data transfer schedule adapts quickly to allow other tasks to keep making progress. We apply Hoplite to a popular task-based distributed framework, Ray. We show that Hoplite speeds up asynchronous stochastic gradient descent, reinforcement learning, and serving an ensemble of machine learning models that are difficult to execute efficiently with traditional collective communication by up to 7.8x, 3.9x, and 3.3x, respectively.
-|confname=MobiCom 2021
+|confname=SIGCOMM 2021
-|link=https://dl.acm.org/doi/10.1145/3447993.3448630
+|link=https://dl.acm.org/doi/pdf/10.1145/3452296.3472897
-|title=A community-driven approach to democratize access to satellite ground stations
+|title=Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems
-|speaker=Rong Cong
+|speaker=Xianyang
 }}
 {{Latest_seminar
-|abstract=Sketch algorithms have been extensively studied in the area of network measurement, given their limited resource usage and theoretically bounded errors. However, error bounds provided by existing algorithms remain too coarse-grained: in practice, only a small number of flows (e.g., heavy hitters) actually benefit from the bounds, while the remaining flows still suffer from serious errors. In this paper, we aim to design nearly-zero-error sketch that achieves negligible per-flow error for almost all flows. We base our study on a technique named compressive sensing. We exploit compressive sensing in two aspects. First, we incorporate the near-perfect recovery of compressive sensing to boost sketch accuracy. Second, we leverage compressive sensing as a novel and uniform methodology to analyze various design choices of sketch algorithms. Guided by the analysis, we propose two sketch algorithms that seamlessly embrace compressive sensing to reach nearly zero errors. We implement our algorithms in OpenVSwitch and P4. Experimental results show that the two algorithms incur less than 0.1% per-flow error for more than 99.72% flows, while preserving the resource efficiency of sketch algorithms. The efficiency demonstrates the power of our new methodology for sketch analysis and design.
+|abstract=This paper re-evaluates the performance of the EPaxos consensus protocol for geo-replication and proposes an enhancement that uses synchronized clocks to reduce operation latency. The benchmarking approach used for the original EPaxos evaluation does not trigger or measure the full impact of conflict behavior on system performance. Our re-evaluation confirms the original claim that EPaxos provides optimal median commit latency in a WAN, but it shows much worse tail latency than previously reported (more than 4x worse than Multi-Paxos). Furthermore, performance is highly sensitive to application workloads, particularly at the tail. In addition, we show how synchronized clocks can be used to reduce conflicts in geo-replication. By imposing intentional delays on message processing, we can achieve roughly in-order deliveries to multiple replicas. When applied to EPaxos, this technique reduced conflicts by at least 50% without introducing additional overhead, decreasing mean latency by up to 7.5%.
 |confname=NSDI 2021
-|link=https://www.usenix.org/system/files/nsdi21-huang.pdf
+|link=https://www.usenix.org/system/files/nsdi21-tollman.pdf
-|title=Toward Nearly-Zero-Error Sketching via Compressive Sensing
+|title=EPaxos Revisited
-|speaker=Xiong Wang
+|speaker=Jianfei
 }}
 === History ===
 {{Resource:Previous_Seminars}}

Navigation menu

Difference between revisions of "Resource:Seminar"

Revision as of 19:56, 4 October 2021

Contents

Difference between revisions of "Resource:Seminar"

Revision as of 19:56, 4 October 2021

Latest

History

History

2024

2023

2022

2021

2020

2019

2018

2017

Instructions