Difference between revisions of "Resource:Seminar"

Revision as of 11:36, 28 May 2024

Time: Friday 10:30-12:00
Address: 4th Research Building A518
Useful links: Readling list; Schedules; Previous seminars.

[SIGCOMM '23] Masking Corruption Packet Losses in Datacenter Networks with Link-local Retransmission, Jiacheng
Abstract: Packet loss due to link corruption is a major problem in large warehouse-scale datacenters. The current state-of-the-art approach of disabling corrupting links is not adequate because, in practice, all the corrupting links cannot be disabled due to capacity constraints. In this paper, we show that, it is feasible to implement link-local retransmission at sub-RTT timescales to completely mask corruption packet losses from the transport endpoints. Our system, LinkGuardian, employs a range of techniques to (i) keep the packet buffer requirement low, (ii) recover from tail packet losses without employing timeouts, and (iii) preserve packet ordering. We implement LinkGuardian on the Intel Tofino switch and show that for a 100G link with a loss rate of 10−3, LinkGuardian can reduce the loss rate by up to 6 orders of magnitude while incurring only 8% reduction in effective link speed. By eliminating tail packet losses, LinkGuardian improves the 99.9th percentile flow completion time (FCT) for TCP and RDMA by 51x and 66x respectively. Finally, we also show that in the context of datacenter networks, simple out-of-order retransmission is often sufficient to significantly mitigate the impact of corruption packet loss for short TCP flows.
[NSDI '23] ROLEX: A Scalable RDMA-oriented Learned Key-Value Store for Disaggregated Memory Systems, Haotian
Abstract: Disaggregated memory systems separate monolithic servers into different components, including compute and memory nodes, to enjoy the benefits of high resource utilization, flexible hardware scalability, and efficient data sharing. By exploiting the high-performance RDMA (Remote Direct Memory Access), the compute nodes directly access the remote memory pool without involving remote CPUs. Hence, the ordered key-value (KV) stores (e.g., B-trees and learned indexes) keep all data sorted to provide rang query service via the high-performance network. However, existing ordered KVs fail to work well on the disaggregated memory systems, due to either consuming multiple network roundtrips to search the remote data or heavily relying on the memory nodes equipped with insufficient computing resources to process data modifications. In this paper, we propose a scalable RDMA-oriented KV store with learned indexes, called ROLEX, to coalesce the ordered KV store in the disaggregated systems for efficient data storage and retrieval. ROLEX leverages a retraining-decoupled learned index scheme to dissociate the model retraining from data modification operations via adding a bias and some data-movement constraints to learned models. Based on the operation decoupling, data modifications are directly executed in compute nodes via one-sided RDMA verbs with high scalability. The model retraining is hence removed from the critical path of data modification and asynchronously executed in memory nodes by using dedicated computing resources. Our experimental results on YCSB and real-world workloads demonstrate that ROLEX achieves competitive performance on the static workloads, as well as significantly improving the performance on dynamic workloads by up to 2.2 times than state-of-the-art schemes on the disaggregated memory systems. We have released the open-source codes for public use in GitHub.

[Topic] [ The path planning algorithm for multiple mobile edge servers in EdgeGO], Rong Cong, 2020-11-18

[Mobisys20] Combating packet collisions using non-stationary signal scaling in LPWANs, Wenliang Mao, 2020-11-18
[Topic] [ Dependency-Aware and Latency-Optimal Service Cache in Edge networks], Jiwei Mo, 2020-11-18
[talk] Paper Carnival 2020, ALL, 2020-09-24,25,26

请使用Latest_seminar和Hist_seminar模板更新本页信息.

- 修改时间和地点信息
- 将当前latest seminar部分的code复制到这个页面中
- 将{{Latest_seminar... 修改为 {{Hist_seminar...，并增加对应的日期信息|date=
- 填入latest seminar各字段信息
- link请务必不要留空，如果没有link则填本页地址 https://mobinets.org/index.php?title=Resource:Seminar

格式说明
- Latest_seminar:

{{Latest_seminar
|confname=
|link=
|title=
|speaker=
}}

- Hist_seminar

{{Hist_seminar
|confname=
|link=
|title=
|speaker=
|date=
}}

@@ Line 7: / Line 7: @@
 ===Latest===
 {{Latest_seminar
-|abstract=In this paper, we focus on the problem of efficiently locating a target object described with free-form text using a mobile robot equipped with vision sensors (e.g., an RGBD camera). Conventional active visual search predefines a set of objects to search for, rendering these techniques restrictive in practice. To provide added flexibility in active visual searching, we propose a system where a user can enter target commands using free-form text; we call this system Zero-shot Active Visual Search (ZAVIS). ZAVIS detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks (e.g., desk or bed). For efficient planning of object search patterns, ZAVIS considers commonsense knowledge-based co-occurrence and predictive uncertainty while deciding which landmarks to visit first. We validate the proposed method with respect to SR (success rate) and SPL (success weighted by path length) in both simulated and real-world environments. The proposed method outperforms previous methods in terms of SPL in simulated scenarios, and we further demonstrate ZAVIS with a Pioneer-3AT robot in real-world studies.
+|abstract=Packet loss due to link corruption is a major problem in large warehouse-scale datacenters. The current state-of-the-art approach of disabling corrupting links is not adequate because, in practice, all the corrupting links cannot be disabled due to capacity constraints. In this paper, we show that, it is feasible to implement link-local retransmission at sub-RTT timescales to completely mask corruption packet losses from the transport endpoints. Our system, LinkGuardian, employs a range of techniques to (i) keep the packet buffer requirement low, (ii) recover from tail packet losses without employing timeouts, and (iii) preserve packet ordering. We implement LinkGuardian on the Intel Tofino switch and show that for a 100G link with a loss rate of 10−3, LinkGuardian can reduce the loss rate by up to 6 orders of magnitude while incurring only 8% reduction in effective link speed. By eliminating tail packet losses, LinkGuardian improves the 99.9th percentile flow completion time (FCT) for TCP and RDMA by 51x and 66x respectively. Finally, we also show that in the context of datacenter networks, simple out-of-order retransmission is often sufficient to significantly mitigate the impact of corruption packet loss for short TCP flows.
-|confname=ICRA 2023
+|confname=SIGCOMM '23
-|link=https://ieeexplore.ieee.org/document/10161345
+|link=https://dl.acm.org/doi/pdf/10.1145/3603269.3604853
-|title=Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for Robotic Assistants
+|title=Masking Corruption Packet Losses in Datacenter Networks with Link-local Retransmission
-|speaker=Zhenhua
+|speaker=Jiacheng
-|date=2024-05-24}}
+|date=2024-05-31}}
 {{Latest_seminar
-|abstract=Network monitoring systems struggle with the issue that the measurement data is incomplete, with only a subset of origin-destination (OD) pairs or time slots observed, due to the high deployment and measurement cost. Recent studies show that the missing data can be inferred from partial measurements using neural network models and tensor methods. However, these recovery approaches fail to achieve accuracy, adaptability and high speed, simultaneously. In this paper, we propose RecMon, a deep learning-based data recovery system that satisfies the above three criteria. A global spatio-temporal attention mechanism and a data augmentation algorithm are proposed to improve the recovery accuracy. A semi-supervised learning-based scheme is devised for fast and effective model updates. We conduct extensive experiments on three real-world datasets to compare RecMon with four state-of-the-art methods in terms of online recovery performance. The experimental results show that RecMon can adapt to the latest state of the network and accurately recover network measurement data in less than 100 milliseconds. When 90% of the data is missing, the recovery accuracy of RecMon improves over the strongest baseline method by 22.7%, 16.0%, and 8.2% in the three datasets, respectively.
+|abstract=Disaggregated memory systems separate monolithic servers into different components, including compute and memory nodes, to enjoy the benefits of high resource utilization, flexible hardware scalability, and efficient data sharing. By exploiting the high-performance RDMA (Remote Direct Memory Access), the compute nodes directly access the remote memory pool without involving remote CPUs. Hence, the ordered key-value (KV) stores (e.g., B-trees and learned indexes) keep all data sorted to provide rang query service via the high-performance network. However, existing ordered KVs fail to work well on the disaggregated memory systems, due to either consuming multiple network roundtrips to search the remote data or heavily relying on the memory nodes equipped with insufficient computing resources to process data modifications. In this paper, we propose a scalable RDMA-oriented KV store with learned indexes, called ROLEX, to coalesce the ordered KV store in the disaggregated systems for efficient data storage and retrieval. ROLEX leverages a retraining-decoupled learned index scheme to dissociate the model retraining from data modification operations via adding a bias and some data-movement constraints to learned models. Based on the operation decoupling, data modifications are directly executed in compute nodes via one-sided RDMA verbs with high scalability. The model retraining is hence removed from the critical path of data modification and asynchronously executed in memory nodes by using dedicated computing resources. Our experimental results on YCSB and real-world workloads demonstrate that ROLEX achieves competitive performance on the static workloads, as well as significantly improving the performance on dynamic workloads by up to 2.2 times than state-of-the-art schemes on the disaggregated memory systems. We have released the open-source codes for public use in GitHub.
-|confname=INFOCOM 2023
+|confname=NSDI '23
-|link=https://xplorestaging.ieee.org/document/10229025
+|link=https://www.usenix.org/system/files/fast23-li-pengfei.pdf
-|title=RecMon: A Deep Learning-based Data Recovery System for Network Monitoring
+|title=ROLEX: A Scalable RDMA-oriented Learned Key-Value Store for Disaggregated Memory Systems
-|speaker=Zhenguo
+|speaker=Haotian
-|date=2024-05-24}}
+|date=2024-05-31}}
 {{Resource:Previous_Seminars}}

Navigation menu

Difference between revisions of "Resource:Seminar"

Revision as of 11:36, 28 May 2024

Contents

Difference between revisions of "Resource:Seminar"

Revision as of 11:36, 28 May 2024

Latest

History

2024

2023

2022

2021

2020

2019

2018

2017

Instructions