Difference between revisions of "Resource:Seminar"

Revision as of 17:10, 24 October 2024

Time: 2024-10-18 10:30-12:00
Address: 4th Research Building A533
Useful links: 📚 Readling list; 📆 Schedules; 🧐 Previous seminars.

[SIGCOMM'24] Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem, Shuhong
Abstract: Cloud operators utilize collective communication optimizers to enhance the efficiency of the single-tenant, centrally managed training clusters they manage. However, current optimizers struggle to scale for such use cases and often compromise solution quality for scalability. Our solution, TE-CCL, adopts a traffic-engineering-based approach to collective communication. Compared to a state-of-the-art optimizer, TACCL, TE-CCL produced schedules with 2× better performance on topologies TACCL supports (and its solver took a similar amount of time as TACCL's heuristic-based approach). TECCL additionally scales to larger topologies than TACCL. On our GPU testbed, TE-CCL outperformed TACCL by 2.14× and RCCL by 3.18× in terms of algorithm bandwidth.
[INFOCOM'23] Cross-Camera Inference on the Constrained Edge, Xinyan
Abstract: The proliferation of edge devices has pushed computing from the cloud to the data sources, and video analytics is among the most promising applications of edge computing. Running video analytics is compute- and latency-sensitive, as video frames are analyzed by complex deep neural networks (DNNs) which put severe pressure on resource-constrained edge devices. To resolve the tension between inference latency and resource cost, we present Polly, a cross-camera inference system that enables co-located cameras with different but overlapping fields of views (FoVs) to share inference results between one another, thus eliminating the redundant inference work for objects in the same physical area. Polly’s design solves two basic challenges of cross-camera inference: how to identify overlapping FoVs automatically, and how to share inference results accurately across cameras. Evaluation on NVIDIA Jetson Nano with a real-world traffic surveillance dataset shows that Polly reduces the inference latency by up to 71.4% while achieving almost the same detection accuracy with state-of-the-art systems.
[TMC'24] CrossVision: Real-Time On-Camera Video Analysis via Common RoI Load Balancing, Xinyan
Abstract: Smart cameras with on-device deep learning inference capabilities are enabling distributed video analytics at the data source without sending raw video data over the often unreliable and congested wireless network. However, how to unleash the full potential of the computing power of the camera network requires careful coordination among the distributed cameras, catering to the uneven workload distribution and the heterogeneous computing capabilities. This paper presents CrossVision, a distributed framework for real-time video analytics, that retains all video data on cameras while achieving low inference delay and high inference accuracy. The key idea behind CrossVision is that there is a significant information redundancy in the video content captured by cameras with overlapped Field-of-Views (FoVs), which can be exploited to reduce inference workload as well as improve inference accuracy between correlated cameras. CrossVision consists of three main components to realize its function: a Region-of-Interest (RoI) Matcher that discovers video content correlation based on a segmented FoV transformation scheme; a Workload Balancer that implements a randomized workload balancing strategy based on a bulk-queuing analysis, taking into account the cameras’ predicted future workload arrivals; an Accuracy Guard that ensures that the inference accuracy is not sacrificed as redundant information is discarded. We evaluate CrossVision in a hardware-augmented simulator and on real-world cross-camera datasets, and the results show that CrossVision is able to significantly reduce inference delay while improving the inference accuracy compared to a variety of baseline approaches.

[Topic] [ The path planning algorithm for multiple mobile edge servers in EdgeGO], Rong Cong, 2020-11-18

[Mobisys20] Combating packet collisions using non-stationary signal scaling in LPWANs, Wenliang Mao, 2020-11-18
[Topic] [ Dependency-Aware and Latency-Optimal Service Cache in Edge networks], Jiwei Mo, 2020-11-18
[talk] Paper Carnival 2020, ALL, 2020-09-24,25,26

请使用Latest_seminar和Hist_seminar模板更新本页信息.

- 修改时间和地点信息
- 将当前latest seminar部分的code复制到这个页面中
- 将{{Latest_seminar... 修改为 {{Hist_seminar...，并增加对应的日期信息|date=
- 填入latest seminar各字段信息
- link请务必不要留空，如果没有link则填本页地址 https://mobinets.org/index.php?title=Resource:Seminar

格式说明
- Latest_seminar:

{{Latest_seminar
|confname=
|link=
|title=
|speaker=
}}

- Hist_seminar

{{Hist_seminar
|confname=
|link=
|title=
|speaker=
|date=
}}

@@ Line 8: / Line 8: @@
 {{Latest_seminar
-|abstract = LoRa is a promising technology that offers ubiquitous low-power IoT connectivity. With the features of multi-channel communication, orthogonal transmission, and spectrum sharing, LoRaWAN is poised to connect millions of IoT devices across thousands of logical channels. However, current LoRa gateways utilize hardwired Rx chains that cover only a small fraction (<1%) of the logical channels, limiting the potential for massive LoRa communications. This paper presents XGate, a novel gateway design that uses a single Rx chain to concurrently receive packets from all logical channels, fundamentally enabling scalable LoRa transmission and flexible network access. Unlike hardwired Rx chains in the current gateway design, XGate allocates resources including software-controlled Rx chains and demodulators based on the extracted meta information of incoming packets. XGate addresses a series of challenges to efficiently detect incoming packets without prior knowledge of their parameter configurations. Evaluations show that XGate boosts LoRa concurrent transmissions by 8.4× than state-of-the-art.
+|abstract = Cloud operators utilize collective communication optimizers to enhance the efficiency of the single-tenant, centrally managed training clusters they manage. However, current optimizers struggle to scale for such use cases and often compromise solution quality for scalability. Our solution, TE-CCL, adopts a traffic-engineering-based approach to collective communication. Compared to a state-of-the-art optimizer, TACCL, TE-CCL produced schedules with 2× better performance on topologies TACCL supports (and its solver took a similar amount of time as TACCL's heuristic-based approach). TECCL additionally scales to larger topologies than TACCL. On our GPU testbed, TE-CCL outperformed TACCL by 2.14× and RCCL by 3.18× in terms of algorithm bandwidth.
-|confname=Mobicom' 24
+|confname= SIGCOMM'24
-|link = https://dl.acm.org/doi/pdf/10.1145/3636534.3649375
+|link = https://dl.acm.org/doi/10.1145/3651890.3672249
-|title= Revolutionizing LoRa Gateway with XGate: Scalable Concurrent Transmission across Massive Logical Channels
+|title= Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem
-|speaker=Chenkai
+|speaker=Shuhong
-|date=2024-10-18
+|date=2024-10-25
 }}
 {{Latest_seminar
-|abstract = Deep learning training (DLT), e.g., large language model (LLM) training, has become one of the most important services in multitenant cloud computing. By deeply studying in-production DLT jobs, we observed that communication contention among different DLT jobs seriously influences the overall GPU computation utilization, resulting in the low efficiency of the training cluster. In this paper, we present Crux, a communication scheduler that aims to maximize GPU computation utilization by mitigating the communication contention among DLT jobs. Maximizing GPU computation utilization for DLT, nevertheless, is NP-Complete; thus, we formulate and prove a novel theorem to approach this goal by GPU intensity-aware communication scheduling. Then, we propose an approach that prioritizes the DLT flows with high GPU computation intensity, reducing potential communication contention. Our 96-GPU testbed experiments show that Crux improves 8.3% to 14.8% GPU computation utilization. The large-scale production trace-based simulation further shows that Crux increases GPU computation utilization by up to 23% compared with alternatives including Sincronia, TACCL, and CASSINI.
+|abstract = The proliferation of edge devices has pushed computing from the cloud to the data sources, and video analytics is among the most promising applications of edge computing. Running video analytics is compute- and latency-sensitive, as video frames are analyzed by complex deep neural networks (DNNs) which put severe pressure on resource-constrained edge devices. To resolve the tension between inference latency and resource cost, we present Polly, a cross-camera inference system that enables co-located cameras with different but overlapping fields of views (FoVs) to share inference results between one another, thus eliminating the redundant inference work for objects in the same physical area. Polly’s design solves two basic challenges of cross-camera inference: how to identify overlapping FoVs automatically, and how to share inference results accurately across cameras. Evaluation on NVIDIA Jetson Nano with a real-world traffic surveillance dataset shows that Polly reduces the inference latency by up to 71.4% while achieving almost the same detection accuracy with state-of-the-art systems.
-|confname=SIGCOMM' 24
+|confname= INFOCOM'23
-|link = https://dl.acm.org/doi/pdf/10.1145/3651890.3672239
+|link = https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10229045
-|title= Crux: GPU-Efficient Communication Scheduling for Deep Learning Training
+|title= Cross-Camera Inference on the Constrained Edge
-|speaker=Youwei
+|speaker=Xinyan
-|date=2024-10-18
+|date=2024-10-25
+}}
+{{Latest_seminar
+|abstract = Smart cameras with on-device deep learning inference capabilities are enabling distributed video analytics at the data source without sending raw video data over the often unreliable and congested wireless network. However, how to unleash the full potential of the computing power of the camera network requires careful coordination among the distributed cameras, catering to the uneven workload distribution and the heterogeneous computing capabilities. This paper presents CrossVision, a distributed framework for real-time video analytics, that retains all video data on cameras while achieving low inference delay and high inference accuracy. The key idea behind CrossVision is that there is a significant information redundancy in the video content captured by cameras with overlapped Field-of-Views (FoVs), which can be exploited to reduce inference workload as well as improve inference accuracy between correlated cameras. CrossVision consists of three main components to realize its function: a Region-of-Interest (RoI) Matcher that discovers video content correlation based on a segmented FoV transformation scheme; a Workload Balancer that implements a randomized workload balancing strategy based on a bulk-queuing analysis, taking into account the cameras’ predicted future workload arrivals; an Accuracy Guard that ensures that the inference accuracy is not sacrificed as redundant information is discarded. We evaluate CrossVision in a hardware-augmented simulator and on real-world cross-camera datasets, and the results show that CrossVision is able to significantly reduce inference delay while improving the inference accuracy compared to a variety of baseline approaches.
+|confname= TMC'24
+|link = https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10202594
+|title= CrossVision: Real-Time On-Camera Video Analysis via Common RoI Load Balancing
+|speaker=Xinyan
+|date=2024-10-25
 }}
 {{Resource:Previous_Seminars}}

Navigation menu

Difference between revisions of "Resource:Seminar"

Revision as of 17:10, 24 October 2024

Contents

Difference between revisions of "Resource:Seminar"

Revision as of 17:10, 24 October 2024

Latest

History

2024

2023

2022

2021

2020

2019

2018

2017

Instructions