Difference between revisions of "Resource:Seminar"

From MobiNetS
Jump to: navigation, search
Line 8: Line 8:


{{Latest_seminar
{{Latest_seminar
|abstract = Video super-resolution (VSR) on mobile devices aims to restore high-resolution frames from their low-resolution counterparts, satisfying the requirements of performance, FLOPs and latency. On one hand, partial feature processing, as a classic and acknowledged strategy, is developed in current studies to reach an appropriate trade-off between FLOPs and accuracy. However, the splitting of partial feature processing strategy are usually performed in a blind manner, thereby reducing the computational efficiency and performance gains. On the other hand, current methods for mobile platforms primarily treat VSR as an extension of single-image super-resolution to reduce model calculation and inference latency. However, lacking inter-frame information interaction in current methods results in a suboptimal latency and accuracy trade-off. To this end, we propose a novel architecture, termed Feature Aggregating Network with Inter-frame Interaction (FANI), a lightweight yet considering frame-wise correlation VSR network, which could achieve real-time inference while maintaining superior performance. Our FANI accepts adjacent multi-frame low-resolution images as input and generally consists of several fully-connection-embedded modules, i.e., Multi-stage Partial Feature Distillation (MPFD) for capturing multi-level feature representations. Moreover, considering the importance of inter-frame alignment, we further employ a tiny Attention-based Frame Alignment (AFA) module to promote inter-frame information flow and aggregation efficiently. Extensive experiments on the well-known dataset and real-world mobile device demonstrate the superiority of our proposed FANI, which means that our FANI could be well adapted to mobile devices and produce visually pleasing results.
|abstract = Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands. Despite the wide adoption of hybrid parallel paradigms like model parallelism, expert parallelism, and expert-sharding parallelism (i.e., MP+EP+ESP) to support MoE model training on GPU clusters, the training efficiency is hindered by communication costs introduced by these parallel paradigms. To address this limitation, we propose Parm, a system that accelerates MP+EP+ESP training by designing two dedicated schedules for placing communication tasks. The proposed schedules eliminate redundant computations and communications and enable overlaps between intra-node and inter-node communications, ultimately reducing the overall training time. As the two schedules are not mutually exclusive, we provide comprehensive theoretical analyses and derive an automatic and accurate solution to determine which schedule should be applied in different scenarios. Experimental results on an 8-GPU server and a 32-GPU cluster demonstrate that Parm outperforms the state-of-the-art MoE training system, DeepSpeed-MoE, achieving 1.13× to 5.77× speedup on 1296 manually configured MoE layers and approximately 3× improvement on two real-world MoE models based on BERT and GPT-2.
|confname = ICDM‘23
|confname =INFOCOM‘24
|link = https://ieeexplore.ieee.org/abstract/document/10415812
|link = https://ieeexplore.ieee.org/abstract/document/10621327
|title= Feature Aggregating Network with Inter-Frame Interaction for Efficient Video Super-Resolution
|title= Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules
|speaker=Shuhong
|speaker=Mengqi
|date=2024-10-25
|date=2024-11-1
}}
}}
{{Latest_seminar
{{Latest_seminar
|abstract = The proliferation of edge devices has pushed computing from the cloud to the data sources, and video analytics is among the most promising applications of edge computing. Running video analytics is compute- and latency-sensitive, as video frames are analyzed by complex deep neural networks (DNNs) which put severe pressure on resource-constrained edge devices. To resolve the tension between inference latency and resource cost, we present Polly, a cross-camera inference system that enables co-located cameras with different but overlapping fields of views (FoVs) to share inference results between one another, thus eliminating the redundant inference work for objects in the same physical area. Polly’s design solves two basic challenges of cross-camera inference: how to identify overlapping FoVs automatically, and how to share inference results accurately across cameras. Evaluation on NVIDIA Jetson Nano with a real-world traffic surveillance dataset shows that Polly reduces the inference latency by up to 71.4% while achieving almost the same detection accuracy with state-of-the-art systems.
|abstract = HD map is a key enabling technology towards fully autonomous driving. We propose VI-Map, the first system that leverages roadside infrastructure to enhance real-time HD mapping for autonomous driving. The core concept of VI-Map is to exploit the unique cumulative observations made by roadside infrastructure to build and maintain an accurate and current HD map. This HD map is then fused with on-vehicle HD maps in real time, resulting in a more comprehensive and up-to-date HD map. By extracting concise bird-eye-view features from infrastructure observations and utilizing vectorized map representations, VI-Map incurs low compute and communication overhead. We conducted end-to-end evaluations of VI-Map on a real-world testbed and a simulator. Experiment results show that VI-Map can construct decentimeter-level (up to 0.3 m) HD maps and achieve real-time (up to a delay of 42 ms) map fusion between driving vehicles and roadside infrastructure. This represents a significant improvement of 2.8× and 3× in map accuracy and coverage compared to the state-of-the-art online HD mapping approaches. A video demo of VI-Map on our real-world testbed is available at https://youtu.be/p2RO65R5Ezg.
|confname= INFOCOM'23
|confname=Mobicom'23
|link = https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10229045
|link = https://dl.acm.org/doi/abs/10.1145/3570361.3613280
|title= Cross-Camera Inference on the Constrained Edge
|title= VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving
|speaker=Xinyan
|speaker=Wangyang
|date=2024-10-25
|date=2024-11-1
}}
{{Latest_seminar
|abstract = Smart cameras with on-device deep learning inference capabilities are enabling distributed video analytics at the data source without sending raw video data over the often unreliable and congested wireless network. However, how to unleash the full potential of the computing power of the camera network requires careful coordination among the distributed cameras, catering to the uneven workload distribution and the heterogeneous computing capabilities. This paper presents CrossVision, a distributed framework for real-time video analytics, that retains all video data on cameras while achieving low inference delay and high inference accuracy. The key idea behind CrossVision is that there is a significant information redundancy in the video content captured by cameras with overlapped Field-of-Views (FoVs), which can be exploited to reduce inference workload as well as improve inference accuracy between correlated cameras. CrossVision consists of three main components to realize its function: a Region-of-Interest (RoI) Matcher that discovers video content correlation based on a segmented FoV transformation scheme; a Workload Balancer that implements a randomized workload balancing strategy based on a bulk-queuing analysis, taking into account the cameras’ predicted future workload arrivals; an Accuracy Guard that ensures that the inference accuracy is not sacrificed as redundant information is discarded. We evaluate CrossVision in a hardware-augmented simulator and on real-world cross-camera datasets, and the results show that CrossVision is able to significantly reduce inference delay while improving the inference accuracy compared to a variety of baseline approaches.
|confname= TMC'24
|link = https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10202594
|title= CrossVision: Real-Time On-Camera Video Analysis via Common RoI Load Balancing
|speaker=Xinyan
|date=2024-10-25
}}
}}
{{Resource:Previous_Seminars}}
{{Resource:Previous_Seminars}}

Revision as of 11:50, 31 October 2024

Time: 2024-10-25 10:30-12:00
Address: 4th Research Building A533
Useful links: 📚 Readling list; 📆 Schedules; 🧐 Previous seminars.

Latest

  1. [INFOCOM‘24] Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules, Mengqi
    Abstract: Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands. Despite the wide adoption of hybrid parallel paradigms like model parallelism, expert parallelism, and expert-sharding parallelism (i.e., MP+EP+ESP) to support MoE model training on GPU clusters, the training efficiency is hindered by communication costs introduced by these parallel paradigms. To address this limitation, we propose Parm, a system that accelerates MP+EP+ESP training by designing two dedicated schedules for placing communication tasks. The proposed schedules eliminate redundant computations and communications and enable overlaps between intra-node and inter-node communications, ultimately reducing the overall training time. As the two schedules are not mutually exclusive, we provide comprehensive theoretical analyses and derive an automatic and accurate solution to determine which schedule should be applied in different scenarios. Experimental results on an 8-GPU server and a 32-GPU cluster demonstrate that Parm outperforms the state-of-the-art MoE training system, DeepSpeed-MoE, achieving 1.13× to 5.77× speedup on 1296 manually configured MoE layers and approximately 3× improvement on two real-world MoE models based on BERT and GPT-2.
  2. [Mobicom'23] VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving, Wangyang
    Abstract: HD map is a key enabling technology towards fully autonomous driving. We propose VI-Map, the first system that leverages roadside infrastructure to enhance real-time HD mapping for autonomous driving. The core concept of VI-Map is to exploit the unique cumulative observations made by roadside infrastructure to build and maintain an accurate and current HD map. This HD map is then fused with on-vehicle HD maps in real time, resulting in a more comprehensive and up-to-date HD map. By extracting concise bird-eye-view features from infrastructure observations and utilizing vectorized map representations, VI-Map incurs low compute and communication overhead. We conducted end-to-end evaluations of VI-Map on a real-world testbed and a simulator. Experiment results show that VI-Map can construct decentimeter-level (up to 0.3 m) HD maps and achieve real-time (up to a delay of 42 ms) map fusion between driving vehicles and roadside infrastructure. This represents a significant improvement of 2.8× and 3× in map accuracy and coverage compared to the state-of-the-art online HD mapping approaches. A video demo of VI-Map on our real-world testbed is available at https://youtu.be/p2RO65R5Ezg.

History

2024

2023

2022

2021

2020

  • [Topic] [ The path planning algorithm for multiple mobile edge servers in EdgeGO], Rong Cong, 2020-11-18

2019

2018

2017

Instructions

请使用Latest_seminar和Hist_seminar模板更新本页信息.

    • 修改时间和地点信息
    • 将当前latest seminar部分的code复制到这个页面
    • 将{{Latest_seminar... 修改为 {{Hist_seminar...,并增加对应的日期信息|date=
    • 填入latest seminar各字段信息
    • link请务必不要留空,如果没有link则填本页地址 https://mobinets.org/index.php?title=Resource:Seminar
  • 格式说明
    • Latest_seminar:

{{Latest_seminar
|confname=
|link=
|title=
|speaker=
}}

    • Hist_seminar

{{Hist_seminar
|confname=
|link=
|title=
|speaker=
|date=
}}