Difference between revisions of "Resource:Seminar"

Revision as of 05:38, 28 November 2025

Time: 2025-11-21 10:30
Address: 4th Research Building A518
Useful links: 📚 Readling list; 📆 Schedules; 🧐 Previous seminars.

Latest

[ToN'25] Spliceosome: On-Camera Video Thinning and Tuning for Timely and Accurate Analytics, Zhongwei Sun
Abstract: Running deep neural networks (DNNs) on large-scale videos from widely distributed cameras presents two significant challenges. Firstly, video quality for analytical purposes is severely impacted by the camera deployment environment, which is termed Pixel Recession in this paper. Secondly, low-latency video streaming from the source camera to edge servers is greatly hindered by the rapid expansion of video traffic. Despite numerous efforts such as enhancing the video structure, uneven encoding, and filtering frames captured on camera, these methods have proven insufficient to address the challenges at hand. We propose Spliceosome, a novel video analytics system that effectively overcomes the pixel recession and streaming bottlenecks. In brief, Spliceosome 1) recovers from pixel recession by adaptive video knobs (i.e., brightness and contrast) tuning in ARP (anchor region proposal) granularity, and 2) lowers the transmission volume by video thinning, which uses only single-channel information for video encoding. We implemented Spliceosome using only commercial off-the-shelf hardware. Our experimental results demonstrate that Spliceosome outperforms other alternative designs by 4.71-14.47%, 40.94-58.71%, and 14.28% in detection accuracy, end-to-end delay, and efficiency of DNNs inference, respectively.
[NSDI'25] Accelerating Design Space Exploration for LLM Training Systems with Multi-experiment Parallel Simulation, Qinyong
Abstract: The rapid expansion of large language models (LLMs) requires the development of extensive GPU clusters, with companies deploying clusters with tens to hundreds of thousands of GPUs. This growth significantly expands the design space for LLM training systems, requiring thorough exploration of different parallelization strategies, communication parameters, congestion control, fabric topology, etc. Current methods require up to 10k simulation experiments to identify optimal configurations, with inadequate exploration leading to significant degradation of training performance. In this paper, we tackle the overlooked problem of efficiently conducting parallel simulation experiments for design space exploration. Our analysis and experiments show that Single-process Multi-experiment (SPME) achieves superior performance by reducing scheduling overhead and optimizing resource utilization, yet remains insufficient for current AI cluster scales. To enhance SPME’s efficacy, we introduce Multiverse, a novel GPU-based AI training simulator. Multiverse leverages the computing throughput of GPUs efficiently with optimizations such as a pull-based synchronization, highfidelity intra-server communication, and a kernel-fusion technique. Extensive experiments validate the accuracy and efficiency of Multiverse, demonstrating less than 3.0% discrepancy with real-world LLM training on clusters of up to 54,000 GPUs, achieving 43.1−73.2X speedup over state-of-the-art CPU-based simulators in various use cases.

History

[ToN'25] Spliceosome: On-Camera Video Thinning and Tuning for Timely and Accurate Analytics, Zhongwei Sun, 2025-11-28{{Hist_seminar

|abstract =The rapid expansion of large language models (LLMs) requires the development of extensive GPU clusters, with companies deploying clusters with tens to hundreds of thousands of GPUs. This growth significantly expands the design space for LLM training systems, requiring thorough exploration of different parallelization strategies, communication parameters, congestion control, fabric topology, etc. Current methods require up to 10k simulation experiments to identify optimal configurations, with inadequate exploration leading to significant degradation of training performance. In this paper, we tackle the overlooked problem of efficiently conducting parallel simulation experiments for design space exploration. Our

[ASAP'25] ReaLLM: A Trace-Driven Framework for Rapid Simulation of Large-Scale LLM Inference, JunZhe, 2025-11-21
[ICDE'25] Effective Task Assignment in Mobility Prediction-Aware Spatial Crowdsourcing, Zhenguo, 2025-11-21
[INFOCOM'25] QuESat: Satellite-Assisted Quantum Internet for Global-Scale Entanglement Distribution, Yaliang, 2025-11-07
[INFOCOM'25] GeoLM: Performance-oriented Leader Management for Geo-Distributed Consensus Protocol, Linqi Liu, 2025-11-07
[Sensys'24] MagicStream: Bandwidth-conserving Immersive Telepresence via Semantic Communication, Mengfan Wang, 2025-10-31
[Systems Joural] Collaborative Task Offloading for LEO Satellite Internet of Things: A Novel Computing Coordinate Graph-Based Approach, Yifei Zhou, 2025-10-31
[Sensys'24] FDLoRa: Tackling Downlink-Uplink Asymmetry with Full-duplex LoRa Gateways, Kai Chen, 2025-10-23
[NSDI'25] ONCache: A Cache-Based Low-Overhead Container Overlay Network, Daobing Zeng, 2025-10-24
[Arxiv] HyperCam: Low-Power Onboard Computer Vision for IoT Cameras, Menghao Liu, 2025-10-17
[SIGCOMM'25 (short paper)] NIER: Practical Neural-enhanced Low-bitrate Video Conferencing, Xinyan Wang, 2025-9-26
[INFOCOM'25] HyperJet: Joint Communication and Computation Scheduling for Hypergraph Tasks in Distributed Edge Computing, Yi Zhou, 2025-9-26
[NSDI'25] Large Network UWB Localization: Algorithms and Implementation, Bangguo, 2025-9-26
[NSDI'25] Dissecting and Streamlining the Interactive Loop of Mobile Cloud Gaming, Li Chen, 2025-9-9
[APNet'25] FlexSpark: Robust and Efficient Multi-Device Collaborative Inference over Wireless Network, Ruizhen, 2025-9-19
[Mobisys'25] RISENSE: Long-Range In-Band Wireless Control of Passive Reconfigurable Intelligent Surfaces, Haifeng, 2025-9-12
[Mobicom'25] L3GS: Layered 3D Gaussian Splats for Efficient 3D Scene Delivery, Jiyi, 2025-9-12
[Begin of new semester] Paper Carnival 2025, All, 2025-08-27
[JSAC'24] ISCom: Interest-Aware Semantic Communication Scheme for Point Cloud Video Streaming on Metaverse XR Devices, Jiyi, 2025-06-13
[TUTORIAL] Idea share, OldBee, 2025-06-13
[IDEA] ReDream: Residual Feature-Driven Mixed Sparse Coding for Model Partitioning, Xianyang, 2025-05-23
[Mobisys'24] CACTUS: Dynamically Switchable Context-aware micro-Classifiers for Efficient IoT Inference, Zhenhua, 2025-04-18
[TC'24] A GPU-Enabled Real-Time Framework for Compressing and Rendering Volumetric Videos, Mengfan, 2025-04-18
[INFOCOM24'] Efficient and Straggler-Resistant Homomorphic Encryption for Heterogeneous Federated Learning, Dongting, 2025-03-28
[INFOCOM25'] Link Configuration for Fidelity-Constrained Entanglement Routing in Quantum Networks, Yaliang, 2025-03-27
[Arxiv] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search, Qinyong, 2025-03-14
[ToN'23] LiFi for Low-Power and Long-Range RF Backscatter, Mengyu, 2025-03-14
[NSDI'25] Region-based Content Enhancement for Efficient Video Analytics at the Edge, Xinyan, 2025-03-07
[AAAI'22] Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer, Bairong, 2025-03-07
[NSDI'24] Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed, Youwei, 2025-02-28
[IOPSCIENCE'21] SeQUeNCe: a customizable discrete-event simulator of quantum networks, Junzhe, 2025-02-21
[CISCE'24] A Long Distance Environmental Monitoring System Based on Low Power IoT, Ayesha Rasool, 2025-02-21
[MobiCom'24] Soar: Design and Deployment of A Smart Roadside Infrastructure System for Autonomous Driving, Jiahao, 2025-01-10
[MobiSys'24] Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs, Jiale, 2025-01-10
[MobiCom'24] MuV2: Scaling up Multi-user Mobile Volumetric Video Streaming via Content Hybridization and Sharing, Jiyi, 2025-01-03
[MobiCom'24] M2HO: Mitigating the Adverse Effects of 5G Handovers on TCP, Jiacheng, 2025-01-03

[Topic] [ The path planning algorithm for multiple mobile edge servers in EdgeGO], Rong Cong, 2020-11-18

[Mobisys20] Combating packet collisions using non-stationary signal scaling in LPWANs, Wenliang Mao, 2020-11-18
[Topic] [ Dependency-Aware and Latency-Optimal Service Cache in Edge networks], Jiwei Mo, 2020-11-18
[talk] Paper Carnival 2020, ALL, 2020-09-24,25,26

[INFOCOM'20] Optimizing Federated Learning on Non-IID Data with Reinforcement Learning, YuHong Jiang, 2020-5-16

[INFOCOM'20] Joint Optimization of Signal Design and Resource Allocation in Wireless D2D Edge Computing, Shiqi Hu, 2020-4-20
[INFOCOM'20] LiteNap: Downclocking LoRa Reception, Wenliang Mao, 2020-4-13
[INFOCOM'20] Delay-Optimal Distributed Edge Computing in Wireless Edge Networks, Chang Shu, 2020-3-30
[IoTJ 2018] Over-the-Air Computation for IoT Networks: Computing Multiple Functions With Antenna Arrays, Yuhong Jiang, 2020-3-23
[SIGCOMM'19] RF-based Inertial Measurement, Weifeng Gao, 2020-3-16

2019

[ICDCS'19] FRAME: Fault Tolerant and Real-Time Messaging for Edge Computing, Xiaosong Wang, 2019-12-25
[INFOCOM'19] Intelligent Edge-Assisted Crowdcast with Deep Reinforcement Learning for Personalized QoE, Hengwei Deng, 2019-12-25
[ieee communications magazine'18] Orchestration of Microservices for IoT Using Docker and Edge Computing, Changsheng Liu, 2019-12-17
[Computer Science'13] Playing Atari with Deep Reinforcement Learning, Jie Zhang, 2019-12-17
[ICNP'19] Exploiting Rateless Codes and Cross-Layer Optimization for Low-Power Wide-Area Networks, Silin Feng, 2019-11-13
[ICDCS'19] DMRA: A Decentralized Resource Allocation Scheme for Multi-SP Mobile Edge Computing, Jiwei Mo, 2019-11-13
[MobiCom'19] Edge Assisted Real-time Object Detection for MobileAugmented Reality, Yunpeng Han, 2019-11-06
[NSDI'20] Frequency Configuration for Low-Power Wide-Area Networks in a Heartbeat, Xiong Wang, 2019-11-06
[MobiSys'16] Mobility Modeling and Prediction in Bike-Sharing Systems, Anqi Yang, 2019-10-30
[Tech. Rep.] LoRa Localization, Xuan Yang, 2019-10-30
[SigComm'19] E2E: Embracing User Heterogeneity to ImproveQuality of Experience on the Web, Jingwei Li, 2019-10-23
[ICDCS'19] CMFL: Mitigating Communication Overhead for Federated Learning, Yuhong Jiang, 2019-10-23
[Tech.Rep.] Report on LoRa reliable protocols, Wenliang Mao, 2019-10-16
[ICDCS'19] Computation Offloading for Mobile-Edge Computing with Multi-user, Chang Shu, 2019-10-16

[Paper_Carnival_2019] Paper Carnival 2019, ALL, 2019-09-28,29,30
[INFOCOM'19] Octans: Optimal Placement of Service Function Chains in Many-Core Systems, Yuntong Zhang, 2019-05-22
[INFOCOM'19] Adaptive Interference-Aware VNF Placement for Service-Customized 5G Network Slices, Zhe Wang, 2019-05-22
[Tech. Rep.] Recent progress and further trends on EdgeCloudSim, Yunpeng Han, 2019-04-19
[MobiCom'19] mD-Track: Leveraging Multi-Dimensionality for Passive Indoor Wi-Fi Tracking, Xuan Yang, 2019-04-19
[NSDI'19] Correctness and Performance for Stateful Chained Network Functions, Yunpeng Han, 2019-04-19
[INFOCOM'19] Charging Oriented Sensor Placement and Flexible Scheduling in Rechargeable WSN, Wenjie Huang, 2019-04-12
[SIGCOMM'13] Developing a Predictive Model of Quality of Experience for Internet Video, Yuhong Jiang, 2019-04-12
[INFOCOM'19] Brush like a Dentist: Accurate Monitoring of Toothbrushing via Wrist-Worn Gesture Sensing, Jingwei Li, 2019-03-29
[INFOCOM'19] Nomad: An Efficient Consensus Approach for Latency-Sensitive Edge-Cloud Applications, Anqi Yang, 2019-03-29
[INFOCOM'19] Winning at the Starting Line: Joint Network Selection and Service Placement for Mobile Edge Computing, Chang Shu, 2019-03-22
[INFOCOM'19] Interference Recycling: Exploiting Interfering Signals to Enhance Data Transmission, Wenliang Mao, 2019-03-22
[COMST'18] Small Cells in the Forthcoming 5G/IoT: Traffic Modeling and Deployment Overview, Anqi Yang, 2019-01-04

2018

[SIGCOMM'18] Elastic Sketch: Adaptive and Fast Network-wide Measurements, Wenliang Mao, 2018-12-21
[TMC'17] Performance analysis of mobile data offloading in heterogeneous networks, Yunpeng Han, 2018-12-06
[COMST'18] Small Cells in the Forthcoming 5G/IoT: Traffic Modelling and Deployment Overview, Anqi Yang, 2018-12-06
[TMC'17] A Reliability-Augmented Particle Filter for Magnetic Fingerprinting based Indoor Localization on Sma, Wenjie Huang, 2018-11-30
[ToN'18] A Distributed Computation Offloading Strategy in Small-Cell Networks Integrated With Mobile Edge Computing, Yuhong Jiang, 2018-11-23
[ICNP'18] Networking Support For Physical-Layer Cross-Technology Communication, Jingwei Li, 2018-11-23
[IoT Journal'18] Mobile-Edge Computation Offloading for Ultra-Dense IoT Networks, Chang Shu, 2018-11-16
[IPSN'17] BLEnd: Practical Continuous Neighbor Discovery for Bluetooth Low Energy, Minghang Yang, 2018-11-16
[Topic] LoRa Applications (two papers), Xinyuan Huang, 2018-10-19
[TMC'17] Static and Mobile Target k-Coverage in Wireless Rechargeable Sensor Networks, Shuowei Chen, 2018-10-19
[EWSN'17] MOR: Multichannel Opportunistic Routing for Wireless Sensor Networks, Xuan Yang, 2018-10-12
[TMC'17] Hermes: Latency Optimal Task Assignment for Resource-constrained Mobile Computing, Yunpeng Han, 2018-10-12
[MobiCom'17] FoggyCache: Cross-Device Approximate Computation Reuse, Jingwei Li, 2018-09-30
[INFOCOM'18] Knowledge-centric proactive edge caching over mobile content distribution network, Anqi Yang, 2018-09-21
[TWC'18] Enhancing Video Rate Adaptation with Mobile Edge Computing and Caching in Software-defined Mobile Ne, Yuhong Jiang, 2018-09-21
[INFOCOM'18] Dynamic,Latency-Optimal vNF Placement at the Network Edge, Latency-Optimal vNF Placement at the Network Edge, Chang Shu
[TMC'17] Neighbor Discovery and Rendezvous Maintenance with Extended Quorum Systems for Mobile Applications, Minghang Yang, 2018-09-14
[Special Session] 3-day discussion on recent papers in wireless,networking and mobile, networking and mobile</a>, Chang Shu
[IPSN'18] Charm: Exploiting Geographical Diversity Through Coherent Combining in Low-Power Wide-Area Networks, Weifeng Gao, 2018-06-15
[INFOCOM'18] Adaptive VNF Scaling and Flow Routing with Proactive Demand Prediction, Chang Shu, 2018-06-15
[ComMag'17] The Algorithmic Aspects of Network Slicing, Yunpeng Han, 2018-06-08
[IPSN'18] Continuous Wireless Link Rates for Internet of Things, Luqi Yang, 2018-06-08
[INFOCOM'18] TwinBee: Reliable Physical-Layer Cross-Technology Communication with Symbol-Level Coding, Xinyuan Huang, 2018-06-01
[Invited Tech.Rep.] Report on recent research progress, Songfan Li, 2018-06-01
[Special Session] Scheduling Algorithms for Resource-Constrained Systems, Prof. Dakai Zhu from UTSA, 2018-05-28
[CVPR'17] Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning, Hui Cao, 2018-05-21
[INFOCOM'18] Self-Adapting Quorum-Based Neighbor Discovery in Wireless Sensor Networks, Minghang Yang, 2018-05-21
[Special Session] From Location to Activity: Human-centric Sensing and Analytics, Prof. Tao Gu, 2018-05-11
[INFOCOM'18] [# LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals], Shuowei Chen, 2018-04-27
[JSAC'17] QoE-Aware and Reliable Traffic Steering for Service Function Chaining in Mobile Networks, Zhe Wang, 2018-04-27
[JSAC'17] Distributed Service Function Chaining, Yuntong Zhang, 2018-04-13
[INFOCOM'18] Joint Service Caching and Task Offloading for Mobile Edge Computing in Dense Networks, Zi Wang, 2018-04-13
[INFOCOM'18] One-Hop Out-of-Band Control Planes for Low-Power Multi-Hop Wireless Networks, Chang Shu, 2018-03-16
[SigComm'16] OpenBox: A Software-De?ned Framework for Developing,Deploying,and Managing Network Functions, Deploying, and Managing Network Functions</a>
[SigComm'17] Empowering Low-Power Wide Area Networks in Urban Settings, Weifeng Gao, 2018-02-02
[ComMag16] Hypergraph Theory: Applications in 5G Heterogeneous Ultra-Dense Networks, Yunpeng Han, 2018-01-26
[TCST'17] Optimal UAV Route Planning for Coverage Search of Stationary Target in River, Hui Cao, 2018-01-26

2017

[MobiCom'17] ReflexCode: Coding with Superposed Reflection Light for LED-Camera Communication, Xinyuan Huang, 2017-12-08
[Proc. IEEE 2016] Using Smart Edge IoT Devices for Safer,Rapid Response With Industry IoT Control Operations, Rapid Response With Industry IoT Control Operations</a>, Minghang Yang
[INFOCOM'17] Approximation Algorithms for The NFV Service Distribution Problem, Yuntong Zhang, 2017-11-24
[CCS'17] DolphinAtack: Inaudible Voice Commands, Zifei Zhao, 2017-11-24
[NSDI'17] Improving User Perceived Page Load Times Using Gaze, Yaoyao Pang, 2017-11-17
[CoNEXT'16] Flurries: Countless Fine-Grained NFs for Flexible Per-Flow Customization, Zhe Wang, 2017-11-17
[MobiCom'17] PassiveVLC: Enabling Practical Visible Light Backscatter Communication for Battery-free IoT Applicat, Weifeng Gao, 2017-11-10
[INFOCOM'17] Service Chain Embedding with Maximum Flow in Software-defined Network and Application to The Next-Ge, Chang Shu, 2017-11-10
[INFOCOM'17] BAC: Bandwidth-Aware Compression for EfficientLive Migration of Virtual Machines, Yunpeng Han, 2017-11-03
[MobiCom'17] WEBee: Physical-Layer Cross-Technology Communication via Emulation, Shuowei Chen, 2017-11-03
[SigComm'17] NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains, Hui Cao, 2017-10-27
[MobiCom'17] Continuous Authentication for Voice Assistants, Heng Yuan, 2017-10-27
[TWC'17] Computation Offloading and Resource Allocation in Wireless Cellular Networks With Mobile Edge Computing, Xinyuan Huang, 2017-10-20
[ToN'17] Chase: Taming concurrent broadcast for flooding in asynchronous duty cycle networks, Minghang Yang, 2017-10-20
[TOSN'17] Improving Performance of Synchronous Transmission-Based Protocols Using Capture Effect over Multicha, Luqi Yang, 2017-10-13
[SigComm'17] Dynamic Service Chaining with Dysco, Chang Shu, 2017-10-13
[INFOCOM'17] ER: Early Recognition of Inattentive Driving Leveraging Audio Devices on Smartphones, Zifei Zhao, 2017-09-29
[INFOCOM'17] LightTouch: Securely Connecting Wearables to Ambient Displays with User Intent, Yaoyao Pang, 2017-09-22
[INFOCOM'17] Traffic Aware Placement of Interdependent NFV Middleboxes, Zhe Wang, 2017-09-22
[SigComm'17] NFP: Enabling Network Function Parallelism in NFV, Yuntong Zhang, 2017-09-22
[NFV-SDN'16] Efficient service Graph Embedding: A Practical Approach, Chang Shu, 2017-09-11
[SenSys'17] Network-wide Consensus Utilizing the Capture Effect in Low-power Wireless Networks, Weifeng Gao, 2017-09-11
[INFOCOM'17] Survivable and Bandwidth Guaranteed Embe, Yuntong Zhang, 2017-06-26
[MobiCom'15] [ Survivable and Bandwidth Guaranteed Embedding of Virtual Clusters in Cloud Data Centers], Yuntong Zhang, 2017-06-26
[MobiCom'15] Keystroke Recognition Using WiFi Signals, Weiwang Li, 2017-06-26

Instructions

请使用Latest_seminar和Hist_seminar模板更新本页信息.

- 修改时间和地点信息
- 将当前latest seminar部分的code复制到这个页面中
- 将{{Latest_seminar... 修改为 {{Hist_seminar...，并增加对应的日期信息|date=
- 填入latest seminar各字段信息
- link请务必不要留空，如果没有link则填本页地址 https://mobinets.org/index.php?title=Resource:Seminar

格式说明
- Latest_seminar:

{{Latest_seminar
|confname=
|link=
|title=
|speaker=
}}

- Hist_seminar

{{Hist_seminar
|confname=
|link=
|title=
|speaker=
|date=
}}

@@ Line 8: / Line 8: @@
 {{Latest_seminar
-|abstract = As Large Language Models (LLMs) continue to scale, optimizing their deployment requires efficient hardware and system co-design. However, current LLM performance evaluation frameworks fail to capture both chip-level execution details and system-wide behavior, making it difficult to assess realistic performance bottlenecks. In this work, we introduce ReaLLM, a trace-driven simulation framework designed to bridge the gap between detailed accelerator design and large-scale inference evaluation. Unlike prior simulators, ReaLLM integrates kernel profiling derived from detailed microarchitectural simulations with a new trace-driven end-to-end system simulator, enabling precise evaluation of parallelism strategies, batching techniques, and scheduling policies. To address the high computational cost of exhaustive simulations, ReaLLM constructs a precomputed kernel library based on hypothesized scenarios, interpolating results to efficiently explore a vast design space of LLM inference systems. Our validation against real hardware demonstrates the framework's accuracy, achieving an average end-to-end latency prediction error of only 9.1% when simulating inference tasks running on 4 NVIDIA H100 GPUs. We further use ReaLLM to evaluate popular LLMs' end-to-end performance across traces from different applications and identify key system bottlenecks, showing that modern GPU-based LLM inference is increasingly compute-bound rather than memory-bandwidth bound at large scale. Additionally, we significantly reduce simulation time with our precomputed kernel library by a factor of 6× for full-simulations and 164× for workload SLO exploration. ReaLLM is open-source and available at https://github.com/bespoke-silicon-group/reallm..
+|abstract = Running deep neural networks (DNNs) on large-scale videos from widely distributed cameras presents two significant challenges. Firstly, video quality for analytical purposes is severely impacted by the camera deployment environment, which is termed Pixel Recession in this paper. Secondly, low-latency video streaming from the source camera to edge servers is greatly hindered by the rapid expansion of video traffic. Despite numerous efforts such as enhancing the video structure, uneven encoding, and filtering frames captured on camera, these methods have proven insufficient to address the challenges at hand. We propose Spliceosome, a novel video analytics system that effectively overcomes the pixel recession and streaming bottlenecks. In brief, Spliceosome 1) recovers from pixel recession by adaptive video knobs (i.e., brightness and contrast) tuning in ARP (anchor region proposal) granularity, and 2) lowers the transmission volume by video thinning, which uses only single-channel information for video encoding. We implemented Spliceosome using only commercial off-the-shelf hardware. Our experimental results demonstrate that Spliceosome outperforms other alternative designs by 4.71-14.47%, 40.94-58.71%, and 14.28% in detection accuracy, end-to-end delay, and efficiency of DNNs inference, respectively.
-|confname =ASAP'25
+|confname =ToN'25
-|link = https://ieeexplore.ieee.org/abstract/document/11113621
+|link = https://ieeexplore.ieee.org/abstract/document/10843977
-|title= ReaLLM: A Trace-Driven Framework for Rapid Simulation of Large-Scale LLM Inference
+|title= Spliceosome: On-Camera Video Thinning and Tuning for Timely and Accurate Analytics
-|speaker=JunZhe
+|speaker=Zhongwei Sun
-|date=2025-11-21
+|date=2025-11-28
 }}{{Latest_seminar
-|abstract =With the proliferation of mobile devices, spatial crowdsourcing has emerged as a promising paradigm for facilitating location-based services, encompassing various applications across academia and industries. Recently, pioneering works have attempted to infer workers' mobility patterns from historical data to improve the quality of task assignment. However, these studies have overlooked or under-examined issues such as the dynamic mobility patterns of crowd workers, especially in the context of newcomers, the misalignment between the objectives of mobility prediction and task assignment, and the effective utilization of predicted mobility patterns. In this paper, we investigate a problem we term Task Assignment in Mobility Prediction-aware Spatial Crowdsourcing (TAMP). To address the TAMP problem, we first propose a task-adaptive meta-learning algorithm, which trains a set of specific meta-knowledge for workers' mobility prediction models through game theory-based learning task clustering and meta-training within each cluster. Then, we design a task assignment-oriented loss function and develop a task assignment algorithm that incorporates prediction performance, prioritizing assignments with higher confidence of completion. Extensive experiments on real-world datasets validate that our proposed methods can effectively improve the quality of task assignment.
+|abstract =The rapid expansion of large language models (LLMs) requires the development of extensive GPU clusters, with companies deploying clusters with tens to hundreds of thousands of GPUs. This growth significantly expands the design space for LLM training systems, requiring thorough exploration of different parallelization strategies, communication parameters, congestion control, fabric topology, etc. Current methods require up to 10k simulation experiments to identify optimal configurations, with inadequate exploration leading to significant degradation of training performance. In this paper, we tackle the overlooked problem of efficiently conducting parallel simulation experiments for design space exploration. Our analysis and experiments show that Single-process Multi-experiment (SPME) achieves superior performance by reducing scheduling overhead and optimizing resource utilization, yet remains insufficient for current AI cluster scales. To enhance SPME’s efficacy, we introduce Multiverse, a novel GPU-based AI training simulator. Multiverse leverages the computing throughput of GPUs efficiently with optimizations such as a pull-based synchronization, highfidelity intra-server communication, and a kernel-fusion technique. Extensive experiments validate the accuracy and efficiency of Multiverse, demonstrating less than 3.0% discrepancy with real-world LLM training on clusters of up to 54,000 GPUs, achieving 43.1−73.2X speedup over state-of-the-art CPU-based simulators in various use cases.
-|confname =ICDE'25
+|confname =NSDI'25
-|link = https://ieeexplore.ieee.org/document/11113007
+|link = https://www.usenix.org/conference/nsdi25/presentation/gui
-|title= Effective Task Assignment in Mobility Prediction-Aware Spatial Crowdsourcing
+|title= Accelerating Design Space Exploration for LLM Training Systems with Multi-experiment Parallel Simulation
-|speaker= Zhenguo
+|speaker=Qinyong
-|date=2025-11-21
+|date=2025-11-28
 }}
 {{Resource:Previous_Seminars}}

Navigation menu

Difference between revisions of "Resource:Seminar"

Revision as of 05:38, 28 November 2025

Contents

Latest

History

2024

2023

2022

2021

2020

2019

2018

2017

Instructions