Difference between revisions of "Resource:Seminar"

Latest revision as of 10:37, 10 April 2026

Time: 2026-04-10 10:30
Address: 4th Research Building A518
Useful links: 📚 Readling list; 📆 Schedules; 🧐 Previous seminars.

[OSDI'25] PipeThreader: Software-defined pipelining for efficient DNN execution, Junzhe
Abstract: To effectively utilize heterogeneous specialized hardware units in modern GPUs, such as TensorCores and Tensor Memory Accelerators, this paper introduces PipeThreader, a new DNN compiler. PipeThreader proposes shifting scheduling functionality from hardware to software so as to enable more efficient and sophisticated computation pipelining with minimal manual effort. This is achieved through sTask-graph, a new DNN computation abstraction, a hierarchical hardware abstraction that captures the capabilities of specialized units, and new scheduling primitives. As a result, PipeThreader can discover efficient pipeline scheduling for well-studied DNN architectures like FlashAttention, achieving comparable or even superior performance. Additionally, it can uncover novel pipeline schemes for emerging models like Mamba2, delivering significantly better performance compared to state-of-the-art hand-crafted implementations. The code is open-sourced at https://github.com/tile-ai/tilelang.

[Topic] [ The path planning algorithm for multiple mobile edge servers in EdgeGO], Rong Cong, 2020-11-18

[Mobisys20] Combating packet collisions using non-stationary signal scaling in LPWANs, Wenliang Mao, 2020-11-18
[Topic] [ Dependency-Aware and Latency-Optimal Service Cache in Edge networks], Jiwei Mo, 2020-11-18
[talk] Paper Carnival 2020, ALL, 2020-09-24,25,26

请使用Latest_seminar和Hist_seminar模板更新本页信息.

- 修改时间和地点信息
- 将当前latest seminar部分的code复制到这个页面中
- 将{{Latest_seminar... 修改为 {{Hist_seminar...，并增加对应的日期信息|date=
- 填入latest seminar各字段信息
- link请务必不要留空，如果没有link则填本页地址 https://mobinets.org/index.php?title=Resource:Seminar

格式说明
- Latest_seminar:

{{Latest_seminar
|confname=
|link=
|title=
|speaker=
}}

- Hist_seminar

{{Hist_seminar
|confname=
|link=
|title=
|speaker=
|date=
}}

@@ Line 1: / Line 1: @@
 {{SemNote
-|time='''2023-02-13 9:30'''
+|time='''2026-04-10 10:30'''
-|addr=4th Research Building A527-B
+|addr=4th Research Building A518
-|note=Useful links: [[Resource:Reading_List|Readling list]]; [[Resource:Seminar_schedules|Schedules]]; [[Resource:Previous_Seminars|Previous seminars]].
+|note=Useful links: [[Resource:Reading_List|📚 Readling list]]; [[Resource:Seminar_schedules|📆 Schedules]]; [[Resource:Previous_Seminars|🧐 Previous seminars]].
 }}
 ===Latest===
 {{Latest_seminar
-|abstract = Mobile crowd sensing (MCS) is a popular sensing paradigm that leverages the power of massive mobile workers to perform various location-based sensing tasks. To assign workers with suitable tasks, recent research works investigated mobility prediction methods based on probabilistic and statistical models to estimate the worker’s moving behavior, based on which the allocation algorithm is designed to match workers with tasks such that workers do not need to deviate from their daily routes and tasks can be completed as many as possible. In this paper, we propose a new multi-task allocation method based on mobility prediction, which differs from the existing works by (1) making use of workers’ historical trajectories more comprehensively by using the fuzzy logic system to obtain more accurate mobility prediction and (2) designing a global heuristic searching algorithm to optimize the overall task completion rate based on the mobility prediction result, which jointly considers workers’ and tasks’ spatiotemporal features. We evaluate the proposed prediction method and task allocation algorithm using two real-world datasets. The experimental results validate the effectiveness of the proposed methods compared against baselines.
+|abstract = To effectively utilize heterogeneous specialized hardware units in modern GPUs, such as TensorCores and Tensor Memory Accelerators, this paper introduces PipeThreader, a new DNN compiler. PipeThreader proposes shifting scheduling functionality from hardware to software so as to enable more efficient and sophisticated computation pipelining with minimal manual effort. This is achieved through sTask-graph, a new DNN computation abstraction, a hierarchical hardware abstraction that captures the capabilities of specialized units, and new scheduling primitives. As a result, PipeThreader can discover efficient pipeline scheduling for well-studied DNN architectures like FlashAttention, achieving comparable or even superior performance. Additionally, it can uncover novel pipeline schemes for emerging models like Mamba2, delivering significantly better performance compared to state-of-the-art hand-crafted implementations. The code is open-sourced at https://github.com/tile-ai/tilelang.
-|confname=Mobicom 2022
+|confname =OSDI'25
-|link=https://dl.acm.org/doi/pdf/10.1145/3495243.3560544
+|link = https://www.usenix.org/conference/osdi25/presentation/cheng
-|title=BSMA: Scalable LoRa networks using full duplex gateways
+|title= PipeThreader: Software-defined pipelining for efficient DNN execution
-|speaker=Kaiwen}}
+|speaker=Junzhe
-{{Latest_seminar
+|date=2026-4-9
-|abstract = On-device deep neural network (DNN) training holds the potential to enable a rich set of privacy-aware and infrastructure-independent personalized mobile applications. However, despite advancements in mobile hardware, locally training a complex DNN is still a nontrivial task given its resource demands. In this work, we show that the limited memory resources on mobile devices are the main constraint and propose Sage as a framework for efficiently optimizing memory resources for on-device DNN training. Specifically, Sage configures a flexible computation graph for DNN gradient evaluation and reduces the memory footprint of the graph using operator- and graph-level optimizations. In run-time, Sage employs a hybrid of gradient checkpointing and micro-batching techniques to dynamically adjust its memory use to the available system memory budget. Using implementation on off-the-shelf smartphones, we show that Sage enables local training of complex DNN models by reducing memory use by more than 20-fold compared to a baseline approach. We also show that Sage successfully adapts to run-time memory budget variations, and evaluate its energy consumption to show Sage's practical applicability.
+}}
-|confname=MobiSys 2022
-|link=https://dl.acm.org/doi/pdf/10.1145/3498361.3539765
-|title=Memory-efficient DNN Training on Mobile Devices
-|speaker=Wenjie}}
-{{Latest_seminar
-|abstract = We characterize production workloads of serverless DAGs at a major cloud provider. Our analysis highlights two major factors that limit performance: (a) lack of efficient communication methods between the serverless functions in the DAG, and (b) stragglers when a DAG stage invokes a set of parallel functions that must complete before starting the next DAG stage. To address these limitations, we propose WISEFUSE, an automated approach to generate an optimized execution plan for serverless DAGs for a user-specified latency objective or budget. We introduce three optimizations: (1) Fusion combines in-series functions together in a single VM to reduce the communication overhead between cascaded functions. (2) Bundling executes a group of parallel invocations of a function in one VM to improve resource sharing among the parallel workers to reduce skew. (3) Resource Allocation assigns the right VM size to each function or function bundle in the DAG to reduce the E2E latency and cost. We implement WISEFUSE to evaluate it experimentally using three popular serverless applications with different DAG structures, memory footprints, and intermediate data sizes. Compared to competing approaches and other alternatives, WISEFUSE shows significant improvements in E2E latency and cost. Specifically, for a machine learning pipeline, WISEFUSE achieves P95 latency that is 67% lower than Photons, 39% lower than Faastlane, and 90% lower than SONIC without increasing the cost.
-|confname=Proceedings of the ACM on Measurement and Analysis of Computing Systems 2022
-|link=https://dl.acm.org/doi/pdf/10.1145/3530892
-|title=WiseFuse: Workload Characterization and DAG Transformation for Serverless Workflows
-|speaker=Qinyong}}
-=== History ===
 {{Resource:Previous_Seminars}}

Navigation menu

Difference between revisions of "Resource:Seminar"

Latest revision as of 10:37, 10 April 2026

Contents

Difference between revisions of "Resource:Seminar"

Latest revision as of 10:37, 10 April 2026

Latest

History

2024

2023

2022

2021

2020

2019

2018

2017

Instructions