Difference between revisions of "Resource:Seminar"

Latest revision as of 10:37, 10 April 2026

Time: 2026-04-10 10:30
Address: 4th Research Building A518
Useful links: 📚 Readling list; 📆 Schedules; 🧐 Previous seminars.

[OSDI'25] PipeThreader: Software-defined pipelining for efficient DNN execution, Junzhe
Abstract: To effectively utilize heterogeneous specialized hardware units in modern GPUs, such as TensorCores and Tensor Memory Accelerators, this paper introduces PipeThreader, a new DNN compiler. PipeThreader proposes shifting scheduling functionality from hardware to software so as to enable more efficient and sophisticated computation pipelining with minimal manual effort. This is achieved through sTask-graph, a new DNN computation abstraction, a hierarchical hardware abstraction that captures the capabilities of specialized units, and new scheduling primitives. As a result, PipeThreader can discover efficient pipeline scheduling for well-studied DNN architectures like FlashAttention, achieving comparable or even superior performance. Additionally, it can uncover novel pipeline schemes for emerging models like Mamba2, delivering significantly better performance compared to state-of-the-art hand-crafted implementations. The code is open-sourced at https://github.com/tile-ai/tilelang.

[Topic] [ The path planning algorithm for multiple mobile edge servers in EdgeGO], Rong Cong, 2020-11-18

[Mobisys20] Combating packet collisions using non-stationary signal scaling in LPWANs, Wenliang Mao, 2020-11-18
[Topic] [ Dependency-Aware and Latency-Optimal Service Cache in Edge networks], Jiwei Mo, 2020-11-18
[talk] Paper Carnival 2020, ALL, 2020-09-24,25,26

请使用Latest_seminar和Hist_seminar模板更新本页信息.

- 修改时间和地点信息
- 将当前latest seminar部分的code复制到这个页面中
- 将{{Latest_seminar... 修改为 {{Hist_seminar...，并增加对应的日期信息|date=
- 填入latest seminar各字段信息
- link请务必不要留空，如果没有link则填本页地址 https://mobinets.org/index.php?title=Resource:Seminar

格式说明
- Latest_seminar:

{{Latest_seminar
|confname=
|link=
|title=
|speaker=
}}

- Hist_seminar

{{Hist_seminar
|confname=
|link=
|title=
|speaker=
|date=
}}

@@ Line 1: / Line 1: @@
 {{SemNote
-|time='''2022-4-22 10:20'''
+|time='''2026-04-10 10:30'''
-|addr=4th Research Building A527-B
+|addr=4th Research Building A518
-|note=Useful links: [[Resource:Reading_List|Readling list]]; [[Resource:Seminar_schedules|Schedules]]; [[Resource:Previous_Seminars|Previous seminars]].
+|note=Useful links: [[Resource:Reading_List|📚 Readling list]]; [[Resource:Seminar_schedules|📆 Schedules]]; [[Resource:Previous_Seminars|🧐 Previous seminars]].
 }}
 ===Latest===
 {{Latest_seminar
-|abstract = Heterogeneity across clients in federated learning (FL) usually hinders the optimization convergence and generalization performance when the aggregation of clients’ knowledge occurs in the gradient space. For example, clients may differ in terms of data distribution, network latency, input/output space, and/or model architecture, which can easily lead to the misalignment of their local gradients. To improve the tolerance to heterogeneity, we propose a novel federated prototype learning (FedProto) framework in which the clients and server communicate the abstract class prototypes instead of the gradients. FedProto aggregates the local prototypes collected from different clients, and then sends the global prototypes back to all clients to regularize the training of local models. The training on each client aims to minimize the classification error on the local data while keeping the resulting local prototypes sufficiently close to the corresponding global ones. Moreover, we provide a theoretical analysis to the convergence rate of FedProto under non-convex objectives. In experiments, we propose a benchmark setting tailored for heterogeneous FL, with FedProto outperforming several recent FL approaches on multiple datasets. Code is available at https://github.com/yuetan031/fedproto.
+|abstract = To effectively utilize heterogeneous specialized hardware units in modern GPUs, such as TensorCores and Tensor Memory Accelerators, this paper introduces PipeThreader, a new DNN compiler. PipeThreader proposes shifting scheduling functionality from hardware to software so as to enable more efficient and sophisticated computation pipelining with minimal manual effort. This is achieved through sTask-graph, a new DNN computation abstraction, a hierarchical hardware abstraction that captures the capabilities of specialized units, and new scheduling primitives. As a result, PipeThreader can discover efficient pipeline scheduling for well-studied DNN architectures like FlashAttention, achieving comparable or even superior performance. Additionally, it can uncover novel pipeline schemes for emerging models like Mamba2, delivering significantly better performance compared to state-of-the-art hand-crafted implementations. The code is open-sourced at https://github.com/tile-ai/tilelang.
-|confname= AAAI 2022
+|confname =OSDI'25
-|link=https://www.aaai.org/AAAI22Papers/AAAI-6846.YueT.pdf
+|link = https://www.usenix.org/conference/osdi25/presentation/cheng
-|title= FedProto: Federated Prototype Learning across Heterogeneous Clients
+|title= PipeThreader: Software-defined pipelining for efficient DNN execution
-|speaker=Jianqi
+|speaker=Junzhe
-}}
+|date=2026-4-9
-{{Latest_seminar
-|abstract = The Edge-based Multi-agent visual SLAM plays a key role in emerging mobile applications such as search-and-rescue, inventory automation, and drone grouping. This algorithm relies on a central node to maintain the global map and schedule agents to execute their individual tasks. However, as the number of agents continues growing, the operational overhead of the visual SLAM system such as data redundancy, bandwidth consumption, and localization errors also scale, which challenges the system scalability. In this paper, we present the design and implementation of SwarmMap, a framework design that scales up collaborative visual SLAM service in edge offloading settings. At the core of SwarmMap are three simple yet effective system modules — a change log-based server-client synchronization mechanism, a priority-aware task scheduler, and a lean representation of the global map that work hand-in-hand to address the data explosion caused by the growing number of agents. We make SwarmMap compatible with the robotic operating system (ROS) and open-source it. Existing visual SLAM applications could incorporate SwarmMap to enhance their performance and capacity in multi-agent scenarios. Comprehensive evaluations and a three-month case study at one of the world's largest oil fields demonstrate that SwarmMap can serve 2× more agents (>20 agents) than the state of the arts with the same resource overhead, meanwhile maintaining an average trajectory error of 38cm, outperforming existing works by > 55%.
-|confname= NSDI 2022
-|link=https://www.usenix.org/system/files/nsdi22-paper-xu_jingao.pdf
-|title=SwarmMap: Scaling Up Real-time Collaborative Visual SLAM at the Edge
-|speaker=Jianfei
 }}
-=== History ===
 {{Resource:Previous_Seminars}}

Navigation menu

Difference between revisions of "Resource:Seminar"

Latest revision as of 10:37, 10 April 2026

Contents

Difference between revisions of "Resource:Seminar"

Latest revision as of 10:37, 10 April 2026

Latest

History

2024

2023

2022

2021

2020

2019

2018

2017

Instructions