Difference between revisions of "Resource:Seminar"

Latest revision as of 10:37, 10 April 2026

Time: 2026-04-10 10:30
Address: 4th Research Building A518
Useful links: 📚 Readling list; 📆 Schedules; 🧐 Previous seminars.

[OSDI'25] PipeThreader: Software-defined pipelining for efficient DNN execution, Junzhe
Abstract: To effectively utilize heterogeneous specialized hardware units in modern GPUs, such as TensorCores and Tensor Memory Accelerators, this paper introduces PipeThreader, a new DNN compiler. PipeThreader proposes shifting scheduling functionality from hardware to software so as to enable more efficient and sophisticated computation pipelining with minimal manual effort. This is achieved through sTask-graph, a new DNN computation abstraction, a hierarchical hardware abstraction that captures the capabilities of specialized units, and new scheduling primitives. As a result, PipeThreader can discover efficient pipeline scheduling for well-studied DNN architectures like FlashAttention, achieving comparable or even superior performance. Additionally, it can uncover novel pipeline schemes for emerging models like Mamba2, delivering significantly better performance compared to state-of-the-art hand-crafted implementations. The code is open-sourced at https://github.com/tile-ai/tilelang.

[Topic] [ The path planning algorithm for multiple mobile edge servers in EdgeGO], Rong Cong, 2020-11-18

[Mobisys20] Combating packet collisions using non-stationary signal scaling in LPWANs, Wenliang Mao, 2020-11-18
[Topic] [ Dependency-Aware and Latency-Optimal Service Cache in Edge networks], Jiwei Mo, 2020-11-18
[talk] Paper Carnival 2020, ALL, 2020-09-24,25,26

请使用Latest_seminar和Hist_seminar模板更新本页信息.

- 修改时间和地点信息
- 将当前latest seminar部分的code复制到这个页面中
- 将{{Latest_seminar... 修改为 {{Hist_seminar...，并增加对应的日期信息|date=
- 填入latest seminar各字段信息
- link请务必不要留空，如果没有link则填本页地址 https://mobinets.org/index.php?title=Resource:Seminar

格式说明
- Latest_seminar:

{{Latest_seminar
|confname=
|link=
|title=
|speaker=
}}

- Hist_seminar

{{Hist_seminar
|confname=
|link=
|title=
|speaker=
|date=
}}

@@ Line 1: / Line 1: @@
 {{SemNote
-|time='''2024-11-8 10:30-12:00'''
+|time='''2026-04-10 10:30'''
 |addr=4th Research Building A518
 |note=Useful links: [[Resource:Reading_List|📚 Readling list]]; [[Resource:Seminar_schedules|📆 Schedules]]; [[Resource:Previous_Seminars|🧐 Previous seminars]].
@@ Line 8: / Line 8: @@
 {{Latest_seminar
-|abstract = Collaborative inference is the current state-of-the-art solution for mobile-server neural network inference offloading. However, we find that existing collaborative inference solutions only focus on partitioning the DNN computation, which is only a small part of achieving an efficient DNN offloading system. What ultimately determines the performance of DNN offloading is how the execution system utilizes the characteristics of the given DNN offloading task on the mobile, network, and server resources of the offloading environment. To this end, we design CoActo, a DNN execution system built from the ground up for mobile-server inference offloading. Our key design philosophy is Coactive Inference Offloading, which is a new, improved concept of DNN offloading that adds two properties, 1) fine-grained expression of DNNs and 2) concurrency of runtime resources, to existing collaborative inference. In CoActo, system components go beyond simple model splitting of existing approaches and operate more proactively to achieve the coactive execution of inference workloads. CoActo dynamically schedules concurrent interleaving of the mobile, server, and network operations to actively increase resource utilization, enabling lower end-to-end latency. We implement CoActo for various mobile devices and server environments and evaluate our system with distinct environment settings and DNN models. The experimental results show that our system achieves up to 2.1 times speed-up compared to the state-of-the-art collaborative inference solutions.
+|abstract = To effectively utilize heterogeneous specialized hardware units in modern GPUs, such as TensorCores and Tensor Memory Accelerators, this paper introduces PipeThreader, a new DNN compiler. PipeThreader proposes shifting scheduling functionality from hardware to software so as to enable more efficient and sophisticated computation pipelining with minimal manual effort. This is achieved through sTask-graph, a new DNN computation abstraction, a hierarchical hardware abstraction that captures the capabilities of specialized units, and new scheduling primitives. As a result, PipeThreader can discover efficient pipeline scheduling for well-studied DNN architectures like FlashAttention, achieving comparable or even superior performance. Additionally, it can uncover novel pipeline schemes for emerging models like Mamba2, delivering significantly better performance compared to state-of-the-art hand-crafted implementations. The code is open-sourced at https://github.com/tile-ai/tilelang.
-|confname = Mobisys'24
+|confname =OSDI'25
-|link = https://dl.acm.org/doi/10.1145/3643832.3661885
+|link = https://www.usenix.org/conference/osdi25/presentation/cheng
-|title= CoActo: CoActive Neural Network Inference Offloading with Fine-grained and Concurrent Execution
+|title= PipeThreader: Software-defined pipelining for efficient DNN execution
-|speaker=Zhenhua
+|speaker=Junzhe
-|date=2024-11-22
+|date=2026-4-9
-}}
-{{Latest_seminar
-|abstract = Caching is an indispensable technique for low-cost and fast data serving. The eviction algorithm, at the heart of a cache, has been primarily designed to maximize efficiency—reducing the cache miss ratio. Many eviction algorithms have been designed in the past decades. However, they all trade off throughput, simplicity, or both for higher efficiency. Such a compromise often hinders adoption in production systems.
-This work presents SIEVE, an algorithm that is simpler than LRU and provides better than state-of-the-art efficiency and scalability for web cache workloads. We implemented SIEVE in five production cache libraries, requiring fewer than 20 lines of code changes on average. Our evaluation on 1559 cache traces from 7 sources shows that SIEVE achieves up to 63.2% lower miss ratio than ARC. Moreover, SIEVE has a lower miss ratio than 9 state-of-the-art algorithms on more than 45% of the 1559 traces, while the next best algorithm only has a lower miss ratio on 15%. SIEVE's simplicity comes with superior scalability as cache hits require no locking. Our prototype achieves twice the throughput of an optimized 16-thread LRU implementation. SIEVE is more than an eviction algorithm; it can be used as a cache primitive to build advanced eviction algorithms just like FIFO and LRU.
-|confname =NSDI'24
-|link = https://www.usenix.org/conference/nsdi24/presentation/zhang-yazhuo
-|title= SIEVE is Simpler than LRU: an Efficient Turn-Key Eviction Algorithm for Web Caches
-|date=2024-11-22
 }}
 {{Resource:Previous_Seminars}}

Navigation menu

Difference between revisions of "Resource:Seminar"

Latest revision as of 10:37, 10 April 2026

Contents

Difference between revisions of "Resource:Seminar"

Latest revision as of 10:37, 10 April 2026

Latest

History

2024

2023

2022

2021

2020

2019

2018

2017

Instructions