Difference between revisions of "Resource:Seminar"

From MobiNetS
Jump to: navigation, search
 
(84 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{SemNote
{{SemNote
|time='''2024-09-20 10:30-12:00'''
|time='''2025-12-12 10:30'''
|addr=4th Research Building A518
|addr=4th Research Building A518
|note=Useful links: [[Resource:Reading_List|📚 Readling list]]; [[Resource:Seminar_schedules|📆 Schedules]]; [[Resource:Previous_Seminars|🧐 Previous seminars]].
|note=Useful links: [[Resource:Reading_List|📚 Readling list]]; [[Resource:Seminar_schedules|📆 Schedules]]; [[Resource:Previous_Seminars|🧐 Previous seminars]].
Line 8: Line 8:


{{Latest_seminar
{{Latest_seminar
|abstract = Overlapping cameras offer exciting opportunities to view a scene from different angles, allowing for more advanced, comprehensive and robust analysis. However, existing video analytics systems for multi-camera streams are mostly limited to (i) per-camera processing and aggregation and (ii) workload-agnostic centralized processing architectures. In this paper, we present Argus, a distributed video analytics system with cross-camera collaboration on smart cameras. We identify multi-camera, multi-target tracking as the primary task of multi-camera video analytics and develop a novel technique that avoids redundant, processing-heavy identification tasks by leveraging object-wise spatio-temporal association in the overlapping fields of view across multiple cameras. We further develop a set of techniques to perform these operations across distributed cameras without cloud support at low latency by (i) dynamically ordering the camera and object inspection sequence and (ii) flexibly distributing the workload across smart cameras, taking into account network transmission and heterogeneous computational capacities. Evaluation of three real-world overlapping camera datasets with two Nvidia Jetson devices shows that Argus reduces the number of object identifications and end-to-end latency by up to 7.13× and 2.19× (4.86× and 1.60× compared to the state-of-the-art), while achieving comparable tracking quality.
|abstract = Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only learn the contextual semantics of code during pre-training, neglecting executability information closely related to the execution state of the code, which results in unguaranteed code executability and unreliable automated code translation. To address this issue, we propose ExeCoder, an LLM specifically designed for code translation, aimed at utilizing executability representations such as functional semantics, syntax structures, and variable dependencies to enhance the capabilities of LLMs in code translation. To evaluate the effectiveness of ExeCoder, we manually enhanced the widely used benchmark TransCoder-test, resulting in a benchmark called TransCoder-test-X that serves LLMs. Evaluation of TransCoder-test-X indicates that ExeCoder achieves state-of-the-art performance in code translation, surpassing existing open-source code LLMs by over 10.88% to 38.78% and over 27.44% to 42.97% on two metrics, and even outperforms the renowned closed-source LLM GPT-4o.  
|confname=TMC' 24
|confname =EMNLP'25
|link = https://ieeexplore.ieee.org/abstract/document/10682605
|link = https://arxiv.org/abs/2501.18460
|title= Argus: Enabling Cross-Camera Collaboration for Video Analytics on Distributed Smart Cameras
|title= ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation
|speaker=Bairong
|speaker=Youwei Ran
|date=2024-9-29
|date=2025-12-12
}}
}}
{{Latest_seminar
{{Latest_seminar
|abstract = We present FarfetchFusion, a fully mobile live 3D telepresence system. Enabling mobile live telepresence is a challenging problem as it requires i) realistic reconstruction of the user and ii) high responsiveness for immersive experience. We first thoroughly analyze the live 3D telepresence pipeline and identify three critical challenges: i) 3D data streaming latency and compression complexity, ii) computational complexity of volumetric fusion-based 3D reconstruction, and iii) inconsistent reconstruction quality due to sparsity of mobile 3D sensors. To tackle the challenges, we propose a disentangled fusion approach, which separates invariant regions and dynamically changing regions with our low-complexity spatio-temporal alignment technique, topology anchoring. We then design and implement an end-to-end system, which achieves realistic reconstruction quality comparable to existing server-based solutions while meeting the real-time performance requirements (<100 ms end-to-end latency, 30 fps throughput, <16 ms motion-to-photon latency) solely relying on mobile computation capability.
|abstract =Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. We will open-source all the hardware and software implementations upon publication.
|confname=MobiCom' 23
|confname =CoRL'24
|link = https://dl.acm.org/doi/abs/10.1145/3570361.3592525
|link = https://openreview.net/forum?id=FO6tePGRZj
|title= FarfetchFusion: Towards Fully Mobile Live 3D Telepresence Platform
|title= Mobile ALOHA: Learning Bimanual Mobile Manipulation using Low-Cost Whole-Body Teleoperation
|speaker=Mengfan
|speaker=Yi Zhou
|date=2024-9-29
|date=2025-12-12
}}
}}
{{Resource:Previous_Seminars}}
{{Resource:Previous_Seminars}}

Latest revision as of 23:32, 11 December 2025

Time: 2025-12-12 10:30
Address: 4th Research Building A518
Useful links: 📚 Readling list; 📆 Schedules; 🧐 Previous seminars.

Latest

  1. [EMNLP'25] ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation, Youwei Ran
    Abstract: Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only learn the contextual semantics of code during pre-training, neglecting executability information closely related to the execution state of the code, which results in unguaranteed code executability and unreliable automated code translation. To address this issue, we propose ExeCoder, an LLM specifically designed for code translation, aimed at utilizing executability representations such as functional semantics, syntax structures, and variable dependencies to enhance the capabilities of LLMs in code translation. To evaluate the effectiveness of ExeCoder, we manually enhanced the widely used benchmark TransCoder-test, resulting in a benchmark called TransCoder-test-X that serves LLMs. Evaluation of TransCoder-test-X indicates that ExeCoder achieves state-of-the-art performance in code translation, surpassing existing open-source code LLMs by over 10.88% to 38.78% and over 27.44% to 42.97% on two metrics, and even outperforms the renowned closed-source LLM GPT-4o.
  2. [CoRL'24] Mobile ALOHA: Learning Bimanual Mobile Manipulation using Low-Cost Whole-Body Teleoperation, Yi Zhou
    Abstract: Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. We will open-source all the hardware and software implementations upon publication.

History

2024

2023

2022

2021

2020

  • [Topic] [ The path planning algorithm for multiple mobile edge servers in EdgeGO], Rong Cong, 2020-11-18

2019

2018

2017

Instructions

请使用Latest_seminar和Hist_seminar模板更新本页信息.

    • 修改时间和地点信息
    • 将当前latest seminar部分的code复制到这个页面
    • 将{{Latest_seminar... 修改为 {{Hist_seminar...,并增加对应的日期信息|date=
    • 填入latest seminar各字段信息
    • link请务必不要留空,如果没有link则填本页地址 https://mobinets.org/index.php?title=Resource:Seminar
  • 格式说明
    • Latest_seminar:

{{Latest_seminar
|confname=
|link=
|title=
|speaker=
}}

    • Hist_seminar

{{Hist_seminar
|confname=
|link=
|title=
|speaker=
|date=
}}