About
Hi there! ππ This is Chuyue, a senior CS student at ShanghaiTech University, conducting research on LLM/VLM agents under the supervision of Professor Kan Ren. I also previously worked with him on multimodal time-series foundation models. Before that, I worked as an Applied Research Intern at Tencent's Multimodal Model Department, Visual Agent Group, focusing on multi-level autonomous planning, tool invocation, reward allocation, and confidence evaluation for long-trajectory visual generation. During my junior year, I spent a year as an exchange student at UC Berkeley, where I served as a Research Assistant and SWE volunteer for the Circle Cat team, working on LLM for code research. My research interests span LLM/VLM agents, LLM/VLM/MLLM, and multimodal generative foundation models, with a current focus on self-learning mechanisms for LLM/VLM agents to address challenges such as data scarcity and reward sparsity.
Publications
Shuqi Gu, Chuyue Li , Baoyu Jing, Kan Ren.
Innovatively designed a framework to enable finer-grained cross-modal semantic alignment and control in generating time series data from unstructured text.
Research Internship/Experience
Applied Research Intern
Tencent Hunyuan β Large Multimodal Model Dept., Visual Agent Team
Multi-level Auto-Planning Interactive Visual Generative Agent
Co-Author | Paper Under Review
- Designed multi-turn interactive auto-planning system with multi-agent collaboration for generating professional image and video deliverables
- Implemented intent recognition, hierarchical planning, multimodal understanding, self-reflection, deep search, RAG, and fine-grained tool selection
- Led automatic evaluation system and reward function design for long-trajectory planning, tool invocation, and visual generation; developed multimodal deep search and domain-specific RAG modules
Research Assistant
ShanghaiTech VDI Center
Multimodal Time Series Conditional Generation (June 2024 - March 2025)
Second Author | Accepted by ICML 2025
- Designed framework for fine-grained cross-modal semantic alignment in generating time series from unstructured text
- Implemented Multi-view Noise Estimator for multi-resolution modeling across temporal, spatial, and diffusion perspectives
- Proposed Multi-focal Text Processor with learnable anchor vectors for hierarchical text-to-time-series alignment
- Achieved 20%+ improvement in FID, J-FTSD, and CTTP scores over baselines
Second Author | Paper Under Review
- Multi-Tasks Adaptive Unified Model for Multimodal Time Series Tasks
Research Assistant
CircleCat
First Author | To be submitted January 2026
- LLM for code project.
Software Engineer Intern
CircleCat
Developed software systems with focus on LLM applications and code analysis tools.
Education
University of California, Berkeley
CS Exchange Student
ShanghaiTech University
B.S. in Computer Science
Selected Projects
NYUSHDIC
- Lead design and implementation of multi-turn interactive agent system for generating 3D visual deliverables from text
- Construct 2D bridge based on text-to-image transformers to mitigate modality gaps in diffusion-based text-to-3D generation
- Utilize CoT and multi-agent collaboration framework supporting multi-turn dialogue and iterative modification