About

Hi there! πŸ˜ŠπŸ‘‹ This is Chuyue, a senior CS student at ShanghaiTech University, conducting research on LLM/VLM agents under the supervision of Professor Kan Ren. I also previously worked with him on multimodal time-series foundation models. Before that, I worked as an Applied Research Intern at Tencent's Multimodal Model Department, Visual Agent Group, focusing on multi-level autonomous planning, tool invocation, reward allocation, and confidence evaluation for long-trajectory visual generation. During my junior year, I spent a year as an exchange student at UC Berkeley, where I served as a Research Assistant and SWE volunteer for the Circle Cat team, working on LLM for code research. My research interests span LLM/VLM agents, LLM/VLM/MLLM, and multimodal generative foundation models, with a current focus on self-learning mechanisms for LLM/VLM agents to address challenges such as data scarcity and reward sparsity.

Publications

Shuqi Gu, Chuyue Li , Baoyu Jing, Kan Ren.

Innovatively designed a framework to enable finer-grained cross-modal semantic alignment and control in generating time series data from unstructured text.

Research Internship/Experience

Applied Research Intern

Tencent Hunyuan β€” Large Multimodal Model Dept., Visual Agent Team

June 2025 - October 2025

Multi-level Auto-Planning Interactive Visual Generative Agent
Co-Author | Paper Under Review

  • Designed multi-turn interactive auto-planning system with multi-agent collaboration for generating professional image and video deliverables
  • Implemented intent recognition, hierarchical planning, multimodal understanding, self-reflection, deep search, RAG, and fine-grained tool selection
  • Led automatic evaluation system and reward function design for long-trajectory planning, tool invocation, and visual generation; developed multimodal deep search and domain-specific RAG modules

Research Assistant

ShanghaiTech VDI Center

June 2024 - Present | Advisor: Prof. Kan Ren

Multimodal Time Series Conditional Generation (June 2024 - March 2025)
Second Author | Accepted by ICML 2025

  • Designed framework for fine-grained cross-modal semantic alignment in generating time series from unstructured text
  • Implemented Multi-view Noise Estimator for multi-resolution modeling across temporal, spatial, and diffusion perspectives
  • Proposed Multi-focal Text Processor with learnable anchor vectors for hierarchical text-to-time-series alignment
  • Achieved 20%+ improvement in FID, J-FTSD, and CTTP scores over baselines
Unified Generative Modeling for Multimodal Time Series (January 2025 - Present)
Second Author | Paper Under Review
  • Multi-Tasks Adaptive Unified Model for Multimodal Time Series Tasks

Research Assistant

CircleCat

Feb 2025 - Dec 2025 | Advisors: Dr. Kazuma Hashimoto (Google DeepMind), Dr. Lingyu Gao (Duolingo)

First Author | To be submitted January 2026

  • LLM for code project.

Software Engineer Intern

CircleCat

Nov 2024 - Mar 2025

Developed software systems with focus on LLM applications and code analysis tools.

Education

University of California, Berkeley

CS Exchange Student

Aug 2024 - May 2025 β€’ GPA: 3.9/4.0

ShanghaiTech University

B.S. in Computer Science

Sep 2022 - Present

Selected Projects

NYUSHDIC

  • Lead design and implementation of multi-turn interactive agent system for generating 3D visual deliverables from text
  • Construct 2D bridge based on text-to-image transformers to mitigate modality gaps in diffusion-based text-to-3D generation
  • Utilize CoT and multi-agent collaboration framework supporting multi-turn dialogue and iterative modification