CL-25: The 2nd CL Workshop
Continual Learning meets Multimodal Foundation Models:
Fundamentals and Advances

In conjunction with ACM MM 2025

28 October, 2025 (8:45 AM - 12:30 AM)

Location: Dublin, Ireland

Call For Papers

In recent years, with the advancement of multimodal foundation models (MMFMs), there has been a growing interest in enhancing their generalization abilities through continual learning (CL) to process diverse data types, from text to visuals, and continuously update their capabilities based on real-time inputs. Despite significant advancements in theoretical research and applications of continual learning, the community remains confronted with serious challenges. Our workshop aims to provide a venue where academic researchers and industry practitioners can come together to discuss the principles, limitations, and applications of multimodal foundation models in continual learning for multimedia applications and promote the understanding of multimodal foundation models in continual learning, innovative algorithms, and research on new multimodal technologies and applications.

Scope and Topics
Interested topics will include, but not be limited to:

  • Lifelong / Continual / Incremental / Online Learning
  • Few-shot & Transfer Learning related to Continual Learning
  • Applications and use-cases of Continual Learning
  • Meta-learning & Curriculum Learning & Active Learning
  • Reinforcement Learning and Robotics in Continual Learning
  • Ethical and Safety considerations for machines that can learn continuously
  • Continuous domain adaptation / Test-time adaptation
  • Vision / Sound / Speech / Language Foundation Models in any possible combination
  • Self / Semi / Weakly supervised training of MMFMs
  • Multi-task and Continual Learning for MMFMs
  • Efficient training and inference of MMFMs
  • Parameter-efficient fine-tuning, prompting, and adapters for MMFMs
  • Generative MMFMs (e.g. text-to-image / video /3D generation)
  • Ethics, risks, and fairness of MMFMs
  • Benchmarks, scenarios, evaluation protocols, and metrics for the above topics

Keynote Speakers


Liyuan Wang is an Assistant Professor in the Department of Psychological and Cognitive Science at Tsinghua University. He received his B.S. and Ph.D. degrees from Tsinghua University and subsequently conducted postdoctoral research there. His work lies at the intersection of machine learning and neuroscience, focusing on continual and lifelong learning in intelligent systems. He aims to develop general computational models that bridge artificial and biological intelligence and to translate them into key applications in AI4Science, AI4Health, AIGC, and embodied intelligence. His research has appeared in leading venues including Nature Machine Intelligence, IEEE TPAMI, NeurIPS, ICML, ICLR, CVPR, ICCV, etc.

Adrian G. Bors (Senior Member, IEEE) received the M.Sc. degree in electronics engineering from the Polytechnic University of Bucharest, Bucharest, Romania, in 1992, and the Ph.D. degree in informatics from the University of Thessaloniki, Thessaloniki, Greece, in 1999.,In 1999, he joined the Department of Computer Science, University of York, York, U.K., where he is currently an Associate Professor. He was a Research Scientist at the University of Tampere, Tampere, Finland, and held visiting positions at the University of California at San Diego (UCSD), San Diego, CA, USA; the University of Montpellier, Montpellier, France; and MBZ University of Artificial Intelligence, Abu Dhabi, United Arab Emirates. He has authored and co-authored more than 180 research articles, including 50 in journals. His research interests include machine learning, computer vision, pattern recognition, and image processing.,Dr. Bors was an Associate Editor of IEEE Transactions on Image Processing from 2010 to 2014 and IEEE Transactions on Neural Networks from 2001 to 2009. He was also a Co-Guest Editor for special issues for the International Journal of Computer Vision in 2018 and the Journal of Pattern Recognition in 2015.

Wenya Wang is an Assistant Professor with the College of Computing and Data Science, Nanyang Technological University, Singapore. Prior to joining NTU, she worked as a Postdoc Researcher at Paul G. Allen School of Computer Science & Engineering at the University of Washington. She received the “International Postdoctoral Fellowship” awarded by the College of Engineering, and the “Lee Kuan Yew Postdoctoral Fellowship” after obtaining her PhD degree at NTU. She has also been awarded the Asian Young Scientist Fellowship in 2025. Her research interests lie in Natural Language Processing, Large Language Models, and Multimodal Reasoning, particularly investigating and utilizing the power of generative models to enhance reasoning and explainability. She has continuously served as Area Chairs for ACL, EMNLP, NAACL and ICLR.

Li Jingjing is a Professor at the University of Electronic Science and Technology of China. His research focuses on multimodal learning, with over 80 papers published in TPAMI and other CCF A-level venues. He has won multiple national awards, including the Wu Wenjun AI Outstanding Youth Award and the ACM SIGAI China Rising Star Award.

Program

The workshop will include four invited talks and four paper presentations from 8:45 a.m. to 12:30 a.m. .

Time

Programme

08:45-08:50

Opening Remarks

10:10-10:50

Keynote Speaker

Keynote: Continual Learning for Multi-modal Human-centric Applications

  • [Abstract]

  • [Slides]

  • Continual learning is a fundamental mechanism for enabling long-term adaptation and knowledge accumulation in intelligent systems. However, in the era of large-scale pretraining, it remains a critical challenge to extend continual learning to dynamic, heterogeneous real-world environments. This talk presents our recent efforts on continual learning in pretrained models, with a focus on two key directions. First, we propose a modality-heterogeneous continual pretraining framework for multi-modal physiological signal generation, enabling real-time and robust monitoring of human health conditions. Second, inspired by the spatial cognition mechanisms of the biological brain, we develop embodied agents that construct and refine cognitive maps through continuous collection of spatial knowledge, thus equipping multi-modal language models with strong long-horizon generalization in complex environments. Together, these advances point toward the development of brain-inspired embodied intelligence with lifelong adaptability.
Liyuan Wang
Tsinghua University

09:30-10:10

Keynote Speaker

Keynote(online): Continual Learning of Visual Representations

  • [Abstract]

  • [Slides]

  • Continually learning and acquiring new concepts from a dynamically changing environment is an important requirement for an artificial intelligence system. Existing deep learning methods fail to achieve this goal and suffer from significant performance degeneration when being trained again on a new dataset. We discuss the main approaches in continual learning, trough regularization, expansion architectures and using replay mechanisms. A series of recent approaches to the continual learning of image tasks will be introduced during the plenary lecture and experimental results will be provided. Limitations of the existing continual learning systems will also be discussed together with directions of future research.
Adrian G. Bors
University of York

08:50-09:30

Keynote Speaker

Keynote: From Context to Parameters: Generalization and Transfer in Modalities and Domains

  • [Abstract]

  • [Slides]

  • Multimodal foundation models (MFMs) are increasingly deployed in dynamic, open-world environments where they must generalize to new tasks and modalities and transfer knowledge across diverse domains. Achieving this requires tackling two complementary challenges: adapting in context for immediate, task-specific generalization, and evolving parameters for long-term retention and scalable transfer. In this keynote, I will present effective strategies through the lens of context and parameter adaptations. On the context side, I will discuss how multimodal models can leverage demonstrations and structured reasoning to generalize on the fly, adapting to new tasks without additional training. On the parameter side, I will examine how models can evolve to retain prior knowledge and expand to new modalities and domains, enabling continual learning over time.
Wenya Wang
Nanyang Technological University

10:50-11:30

Keynote Speaker

Keynote: Generalizing vision-language models to novel domains

  • [Abstract]

  • [Slides]

  • Vision-language pretraining has enabled powerful vision-language models (VLMs) with strong zero-shot capabilities. Yet, their performance drops in domain-specific tasks, motivating research on transferring and generalizing VLM knowledge to downstream applications. This talk briefly reviews generalization settings, methodologies, and benchmarks, categorizing approaches into prompt-based, parameter-based, and feature-based methods. We also discuss our recent research on generalizing VLMs to novel domains.
Jingjing Li
University of Electronic Science and Technology of China

11:30-11:43

Morning Tea

11:43-11:51

SR-ML: A Sequence-level Routing with Mixed Low-rank Experts Framework for Continual Learning

11:51-11:59

Low Altitude-R1: Exploring the Upper Limits of Target Detection in Low-altitude Scenarios with Reinforcement Learning

11:59-12:07

LaST-LoRA: Adaptive Knowledge Reuse and Latent Subspace Tracking for Continual Learning

12:07-12:15

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

12:15-12:30

Panel & Closing Remarks

Submission

  • The CL-25 will be held together with ACM MM 2025.
  • Accepted papers will be selected to be presented at the workshop, and authors retain the right to submit them to other journals.
  • We invite submissions of original research papers addressing but not limited to the topics as listed above. Submissions should adhere to the ACM Multimedia 2025 formatting guidelines and will undergo a rigorous peer-review process. The template can be found via:
  • Submissions may vary in length from 4 to 8 pages, with additional pages permitted for the reference section (up to 2 pages). There is no distinction between long and short papers, but authors are free to determine the appropriate length for their paper.
  • Papers have to be submitted via:

Organizers

Program Committee

Wenbin Li

Nanjing University

Qi Fan

Nanjing University

Rui Yan

Nanjing University of Science and Technology

Xiangbo Shu

Nanjing University of Science and Technology

Hongguang Zhang

Systems Engineering Institute, AMS

Qi Wang

Tsinghua University

Lei Wang

University of Wollongong

Student Organizer

Yunchen Wu

Nanjing University

Shangge Liu

Nanjing University

Important Dates

  • Paper Submission Deadline: 11st, Jul, 2025

  • Paper Acceptance Notification: 1st, Aug, 2025

  • Camera-Ready Deadline: 11st, Aug, 2025

Contacts

Contact the Organizing Committee: woods.cl.acm.mm@gmail.com