CL-25: Continual Learning meets Multimodal Foundation Models: Fundamentals and Advances

Call For Papers

In recent years, with the advancement of multimodal foundation models (MMFMs), there has been a growing interest in enhancing their generalization abilities through continual learning (CL) to process diverse data types, from text to visuals, and continuously update their capabilities based on real-time inputs. Despite significant advancements in theoretical research and applications of continual learning, the community remains confronted with serious challenges. Our workshop aims to provide a venue where academic researchers and industry practitioners can come together to discuss the principles, limitations, and applications of multimodal foundation models in continual learning for multimedia applications and promote the understanding of multimodal foundation models in continual learning, innovative algorithms, and research on new multimodal technologies and applications.

Scope and Topics
Interested topics will include, but not be limited to:

Lifelong / Continual / Incremental / Online Learning
Few-shot & Transfer Learning related to Continual Learning
Applications and use-cases of Continual Learning
Meta-learning & Curriculum Learning & Active Learning
Reinforcement Learning and Robotics in Continual Learning
Ethical and Safety considerations for machines that can learn continuously
Continuous domain adaptation / Test-time adaptation
Vision / Sound / Speech / Language Foundation Models in any possible combination
Self / Semi / Weakly supervised training of MMFMs
Multi-task and Continual Learning for MMFMs
Efficient training and inference of MMFMs
Parameter-efficient fine-tuning, prompting, and adapters for MMFMs
Generative MMFMs (e.g. text-to-image / video /3D generation)
Ethics, risks, and fairness of MMFMs
Benchmarks, scenarios, evaluation protocols, and metrics for the above topics

Keynote Speakers

Liyuan Wang

Liyuan Wang is an Assistant Professor in the Department of Psychological and Cognitive Science at Tsinghua University. He received his B.S. and Ph.D. degrees from Tsinghua University and subsequently conducted postdoctoral research there. His work lies at the intersection of machine learning and neuroscience, focusing on continual and lifelong learning in intelligent systems. He aims to develop general computational models that bridge artificial and biological intelligence and to translate them into key applications in AI4Science, AI4Health, AIGC, and embodied intelligence. His research has appeared in leading venues including Nature Machine Intelligence, IEEE TPAMI, NeurIPS, ICML, ICLR, CVPR, ICCV, etc.

Adrian G. Bors

Adrian G. Bors (Senior Member, IEEE) received the M.Sc. degree in electronics engineering from the Polytechnic University of Bucharest, Bucharest, Romania, in 1992, and the Ph.D. degree in informatics from the University of Thessaloniki, Thessaloniki, Greece, in 1999.,In 1999, he joined the Department of Computer Science, University of York, York, U.K., where he is currently an Associate Professor. He was a Research Scientist at the University of Tampere, Tampere, Finland, and held visiting positions at the University of California at San Diego (UCSD), San Diego, CA, USA; the University of Montpellier, Montpellier, France; and MBZ University of Artificial Intelligence, Abu Dhabi, United Arab Emirates. He has authored and co-authored more than 180 research articles, including 50 in journals. His research interests include machine learning, computer vision, pattern recognition, and image processing.,Dr. Bors was an Associate Editor of IEEE Transactions on Image Processing from 2010 to 2014 and IEEE Transactions on Neural Networks from 2001 to 2009. He was also a Co-Guest Editor for special issues for the International Journal of Computer Vision in 2018 and the Journal of Pattern Recognition in 2015.

Wenya Wang

Wenya Wang is an Assistant Professor with the College of Computing and Data Science, Nanyang Technological University, Singapore. Prior to joining NTU, she worked as a Postdoc Researcher at Paul G. Allen School of Computer Science & Engineering at the University of Washington. She received the “International Postdoctoral Fellowship” awarded by the College of Engineering, and the “Lee Kuan Yew Postdoctoral Fellowship” after obtaining her PhD degree at NTU. She has also been awarded the Asian Young Scientist Fellowship in 2025. Her research interests lie in Natural Language Processing, Large Language Models, and Multimodal Reasoning, particularly investigating and utilizing the power of generative models to enhance reasoning and explainability. She has continuously served as Area Chairs for ACL, EMNLP, NAACL and ICLR.

Jingjing Li

Li Jingjing is a Professor at the University of Electronic Science and Technology of China. His research focuses on multimodal learning, with over 80 papers published in TPAMI and other CCF A-level venues. He has won multiple national awards, including the Wu Wenjun AI Outstanding Youth Award and the ACM SIGAI China Rising Star Award.

Program

The workshop will include four invited talks and four paper presentations from 8:45 a.m. to 12:30 a.m. .


Time	Programme
08:45-08:50	Opening Remarks
08:50-09:30 Keynote Speaker	Keynote: Continual Learning for Multi-modal Human-centric Applications [Abstract] [Slides] Continual learning is a fundamental mechanism for enabling long-term adaptation and knowledge accumulation in intelligent systems. However, in the era of large-scale pretraining, it remains a critical challenge to extend continual learning to dynamic, heterogeneous real-world environments. This talk presents our recent efforts on continual learning in pretrained models, with a focus on two key directions. First, we propose a modality-heterogeneous continual pretraining framework for multi-modal physiological signal generation, enabling real-time and robust monitoring of human health conditions. Second, inspired by the spatial cognition mechanisms of the biological brain, we develop embodied agents that construct and refine cognitive maps through continuous collection of spatial knowledge, thus equipping multi-modal language models with strong long-horizon generalization in complex environments. Together, these advances point toward the development of brain-inspired embodied intelligence with lifelong adaptability. Liyuan Wang Tsinghua University
09:30-10:10 Keynote Speaker	Keynote(online): Continual Learning of Visual Representations [Abstract] [Slides] Continually learning and acquiring new concepts from a dynamically changing environment is an important requirement for an artificial intelligence system. Existing deep learning methods fail to achieve this goal and suffer from significant performance degeneration when being trained again on a new dataset. We discuss the main approaches in continual learning, trough regularization, expansion architectures and using replay mechanisms. A series of recent approaches to the continual learning of image tasks will be introduced during the plenary lecture and experimental results will be provided. Limitations of the existing continual learning systems will also be discussed together with directions of future research. Adrian G. Bors University of York
10:10-10:50 Keynote Speaker	Keynote: From Context to Parameters: Generalization and Transfer in Modalities and Domains [Abstract] [Slides] Multimodal foundation models (MFMs) are increasingly deployed in dynamic, open-world environments where they must generalize to new tasks and modalities and transfer knowledge across diverse domains. Achieving this requires tackling two complementary challenges: adapting in context for immediate, task-specific generalization, and evolving parameters for long-term retention and scalable transfer. In this keynote, I will present effective strategies through the lens of context and parameter adaptations. On the context side, I will discuss how multimodal models can leverage demonstrations and structured reasoning to generalize on the fly, adapting to new tasks without additional training. On the parameter side, I will examine how models can evolve to retain prior knowledge and expand to new modalities and domains, enabling continual learning over time. Wenya Wang Nanyang Technological University
10:50-11:30 Keynote Speaker	Keynote: Generalizing vision-language models to novel domains [Abstract] [Slides] Vision-language pretraining has enabled powerful vision-language models (VLMs) with strong zero-shot capabilities. Yet, their performance drops in domain-specific tasks, motivating research on transferring and generalizing VLM knowledge to downstream applications. This talk briefly reviews generalization settings, methodologies, and benchmarks, categorizing approaches into prompt-based, parameter-based, and feature-based methods. We also discuss our recent research on generalizing VLMs to novel domains. Jingjing Li University of Electronic Science and Technology of China
11:30-11:43	Morning Tea
11:43-11:51	SR-ML: A Sequence-level Routing with Mixed Low-rank Experts Framework for Continual Learning
11:51-11:59	Low Altitude-R1: Exploring the Upper Limits of Target Detection in Low-altitude Scenarios with Reinforcement Learning
11:59-12:07	LaST-LoRA: Adaptive Knowledge Reuse and Latent Subspace Tracking for Continual Learning
12:07-12:15	NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation
12:15-12:30	Panel & Closing Remarks

CL-25: The 2nd CL Workshop Continual Learning meets Multimodal Foundation Models: Fundamentals and Advances

In conjunction with ACM MM 2025 28 October, 2025 (8:45 AM - 12:30 AM) Location: Dublin, Ireland

Call For Papers

Keynote Speakers

Program

Time

Programme

08:45-08:50

Opening Remarks

08:50-09:30

Keynote: Continual Learning for Multi-modal Human-centric Applications

[Abstract]

[Slides]

09:30-10:10

Keynote(online): Continual Learning of Visual Representations

[Abstract]

[Slides]

10:10-10:50

Keynote: From Context to Parameters: Generalization and Transfer in Modalities and Domains

[Abstract]

[Slides]

10:50-11:30

Keynote: Generalizing vision-language models to novel domains

[Abstract]

[Slides]

11:30-11:43

Morning Tea

11:43-11:51

SR-ML: A Sequence-level Routing with Mixed Low-rank Experts Framework for Continual Learning

11:51-11:59

Low Altitude-R1: Exploring the Upper Limits of Target Detection in Low-altitude Scenarios with Reinforcement Learning

11:59-12:07

LaST-LoRA: Adaptive Knowledge Reuse and Latent Subspace Tracking for Continual Learning

12:07-12:15

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

12:15-12:30

Panel & Closing Remarks

Submission

Organizers

Program Committee

Nanjing University

Nanjing University

Nanjing University of Science and Technology

Nanjing University of Science and Technology

Systems Engineering Institute, AMS

Tsinghua University

University of Wollongong

Student Organizer

Nanjing University

Nanjing University

Important Dates

Contacts

CL-25: The 2nd CL Workshop
Continual Learning meets Multimodal Foundation Models:
Fundamentals and Advances

In conjunction with ACM MM 2025
28 October, 2025 (8:45 AM - 12:30 AM)
Location: Dublin, Ireland