In recent years, with the advancement of multimodal foundation models (MMFMs), there has been a growing interest in enhancing their generalization abilities through continual learning (CL) to process diverse data types, from text to visuals, and continuously update their capabilities based on real-time inputs. Despite significant advancements in theoretical research and applications of continual learning, the community remains confronted with serious challenges. Our workshop aims to provide a venue where academic researchers and industry practitioners can come together to discuss the principles, limitations, and applications of multimodal foundation models in continual learning for multimedia applications and promote the understanding of multimodal foundation models in continual learning, innovative algorithms, and research on new multimodal technologies and applications.
Scope and Topics
Interested topics will include, but not be limited to:
The proposed workshop will include 3 invited talks and 4 or more oral paper presentations. The workshop will be considered for a whole-day meeting.
Time |
Programme |
09:00-09:05 |
Opening Remarks |
09:05-09:50Keynote Speaker |
Keynote: Adaptation Without Forgetting: Repurposing Foundation Models for Zero-Shot and Few-Shot Semantic Segmentation
Foundation vision models, trained either in a supervised or unsupervised manner, possess extensive knowledge about diverse object appearances.
These models are often adapted for new computer vision tasks, such as transitioning from classification to segmentation, by adding additional parameters.
However, in practice this adaptation often relies on a limited set of object categories, causing the system to overfit to the "seen category" and leading to the forgetting of the foundation model's knowledge about other categories.
In this talk, we present our recent efforts to address this challenge, with a focus on zero-shot and few-shot semantic segmentation applications.
Our findings demonstrate that parameter-efficient tuning, carefully designed loss functions, and specific input for the newly added module can significantly enhance performance compared to the straightforward extension of foundation models.
|
09:50-10:35Keynote Speaker |
Keynote: Adapting Foundation Models: A Case Study on Remote Sensing Imagery
Large visual models, such as CLIP and Stable Diffusion (SD), demonstrate remarkable performance in general image recognition and generation tasks.
Their continual learning involves two folds: enhancing their performance with more natural images as input, and adapting them to specialized image domains.
Our research targets the latter, using remote sensing (RS) imagery as a use case. RS, which relies on specialized satellites,
presents challenges in image annotation and suffers from data scarcity and class imbalance, especially in special spectrums.
Adapting models in this domain often leads to strong biases, where features of major classes overshadow those of minor classes.
To address this, we recently introduced debLoRA---a generic training approach compatible with various low-rank model adaptation methods (like LoRA) to produce debiased features.
In this talk, we will delve into this method and present the results achieved.
|
10:35-11:00 |
Morning Tea |
11:00-11:45Keynote Speaker |
Keynote: Evolving AI: Advancing Continual Learning in Large Language Models
Continual learning with large language models (LLMs) is crucial for enabling AI systems to adapt and evolve in real-time, maintaining and enhancing knowledge without succumbing to catastrophic forgetting, thereby ensuring sustained operational efficiency and relevance.
This report explores the integration of continual learning with large language models across multi-modal information sources. We begin by reviewing traditional continual learning, illustrating its application in text, image, and speech extraction, and multi-modal knowledge graph construction.
We then redefine continual learning for LLMs, focusing on overcoming catastrophic forgetting and enhancing knowledge retention through continual pre-training, instruction tuning, and alignment. Looking ahead, we discuss challenges such as data evolution and contamination and propose innovations in architectures and learning paradigms, including language agents evolution and proactive continual learning.
|
11:45-12:30Keynote Speaker |
Keynote: Zero- and Few-shot Keypoint Detection: from modulation to multimodal prompting
Keypoint detection has been an important topic in computer vision over 20 years.
Early methods relied on unsupervised techniques such as Hessian or Harris corner detectors, or SIFT interest points.
Modern keypoint detection can now be performed within a few-shot learning paradigm, where annotated support keypoints (e.g., paw, nose, ears, eyes) are detected in an unannotated query.
Applications of such keypoints include pose estimation, fine-grained recognition, and pose warping. In this talk, I will discuss our earlier work on few-shot keypoint detection (FSKD) that can generalize to unseen animal species (e.g., training on dogs, testing on cats) and keypoint types (e.g., training on paws, testing on ears).
I will also cover how saliency maps and DINO can enhance attention in keypoint detection, how one may streamline the traditional modulation and detection into a single step, and use contrastive learning to improve performance.
Finally, I will explain our recent work on multimodal (image, text) keypoint prompting using CLIP for generalized zero- and few-shot keypoint detection.
|
12:30-14:00 |
Lunch |
14:00-14:30 |
Fast and Accurate Continual Test Time Domain Adaptation |
14:30-15:00 |
Incremental Image Generation with Diffusion Models by Label Embedding Initialization and Fusion |
15:00-15:30 |
EAGLE Network: A Novel Incremental Learning Framework for Detecting Unknown Logos in Open-World Environments |
15:30-16:00 |
Afternoon Tea |
16:00-16:30 |
FAM-Logo: Forward Compatible Multimodal Framework for Few-Shot Logo Incremental Classification |
16:30-17:15 |
Panel & Closing Remarks |
Contact the Organizing Committee: woods.cl.acm.mm@gmail.com