[2024-Mar-13] Guiding Instruction-based Image Editing via Multimodal Large Language Models

Institute of Information Systems and Applications
Speaker:	Dr. Tsu-Jui Fu(Ph.D. candidate at UCSB)
Topic:	Guiding Instruction-based Image Editing via Multimodal Large Language Models
Date:	13:20-15:00 Wednesday 13-Mar-2024
QR Code:
Link:	https://meet.google.com/iid-yado-ftt
Location:	Delta 103
Hosted by:	Prof. Chun-Yi Lee

Abstract

Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. In this talk, we investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE). We will involve a background review of MLLMs and diffusion models for visual generation, so everyone is welcome to join!

Bio.

Tsu-Jui (https://tsujuifu.github.io) is a Ph.D. candidate at UCSB and an incoming research scientist at Apple. His research lies in vision+language and text-guided visual editing. He is also interested in language grounding and information extraction. He has done research internships at Apple AI/ML, Meta AI, Microsft Azure AI, and Microsoft Research.

All faculty and students are welcome to join.

瀏覽數:

友善列印

[2024-Mar-13] Guiding Instruction-based Image Editing via Multimodal Large Language Models

Dr. Tsu-Jui Fu(Ph.D. candidate at UCSB)