multimodal llm GPT-4V(ision) is a Generalist Web Agent, if Grounded Paper • 2401.01614 • Published Jan 3 • 21 BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models Paper • 2402.13577 • Published Feb 21 • 7
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models Paper • 2402.13577 • Published Feb 21 • 7
Text2Image Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Paper • 2401.11708 • Published Jan 22 • 28 Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Paper • 2403.13044 • Published Mar 19 • 14
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Paper • 2401.11708 • Published Jan 22 • 28
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Paper • 2403.13044 • Published Mar 19 • 14