-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 125 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 52 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 84
Collections
Discover the best community collections!
Collections including paper arxiv:2407.14358
-
Taming Data and Transformers for Audio Generation
Paper • 2406.19388 • Published -
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Paper • 2406.11768 • Published • 20 -
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper • 2407.02869 • Published • 18 -
Stable Audio Open
Paper • 2407.14358 • Published • 22
-
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Paper • 2407.15841 • Published • 38 -
Stable Audio Open
Paper • 2407.14358 • Published • 22 -
PlacidDreamer: Advancing Harmony in Text-to-3D Generation
Paper • 2407.13976 • Published • 5 -
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Paper • 2407.14329 • Published • 4
-
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
mozilla-foundation/common_voice_17_0
Viewer • Updated • 13M • 53.3k • 142 -
Stable Audio Open
Paper • 2407.14358 • Published • 22 -
fnlp/AnyGPT-chat
Text Generation • Updated • 5.21k • 15