Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.14358

Papers I want to read

Papers in my to-read list

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 67
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 52
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 84

generative audio

Taming Data and Transformers for Audio Generation

Paper • 2406.19388 • Published Jun 27
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Paper • 2406.11768 • Published Jun 17 • 20
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3 • 18
Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22

Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22

stabilityai/stable-audio-open-1.0

Text-to-Audio • Updated Jul 31 • 23.4k • 859
Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22

Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22
Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15 • 52

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22 • 38
Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22
PlacidDreamer: Advancing Harmony in Text-to-3D Generation

Paper • 2407.13976 • Published Jul 19 • 5
Efficient Audio Captioning with Encoder-Level Knowledge Distillation

Paper • 2407.14329 • Published Jul 19 • 4

Audio generation

Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22

Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22

Autoregressive Speech Synthesis without Vector Quantization

Paper • 2407.08551 • Published Jul 11 • 13
Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 9
mozilla-foundation/common_voice_17_0

Viewer • Updated Jun 16 • 13M • 53.3k • 142
Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22
fnlp/AnyGPT-chat

Text Generation • Updated Jun 5 • 5.21k • 15

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs