K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper • 2408.14468 • Published 24 days ago • 33
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting Paper • 2408.11706 • Published 29 days ago • 5
Q-Ground: Image Quality Grounding with Large Multi-modality Models Paper • 2407.17035 • Published Jul 24 • 1
Magpie-Qwen2 Datasets Collection Dataset built with Qwen2 72B and Qwen2 7B. • 6 items • Updated 5 days ago • 10
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22 • 19
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 169
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52
InternVL 2.0 Collection Expanding Performance Boundaries of Open-Source MLLM • 16 items • Updated Aug 10 • 72
CMC-Bench: Towards a New Paradigm of Visual Signal Compression Paper • 2406.09356 • Published Jun 13 • 4
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12 • 24
A-Bench: Are LMMs Masters at Evaluating AI-generated Images? Paper • 2406.03070 • Published Jun 5 • 2
MaPO Collection This collection includes the models and datasets as a part of the MaPO release. • 9 items • Updated Jun 12 • 5
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated 1 day ago • 332
LLaVA++ (LLaMA-3 and Phi-3-Mini) Collection Extending Visual Capabilities of LLaVA with LLaMA-3 and Phi-3 • 11 items • Updated Jun 11 • 22
LLaVA-LLaMA-3 Collection Reproduction of various LLaVA models based on LLaMA-3 backbone. • 4 items • Updated Aug 16 • 2
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Aug 2 • 673
MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated May 3 • 46
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9 • 29
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Paper • 2403.16990 • Published Mar 25 • 24
Common Corpus Collection The largest public domain dataset for training LLMs. • 27 items • Updated Jul 17 • 111
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Paper • 2403.12015 • Published Mar 18 • 63
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 56
A Benchmark for Multi-modal Foundation Models on Low-level Vision: from Single Images to Pairs Paper • 2402.07116 • Published Feb 11 • 2
Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models Paper • 2312.15300 • Published Dec 23, 2023 • 2
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 264 items • Updated Jun 22 • 392
Q-Refine: A Perceptual Quality Refiner for AI-Generated Image Paper • 2401.01117 • Published Jan 2 • 8
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels Paper • 2312.17090 • Published Dec 28, 2023 • 4
Diffusion Model Alignment Using Direct Preference Optimization Paper • 2311.12908 • Published Nov 21, 2023 • 47
SelfEval: Leveraging the discriminative nature of generative models for evaluation Paper • 2311.10708 • Published Nov 17, 2023 • 14
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper • 2311.06783 • Published Nov 12, 2023 • 26