LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
Scalify: scale propagation for efficient low-precision LLM training Paper • 2407.17353 • Published Jul 24 • 11
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Paper • 2407.11062 • Published Jul 10 • 8
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published Jul 17 • 75
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA Paper • 2406.17419 • Published Jun 25 • 14
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks Paper • 2406.12066 • Published Jun 17 • 8
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 55
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published May 23 • 34
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18 • 24
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published Apr 18 • 38
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 590
SDXL-Lightning: Progressive Adversarial Diffusion Distillation Paper • 2402.13929 • Published Feb 21 • 27
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20 • 24
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 59
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26 • 67
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference Paper • 2401.08671 • Published Jan 9 • 13
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 70
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40
Generative Multimodal Models are In-Context Learners Paper • 2312.13286 • Published Dec 20, 2023 • 34
LLM-FP4: 4-Bit Floating-Point Quantized Transformers Paper • 2310.16836 • Published Oct 25, 2023 • 13
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots Paper • 2310.13724 • Published Oct 19, 2023 • 8
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 96
Enable Language Models to Implicitly Learn Self-Improvement From Data Paper • 2310.00898 • Published Oct 2, 2023 • 22
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Paper • 2309.16058 • Published Sep 27, 2023 • 55
Large Language Models Cannot Self-Correct Reasoning Yet Paper • 2310.01798 • Published Oct 3, 2023 • 32
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 58
TextBind: Multi-turn Interleaved Multimodal Instruction-following Paper • 2309.08637 • Published Sep 14, 2023 • 7
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) Paper • 2309.08968 • Published Sep 16, 2023 • 22
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages Paper • 2309.09400 • Published Sep 17, 2023 • 82
Efficient Memory Management for Large Language Model Serving with PagedAttention Paper • 2309.06180 • Published Sep 12, 2023 • 25
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 86
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models Paper • 2308.16137 • Published Aug 30, 2023 • 39
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation Paper • 2308.15975 • Published Aug 30, 2023 • 11
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers Paper • 2308.13494 • Published Aug 25, 2023 • 9