ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA Paper • 2603.10256 • Published 3 days ago • 14
Principled Coarse-Grained Acceptance for Speculative Decoding in Speech Paper • 2511.13732 • Published Nov 5, 2025
ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline Paper • 2508.06094 • Published Aug 8, 2025
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits Paper • 2506.09988 • Published Jun 11, 2025
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions Paper • 2411.09018 • Published Nov 13, 2024
ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation Paper • 2403.01306 • Published Mar 2, 2024
MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations Paper • 2312.03631 • Published Dec 6, 2023