view article Article Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models wubingheng โข Aug 5, 2025 โข 7