Qiyan Zhao

Institute of Automation, Chinese Academy of Sciences

GitHub | Google Scholar | Email: qiyanzhao618@gmail.com

About Me

I am a Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (advised by Prof. Xu-Yao Zhang), focusing on explainable AI and multimodal large language models. Previously, I was a research intern at Shanghai Jiao Tong University, where I had the privilege of collaborating with and learning from Xiaofeng Zhang. Additionally, I collaborated closely with Dr. Koon-Ting Yip at University of Macau.

My current interests lie in multimodal reasoning and embodied intelligence (VLA). Please feel free to reach out if you are interested in related topics.

News

2026.4 — We have 2 papers accepted to ACL 2026.
2026.2 — We have 1 paper accepted to CVPR 2026.
2026.1 — We have 1 paper accepted to ICRA 2026.
2026.1 — We have 2 paper including 1 oral accepted to ICLR 2026.
2025.10 — We have 1 paper accepted to ACM MM 2025.
2025.9 — We have 1 paper accepted to The Visual Computer.
2025.6 — We have 1 paper accepted to IJCNN 2025.

Publications

Context Tokens are Anchors: Understanding the Repeat Curse in dMLLMs from an Information Flow Perspective

Qiyan Zhao, Xiaofeng Zhang, Shuochen Chang, Qianyu Chen, Xiaosong Yuan, Xuhang Chen, Luoqi Liu, Jiajun Zhang, Xu-Yao Zhang, Da-Han Wang

ICLR 2026 [Paper] [Code]

MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models

Qiyan Zhao, Xiaofeng Zhang, Yiheng Li, Yun Xing, Xiaosong Yuan, Feilong Tang, Sinan Fan, Xuhang Chen, Da-Han Wang, Xu-Yao Zhang

ACM MM 2025 [Paper] [Code]

Hallucination Begins Where Saliency Drops

Xiaofeng Zhang, Yuanchao Zhu, Chaochen Gu, Xiaosong Yuan, Qiyan Zhao, Jiawei Cao, Feilong Tang, Sinan Fan, Yaomin Shen, Chen Shen, Hao Tang

ICLR 2026 Oral [Paper] [Code]

Fixing Semantic Blind Spots in Anchor Tokens of dMLLMs

Ruixuan Xu, Jiexi Xu, Qiyan Zhao, Xiaofeng Zhang

ACL 2026 Findings [Paper]

TokenPenalty: Alleviating Attention Sinks and Positional Decay in LVLMs

Xiaofeng Zhang, Yuanchao Zhu, Qiyan Zhao, Xiaosong Yuan, Jiawei Cao, Xuhang Chen

ACL 2026 Findings [Paper]

SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs

Koon-Ting Yip, Qiyan Zhao, Wenhao Yu, Liangyu Yuan, Mingkai Li, Xiaofeng Zhang, Jianmin Ji, Yanyong Zhang, Qing Jiang, Ka-Veng Yuen

CVPR 2026 [Paper]

C²ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal Models Reasoning

Koon-Ting Yip, Qiyan Zhao, Wenhao Yu, Xiaofeng Zhang, Jianming Ji, Yanyong Zhang, Ka-Veng Yuen

ICRA 2026 [Paper] [Code]

Textmamba: Scene text detector with mamba

Qiyan Zhao, Yue Yan, Da-Han Wang

IJCNN 2025 [Paper] [Code]

Few-shot cross-modal text detection via CLIP

Qiyan Zhao, Xiaofeng Zhang, Tiange Zhang, Jiuze Li

The Visual Computer [JCR 2, CCF C] [Code]

Services

Invited Reviewer for: ICML, NeurIPS, CVPR, ICLR, ACM MM, ECCV