Qiyan Zhao

Institute of Automation, Chinese Academy of Sciences
Erik Li avatar

About Me

I am a Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (advised by Prof. Xu-Yao Zhang), focusing on explainable AI and multimodal large language models. Previously, I was a research intern at Shanghai Jiao Tong University, where I had the privilege of collaborating with and learning from Xiaofeng Zhang. Additionally, I collaborated closely with Dr. Koon-Ting Yip at University of Macau.

My current interests lie in multimodal reasoning and embodied intelligence (VLA). Please feel free to reach out if you are interested in related topics.

News

  • 2026.4 — We have 2 papers accepted to ACL 2026.
  • 2026.2 — We have 1 paper accepted to CVPR 2026.
  • 2026.1 — We have 1 paper accepted to ICRA 2026.
  • 2026.1 — We have 2 paper including 1 oral accepted to ICLR 2026.
  • 2025.10 — We have 1 paper accepted to ACM MM 2025.
  • 2025.9 — We have 1 paper accepted to The Visual Computer.
  • 2025.6 — We have 1 paper accepted to IJCNN 2025.

Publications

C2RoPE paper thumbnail
Context Tokens are Anchors: Understanding the Repeat Curse in dMLLMs from an Information Flow Perspective
Qiyan Zhao, Xiaofeng Zhang, Shuochen Chang, Qianyu Chen, Xiaosong Yuan, Xuhang Chen, Luoqi Liu, Jiajun Zhang, Xu-Yao Zhang, Da-Han Wang
ICLR 2026 [Paper] [Code]
C2RoPE paper thumbnail
MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models
Qiyan Zhao, Xiaofeng Zhang, Yiheng Li, Yun Xing, Xiaosong Yuan, Feilong Tang, Sinan Fan, Xuhang Chen, Da-Han Wang, Xu-Yao Zhang
ACM MM 2025 [Paper] [Code]
C2RoPE paper thumbnail
Hallucination Begins Where Saliency Drops
Xiaofeng Zhang, Yuanchao Zhu, Chaochen Gu, Xiaosong Yuan, Qiyan Zhao, Jiawei Cao, Feilong Tang, Sinan Fan, Yaomin Shen, Chen Shen, Hao Tang
ICLR 2026 Oral [Paper] [Code]
EDAR paper thumbnail
Fixing Semantic Blind Spots in Anchor Tokens of dMLLMs
Ruixuan Xu, Jiexi Xu, Qiyan Zhao, Xiaofeng Zhang
ACL 2026 Findings [Paper]
TokenPenalty paper thumbnail
TokenPenalty: Alleviating Attention Sinks and Positional Decay in LVLMs
Xiaofeng Zhang, Yuanchao Zhu, Qiyan Zhao, Xiaosong Yuan, Jiawei Cao, Xuhang Chen
ACL 2026 Findings [Paper]
SoPE paper thumbnail
SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs
Koon-Ting Yip, Qiyan Zhao, Wenhao Yu, Liangyu Yuan, Mingkai Li, Xiaofeng Zhang, Jianmin Ji, Yanyong Zhang, Qing Jiang, Ka-Veng Yuen
CVPR 2026 [Paper]
C2RoPE paper thumbnail
C2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal Models Reasoning
Koon-Ting Yip, Qiyan Zhao, Wenhao Yu, Xiaofeng Zhang, Jianming Ji, Yanyong Zhang, Ka-Veng Yuen
ICRA 2026 [Paper] [Code]
C2RoPE paper thumbnail
Textmamba: Scene text detector with mamba
Qiyan Zhao, Yue Yan, Da-Han Wang
IJCNN 2025 [Paper] [Code]
C2RoPE paper thumbnail
Few-shot cross-modal text detection via CLIP
Qiyan Zhao, Xiaofeng Zhang, Tiange Zhang, Jiuze Li
The Visual Computer [JCR 2, CCF C] [Code]

Services