MPhil Student @ HKUST(GZ)

Zikai Zhou

I am an MPhil researcher at HKUST(GZ), working on generative AI, efficient sampling, and visual foundation models. My research focuses on diffusion and flow-matching sampling, inference acceleration, video generation, and data-centric model improvement.

Email Google Scholar GitHub

2026

Core contributor to Qwen Image 2.0, 3.0, and Turbo foundation model iterations.

ICML 2026

Gold Reviewer, Top 25%.

ICCV 2025

Golden Noise accepted at ICCV and cited 103 times on Google Scholar.

Research

Efficient generation from algorithms to data flywheels.

Sampling and acceleration

Designing faster and more reliable sampling procedures for diffusion, flow matching, and masked generative transformers.

Data-centric generation

Building dataset condensation, filtering, distillation, and evaluation pipelines that close the loop between user signals and model training.

Video and image foundation models

Developing T2V, I2V, V2V, and text-to-image systems with stronger fidelity, controllability, and production readiness.

Evaluation pitfalls

Studying where generative evaluation breaks, including guidance, benchmark design, and automated pre-training feedback.

Publications

Research papers and technical reports

Full publication list

Technical Report 2026Post-training

Qwen-Image-2.0-RL Technical Report

Qwen Team, including Zikai Zhou.

Presents a reinforcement learning and on-policy distillation post-training pipeline for Qwen-Image-2.0, improving generation quality, instruction following, and image editing accuracy.

arXiv 2026Agentic image generation

Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

Qwen Team, including Zikai Zhou.

Introduces Qwen-Image-Agent, a context-centric framework that plans, reasons, searches, uses memory, and incorporates feedback to bridge underspecified real-world requests and sufficient generation context.

First page of Qwen-RobotWorld Technical Report

Technical Report 2026Embodied world model

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Qwen Team, including Zikai Zhou.

Introduces Qwen-RobotWorld, a language-conditioned video world model for embodied intelligence, targeting robotic manipulation, autonomous driving, indoor navigation, and language-guided planning.

Technical Report 2026Qwen Image

Qwen-Image-2.0 Technical Report

Qwen Team, including Zikai Zhou.

Documents the second-generation Qwen Image foundation model, including model iteration, data construction, training, and evaluation practices for large-scale text-to-image generation.

Technical Report 2026VAE

Qwen-Image-VAE-2.0 Technical Report

Qwen Team, including Zikai Zhou.

Presents the visual autoencoding component behind Qwen Image, focusing on compact visual representation, reconstruction quality, and downstream generation fidelity.

Technical Report 2026Fast generation

Qwen-Image-Flash: Beyond Objective Design

T. Wu, K. Yan, Zikai Zhou, L. Jiang, J. Li, J. Zhang, K. Gao, N. Tang, S. Yin, X. Chen, et al.

Introduces a fast Qwen Image variant that emphasizes practical responsiveness while maintaining visual quality and instruction-following behavior.

Technical Report 2026Evaluation

Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation

Qwen Team, including Zikai Zhou.

Builds an evaluation suite for modern text-to-image models, moving beyond simple generation quality toward creative instruction following and practical creation scenarios.

First page of Adaptive Matching Distillation paper

ICML 2026Few-step generation

Optimizing Few-Step Generation with Adaptive Matching Distillation

Lichen Bai*, Zikai Zhou*, Wenliang Zhong, Shitong Shao, Shuo Yang, Shuo Chen, Bojun Cheng, Zeke Xie.

Studies how to distill generation trajectories into a small number of sampling steps through adaptive matching, targeting fast inference without sacrificing image quality.

First page of Lightning Unified Video Editing paper

ICML 2026Video editing

Lightning Unified Video Editing via In-Context Sparse Attention

Shitong Shao*, Zikai Zhou*, Haopeng Li, Yingwei Song, Wenliang Zhong, Lichen Bai, Zeke Xie.

Uses in-context sparse attention to unify video editing operations in a faster pipeline, improving edit consistency while reducing unnecessary attention computation.

First page of data-free LoRA transferability paper

ICML 2026Video diffusion

Exploring Data-Free LoRA Transferability for Video Diffusion Models

Y. Wang, W. Zhong, Lichen Bai, Zikai Zhou, Shitong Shao, Bojun Cheng, Shuo Chen, Shuo Yang, et al.

Examines whether LoRA modules can transfer across video diffusion models without access to the original training data, clarifying when lightweight adaptation remains reusable.

arXiv 2026Efficient DiT

PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers

Haopeng Li, Shitong Shao, Wenliang Zhong, Zikai Zhou, Lichen Bai, Haoyi Xiong, Zeke Xie.

Proposes piecewise sparse attention for diffusion transformers, aiming to retain global generation quality while spending attention only where it matters.

First page of Collect, Reflect, and Refine paper

TPAMI 2026T2I generation

Improved and Accelerated Text-to-Image Generation with Collect, Reflect, and Refine

Shitong Shao, Zikai Zhou, Dian Xie, Yuetong Fang, Tian Ye, Lichen Bai, Bo Han, Zeke Xie.

Builds a closed-loop generation framework that collects feedback, reflects on failure modes, and refines outputs to improve and accelerate text-to-image generation.

ICLR 2026Evaluation

Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

Dian Xie*, Shitong Shao*, Lichen Bai, Zikai Zhou, Bojun Cheng, Shuo Yang, Jun Wu, Zeke Xie.

Shows that guidance choices can distort text-to-image evaluation, motivating evaluation protocols that avoid misleading comparisons across models.

First page of Diffusion Dataset Condensation paper

CVPR 2026Dataset condensation

Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data

Rui Huang, Shitong Shao, Zikai Zhou, Tian Ye, Lichen Bai, Shuo Yang, Zeke Xie.

Condenses diffusion training data so that models can learn efficiently from smaller, more informative synthetic subsets.

ICCV 2025103 citations

Golden Noise for Diffusion Models: A Learning Framework

Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie.

Introduces a learning framework for finding stronger initial noise patterns, improving diffusion generation by optimizing the starting point of the sampling process.

ICLR 2025Video synthesis

IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis

Shitong Shao*, Zikai Zhou*, Lichen Bai, Haoyi Xiong, Zeke Xie.

Combines image diffusion priors with video sampling to improve visual fidelity and temporal generation in video synthesis.

First page of Zigzag Diffusion Sampling paper

ICLR 2025Sampling

Zigzag Diffusion Sampling: The Path to Success Is Zigzag

Lichen Bai, Shitong Shao, Zikai Zhou, Zipeng Qi, Zhiqiang Xu, Haoyi Xiong, Zeke Xie.

Explores a non-monotonic sampling path where diffusion models can self-reflect during generation, improving sample quality through zigzag refinement.

First page of masked generative transformer inference paper

arXiv 2024 / IJCV under reviewMasked generative transformer

Exploring the Design Space for Inference Enhancement and Acceleration of Masked Generative Transformer

Shitong Shao*, Zikai Zhou*, Tian Ye, Lichen Bai, Zhiqiang Xu, Shuo Yang, Bo Han, Zeke Xie.

Systematically studies inference design choices for high-resolution masked generative transformers, identifying practical levers for quality and speed.

TPAMI under reviewFlow sampling

Reflective Flow Sampling Enhancement

Zikai Zhou*, Muyao Wang*, Shitong Shao, Dian Xie, Lichen Bai, Zeke Xie.

Extends reflective sampling ideas to flow-based generation, targeting stronger sample quality through intermediate trajectory correction.

NeurIPS 2024Dataset condensation

Elucidating the Design Space of Dataset Condensation

Shitong Shao, Zikai Zhou, Huanran Chen, Zhiqiang Shen.

Maps the main choices in dataset condensation and clarifies how different recipes influence downstream training performance.

First page of Rethinking Centered Kernel Alignment paper

IJCAI 2024 OralKnowledge distillation

Rethinking Centered Kernel Alignment in Knowledge Distillation

Zikai Zhou, Yunhang Shen, Shitong Shao, Linrui Gong, Shaohui Lin.

Revisits CKA as a distillation signal and analyzes how representation alignment should be used for more effective teacher-student learning.

First page of Similar Target Method paper

IJCNN 2024Adversarial attacks

Enhancing Adversarial Attacks: The Similar Target Method

S. Zhang, Z. Wang, Zikai Zhou, J. Liu, H. Chen.

Improves adversarial attack construction by selecting similar target classes, making perturbation objectives more effective and semantically grounded.

arXiv 2023Normalization

AFN: Adaptive Fusion Normalization via an Encoder-Decoder Framework

Zikai Zhou, S. Zhang, Z. Wang, H. Chen.

Proposes an encoder-decoder normalization framework that adaptively fuses feature statistics for more flexible representation transformation.

PDF preview unavailable

Under reviewVideo diffusion

The Blessing of Smooth Initialization for Video Diffusion Models

Shitong Shao, Lichen Bai, Zikai Zhou, Tian Ye, Yunfeng Cai, Kaishun Wu, Zeke Xie.

Studies how smooth initialization can stabilize video diffusion generation and improve temporal coherence at inference time.

* Equal contribution. Technical report entries reflect the Qwen Image series contributions listed in the CV and Google Scholar profile.

Experience

Research that ships into model systems.

Mar. 2026 - Present

Research Intern, Qwen Image Foundation Model Team, Alibaba Group

Core contributor to Qwen Image 2.0, 3.0, and Turbo. Built automated data flywheels, large-scale distillation pipelines, and pre-training evaluation systems for image foundation models.

Jun. 2024 - Feb. 2026

Algorithm Engineer, Fuguang AI

Led R&D and deployment of T2V, I2V, and V2V models, including data collection, auto-labeling, quality assessment, inference optimization, and prompt-engineering modules.

Apr. 2024 - Sep. 2025

Research Assistant, xLeaf Lab, HKUST(GZ)

Worked on AIGC inference acceleration and sampling optimization for diffusion and flow matching, contributing to Golden Noise and Zigzag Diffusion Sampling.

Education

HKUST(GZ) and BIT

MPhil student in Artificial Intelligence at HKUST(GZ), supervised by Prof. Zeke Xie. B.Eng. in Computer Science from Beijing Institute of Technology, Outstanding Graduate 2025.

Academic Service

Reviewer and chairing

ICML 2026 Gold Reviewer. Session Chair at IJCAI 2024. Reviewer for ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, AAAI, ACM MM, and IJCAI.

Talk

Inference optimization

Invited by Qingke Community to present research on inference optimization for diffusion models.