Xiaoyun Yuan, Ph.D.

I am a postdoctoral researcher (Supervisor: Dr. Lu Fang) at Sigma lab, Tsinghua University, where I work on Computational Photography.

We build the gigapixel dataset PANDA for large scale human centric analysis. We are also holding competitions. Welcome !!!

Ph.D., HKUST 2020          B.Sc., USTC 2014

xiaoyunyuan AT tsinghua.edu.cn  /  Google Scholar  /  Linkedin  /  Github

profile photo

Our paper "GigaMVS: A Benchmark for Ultra-large-scale Gigapixel-level 3D Reconstruction" was accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Sept 24, 2021

Our paper "Mapping human brain function with massively parallel high-speed three-dimensional photoacoustic computed tomography" won the Best Paper Award in Photons Plus Ultrasound: Imaging and Sensing 2021. Mar 11, 2021

Selected publications

I'm interested in Gigapixel Imaging, Optical Computing and Photoacoustic Imaging. Representative papers are listed.

A modular hierarchical array camera (cover article)
Xiaoyun Yuan*, Mengqi Ji*, Jiamin Wu, David J. Brady, Qionghai Dai, Lu Fang
Light: Science and Applications, 2021

We develop an unstructured array camera system that adopts a hierarchical modular design with multiscale hybrid cameras composing different modules. Intelligent computations are designed to collaboratively operate along both intra- and intermodule pathways. This system can adaptively allocate imagery resources to dramatically reduce the hardware cost and possesses unprecedented flexibility, robustness, and versatility.

Massively parallel functional photoacoustic computed tomography of the human brain
Shuai Na*, Jonathan J. Russin*, Li Lin*, Xiaoyun Yuan*, Peng Hu, Kay B. Jann, Lirong Yan, Konstantin Maslov, Junhui Shi, Danny J. Wang, Charles Y. Liu, Lihong V. Wang
Nature Biomedical Engineering, 2021

Here, we show that massively parallel ultrasonic transducers arranged hemispherically around the human head can produce tomographic images of the brain with a 10-cm-diameter FOV and spatial and temporal resolutions of 350 µm and 2 s, respectively. Our findings establish the use of photoacoustic computed tomography for human brain imaging.

Multiscale-VR: Multiscale Gigapixel 3D Panoramic Videography for Virtual Reality
Jianing Zhang*, Tianyi Zhu*, Anke Zhang*, Xiaoyun Yuan*, Zihan Wang, Sebastian Beetschen, Lan Xu, Xing Lin, Qionghai Dai, Lu Fang
IEEE International Conference on Computational Photography (ICCP), 2020

In this work, we propose Multiscale-VR, a multiscale unstructured camera array computational imaging system for high-quality gigapixel 3D panoramic videography that creates the six-degree-of-freedom multiscale interactive VR content. The Multiscale-VR imaging system comprises scalable cylindrical-distributed global and local cameras, where global stereo cameras are stitched to cover 360° field-of-view, and unstructured local monocular cameras are adapted to the global camera for flexible high-resolution video streaming arrangement.

High-speed three-dimensional photoacoustic computed tomography for preclinical research and clinical translation
Li Lin*, Peng Hu*, Xin Tong*, Shuai Na*, Rui Cao, Xiaoyun Yuan, David C Garrett, Junhui Shi, Konstantin Maslov, Lihong V Wang
Nature Communications, 2021

We developed a three-dimensional photoacoustic computed tomography (3D-PACT) system that features large imaging depth, scalable field of view with isotropic spatial resolution, high imaging speed, and superior image quality.

Multiscale gigapixel video: A cross resolution image matching and warping approach
Xiaoyun Yuan, Lu Fang, Qionghai Dai, David J Brady, Yebin Liu
IEEE International Conference on Computational Photography (ICCP), 2017

We present a multi-scale camera array to capture and synthesize gigapixel videos in an efficient way. Our acquisition setup contains a reference camera with a short-focus lens to get a large field-of-view video and a number of unstructured long-focus cameras to capture local-view details.

Magic glasses: from 2D to 3D
Xiaoyun Yuan*, Difei Tang*, Yebin Liu, Qing Ling, Lu Fang
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) , 2016

This paper proposes a virtual 3D eyeglasses try-on system driven by a 2D Internet image of a human face wearing with a pair of eyeglasses. The main technical challenge of this system is the automatic 3D eyeglasses model reconstruction from the 2D glasses on a frontal human face.

Crossnet++: Cross-scale large-parallax warping for reference-based super-resolution
Yang Tan*, Haitian Zheng*, Yinheng Zhu, Xiaoyun Yuan, Xing Lin, David Brady, Lu Fang
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

We present CrossNet++, an end-to-end network containing novel two-stage cross-scale warping modules. The stage I learns to narrow down the parallax distinctively with the strong guidance of landmarks and intensity distribution consensus. Then the stage II operates more fine-grained alignment and aggregation in feature domain to synthesize the final super-resolved image. To further address the large parallax, new hybrid loss functions comprising warping loss, landmark loss and super-resolution loss are proposed to regularize training and enable better convergence.

Panda: A gigapixel-level human-centric video dataset
Xueyang Wang*, Xiya Zhang*, Yinheng Zhu*, Yuchen Guo*, Xiaoyun Yuan, Liuyu Xiang, Zerun Wang, Guiguang Ding, David Brady, Qionghai Dai, Lu Fang
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020
Paper / Project page

PANDA is the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world large-scale scenes with both wide field-of-view (~1km^2 area) and high resolution details (~gigapixel-level/frame). The scenes may contain 4k head counts with over 100× scale variation. PANDA provides enriched and hierarchical ground-truth annotations, including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions.