Xiaoyun Yuan, Ph.D.

I am a postdoctoral researcher (Supervisor: Dr. Lu Fang) at Sigma lab, Tsinghua University, where I work on Optical Intelligent Computing and Computational Photography.

Updated Oct 27, 2023

Ph.D., HKUST 2020          B.Sc., USTC 2014

xyuanag AT connect.ust.hk  /  Google Scholar  /  Github

profile photo

1. Our paper "Training large-scale optoelectronic neural networks with dual-neuron optical-artificial learning" was accepted by Nature Communications. Sept 25, 2023.

2. We are holding GigaVision challenges , with a total prize of USD 400,000 (CNY 3,000,000). GigaVision program seeks to revolutionize computer vision when it meets gigapixel videography with both wide field-of-view and high-resolution details. Welcome!!! Sept 1, 2022

3. Our paper "A multichannel optical computing architecture for advanced machine vision" was accepted by Light: Science & Applications. Aug 18, 2022

Selected publications

I'm interested in Gigapixel Imaging, Optical Computing and Photoacoustic Imaging. Representative papers are listed.

Training large-scale optoelectronic neural networks with dual-neuron optical-artificial learning
Xiaoyun Yuan, Yong Wang, Zhihao Xu, Tiankuang Zhou, Lu Fang
Nature Communications, 2023
Paper      Code

We present DANTE, a dual-neuron optical-artificial learning architecture. Optical neurons model the optical diffraction, while artificial neurons approximate the intensive optical-diffraction computations with lightweight functions. DANTE also improves convergence by employing iterative global artificial-learning steps and local optical-learning steps. In simulation experiments, DANTE successfully trains large-scale ONNs with 150 million neurons on ImageNet, previously unattainable, and accelerates training speeds significantly on the CIFAR-10 benchmark compared to single-neuron learning. In physical experiments, we develop a two-layer ONN system based on DANTE, which can effectively extract features to improve the classification of natural images.

A multichannel optical computing architecture for advanced machine vision (Editors' Hightlight)
Zhihao Xu*, Xiaoyun Yuan*, Tiankuang Zhou, Lu Fang
Light: Science and Applications, 2022

Herein, we develop Monet: a multichannel optical neural network architecture for a universal multiple-input multiple-channel optical computing based on a novel projection-interference-prediction framework where the inter- and intra- channel connections are mapped to optical interference and diffraction. For the first time, Monet validates that multichannel processing properties can be optically implemented with high-efficiency, enabling real-world intelligent multichannel-processing tasks solved via optical computing, including 3D/motion detections.

GigaMVS: A Benchmark for Ultra-large-scale Gigapixel-level 3D Reconstruction
Jianing Zhang*, Jinzhi Zhang*, Shi Mao*, Mengqi Ji, Guangyu Wang, Zequn Chen, Tian Zhang, Xiaoyun Yuan, Qionghai Dai, Lu Fang
IEEE Transactions on Pattern Analysis & Machine Intelligence , 2021

Multiview stereopsis (MVS) methods, which can reconstruct both the 3D geometry and texture from multiple images, have been rapidly developed and extensively investigated from the feature engineering methods to the data-driven ones. However, there is no dataset containing both the 3D geometry of large-scale scenes and high-resolution observations of small details to benchmark the algorithms. To this end, we present GigaMVS, the first gigapixel-image-based 3D reconstruction benchmark for ultra-large-scale scenes. ...

A modular hierarchical array camera (cover article)
Xiaoyun Yuan*, Mengqi Ji*, Jiamin Wu, David J. Brady, Qionghai Dai, Lu Fang
Light: Science and Applications, 2021

We develop an unstructured array camera system that adopts a hierarchical modular design with multiscale hybrid cameras composing different modules. Intelligent computations are designed to collaboratively operate along both intra- and intermodule pathways. This system can adaptively allocate imagery resources to dramatically reduce the hardware cost and possesses unprecedented flexibility, robustness, and versatility.

Massively parallel functional photoacoustic computed tomography of the human brain
Shuai Na*, Jonathan J. Russin*, Li Lin*, Xiaoyun Yuan*, Peng Hu, Kay B. Jann, Lirong Yan, Konstantin Maslov, Junhui Shi, Danny J. Wang, Charles Y. Liu, Lihong V. Wang
Nature Biomedical Engineering, 2021

Here, we show that massively parallel ultrasonic transducers arranged hemispherically around the human head can produce tomographic images of the brain with a 10-cm-diameter FOV and spatial and temporal resolutions of 350 µm and 2 s, respectively. Our findings establish the use of photoacoustic computed tomography for human brain imaging.

Multiscale-VR: Multiscale Gigapixel 3D Panoramic Videography for Virtual Reality
Jianing Zhang*, Tianyi Zhu*, Anke Zhang*, Xiaoyun Yuan*, Zihan Wang, Sebastian Beetschen, Lan Xu, Xing Lin, Qionghai Dai, Lu Fang
IEEE International Conference on Computational Photography (ICCP), 2020

In this work, we propose Multiscale-VR, a multiscale unstructured camera array computational imaging system for high-quality gigapixel 3D panoramic videography that creates the six-degree-of-freedom multiscale interactive VR content. The Multiscale-VR imaging system comprises scalable cylindrical-distributed global and local cameras, where global stereo cameras are stitched to cover 360° field-of-view, and unstructured local monocular cameras are adapted to the global camera for flexible high-resolution video streaming arrangement.

High-speed three-dimensional photoacoustic computed tomography for preclinical research and clinical translation
Li Lin*, Peng Hu*, Xin Tong*, Shuai Na*, Rui Cao, Xiaoyun Yuan, David C Garrett, Junhui Shi, Konstantin Maslov, Lihong V Wang
Nature Communications, 2021

We developed a three-dimensional photoacoustic computed tomography (3D-PACT) system that features large imaging depth, scalable field of view with isotropic spatial resolution, high imaging speed, and superior image quality.

Multiscale gigapixel video: A cross resolution image matching and warping approach
Xiaoyun Yuan, Lu Fang, Qionghai Dai, David J Brady, Yebin Liu
IEEE International Conference on Computational Photography (ICCP), 2017

We present a multi-scale camera array to capture and synthesize gigapixel videos in an efficient way. Our acquisition setup contains a reference camera with a short-focus lens to get a large field-of-view video and a number of unstructured long-focus cameras to capture local-view details.

Magic glasses: from 2D to 3D
Xiaoyun Yuan*, Difei Tang*, Yebin Liu, Qing Ling, Lu Fang
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) , 2016

This paper proposes a virtual 3D eyeglasses try-on system driven by a 2D Internet image of a human face wearing with a pair of eyeglasses. The main technical challenge of this system is the automatic 3D eyeglasses model reconstruction from the 2D glasses on a frontal human face.

Crossnet++: Cross-scale large-parallax warping for reference-based super-resolution
Yang Tan*, Haitian Zheng*, Yinheng Zhu, Xiaoyun Yuan, Xing Lin, David Brady, Lu Fang
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

We present CrossNet++, an end-to-end network containing novel two-stage cross-scale warping modules. The stage I learns to narrow down the parallax distinctively with the strong guidance of landmarks and intensity distribution consensus. Then the stage II operates more fine-grained alignment and aggregation in feature domain to synthesize the final super-resolved image. To further address the large parallax, new hybrid loss functions comprising warping loss, landmark loss and super-resolution loss are proposed to regularize training and enable better convergence.

Panda: A gigapixel-level human-centric video dataset
Xueyang Wang*, Xiya Zhang*, Yinheng Zhu*, Yuchen Guo*, Xiaoyun Yuan, Liuyu Xiang, Zerun Wang, Guiguang Ding, David Brady, Qionghai Dai, Lu Fang
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020
Paper / Project page

PANDA is the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world large-scale scenes with both wide field-of-view (~1km^2 area) and high resolution details (~gigapixel-level/frame). The scenes may contain 4k head counts with over 100× scale variation. PANDA provides enriched and hierarchical ground-truth annotations, including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions.