I do research on 3D computer vision at Qualcomm AI Research in San Diego. I did my PhD at Notre Dame EE under Dr. Nicholas Zabaras on deep learning for modeling PDE systems. I studied automatic control at Tongji University.
I'm interested in 3D reconstruction, inverse rendering, generative design, and XR applications. I also worked on data compression.
Scale up object-level point cloud pretraining (global embedding) with point-text-image contrastive learning on Objaverse. Data enigeering and scaling matter.
Resolve the ambiguity of material and lighting estimation with precomputed radiance transfer (factorized), differentiable path tracing, radiance caching and material segmentation.
Lift the amazing low-shot 2D part detection capability of GLIP to 3D point cloud, together with point oversegmentation and multiview fusion.
AFAIK SwinT-ChARM is the first neural image codec that outperforms VTM in rate-distortion while with comparable decoding time on GPU.
An embedded bitstream is obtained with nested quantization and per-element sorting by prior stddev, based on the hyperprior model.
PReLU can replace GDN in the hyperprior model to compress YUV (and RGB!) images without loss of coding gain.