🚀 I got my doctoral degree degree from Southern Methodist University (SMU) on November 18, 2025. I am the first and also currently the only person in SMU history to complete the Ph.D. program in just two years! Prior to that, I got my master's degree from the Chinese University of Hong Kong (CUHK) in 2022. And I obtained the bachelor's degree from University of Electronic Science and Technology of China (UESTC) in 2020.
✨✨✨ I am currently on the job market and welcome any opportunities or discussions. Please feel free to reach out if there is a potential fit. ✨✨✨
My research interests mainly focus on developing biomarker-driven multimodal AI systems for clinical decision support and disease progression modeling, specifically: Medical AI -> Developing multimodal models for diagnosis and clinical decision support, integrating diverse medical data.
My long-term vision is to advance AI for Health Care by building clinically grounded, trustworthy, and multimodal intelligence systems. I frame this mission through the Hippocrates paradigm:
Research Map: The Schematic Overview of My Research Vision









* Equal Contribution, † Corresponding Author
Area: Glaucoma, Ophthalmology AI, Data Mining
GlaBoost is a multimodal framework for glaucoma risk prediction that integrates structured clinical data, fundus image embeddings, and expert textual descriptions into a unified feature space.
It leverages pretrained visual and language encoders alongside an enhanced XGBoost classifier to achieve high predictive performance, reaching 98.71% validation accuracy on real-world datasets.
Importantly, its feature importance analysis aligns with clinical knowledge, offering an interpretable and scalable solution for glaucoma diagnosis.
Area: Low Resource Language Processing, Text to Speech, Generative AI
Tibetan speech modeling is constrained by limited parallel corpora across major dialects, hindering multi-dialect synthesis.
To address this, we propose TMD-TTS, a unified framework that generates parallel dialectal speech from explicit dialect labels.
By modeling fine-grained acoustic and linguistic variations, TMD-TTS significantly improves dialectal expressiveness and enables high-quality speech generation across Tibetan dialects.
Area: Glaucoma, Ophthalmology AI, Medical Report Generation
Existing methods for glaucoma report generation suffer from redundant narratives and insufficient emphasis on clinically critical features.
To address these limitations, we propose DA-SPL, a dual-attention multimodal framework that improves cross-modal representation and pathology-aware description.
It enables accurate extraction of subtle disease patterns and generates clinically consistent diagnostic reports with superior performance.
Area: Low Resource Language Processing, Large Language Model, Benchmark
Due to the lack of standardized evaluation in Tibetan NLP, existing large language models cannot be reliably assessed or compared, particularly in reasoning and safety-critical scenarios.
To address this, we propose TLUE, the first unified benchmark for Tibetan LLMs, which enables consistent, reproducible, and comprehensive evaluation.
By resolving fragmented and inconsistent evaluation practices, TLUE establishes a foundation for developing reliable and culturally aligned language models in low-resource settings.
Area:Glaucoma, Ophthalmology AI, Medical Report Generation
Retinal vessel analysis in OCTA is critical for understanding glaucoma progression, yet existing methods struggle with complex vessel structures and rely heavily on large labeled datasets.
To address this, we propose VeinCluster, an unsupervised segmentation algorithm that extracts major vessels and vascular nodes from OCTA images using pixel density-based modeling.
Without requiring extensive annotations or high computational resources, VeinCluster achieves accurate and interpretable vessel segmentation, outperforming existing methods. It further enables downstream analysis of blood flow patterns and supports glaucoma progression modeling.
Area: AIoT for Public Health, YOLOv5, Mask Detection
IoT-based deep learning systems are often constrained by limited bandwidth and computational resources, leading to latency and deployment challenges.
To address this, we develop an improved lightweight YOLOv5 framework for efficient edge-side applications, including mask detection, vehicle counting, and target tracking.
By optimizing model efficiency and deploying via Docker and Kubernetes, the system achieves faster inference, reduced storage requirements, and seamless edge–cloud interaction.
This enables real-time, scalable, and resource-efficient intelligent services in IoT environments.
Area: Resource-Efficient AI, Support Vector Machine, Meter Detection
Existing meter recognition methods rely heavily on deep learning and large-scale data, limiting their effectiveness in small or occluded datasets.
To overcome this, we propose MC-FE, a feature-driven multi-classifier framework that adaptively selects discriminative features, along with ML-KRP for precise localization.
The approach enables robust and accurate recognition without large-scale data, outperforming state-of-the-art methods in small-data scenarios.




Project: Federated Learning Alignment Method under Multi-Type Data Distribution (2022KQNCX084)
Category: Guangdong Province Higher Education Youth Innovation Talent Project - Natural Science
Role: Co-PI, with Dr. Siyang Jiang
Description: A study on developing alignment methods in federated learning to address challenges posed by heterogeneous data distributions across clients, including differences in features, labels, and modalities.
→ Survey:
→ Benchmark:
→ Dataset: