Hao Xuan

About Me

AI Researcher with 5+ years of experience developing scalable machine learning frameworks for modeling complex scientific data in biomedical and clinical domains. Research focuses on representation learning from high-dimensional biological sequences and literature through algorithmic approaches including large language models, transformer-based architectures, and multi-agent reasoning systems. Led the design of extensible alignment frameworks and domain-specific information extraction models for large-scale scientific discovery tasks. Experienced in building generalizable AI systems that bridge machine learning, natural language processing, and computational biology to enable data-driven hypothesis generation and predictive modeling. Proficient in Python, PyTorch, and C/C++ for developing research-grade AI infrastructure.

Detailed Research Interests

I am dedicated to exploring how to (1) enable AI systems with knowledge and reasoning skills to solve biomedical tasks; and (2) develop scalable machine learning frameworks for biological sequence analysis.

1) AI for Biomedical Discovery
I explore how AI can assist and automate the scientific workflow in biomedical domains.

Fine-tuned LLMs: Adapt large language models for biomedical applications using parameter-efficient fine-tuning methods such as LoRA.
Biomedical NLP: Build systems for named entity recognition and information extraction from scientific literature.
Multi-agent reasoning: Develop multi-agent systems for evidence-based drug discovery and scientific inquiry.
Knowledge integration: Combine parametric and retrieved knowledge for biomedical question answering and hypothesis generation.

2) Biological Sequence Analysis
I aim to build extensible frameworks for biological sequence alignment and analysis across scales.

Multi-scale alignment: Develop general algorithmic frameworks for sequence alignment.
K-mer methods: Explore k-mer signature-based approaches for early cancer screening, microbiome analysis and biomarker discovery.
Omics pipelines: Create industry-level, scalable pipelines for high-throughput omics data analysis and integrative modeling.

Research Trajectory

My research develops AI-driven computational systems that translate biological sequencing data into clinically actionable therapeutic insights through an end-to-end precision medicine pipeline: Sequencing → Representation Learning → Clinical Inference → Therapeutic Reasoning → In-Silico Screening. I build scalable frameworks to extract disease-associated molecular patterns from large-scale microbiome and genomic sequencing data, learn patient-specific biological representations, integrate molecular evidence with biomedical knowledge for clinical outcome prediction, and support treatment hypothesis generation through AI-driven reasoning. My long-term goal is to enable integrative AI systems for early cancer detection and personalized treatment in precision oncology.

Selected publications

A general and extensible algorithmic framework to biological sequence alignment across scales and applications
Hao Xuan, Hongyang Sun, Xiangtao Liu, Hanyuan Zhang, Jun Zhang, Cuncong Zhong
bioRxiv, 2026.
[DOI]
Neonatal gut microbiota succession in mice mapped by maturation, site, injury, and single immunoglobulin interleukin-1 related receptor genotype
Hao Xuan, Shahid Umar, Wenhao Yu, Imran Ahmed, Cuncong Zhong, Michael Morowitz, Venkatesh Sampath
iScience, 2025.
[DOI]
Lactobacillus rhamnosus modulates murine neonatal gut microbiota and inflammation caused by pathogenic Escherichia coli
Hao Xuan, Shahid Umar, Cuncong Zhong, Wenhao Yu, Imran Ahmed, Jessica L. Wheatley, Venkatesh Sampath, Sabrina Chavez-Bueno
BMC Microbiology, 2024.
[DOI]

* Equal contribution

News

November 2025 Served as Co-PI on an NSF STTR Phase I proposal submitted by H2Alpha Inc. for H2Oreo—an AI-powered bioinformatics platform; currently under revision for resubmission.

February 2025 Started as Co-Founder & Director at H2Alpha Inc.