Develop algorithms for new sequencing data analysis

Although there are many tools available for next-generation sequencing data analysis, the vast volumes of sequencing data that are being generated at a decreasing cost still require more sensitive and specific algorithms to analyze in an efficient manner. The third-generation sequencing (TGS) platforms (PacBio, Oxford nanopore, etc.) can generate reads of thousands of base pairs, which can cover a large proportion of the genome in a haplotype. However, the error rates of the TGS platforms are very high (e.g. ~15% for PacBio), which is extremely challenging for data analysis. But there are many open questions for Bioinformaticians to develop novel methods to overcome the drawbacks and fully exploit the properties of the TGS data. Besides, there are newly developed biotechniques, such as 10X Genomics, strand-seq, improved single-cell sequencing, etc., which are generating miscellaneous sequencing data types. We are also interested in providing efficient solutions to deal with these types of data.

Functions and mechanisms of genomic rearrangements

Compared to single nucleotide variants and short indels, genomic rearrangements or structural variations (SVs) can contribute to more polymorphisms within species and more divergence between species. However, it is still an open question in terms of the functions and mechanisms of SVs. For example, the observation and hypothesis of chromothripsis are still mysterious. We are interested in an in-depth analysis of SVs in a hypothesis-driven manner.

Disease genomics

The disease is an abnormal phenotype. There is a long-lasting complex problem regarding how the genotype (mixing with epigenetic alterations and modifications and environment) determines the phenotype. We are interested in a wide range of questions, for example, discovering the causing variants, pan-cancer/disease genomics data analysis, integrated data analysis, etc.

Collaborations

We have extensive collaborations around the UAB campus and nationwide.