NSforest: A Machine Learning Method to Identify Marker Genes from Single Cell/Single Nuclei RNA Sequencing Data
Cells are fundamental functional units of multicellular organisms, with different cell types playing distinct physiological roles in the body. The recent advent of single cell transcriptional profiling using RNA sequencing is producing "big data," enabling the identification of novel human cell types at an unprecedented rate.
NSforest is a method based on random forest machine learning for identifying sets of necessary and sufficient marker genes, which can be used for quantitative PCR and multiplex FISH, and to assemble consistent and reproducible cell type definitions for incorporation into the Cell Ontology (CL). The representation of defined cell type classes and their relationships in the CL using this strategy will make the cell type classes findable, accessible, interoperable, and reusable (FAIR), allowing the CL to serve as a reference knowledgebase of information about the role that distinct cellular phenotypes play in human health and disease.
Publications
BMC bioinformatics. 2017-12-21; 18.Suppl 17: 559.
Cell type discovery and representation in the era of high-content single cell phenotyping
Funding
This work is funded by the Chan Zuckerberg Initiative DAF under grant no. 2018-182730.