Discriminative Machine Learning for Blood Cancer Precision Diagnostics

Diagnosis of blood cancer usually requires accurate identification of cancer cell populations from blood and bone marrow samples. Flow cytometry (FCM) is a primary diagnostic assay routinely used in clinical practice for leukemia diagnosis. The assay workflows consist of multiple manual analysis steps performed by technicians, followed by interpretation by hematopathologists. Challenges to this process include technical variability in the manual analysis, difficulty in identification of the atypical leukemic cells, and the growing number of antigens used for diagnosis.

Instead of conducting ad hoc analysis of individual patient samples, we developed a suite of machine learning methods to leverage preexisting clinical FCM samples for improving the precision identification of leukemic cells. Our discriminative learning method optimizes both cell population identification and sample classification simultaneously, making the “black box” machine learning classification interpretable with results recognizable to hematopathologists.

Collaborating with researchers at Stanford, UC Irvine, and UC San Diego, we lead a 5-year project for developing a web-based computational infrastructure – FlowGate – to improve the accessibility and usability of the cutting-edge cytometry data analytical approaches for both translational research and clinical diagnosis. The back-end infrastructure is built upon a large cluster computer at San Diego Supercomputer Center with 1,944 compute nodes to support web-based interactive analytics and visualization across samples.

Our experiments showed that both typical and atypical types of chronic lymphocytic leukemia (CLL) cells can be clearly captured using our computational approach. The approach is general and potentially applicable to other types of blood cancers. We are working with collaborators from diagnostics labs at Stanford, CHLA of USC, University of Washington, and UC San Diego to extend and apply the machine learning approach to diagnosis of more types of blood cancers including acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), and multiple myeloma (MM). Our goal is to improve and utilize machine intelligence for elucidating cancer heterogeneity and disease endotypes for supporting cancer precision medicine.

Key Findings

  • Automated gating analysis reduces human bias in current manual identification of leukemic cells
  • We developed a novel discriminative learning approach that can optimize gating locations and sample classification simultaneously
  • Non-linear embedding dimensionality reduction can be used with automated gating analysis to improve identification of atypical CLL cells that otherwise could not be identified
Automated gating analysis using DAFi identifies CLL cells in natural shapes.
Automated gating analysis using DAFi identifies CLL cells in natural shapes.
Non-linear embedding transformation (UMAP) of preexisting clinical flow cytometry data clearly identifies typical and atypical CLL, as well as minimum residue disease (MRD) cases.
Non-linear embedding transformation (UMAP) of preexisting clinical flow cytometry data clearly identifies typical and atypical CLL, as well as minimum residue disease (MRD) cases.

Publications

Machine Learning of Discriminative Gate Locations for Clinical Diagnosis.
Cytometry. Part A : the journal of the International Society for Analytical Cytology. 2020-03-01; 97.3: 296-307.
PMID: 31691488
DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data.
Cytometry. Part A : the journal of the International Society for Analytical Cytology. 2018-06-01; 93.6: 597-610.
PMID: 29665244

Funding

This work is funded in part by the US National Center for Advancing Translational Sciences (NCATS), grant number U01TR001801.

Principal Investigators

Collaborators

Jack Bui, Huan-You Wang and Nicholas Bevins
Department of Pathology, UC San Diego

Brent Wood
Children's Hospital Los Angeles/University of Southern California

Holden T. Maecker
Human Immune Monitoring Center, Stanford University

Jean Oak
Department of Pathology, Stanford University

Sindhu Cherian
Department of Laboratory Medicine,University of Washington

Padhraic Smyth
Department of Computer Science, University of California, Irvine

Quang Vinh Nguyen
School of Computing, Engineering and Mathematics, Western Sydney University

Related

Related Research