
Editor's Note: At the recent European Hematology Association (EHA) Annual Congress, a premier event held in Milan, Italy, top global experts in hematology convened to discuss cutting-edge diagnostic and therapeutic strategies for hematologic malignancies. Among the topics, Acute Myeloid Leukemia (AML) garnered significant attention. Professor Amin Turki of the HARMONY Alliance, on behalf of his collaborative team, presented a pivotal study titled "Development and Independent Validation of an Unsupervised Genomic Classification for Patients with Acute Myeloid Leukemia using Hierarchical Dirichlet Mixture Models," demonstrating how machine learning can bring revolutionary breakthroughs to the precision subtyping and personalized treatment of AML.
Acute Myeloid Leukemia (AML), as a highly heterogeneous malignancy, presents immense challenges to precise clinical subtyping and treatment due to its complex biological characteristics. Although existing clinical guidelines, such as the European LeukemiaNet (ELN) 2022 recommendations, utilize genetic mutations as the core basis for risk stratification—dividing patients into favorable, intermediate, and adverse-risk groups—this framework still fails to fully account for the prognostic disparities observed among all patients. To gain a deeper understanding of the AML genetic landscape and explore more precise biological subtypes, the HARMONY Alliance initiated a large-scale study aimed at redefining the AML classification system “from the ground up” using an innovative unsupervised learning approach.
A New Path: Exploring AML Genomic Subtyping Based on Unsupervised Learning
Whereas traditional research methods often rely on known biological or clinical labels, the core breakthrough of this study lies in its adoption of a purely unsupervised learning strategy, using only patients’ genomic feature data and “letting the data speak for itself.” The study’s lead, Professor Amin Turki, emphasized in his presentation, “Our goal was to explore the underlying biological patterns of AML from a large-scale, heterogeneous, real-world dataset through a novel data processing method. We believe that identifying patient subgroups with unique biological patterns will provide critical information for designing new therapeutic strategies and achieving individualized risk prediction.”
To this end, the research team selected an advanced machine learning model: hierarchical Dirichlet mixture models. This method not only demonstrates strong robustness in adaptively determining the number of gene clusters but also possesses excellent generalization capabilities, indicating its significant potential for clinical classification applications.
In terms of technical breakthroughs, the study achieved key methodological innovations. Professor Turki explained, “We defined gene clusters as probability distributions, specifically as multivariate Fisher’s non-central hypergeometric distributions. This means that each patient is not manually or rigidly assigned to a category but is precisely allocated to the most probable biological cluster through statistical methods. This approach is more objective and biologically meaningful.”
The development cohort for this study was sourced from the HARMONY Alliance, encompassing 5,244 adult AML patients from multiple European countries and centers. All patients had comprehensive data on 86 genetic features, and the vast majority had received intensive chemotherapy as their primary treatment. The cohort was well-balanced across different age groups and ELN risk categories, providing a solid foundation for robust model training.
Major Findings: Discovery and Validation of 17 Novel Biological Subgroups
Using this innovative methodology for in-depth analysis of the large patient cohort, the research team achieved groundbreaking results: the model successfully identified 17 novel biological subgroups with distinct genetic profiles. Kaplan-Meier survival analysis revealed significant differences in Overall Survival (OS) among these 17 subgroups, demonstrating the powerful prognostic predictive capability of the new classification system.
Particularly noteworthy was the model’s successful refinement of several important existing genetic subtypes. For instance, classic NPM1-mutated AML was subdivided into three prognostically distinct subgroups, and AML with myelodysplasia-related changes (MRC) was also divided into three categories. The re-stratification of inversion 16 (inv(16)), a classic “favorable-risk” type, perfectly illustrates the clinical value of the new model. The study found that inv(16) could be divided into two subgroups:
l Classic Subgroup: Primarily co-occurs with NRAS mutations, with a prognosis consistent with traditional understanding, which is relatively favorable.
l High-Risk Subgroup: Co-occurs with FMS-like tyrosine kinase 3 (FLT3) mutations. Patients with this combination had a significantly worse overall survival than the classic subgroup.
Multivariate Cox regression analysis further confirmed that after adjusting for factors such as age, ELN risk stratification, and allogeneic hematopoietic stem cell transplantation, inv(16) patients with a co-occurring FLT3 mutation had a significantly higher risk of death than the reference group, losing the prognostic advantage traditionally associated with inv(16).
To ensure the reliability of the findings, the team used an independent, publicly available UK National Cancer Research Institute (NCRI) trial cohort as a validation set. The results showed that the distribution of the 17 biological subgroups was almost perfectly recapitulated in the validation cohort. Concurrently, the clinical phenotypic characteristics of patients in each subgroup (such as age, peripheral blood blast percentage, and platelet count) were also highly consistent between the development and validation cohorts, further proving the stability and generalizability of the classification model.
Clinical Value: A Complement and Refinement to the Existing ELN Risk Stratification
So, what is the relationship between this new 17-subgroup classification system and the current ELN 2022 guidelines? In the Q&A session, Professor Turki clearly stated that the new model serves as a complement and refinement to the existing guidelines, not a replacement.
He stated, “Our analysis shows that for many patients, the ELN risk assessment remains very accurate and effective. However, our model provides additional value in certain aspects. For example, it successfully subdivides key groups like inv(16) and NPM1, and identifies new clusters related to IDH2 and myelodysplasia. Through multivariate analysis, we found that these new subgroups provide prognostic information independent of the ELN stratification. This indicates that the new model can serve as a powerful supplement to the ELN guidelines, helping clinicians perform more refined risk assessments and opening up new possibilities for future clinical trials of targeted therapies in specific genetic subgroups.”
Expert Dialogue and Future Outlook
During the interactive session of the conference, attending experts raised several key questions about the model’s details. When asked whether the model considered the clonal architecture of mutations or Variant Allele Frequency (VAF), Professor Turki acknowledged that the current model is based on binary data (presence or absence of mutations) and does not yet integrate VAF information. However, he believes that combining single-cell sequencing data with VAF information is a highly valuable direction for future exploration.
Regarding the high-risk subgroup of inv(16) with co-occurring FLT3 mutations, an expert inquired about the specific type of FLT3 mutation. Professor Turki responded that this subgroup included both Internal Tandem Duplication (ITD) and Tyrosine Kinase Domain (TKD) mutation types, with TKD mutations showing a more prominent association.
This research not only provides a new dimension to the biological understanding of AML but also showcases the immense potential of combining innovative unsupervised learning algorithms with large-scale, high-quality real-world data in handling complex medical information. Through this deep dive into the genetic heterogeneity of AML, the study not only refines prognostic prediction for individual patients but also provides a solid theoretical foundation for the future development of precision-targeted drugs and the implementation of personalized treatment strategies. True to its founding mission, the HARMONY Alliance, through open collaboration, enables the global sharing of data and intelligence, with its achievements ultimately propelling the entire field of hematologic oncology toward an era of true precision medicine. Source/Interview: Onco-look