VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
Sigma Xi 2021 Andrew Gao Presentation
1. Machine Learning and Bioinformatics Approach
Yields Noninvasive miRNA Biomarkers for Early Lung
Cancer Detection
Andrew Gao
1
2. Abstract
Non-small cell lung cancer (NSCLC), the most common type of lung cancer, affects
millions of people. In 2020, lung cancer caused 1.8 million deaths, partly because it is
difficult to diagnose lung cancer at early stages. Detecting cancer earlier results in better
survival. One potential method for early diagnosis, liquid biopsy, relies on biomarkers
present in body fluids. MicroRNAs (miRNAs) in blood could serve as biomarkers for
noninvasive diagnostic/prognostic tests. MiRNAs are RNA molecules that regulate the
expression of specific genes. Through differential expression analysis of four public
datasets, the present study identified 13 miRNAs that are consistently underexpressed
in the tissue, blood, and serum of NSCLC patients. Kaplan-Meier survival analysis found
that six miRNAs had statistically significant prognostic power (miR-140, miR-199a,
miR-29c, miR-320e, miR-103a, miR-526b). Functional enrichment analysis of the genes
targeted by the miRNAs demonstrated that they are involved in several hallmarks of
cancer, such as the epithelial to mesenchymal transition. A machine learning model was
constructed using microRNA expression data. Recursive feature elimination was
performed to select miRNAs with the greatest diagnostic value. Five classifiers were
tested on the selected miRNAs, with Random Forest and Logistic Regression performing
the best. A novel three-microRNA panel with 91.5% accuracy for NSCLC detection
was identified (miR-320e, miR-103a, miR-526b). These miRNAs also have significant
prognostic power for lung adenocarcinoma. The machine learning and analysis workflow
was adapted into an open-source online tool for automatic biomarker selection,
available at biomarkergenie.com. Future steps include experimental validation and clinical
trials. 2
3. Background
• 1.8 million deaths from lung cancer in 2018
– 5 year survival rate is only 21%
• My grandfather passed away at 67 due to
non-small cell lung cancer
• Hard to accurately diagnose lung cancer
– CT Scan
– Tissue Biopsy (follow up)
• Cells release molecules like DNA and RNA
into the bloodstream
• Drawing blood is noninvasive
• Different levels of biomarkers could
indicate cancer
– High levels of miR-1228-3p (Xue 2020)
3
Decreased gene expression
(degraded miRNA can’t be
translated into protein)
4. Statement of Purpose
4
Purpose: Identify non-invasive microRNA biomarkers for non-small cell lung
cancer (NSCLC) diagnostic and prognostic tests.
Criteria:
1. Must be found in blood
2. Consistently differentially expressed across studies
○ Account for high variance in results between studies
3. Statistically significant change
4. Ideally has biological relevance
Outline:
1. Find miRNAs that are reliably differently expressed in lung cancer
2. Find what pathways these miRNAs are involved in
3. Check if these miRNAs can predict survival
4. Identify best combination of miRNAs for diagnosing lung cancer
5. Materials:
● Computer
● R Studio
○ R
○ Limma
● Google Colab
● Atom code editor
● Python
○ Matplotlib
○ Seaborn
○ Scikit Learn
○ Streamlit
● Jvenn
● Heroku
● Github
● Gene Expression
Omnibus (GEO)
○ GSE137140
○ GSE93300
○ GSE94536
○ GSE53882
● The Cancer Genome Atlas (TCGA)
● Search Tool for the Retrieval of Interacting Proteins (STRING)
● Kaplan Meier Plotter (kmplot.com)
● Graphpad Prism 9
● Cytoscape
○ MCODE
● Gene Ontology
○ Panther
● miRWalk
● miRDB
● miRTarBase
● GeneCards
Data: miRNA expression profiling data
Control: Non-cancer people
Experimental: NSCLC patients
Procedure:
6. 6
Selected datasets and characteristics
Differential expression data
● Four datasets
were selected.
● In total, 1978
NSCLC and 1932
control samples.
The limma package in R was
used to calculate logFC and
p values (t-test).
cutoff: p<0.05
logFC: ratio of expression in
disease vs. controls
negative = underexpressed
positive = overexpressed
microRNA name ratio of expression other name
7. 7
Venn diagram of overlapping differentially
expressed microRNAs between datasets
Heatmap of logFC values of
each of the 13 microRNAs
across all datasets
13 microRNAs are
differentially expressed in
all four datasets (p<0.05).
All are underexpressed
(negative logFC).
8. 8
349 target genes are involved in many hallmarks of cancer:
● epithelial to mesenchymal transition
● SMAD protein phosphorylation
● heterochromatin
● miRNA silencing (impaired)
● transforming growth factor beta
Protein protein interaction (PPI) network of target genes
Highly interconnected
clusters of interacting genes
(using Cytoscape)
9. 9
Kaplan Meier Survival Analysis: miR-140, miR-199a, miR-29c, miR-320e, miR-103a,
and miR-526b can reliably predict patient survival outlook (p<0.05)
high expression = better survival (this makes sense)
Squamous Carcinoma
Adenocarcinoma
10. Can these miRNAs distinguish lung cancer and
healthy controls?
10
Input raw data
Stage characteristics of input data (GSE137140)
Most samples are Stage 1 (72%)
columns: microRNAs, target = label
( 1 for cancer, 0 for control)
Initial 3 component PCA using 13
miRNAs shows distinct separation
between lung cancer and healthy
63%
explained
variance
11. Which classifier works the best?
11
Extra Forest, Random Forest, and
Recursive Feature Elimination
(Logistic Regression) rank 13 miRNAs
by importance
● Generally agreed
Random Forest performs best
20% of data for training
80% for testing
Which miRNAs are most important?
Top 3: miR-320e, miR-103a, miR-526b
Chart of Feature Importances based on Extra Forest Ranker
12. 12
Confusion matrix of results
for top 3 microRNAs
10 fold cross validation on top 3 miRNAs:
Accuracy = 0.915 (stdev = 0.13)
10 fold cross validation on top 4 miRNAs:
Accuracy = 0.916 (stdev = 0.13)
(virtually identical)
Accuracy:
miR-320e: 82.5%
top 3: 91.6%
top 4: 92.6%
Which combination of miRNAs works the best?
14. Conclusion
Combined machine learning/bioinformatics approach identifies:
• 2 miRNA panel with 90% accuracy
– miR-320e + miR-103a
• 3 miRNA panel with 91.5% accuracy
– miR-320e + miR-103a + miR-526b
– prognostic biomarkers for squamous carcinoma
• 3 miRNA panel with 86% accuracy
– miR-140 + miR-199a + and miR-29c
– prognostic biomarkers for adenocarcinoma
• Potential “2 in 1” tests
Web tool:
• Speeds up exploratory analysis
• No setup needed
• Increases accessibility of machine learning
• Widely applicable to any disease or omics data 14
15. Conclusion
Advantages
• biomarkers were tested on majority Stage 1 patient data
• consistent across large sample size and across 4 studies
– methodology of this study accounts for variance between studies
• greater sampling flexibility (microRNAs are present in serum AND
plasma)
Limitations:
• small sample size for some data
• groups LUSC and LUAD together
• flaws with training data
– batch effect/confounding
Next steps
• Experimental validation
• Differences between LUAD and LUSC
• Clinical trials
• Add functionality to web tool
– Regression
Topics:
1. microRNAs in cancer
2. differential expression analysis
3. target gene prediction
4. functional enrichment analysis
5. machine learning classification
6. biomarkergenie.com
16. References
1. Press Release N° 263. (2018). In World Health Organization. International Agency for Report on Cancer.
2. American Cancer Society. Facts & Figures 2019. American Cancer Society. Atlanta, Ga. 2019. Howlader N, Noone AM, Krapcho M, Miller D,
Bishop K, Kosary CL, Yu M, Ruhl J, Tatalovich Z, Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds). SEER Cancer Statistics
Review, 1975-2014, National Cancer Institute. Bethesda, MD, https://seer.cancer.gov/csr/1975_2014/, based on November 2016 SEER data
submission, posted to the SEER web site, April 2017.
3. Heneghan, H. M., Miller, N., & Kerin, M. J. (2010). MiRNAs as biomarkers and therapeutic targets in cancer. Current Opinion in
Pharmacology, 10(5), 543–550.
4. Farazi, T. A., Spitzer, J. I., Morozov, P., & Tuschl, T. (2010). miRNAs in human cancer. The Journal of Pathology, 223(2), 102–115.
5. Ma, J., Lin, Y., Zhan, M., Mann, D. L., Stass, S. A., & Jiang, F. (2015). Differential miRNA expressions in peripheral blood mononuclear cells
for diagnosis of lung cancer. Laboratory Investigation, 95(10), 1197–1206.
6. Hennessey, P. T., Sanford, T., Choudhary, A., Mydlarz, W. W., Brown, D., Adai, A. T., Ochs, M. F., Ahrendt, S. A., Mambo, E., & Califano, J.
A. (2012). Serum microRNA Biomarkers for Detection of Non-Small Cell Lung Cancer. PLoS ONE, 7(2), e32307.
7. Shen, J., Todd, N. W., Zhang, H., Yu, L., Lingxiao, X., Mei, Y., Guarnera, M., Liao, J., Chou, A., Lu, C. L., Jiang, Z., Fang, H., Katz, R. L., &
Jiang, F. (2010). Plasma microRNAs as potential biomarkers for non-small-cell lung cancer. Laboratory Investigation, 91(4), 579–587.
8. Heneghan, H. M., Miller, N., Lowery, A. J., Sweeney, K. J., Newell, J., & Kerin, M. J. (2010). Circulating microRNAs as Novel Minimally
Invasive Biomarkers for Breast Cancer. Annals of Surgery, 251(3), 499–505.
9. Xue, W.-X., Zhang, M.-Y., Rui Li, Liu, X., Yin, Y.-H., & Qu, Y.-Q. (2020, July 7). Serum miR-1228-3p and miR-181a-5p as Noninvasive
Biomarkers for Non-Small Cell Lung Cancer Diagnosis and Prognosis. BioMed Research International.
10. Ying, Lisha, et al. "Development of a serum miRNA panel for detection of early stage non-small cell lung cancer." Proceedings of the National
Academy of Sciences 117.40 (2020): 25036-25042.
11. Chugh, P., & Dittmer, D. P. (2012). Potential pitfalls in microRNA profiling. Wiley Interdisciplinary Reviews: RNA, 3(5), 601–616.
https://doi.org/10.1002/wrna.1120
12. Tang, Gusheng, et al. "Different normalization strategies might cause inconsistent variation in circulating microRNAs in patients with
hepatocellular carcinoma." Medical science monitor: international medical journal of experimental and clinical research 21 (2015): 617.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4345856/
13. Kenny, Louise C., et al. "Novel biomarkers for pre-eclampsia detected using metabolomics and machine learning." Metabolomics 1.3 (2005):
227-234. https://link.springer.com/article/10.1007/s11306-005-0003-1
14. Huang, Yao, et al. "Serum microRNA panel excavated by machine learning as a potential biomarker for the detection of gastric cancer."
Oncology reports 39.3 (2018): 1338-1346. https://www.spandidos-publications.com/10.3892/or.2017.6163
16
Full list in notebook.
17. Acknowledgement
Thank you to my parents, my teachers, the Van
Allen lab, and my mentor Wendy Slijk for their
support!
17