Joining Separate Paradigms: Text Mining & Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus

•Als PPTX, PDF herunterladen•

2 gefällt mir•1,185 views

UVA Data Science Institute MSDS students Caitlin Dreisbach ('18), Morgan Wall ('18), and Ali Zaidi ('18) presented a talk based on their capstone research project, part of the MSDS program, at the 2018 Tom Tom Applied Machine Learning Conference in Charlottesville, Va. Learn more about the project at https://dsi.virginia.edu/projects/connecting-mind-and-body.

Daten & Analysen

Joining Separate Paradigms:
Text Mining & Deep Neural Networks
to Characterize Cell Type in the
Cortex and Hippocampus
C. Dreisbach, M. Wall, A. Zaidi | Candidates for Master’s in Data Science, Data Science Institute
A. Flower, PhD | Data Science Institute
C. Overall, PhD | Center for Brain Immunology & Glia (BIG)

Jonathan Kipnis, PhD
Jacalyn Huband, PhD

Gene Expression Profiles
Clustering
Cell Type
Identification

Retrieval Read QC Plot Normalize Split Cluster Save
5
cortex
hippocampus

Denoising Autoencoder (Tan, et al., 2016)
8
10
62
Cluster-level
scRNA-seq
data
Adding
random
noise
Encoding with
high and low
weights
Decode nodes to reconstruct
expression
92
𝒲 1,19804 …
𝑛

G1 G2 G3 G4 G5 G6
Node1 .07 .83 0 0 .40 .21
GO:0034613:
cellular protein
localization
GO:0046907:
intercellular transport
Topic 1 …
Topic 2 …
Topic 3 …
Term frequency
inverse document
frequency (TF-IDF)

TF*IDF Algorithm
Document 1 Document 2
TF*IDFScore = TFt,d log (N/DFt)
Word Term Frequency
in Document 1
# Documents
Appears In
IDF
Log(Ndocs/NdocsTerm)
TF*IDF
Score
Cat 1 2 0 0
The 2 2 0 0
Dog 1 1 0.301 0.301
Cat
Dog
The
The
The
The
Cat
Cat
Cat
The
The word
“dog” appears
the most
interesting as
it’s TF*IDF
Score is
0.301

HIPPOCAMPUS EXAMPLE
Biomarker
labeled
microglia
Immune
system
process
Apoptotic
process
Catabolic
process
“immun[e]”
“necrosi[s]”
“innat[e]”
“nf[kb]"
Cx3cr1
AIF1
Ppr2r1a
Biomarker
Intersecting
gene across
nodes

Selected References
Herculano-Housel, S. (2009). The Human Brain in Numbers: A
Linearly Scaled-Up Primate Brain. Frontiers in Human Neuroscience,
3, 31.
Lun, A., McCarthy, D. and Marioni, J. (2016). A step-by-step workflow
for low-level analysis of single-cell RNA-seq data [version 1; referees:
5 approved with reservations]. F1000 Research, 5,2122.
(doi:10.12688/f1000research.9501.1).
Tan, J., Hammond, J., Hogan, D., & Greene, C. (2016). ADAGE-
Based Integration of Publicly Available Pseudomonas aeruginosa
Gene Expression Data with Denoising Autoencoders Illuminates
Microbe-Host Interactions. mSystems, 1(1) e00025-15. DOI:
10.1128/mSystems.00025-15
Zeisel A, Muñoz-Manchado AB, Codeluppi S, et al.: (2015). Brain
structure. Cell types in the mouse cortex and hippocampus revealed
by single-cell RNA-seq. Science, 347(6226): 1138–1142
13

Questions?
cnd2y, mkw5ck, saz8zj @virginia.edu

Weitere ähnliche Inhalte

Was ist angesagt?

DNA Encryption Algorithms: Scope and Challenges in Symmetric Key CryptographyAM Publications

DNA memoriesHoda msw

HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERcscpconf

Enhanced Level of Security using DNA Computing Technique with Hyperelliptic C...IDES Editor

Text-mining practicalLars Juhl Jensen

Dna in forensicsBlancoScience

Data analysis & integration challenges in genomicsmikaelhuss

Ack,abs,con,concl,refsyed Farhan Rizvi

Data Storage in DNASourabh Chalotra

Dna computingrakeshpal_rk

A comparative review on symmetric and asymmetric DNA-based cryptographyjournalBEEI

A NEW APPROACH TOWARDS INFORMATION SECURITY BASED ON DNA CRYPTOGRAPHY Abhishek Majumdar

A Study on DNA based Computation and Memory DevicesEditor IJCATR

Detecting STR Peaks in Degraded DNA samplesEmanuela Marasco

VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer

Genome Big DataAdrian Baez-Ortega

Dna pptjakkadurgapraveena

How novel compute technology transforms life science researchDenis C. Bauer

Review of CRISPR/Cas9Hub_lot

Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim PoterbaDatabricks

Was ist angesagt? (20)

DNA Encryption Algorithms: Scope and Challenges in Symmetric Key Cryptography

DNA memories

HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER

Enhanced Level of Security using DNA Computing Technique with Hyperelliptic C...

Text-mining practical

Dna in forensics

Data analysis & integration challenges in genomics

Ack,abs,con,concl,ref

Data Storage in DNA

Dna computing

A comparative review on symmetric and asymmetric DNA-based cryptography

A NEW APPROACH TOWARDS INFORMATION SECURITY BASED ON DNA CRYPTOGRAPHY

A Study on DNA based Computation and Memory Devices

Detecting STR Peaks in Degraded DNA samples

VariantSpark: applying Spark-based machine learning methods to genomic inform...

Genome Big Data

Dna ppt

How novel compute technology transforms life science research

Review of CRISPR/Cas9

Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba

Ähnlich wie Joining Separate Paradigms: Text Mining & Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus

Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin

Data submissions and archiving raw data in life sciences. A pilot with Proteo...Rafael C. Jimenez

DNA CHIPS AND MICROARRAY.pptxShabnum

A tutorial in Connectome Analysis (3) - Marcus KaiserLake Como School of Advanced Studies

20140613 Analysis of High Throughput DNA Methylation ProfilingYi-Feng Chang

Howard University: Center for Computational Biology and Bioinformaticskarl.barnes

SfN 2015 - Anil Sharma - Genetic tools to study sensory motor circuits FINALAnil Sharma

CRISPR PROJECT.pptxAcSni

Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Databricks

Using field-based DNA sequencing to accelerate phylogenomicsJoe Parker

Large scale machine learning challenges for systems biologyMaté Ongenaert

A Study of Deep Learning Applicationsijtsrd

Databases_CSS2.pptxSilpa87

The role of machine learning in modelling the cellbutest

Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...Kevin Mader

'Stories that persuade with data' - talk at CENDI meeting January 9 2014Anita de Waard

scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook

Expressed sequence tag (EST), molecular markerKAUSHAL SAHU

Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong

Thesis biobixProf. Wim Van Criekinge

Ähnlich wie Joining Separate Paradigms: Text Mining & Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus (20)

Open pacbiomodelorgpaper j_landolin_20150121

Data submissions and archiving raw data in life sciences. A pilot with Proteo...

DNA CHIPS AND MICROARRAY.pptx

A tutorial in Connectome Analysis (3) - Marcus Kaiser

20140613 Analysis of High Throughput DNA Methylation Profiling

Howard University: Center for Computational Biology and Bioinformatics

SfN 2015 - Anil Sharma - Genetic tools to study sensory motor circuits FINAL

CRISPR PROJECT.pptx

Building Genomic Data Processing and Machine Learning Workflows Using Apache ...

Using field-based DNA sequencing to accelerate phylogenomics

Large scale machine learning challenges for systems biology

A Study of Deep Learning Applications

Databases_CSS2.pptx

The role of machine learning in modelling the cell

Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...

'Stories that persuade with data' - talk at CENDI meeting January 9 2014

scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017

Expressed sequence tag (EST), molecular marker

Scott Edmunds: Data publication in the data deluge

Thesis biobix

Mehr von Melissa Moody

Connecting citizens with public data to drive policy changeMelissa Moody

Data Collection Methods for Building a Free Response Training SimulationMelissa Moody

Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody

Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Melissa Moody

Automatic detection of online abuse and analysis of problematic users in wiki...Melissa Moody

Plans for the University of Virginia School of Data ScienceMelissa Moody

Wikimedia Foundation, Trust & Safety: Cyber Harassment Classification and Pre...Melissa Moody

Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in De...Melissa Moody

Collective Biographies of Women: A Deep Learning Approach to Paragraph Annota...Melissa Moody

Introduction to Social Network AnalysisMelissa Moody

Ethical Priniciples for the All Data RevolutionMelissa Moody

Steering Model Selection with Visual DiagnosticsMelissa Moody

Assessing the reproducibility of DNA microarray studiesMelissa Moody

Modeling the Impact of R & Python Packages: Dependency and Contributor NetworksMelissa Moody

How to Beat the House: Predicting Football Results with Hyperparameter Optimi...Melissa Moody

A Modified K-Means Clustering Approach to Redrawing US Congressional DistrictsMelissa Moody

Mehr von Melissa Moody (16)

Connecting citizens with public data to drive policy change

Data Collection Methods for Building a Free Response Training Simulation

Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...

Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...

Automatic detection of online abuse and analysis of problematic users in wiki...

Plans for the University of Virginia School of Data Science

Wikimedia Foundation, Trust & Safety: Cyber Harassment Classification and Pre...

Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in De...

Collective Biographies of Women: A Deep Learning Approach to Paragraph Annota...

Introduction to Social Network Analysis

Ethical Priniciples for the All Data Revolution

Steering Model Selection with Visual Diagnostics

Assessing the reproducibility of DNA microarray studies

Modeling the Impact of R & Python Packages: Dependency and Contributor Networks

How to Beat the House: Predicting Football Results with Hyperparameter Optimi...

A Modified K-Means Clustering Approach to Redrawing US Congressional Districts

Kürzlich hochgeladen

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Discover Why Less is More in B2B Researchmichael115558

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Invezz.com - Grow your wealth with trading signalsInvezz1

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Kürzlich hochgeladen (20)

Ravak dropshipping via API with DroFx.pptx

100-Concepts-of-AI by Anupama Kate .pptx

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

BabyOno dropshipping via API with DroFx.pptx

Discover Why Less is More in B2B Research

CebaBaby dropshipping via API with DroFX.pptx

Invezz.com - Grow your wealth with trading signals

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Generative AI on Enterprise Cloud with NiFi and Milvus

Joining Separate Paradigms: Text Mining & Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus

1. Joining Separate Paradigms: Text Mining & Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus C. Dreisbach, M. Wall, A. Zaidi | Candidates for Master’s in Data Science, Data Science Institute A. Flower, PhD | Data Science Institute C. Overall, PhD | Center for Brain Immunology & Glia (BIG)

2. Jonathan Kipnis, PhD Jacalyn Huband, PhD

4. Gene Expression Profiles Clustering Cell Type Identification

5. Retrieval Read QC Plot Normalize Split Cluster Save 5 cortex hippocampus

6. 6 CORTEX

7. 7 HIPPOCAMPUS

8. Denoising Autoencoder (Tan, et al., 2016) 8 10 62 Cluster-level scRNA-seq data Adding random noise Encoding with high and low weights Decode nodes to reconstruct expression 92 𝒲 1,19804 … 𝑛

9. G1 G2 G3 G4 G5 G6 Node1 .07 .83 0 0 .40 .21 GO:0034613: cellular protein localization GO:0046907: intercellular transport Topic 1 … Topic 2 … Topic 3 … Term frequency inverse document frequency (TF-IDF)

10. TF*IDF Algorithm Document 1 Document 2 TF*IDFScore = TFt,d log (N/DFt) Word Term Frequency in Document 1 # Documents Appears In IDF Log(Ndocs/NdocsTerm) TF*IDF Score Cat 1 2 0 0 The 2 2 0 0 Dog 1 1 0.301 0.301 Cat Dog The The The The Cat Cat Cat The The word “dog” appears the most interesting as it’s TF*IDF Score is 0.301

11. HIPPOCAMPUS EXAMPLE Biomarker labeled microglia Immune system process Apoptotic process Catabolic process “immun[e]” “necrosi[s]” “innat[e]” “nf[kb]" Cx3cr1 AIF1 Ppr2r1a Biomarker Intersecting gene across nodes

12. HIPPOCAMPUS Themes

13. Selected References Herculano-Housel, S. (2009). The Human Brain in Numbers: A Linearly Scaled-Up Primate Brain. Frontiers in Human Neuroscience, 3, 31. Lun, A., McCarthy, D. and Marioni, J. (2016). A step-by-step workflow for low-level analysis of single-cell RNA-seq data [version 1; referees: 5 approved with reservations]. F1000 Research, 5,2122. (doi:10.12688/f1000research.9501.1). Tan, J., Hammond, J., Hogan, D., & Greene, C. (2016). ADAGE- Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions. mSystems, 1(1) e00025-15. DOI: 10.1128/mSystems.00025-15 Zeisel A, Muñoz-Manchado AB, Codeluppi S, et al.: (2015). Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science, 347(6226): 1138–1142 13

14. Questions? cnd2y, mkw5ck, saz8zj @virginia.edu

Hinweis der Redaktion

CAIT: Hello fellow TomTom’ers! My name is Cait Dreisbach, and this is Morgan Wall and Ali Zaidi. We have been fortunate to spend the last year as a members of the Data Science Institute where, in 37 short days, not that we are counting, we will graduate with a Master’s in Data Science. As a recipient of the degree, each student takes on a capstone project with 2-3 other students. Today we will be presenting on our work titled, “Joining Separate Paradigms: Text Mining and Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus.” We have been supported by our advising team, Dr. Abigail Flower from the DSI, and Dr. Chris Overall from the Center for Brain Immunology and Glia.
CAIT: Our group has been generously supported by several organizations including – the Data Science Institute at the University of Virginia which I know is well-represented in this audience today! Laboratory of Dr. Joni Kipnis which is credited with finding the physical connection between the brain and the immune system. And the Advanced Research Computing Services for the use of Rivanna, UVA’s high-performance computing cluster. Finally, we’d like to thank TomTom for letting us spread our wings today and get a chance to discuss our work!
CAIT: Now, I want to you look at this adorable mouse and put yourself back into your 7th grade classroom. Typically when we give this presentation to data-oriented folks, the connection to biology leaves listeners confusion. What we want to do today is to leverage your 7th grade brain – specifically during the time you learned about cells! We know that organisms are made up of tissues which are just groups of cells that work together to perform a certain task. You’ve probably seen cells that look like this, or this… The goals of this research project was to label their cell type and identify their specific function.
CAIT: Traditionally, we take an organism and extract a collection of cells. Procedures are used to collect a large number of cells, BUT, revolutionary progress in biotechnology now allows us to separate out a single cell. From that single cell, we can unravel the DNA to better understand how specific genes are expression. Expression of genes tells scientists what a cell is doing at any given time. Typical analytic methods include clustering which helps us to determine cell type. This is where we come in with the goal of leveraging data science methods, including autoencoding and text mining, to better refine the otherwise somewhat subjective measures of identifying cell type and function.
MORGAN: The initial part of our capstone included a substantial data processing workflow including sourcing of our candidate mouse-model dataset published in Science in 2015. We downloaded the publically-available data from the Gene Expression Omnibus data repository run by NCBI. Next, we read the data into the Bioconductor package in R, the most popular bioinformatics software available, to perform quality control and normalization procedures. Normalization is the systematic removal of non-natural variation in the experiments. After normalizing the entire dataset, we split the data into the 2 major regions of the brain represented by the data, the cortex and the hippocampus. At this point, you can think of the data as 2 separate CSV files with almost 20,000 gene columns and several thousand rows of single cell observations.
MORGAN:
MORGAN: ***HIGHLIGH
CAIT: Our first approach was to use a denosing autoencoder to isolate candidate genes in the cortex. An DAE is an algorithm that aims to discover more robust features whichs prevent it from simply learning the original identity. We trained the autoencoder to reconstruct the input from a corrupted version of it. The corrupted version is created by adding random noise and then the genes are weighted based on their gene expression in each cell. Genes that have a mean expression level above 2.5 standard deviations are labeled as high weight. Using a non-linear sigmoid function, we were able to isolate the features from just under 20,000 genes to a matrix of less than 5,000.
ALI
ALI
ALI
MORGAN FUTURE DIRECTIONS
MORGAN
MORGAN

Joining Separate Paradigms: Text Mining & Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Joining Separate Paradigms: Text Mining & Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus

Ähnlich wie Joining Separate Paradigms: Text Mining & Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus (20)

Mehr von Melissa Moody

Mehr von Melissa Moody (16)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Joining Separate Paradigms: Text Mining & Deep Neural Networks to Characterize Cell Type in the Cortex and Hippocampus

Hinweis der Redaktion