SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Heuristic PCA Based Feature Extraction
and
Its Application to Bioinformatics
Y-h. Taguchi, Dept. Phys., Chuo Uinv.,
Y. Murakami, Grad. Sch. Med., Osaka City Univ.

M. Iwadate, Dept. Biol. Sci., Chuo Univ.
H. Umeyama, Dept. Biol. Sci., Chuo Univ.
A. Okamoto, Dept. Sch. Health Sci.,
Aichi Univ. Edu.
0. Why PCA?
PCA = principal component analysis
Motivation:
Unsupervised Feature Selection
How PCA?
10 Ordered
Features
90 random
Features

100 Features

20 samples
Class 1
Class 2
11111111110000000000
11111111110000000000
.
.
11111111110000000000
01000000110110011111
00011110000101011101
.
.
.
01000011000110101111
How to select 10 ordered features,
without classification information?
Embedding 100 features into 2D using PCA
90 random
Features

10 Ordered
Features
PC1 represents discrimination
between class 1 and class 2

Class 1

Class 2

20 samples
Applying “weak” unitary transformation to
the space spanned by 20 samples...
20 samples

20 samples
100 Features

Class 1 Class 2 10 Ordered
Features
90 random
Features

Class 1 Class 2
The same 2D embedding.
Thus we can select 10 features.

10 Ordered
Features

90 random
Features
PC1 “weakly” represents discrimination
between class 1 and class 2

Class 1

Class 2

20 samples
Linear discriminant analysis
+ leave one out cross validation
using 10 ordered features ….

True
class 1 2
Predict 1 8 2
228
Accuracy=Sensitivity=Specificity=80%

How about real examples?
1. Real example 1: Disease associated
aberrant promoter methylation
methylation
gene
promoter
three autoimmune diseases
SLE
RA
DM
[ MZ twins (healthy+sick) + 2 healthy controls] ✕ 5
= 20 samples → ✕3 diseases = 60 samples
vs ≈ 1000 potential methylation sites
Embedding of 〜1000 promoters within 20
RA samples into 2D with PCA (PC2 vs PC3)

PC3
Outlier promoters,
Selected

PC2
PC2:RA
Male Female
◯:Sick Twin
△:Healthy Twin
+:Healthy Control 1
☓:Healthy Control 2
Twins: Healthy > Sick
Controls: No
The 4th set: No
→ The reason why
unsupervised feature
selection is needed.

20 samples
Scatter plots between healthy/RA twins.
Red dots = selected promoters
Healthy twins
RA twins
P<2.2 ✕10

-16

-12
P=2.2✕10

-12
P=3.7✕10

P=3.9✕10

-1

P<2.2✕10

-16

Individual promoters are significantly aberrantly
methylated. Thus, feature selections are successful.
After repeating the same procedures to additional two
diseases (SLE and DM)....
Among three autoimmune diseases,
selected promoters are mostly common.

No other methods can achieve such an excellent
coincidence between three autoimmune diseases.
Lessons to learn:
Predefined class definition (e.g., 'sick
twin' vs 'healthy twin + two healthy
controls') is not a good strategy to
extract “important” features that can
exhibit much more complicated behavior
(e.g., upregulated for male while
downregulated for female)
Additional Remarks
Similar procedures were applied to
squamous cell carcinoma(*) and genes with
genotype-specific DNA methylation were
extracted. These genes were identified as
cancer-related genes using literature
searches and in silico drug screening was
performed for these genes (BMC Sys, Biol.
in press, to be presented at APBC2014).
(*) 食道がん
2. Real example 2: Circulating biomarker
findings for liver diseases
Why “circulating biomaker”?
→ non-invasive, thus less stresses.
Circulating = blood, etc
Target in this talk:
microRNAs in blood
→ microRNA is non-protein coding
RNA that regulates other transcript.
Data set: 14 diseases + healthy control
For example,
2D embeddings of 〜900 blood miRNAs using PCA
in 32 lung cancer + 70 healthy controls

PC2

10 outlier
miRNAs

PC1

However PC1 does
not exhibit clear
distinction between
lung
cancer
and
normal control any
more.... (not shown
here)
Prediction

Control vs Lung Cancer
LDA with PCA, leave one out cross validation
(using 10 miRNAs, up to the 5th PC)
True
control
lung cancer
control
56
8
lung cancer
14
24
Accuracy 0.784
Specificity 0.800
Sensitivity 0.750
Precision 0.632
What is the advantage of PCA based
feature extraction? → stability
Cross validation test (10 folds) of stability of
feature extraction (100 trials):
14 diseases vs normal control ✕ 10 miRNAs
= 140 miRNAs selected.
Ideally 140 miRNAs are always selected over
100 trials.
As a result, 129 out of 140 miRNAs are
selected by 100% probabilities.
Comparison of stabilities with other feature
extraction methods
UFF(*) : 111 out of 140 miRNAs
t-test based : 40 out of 140 miRNAs
SAM : 30 out of 140 miRNAs
gsMMD : 5 and 1 out of 140 miRNAs
RFE : 1 out of 140 miRNAs
ensemble RFE : 0 out of 140 miRNAs
(*) only another unsupervised FE
Lessons to learn:
Predefined class definition (e.g., 'sick
twin' vs 'healthy twin+two healthy
controls') is not a good strategy to
extract “stable” features. Too serious
consideration
of
classification
information may injure stability of
selected features.
Additional remarks:
10 miRNAs selected as biomarkers that
discriminate 14 diseases from normal control
were largely overlapped (every 10 miRNAs
were chosen from common 12 miRNAs).
In addition to this, these 12 miRNAs
discriminate seven additional diseases from
healthy controls, even using different
measuring methodology, samples and studies
(submitted).
3. Real example 3: Analysis of proteome
during bacterial incubation
Purpose :
Antibiotics are nothing but disaster of bacteria.
They try to kill not toxic bacteria and thus cause
resistance to drugs. If any other drugs that target
to proteins that are more specific to each bacteria
are targeted, it will be much better and effective.
In order to do this, at first, we need to know how
proteome
can
change
in
response
to
environmental changes.
Data set:
Two incubation conditions:
stable (normal) and shaking (oxidative stress)
Two fractions:
cellular and supernatant
Four time points:
From early to final through meddle growth phase
Three biological replicates.
In total:
2 ✕2 ✕4 ✕ 3 = 48 samples are available
2D embedding of 48 samples using PCA
Cellular

PC2
early
supernatant

PC1

late
supernatant
PCA embeddings of proteins
23 proteins selcted
(underlined are ribosomal ptoteins)

PC2
PC1

SPy1489:hlpA
SPy2039:speB
Spy1073:rplL
SPy2005
SPy2018:emm1
Spy0059:rpmC
Spy0611:tufA
Spy0274:plr
Spy0062:rplX
SPy2043:mf
Spy0613:tpi
Spy2079:AhpC
SPy1831:rpsF}
Spy2160:rpmG
SPy1373:ptsH
SPy0731:eno
Spy1371:gapN
Spy1881:pgk
SPy0711:speC
Spy0071:rpmD
SPy2070:groEL
Spy0019
SPy0712:mf2
using 23 proteins extracted via PCA

PC2
PC1
Lessons to learn:
Even if there are no criterion about what
kind of classifications are assumed,
unsupervised feature extraction can select
prominent features.
4. Discussion
Real example 1:
Commonly methylated promoters between three
autoimmune
diseases
were
found
by
unsupervised feature extraction.
Real example 2:
Stable circulating biomarkers were selected for
14
diseases
using
unsupervised
feature
extraction.
Real example 3:
Successful extraction of prominent features with
unsupervised feature extraction
Unsupervised feature extraction seems
to be the best method, however...
When does PCA based feature extraction work?
Is PCA based feature extraction the best?
Are there any other better unsupervised feature
extraction?
How can we evaluate unsupervised feature
extraction?
Are there any variables to be maximize?
I believe that people here
should be experts on this topics.
Help me....

Weitere ähnliche Inhalte

Was ist angesagt?

Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Philip Bourne
 
The Biophotonic Scanner, Supplementation & The "Cady White Paper"
The Biophotonic Scanner, Supplementation & The "Cady White Paper"The Biophotonic Scanner, Supplementation & The "Cady White Paper"
The Biophotonic Scanner, Supplementation & The "Cady White Paper"Louis Cady, MD
 
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...yuliamax
 
IncellDx Oncobreast 3Dx CSUPERB Poster
IncellDx Oncobreast 3Dx CSUPERB PosterIncellDx Oncobreast 3Dx CSUPERB Poster
IncellDx Oncobreast 3Dx CSUPERB PosterAmanda Chargin
 
Genomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a modelGenomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a modelJohn B. Cole, Ph.D.
 
Chimeric Antigen Receptors (paper with corresponding power point)
Chimeric Antigen Receptors (paper with corresponding power point)Chimeric Antigen Receptors (paper with corresponding power point)
Chimeric Antigen Receptors (paper with corresponding power point)Kevin B Hugins
 
Rna seq - PDX models
Rna seq - PDX models Rna seq - PDX models
Rna seq - PDX models Amitha Dasari
 
Meacho targeting
Meacho targetingMeacho targeting
Meacho targetingArun kumar
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesSean Ekins
 
Anticancer drug screening
Anticancer drug screeningAnticancer drug screening
Anticancer drug screeningshishirkawde
 
MDC Connects: Biomarker identification - Assessing Immune Function
MDC Connects: Biomarker identification - Assessing Immune FunctionMDC Connects: Biomarker identification - Assessing Immune Function
MDC Connects: Biomarker identification - Assessing Immune FunctionMedicines Discovery Catapult
 
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...Shannon Chesley
 
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX ModelsWhole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX ModelsTom Koch
 
2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, LeidenAlain van Gool
 
Reprogramming cellular identity
Reprogramming cellular identityReprogramming cellular identity
Reprogramming cellular identityCaleb Henderson
 

Was ist angesagt? (20)

Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases
 
Biomarkers
BiomarkersBiomarkers
Biomarkers
 
The Biophotonic Scanner, Supplementation & The "Cady White Paper"
The Biophotonic Scanner, Supplementation & The "Cady White Paper"The Biophotonic Scanner, Supplementation & The "Cady White Paper"
The Biophotonic Scanner, Supplementation & The "Cady White Paper"
 
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
 
2007 antiproliferative, cytotoxic and antitumour activity
2007 antiproliferative, cytotoxic and antitumour activity2007 antiproliferative, cytotoxic and antitumour activity
2007 antiproliferative, cytotoxic and antitumour activity
 
IncellDx Oncobreast 3Dx CSUPERB Poster
IncellDx Oncobreast 3Dx CSUPERB PosterIncellDx Oncobreast 3Dx CSUPERB Poster
IncellDx Oncobreast 3Dx CSUPERB Poster
 
Prevalence of Resistant Enzymes and Their Therapeutic Challenges
Prevalence of Resistant Enzymes and Their Therapeutic ChallengesPrevalence of Resistant Enzymes and Their Therapeutic Challenges
Prevalence of Resistant Enzymes and Their Therapeutic Challenges
 
Ames test
Ames testAmes test
Ames test
 
Genomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a modelGenomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a model
 
Chimeric Antigen Receptors (paper with corresponding power point)
Chimeric Antigen Receptors (paper with corresponding power point)Chimeric Antigen Receptors (paper with corresponding power point)
Chimeric Antigen Receptors (paper with corresponding power point)
 
Rna seq - PDX models
Rna seq - PDX models Rna seq - PDX models
Rna seq - PDX models
 
Meacho targeting
Meacho targetingMeacho targeting
Meacho targeting
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
 
Anticancer drug screening
Anticancer drug screeningAnticancer drug screening
Anticancer drug screening
 
MDC Connects: Biomarker identification - Assessing Immune Function
MDC Connects: Biomarker identification - Assessing Immune FunctionMDC Connects: Biomarker identification - Assessing Immune Function
MDC Connects: Biomarker identification - Assessing Immune Function
 
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
 
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX ModelsWhole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
 
2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden
 
Reprogramming cellular identity
Reprogramming cellular identityReprogramming cellular identity
Reprogramming cellular identity
 
Grant Proposal
Grant ProposalGrant Proposal
Grant Proposal
 

Ähnlich wie Heuristic PCA for Unsupervised Bioinformatics Feature Extraction

From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...Joaquin Dopazo
 
Clasificación de riesgo en renal metastásico
Clasificación de riesgo en renal metastásicoClasificación de riesgo en renal metastásico
Clasificación de riesgo en renal metastásicoMauricio Lema
 
Navigating through disease maps
Navigating through disease mapsNavigating through disease maps
Navigating through disease mapsJoaquin Dopazo
 
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCCCapacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCCMauricio Lema
 
Provenge (sipuleucel t)
Provenge (sipuleucel t)Provenge (sipuleucel t)
Provenge (sipuleucel t)Vinblast
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judesSean Ekins
 
Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...Gul Muneer
 
Personalized medicine
Personalized medicinePersonalized medicine
Personalized medicinecancerdrg
 
2013-11-14 NVKCL symposium, Utrecht
2013-11-14 NVKCL symposium, Utrecht2013-11-14 NVKCL symposium, Utrecht
2013-11-14 NVKCL symposium, UtrechtAlain van Gool
 
Provenge (Sipuleucel T)
Provenge (Sipuleucel T)Provenge (Sipuleucel T)
Provenge (Sipuleucel T)Cytokinine
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...European School of Oncology
 
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...KineMed, Inc.
 
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyCityAge
 
Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...Gerald Lushington
 
A comparative study using different measure of filteration
A comparative study using different measure of filterationA comparative study using different measure of filteration
A comparative study using different measure of filterationpurkaitjayati29
 
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...U.S. EPA Office of Research and Development
 
MET EV 2.pptx, metabolimics,genomics,approach
MET EV 2.pptx, metabolimics,genomics,approachMET EV 2.pptx, metabolimics,genomics,approach
MET EV 2.pptx, metabolimics,genomics,approachJyotshnaBolisetty
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverySean Ekins
 

Ähnlich wie Heuristic PCA for Unsupervised Bioinformatics Feature Extraction (20)

From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...
 
Clasificación de riesgo en renal metastásico
Clasificación de riesgo en renal metastásicoClasificación de riesgo en renal metastásico
Clasificación de riesgo en renal metastásico
 
Navigating through disease maps
Navigating through disease mapsNavigating through disease maps
Navigating through disease maps
 
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCCCapacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
 
Toxicokinetics
ToxicokineticsToxicokinetics
Toxicokinetics
 
Provenge (sipuleucel t)
Provenge (sipuleucel t)Provenge (sipuleucel t)
Provenge (sipuleucel t)
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judes
 
Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...
 
Personalized medicine
Personalized medicinePersonalized medicine
Personalized medicine
 
2013-11-14 NVKCL symposium, Utrecht
2013-11-14 NVKCL symposium, Utrecht2013-11-14 NVKCL symposium, Utrecht
2013-11-14 NVKCL symposium, Utrecht
 
Provenge (Sipuleucel T)
Provenge (Sipuleucel T)Provenge (Sipuleucel T)
Provenge (Sipuleucel T)
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
 
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
 
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
 
Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...
 
SRC TMCOS 2015 2
SRC TMCOS 2015 2SRC TMCOS 2015 2
SRC TMCOS 2015 2
 
A comparative study using different measure of filteration
A comparative study using different measure of filterationA comparative study using different measure of filteration
A comparative study using different measure of filteration
 
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
 
MET EV 2.pptx, metabolimics,genomics,approach
MET EV 2.pptx, metabolimics,genomics,approachMET EV 2.pptx, metabolimics,genomics,approach
MET EV 2.pptx, metabolimics,genomics,approach
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discovery
 

Mehr von Y-h Taguchi

Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...Y-h Taguchi
 
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明Y-h Taguchi
 
Tensor decomposition­based unsupervised feature extraction identified the un...
Tensor decomposition­based unsupervised  feature extraction identified the un...Tensor decomposition­based unsupervised  feature extraction identified the un...
Tensor decomposition­based unsupervised feature extraction identified the un...Y-h Taguchi
 
Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...Y-h Taguchi
 
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発Y-h Taguchi
 
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...Y-h Taguchi
 
Rectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics dataRectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics dataY-h Taguchi
 
テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択Y-h Taguchi
 
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索Y-h Taguchi
 
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定Y-h Taguchi
 
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定Y-h Taguchi
 
Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...Y-h Taguchi
 
microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...Y-h Taguchi
 
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...Y-h Taguchi
 
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析Y-h Taguchi
 
PCAを用いた2群の有意差検定
PCAを用いた2群の有意差検定PCAを用いた2群の有意差検定
PCAを用いた2群の有意差検定Y-h Taguchi
 
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...Y-h Taguchi
 
A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...Y-h Taguchi
 
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定Y-h Taguchi
 
Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Y-h Taguchi
 

Mehr von Y-h Taguchi (20)

Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...
 
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
 
Tensor decomposition­based unsupervised feature extraction identified the un...
Tensor decomposition­based unsupervised  feature extraction identified the un...Tensor decomposition­based unsupervised  feature extraction identified the un...
Tensor decomposition­based unsupervised feature extraction identified the un...
 
Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...
 
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
 
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
 
Rectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics dataRectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics data
 
テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択
 
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
 
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
 
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
 
Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...
 
microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...
 
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
 
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
 
PCAを用いた2群の有意差検定
PCAを用いた2群の有意差検定PCAを用いた2群の有意差検定
PCAを用いた2群の有意差検定
 
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...
 
A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...
 
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
 
Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...
 

Kürzlich hochgeladen

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 

Kürzlich hochgeladen (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 

Heuristic PCA for Unsupervised Bioinformatics Feature Extraction

  • 1. Heuristic PCA Based Feature Extraction and Its Application to Bioinformatics Y-h. Taguchi, Dept. Phys., Chuo Uinv., Y. Murakami, Grad. Sch. Med., Osaka City Univ. M. Iwadate, Dept. Biol. Sci., Chuo Univ. H. Umeyama, Dept. Biol. Sci., Chuo Univ. A. Okamoto, Dept. Sch. Health Sci., Aichi Univ. Edu.
  • 2. 0. Why PCA? PCA = principal component analysis Motivation: Unsupervised Feature Selection How PCA?
  • 3. 10 Ordered Features 90 random Features 100 Features 20 samples Class 1 Class 2 11111111110000000000 11111111110000000000 . . 11111111110000000000 01000000110110011111 00011110000101011101 . . . 01000011000110101111 How to select 10 ordered features, without classification information?
  • 4. Embedding 100 features into 2D using PCA 90 random Features 10 Ordered Features
  • 5. PC1 represents discrimination between class 1 and class 2 Class 1 Class 2 20 samples
  • 6. Applying “weak” unitary transformation to the space spanned by 20 samples... 20 samples 20 samples 100 Features Class 1 Class 2 10 Ordered Features 90 random Features Class 1 Class 2
  • 7. The same 2D embedding. Thus we can select 10 features. 10 Ordered Features 90 random Features
  • 8. PC1 “weakly” represents discrimination between class 1 and class 2 Class 1 Class 2 20 samples
  • 9. Linear discriminant analysis + leave one out cross validation using 10 ordered features …. True class 1 2 Predict 1 8 2 228 Accuracy=Sensitivity=Specificity=80% How about real examples?
  • 10. 1. Real example 1: Disease associated aberrant promoter methylation methylation gene promoter three autoimmune diseases SLE RA DM [ MZ twins (healthy+sick) + 2 healthy controls] ✕ 5 = 20 samples → ✕3 diseases = 60 samples vs ≈ 1000 potential methylation sites
  • 11. Embedding of 〜1000 promoters within 20 RA samples into 2D with PCA (PC2 vs PC3) PC3 Outlier promoters, Selected PC2
  • 12. PC2:RA Male Female ◯:Sick Twin △:Healthy Twin +:Healthy Control 1 ☓:Healthy Control 2 Twins: Healthy > Sick Controls: No The 4th set: No → The reason why unsupervised feature selection is needed. 20 samples
  • 13. Scatter plots between healthy/RA twins. Red dots = selected promoters Healthy twins RA twins P<2.2 ✕10 -16 -12 P=2.2✕10 -12 P=3.7✕10 P=3.9✕10 -1 P<2.2✕10 -16 Individual promoters are significantly aberrantly methylated. Thus, feature selections are successful. After repeating the same procedures to additional two diseases (SLE and DM)....
  • 14. Among three autoimmune diseases, selected promoters are mostly common. No other methods can achieve such an excellent coincidence between three autoimmune diseases.
  • 15. Lessons to learn: Predefined class definition (e.g., 'sick twin' vs 'healthy twin + two healthy controls') is not a good strategy to extract “important” features that can exhibit much more complicated behavior (e.g., upregulated for male while downregulated for female)
  • 16. Additional Remarks Similar procedures were applied to squamous cell carcinoma(*) and genes with genotype-specific DNA methylation were extracted. These genes were identified as cancer-related genes using literature searches and in silico drug screening was performed for these genes (BMC Sys, Biol. in press, to be presented at APBC2014). (*) 食道がん
  • 17. 2. Real example 2: Circulating biomarker findings for liver diseases Why “circulating biomaker”? → non-invasive, thus less stresses. Circulating = blood, etc Target in this talk: microRNAs in blood → microRNA is non-protein coding RNA that regulates other transcript.
  • 18. Data set: 14 diseases + healthy control For example, 2D embeddings of 〜900 blood miRNAs using PCA in 32 lung cancer + 70 healthy controls PC2 10 outlier miRNAs PC1 However PC1 does not exhibit clear distinction between lung cancer and normal control any more.... (not shown here)
  • 19. Prediction Control vs Lung Cancer LDA with PCA, leave one out cross validation (using 10 miRNAs, up to the 5th PC) True control lung cancer control 56 8 lung cancer 14 24 Accuracy 0.784 Specificity 0.800 Sensitivity 0.750 Precision 0.632
  • 20.
  • 21. What is the advantage of PCA based feature extraction? → stability Cross validation test (10 folds) of stability of feature extraction (100 trials): 14 diseases vs normal control ✕ 10 miRNAs = 140 miRNAs selected. Ideally 140 miRNAs are always selected over 100 trials. As a result, 129 out of 140 miRNAs are selected by 100% probabilities.
  • 22. Comparison of stabilities with other feature extraction methods UFF(*) : 111 out of 140 miRNAs t-test based : 40 out of 140 miRNAs SAM : 30 out of 140 miRNAs gsMMD : 5 and 1 out of 140 miRNAs RFE : 1 out of 140 miRNAs ensemble RFE : 0 out of 140 miRNAs (*) only another unsupervised FE
  • 23. Lessons to learn: Predefined class definition (e.g., 'sick twin' vs 'healthy twin+two healthy controls') is not a good strategy to extract “stable” features. Too serious consideration of classification information may injure stability of selected features.
  • 24. Additional remarks: 10 miRNAs selected as biomarkers that discriminate 14 diseases from normal control were largely overlapped (every 10 miRNAs were chosen from common 12 miRNAs). In addition to this, these 12 miRNAs discriminate seven additional diseases from healthy controls, even using different measuring methodology, samples and studies (submitted).
  • 25. 3. Real example 3: Analysis of proteome during bacterial incubation Purpose : Antibiotics are nothing but disaster of bacteria. They try to kill not toxic bacteria and thus cause resistance to drugs. If any other drugs that target to proteins that are more specific to each bacteria are targeted, it will be much better and effective. In order to do this, at first, we need to know how proteome can change in response to environmental changes.
  • 26. Data set: Two incubation conditions: stable (normal) and shaking (oxidative stress) Two fractions: cellular and supernatant Four time points: From early to final through meddle growth phase Three biological replicates. In total: 2 ✕2 ✕4 ✕ 3 = 48 samples are available
  • 27. 2D embedding of 48 samples using PCA Cellular PC2 early supernatant PC1 late supernatant
  • 28. PCA embeddings of proteins 23 proteins selcted (underlined are ribosomal ptoteins) PC2 PC1 SPy1489:hlpA SPy2039:speB Spy1073:rplL SPy2005 SPy2018:emm1 Spy0059:rpmC Spy0611:tufA Spy0274:plr Spy0062:rplX SPy2043:mf Spy0613:tpi Spy2079:AhpC SPy1831:rpsF} Spy2160:rpmG SPy1373:ptsH SPy0731:eno Spy1371:gapN Spy1881:pgk SPy0711:speC Spy0071:rpmD SPy2070:groEL Spy0019 SPy0712:mf2
  • 29. using 23 proteins extracted via PCA PC2 PC1
  • 30.
  • 31. Lessons to learn: Even if there are no criterion about what kind of classifications are assumed, unsupervised feature extraction can select prominent features.
  • 32. 4. Discussion Real example 1: Commonly methylated promoters between three autoimmune diseases were found by unsupervised feature extraction. Real example 2: Stable circulating biomarkers were selected for 14 diseases using unsupervised feature extraction. Real example 3: Successful extraction of prominent features with unsupervised feature extraction
  • 33. Unsupervised feature extraction seems to be the best method, however... When does PCA based feature extraction work? Is PCA based feature extraction the best? Are there any other better unsupervised feature extraction? How can we evaluate unsupervised feature extraction? Are there any variables to be maximize?
  • 34. I believe that people here should be experts on this topics. Help me....