Infinite Latent Process Decomposition

•Als PPTX, PDF herunterladen•

0 gefällt mir•311 views

Tomonari Masada

A topic model for analyzing microarray data

Technologie Gesundheit & Medizin

infinite Latent Process
Decomposition
Tomonari MASADA (正田備也)
tomonari.masada@gmail.com
Nagasaki University (長崎大學)

From array data
extract gene clusters
sample-by-sample
[Intuition]
Different samples may
show different
groupings of
gene expressionsProblem

Neither gene clustering
nor sample clustering
Clustering of
gene-sample pairs
What
We
Do

LPD [Rogers et al. 05]
Latent
Process
Decomposition
• Bayesian modeling
• Assignment of each
gene-sample pair
to a process
process = cluster
Previous
Work

[Ying et al. 08]
• K (# processes) should
be given as an input.
• LPD is inefficient
when K is large.
In many cases,
we don’t know
optimal K. Weakness

iLPD
infinite
Latent
Process
Decomposition
• Bayesian nonparametrics
(K  ∞)
Our
New
Method

• K can be truncated.
(K∞ only theoretically.)
• Memory size is fixed.
• Parallelization is easy.
• K can be set
with little thought. Merits

Model
Details
γtruncatedGEM~
,1
1
1
k
l lkk πππ
απd Dirichlet~
γγ,baGamma~
αα,baGamma~
,1Beta~ ,γk
ρρ,baGamma~
,ρμgk 0Gauss~
00Gamma~ ,bagk
dgdg gzgzdg ,λμx Gauss~
ddg θz Multi~
Kk ,,1 

Collapsed Variational
Bayesian Inference
○ Fixed memory size
○ Easy parallelization
× Special function evaluation
– digamma, trigamma, tetragamma functions
Inference
(CVB)

Experiment
http://www.gems-system.org/
Dataset name Sample Gene Diagnostic Task
11_Tumors 174 12,534 11 various human tumor types
14_Tumors 308 15,010
14 various human tumor types and
12 normal tissue types
9_Tumors 60 5,727 9 various human tumor types
Brain_Tumor1 90 5,921 5 human brain tumor types
Brain_Tumor2 50 10,368 4 malignant glioma types
Leukemia1 72 5,328 AML, ALL B-cell, and ALL T-cell
Leukemia2 72 11,226 AML, ALL, and mixed-lineage leukemia (MLL)
Lung_Cancer 203 12,601 4 lung cancer types and normal tissues
SRBCT 83 2,309 Small, round blue cell tumors (SRBCT) of childhood
Prostate_Tumor 102 10,510 Prostate tumor and normal tissues
DLBCL 77 5,470 DLBCL and follicular lymphomas

• Compare iLPD with
LPD [Ying et al. 08]
• Train iLPD on
90% randomly selected data
• Evaluate posterior density
at 10% test data and
calculate geometric mean
• Average over 25 runs Evaluation

• iLPD is more efficient
for a large K than LPD.
• There is a dataset that
is not well analyzed.
–LPD-type methods may
not be a panacea.
Cf. BMC Bioinformatics 2010, 11:552
– Nonparametric Bayesian method based on
Indian Buffet Processes
Results

• Practical evaluation
• Result interpretation
• GPGPU acceleration
• Visualization
Future
Work

0.270
0.280
0.290
0.300
10 processes 20 processes 40 processes
iLPD LPD
Brain_Tumor1

0.225
0.235
0.245
0.255
10 processes 20 processes 40 processes
iLPD LPD
Brain_Tumor2

0.250
0.260
0.270
0.280
10 processes 20 processes 40 processes
iLPD LPD
DLBCL

0.230
0.240
0.250
0.260
10 processes 20 processes 40 processes
iLPD LPD
Leukemia1

0.300
0.310
0.320
0.330
10 processes 20 processes 40 processes
iLPD LPD
Leukemia2

0.340
0.345
0.350
0.355
0.360
10 processes 20 processes 40 processes
iLPD LPD
Lung_Cancer

0.425
0.445
0.465
0.485
10 processes 20 processes 40 processes
iLPD LPD
Prostate_Tumor

0.230
0.240
0.250
0.260
0.270
0.280
10 processes 20 processes 40 processes
iLPD LPD
SRBCT

0.305
0.310
0.315
10 processes 20 processes 40 processes
iLPD LPD
11_Tumors

0.470
0.480
0.490
0.500
10 processes 20 processes 40 processes
iLPD LPD
14_Tumors

0.140
0.150
0.160
0.170
0.180
0.190
10 processes 20 processes 40 processes
iLPD LPD
9_Tumors

Empfohlen

Leveraged Gaussian ProcessSungjoon Choi

LevDNNSungjoon Choi

[PR12] Spectral Normalization for Generative Adversarial NetworksJaeJun Yoo

Finance 2020: Designing a Finance function to meet new demandsDeloitte Canada

Best in Class Finance Transformation - Best Practices for the Finance FunctionProformative, Inc.

Phinney 2019 ASMS Proteome software Users group TalkUC Davis

Search-based testing of procedural programs:iterative single-target or multi-...Vrije Universiteit Brussel

BSSML17 - EnsemblesBigML, Inc

Empfohlen

Leveraged Gaussian ProcessSungjoon Choi

LevDNNSungjoon Choi

[PR12] Spectral Normalization for Generative Adversarial NetworksJaeJun Yoo

Finance 2020: Designing a Finance function to meet new demandsDeloitte Canada

Best in Class Finance Transformation - Best Practices for the Finance FunctionProformative, Inc.

Phinney 2019 ASMS Proteome software Users group TalkUC Davis

Search-based testing of procedural programs:iterative single-target or multi-...Vrije Universiteit Brussel

BSSML17 - EnsemblesBigML, Inc

JavaDayKiev'15 Java in production for Data Mining Research projectsAlexey Zinoviev

Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...Xavier Llorà

BlinkdbNitish Upreti

Understanding deep learning requires rethinking generalizationJamie Seol

Building useful models for imbalanced datasets (without resampling)Greg Landrum

Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...Seunghyun Hwang

Deep Convolutional GANs - meaning of latent spaceHansol Kang

VSSML17 L2. Ensembles and Logistic RegressionsBigML, Inc

Icann2018ppt finalDebasmit Das

PhD Defense SlidesDebasmit Das

李俊良/Feature Engineering in Machine Learning台灣資料科學年會

Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...adil raja

Implementation of linear regression and logistic regression on SparkDalei Li

20050831#lab conference#김진성Yonsei University College of Medicine

Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...WiMLDSMontreal

2015 pag-metagenomec.titus.brown

In the age of Big Data, what role for Software Engineers?CS, NcState

JDD2015: Thorny path to Data Mining projects - Alexey Zinoviev PROIDEA

Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak

Learning Latent Space Energy Based Prior Modelの解説Tomonari Masada

Denoising Diffusion Probabilistic Modelsの重要な式の解説Tomonari Masada

Weitere ähnliche Inhalte

Ähnlich wie Infinite Latent Process Decomposition

JavaDayKiev'15 Java in production for Data Mining Research projectsAlexey Zinoviev

Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...Xavier Llorà

BlinkdbNitish Upreti

Understanding deep learning requires rethinking generalizationJamie Seol

Building useful models for imbalanced datasets (without resampling)Greg Landrum

Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...Seunghyun Hwang

Deep Convolutional GANs - meaning of latent spaceHansol Kang

VSSML17 L2. Ensembles and Logistic RegressionsBigML, Inc

Icann2018ppt finalDebasmit Das

PhD Defense SlidesDebasmit Das

李俊良/Feature Engineering in Machine Learning台灣資料科學年會

Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...adil raja

Implementation of linear regression and logistic regression on SparkDalei Li

20050831#lab conference#김진성Yonsei University College of Medicine

Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...WiMLDSMontreal

2015 pag-metagenomec.titus.brown

In the age of Big Data, what role for Software Engineers?CS, NcState

JDD2015: Thorny path to Data Mining projects - Alexey Zinoviev PROIDEA

Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak

Ähnlich wie Infinite Latent Process Decomposition (20)

JavaDayKiev'15 Java in production for Data Mining Research projects

Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...

Blinkdb

Understanding deep learning requires rethinking generalization

Building useful models for imbalanced datasets (without resampling)

Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...

Deep Convolutional GANs - meaning of latent space

VSSML17 L2. Ensembles and Logistic Regressions

Icann2018ppt final

PhD Defense Slides

李俊良/Feature Engineering in Machine Learning

Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...

Implementation of linear regression and logistic regression on Spark

20050831#lab conference#김진성

Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...

2015 pag-metagenome

In the age of Big Data, what role for Software Engineers?

JDD2015: Thorny path to Data Mining projects - Alexey Zinoviev

Optimization of Continuous Queries in Federated Database and Stream Processin...

Mehr von Tomonari Masada

Learning Latent Space Energy Based Prior Modelの解説Tomonari Masada

Denoising Diffusion Probabilistic Modelsの重要な式の解説Tomonari Masada

Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada

A note on the density of Gumbel-softmaxTomonari Masada

トピックモデルの基礎と応用Tomonari Masada

Expectation propagation for latent Dirichlet allocationTomonari Masada

Mini-batch Variational Inference for Time-Aware Topic ModelingTomonari Masada

A note on variational inference for the univariate GaussianTomonari Masada

Document Modeling with Implicit Approximate Posterior DistributionsTomonari Masada

LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionTomonari Masada

A Note on ZINB-VAETomonari Masada

A Note on Latent LSTM AllocationTomonari Masada

A Note on TopicRNNTomonari Masada

Topic modeling with Poisson factorization (2)Tomonari Masada

Poisson factorizationTomonari Masada

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelTomonari Masada

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada

Word count in Husserliana Volumes 1 to 28Tomonari Masada

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada

FDSE2015Tomonari Masada

Mehr von Tomonari Masada (20)

Learning Latent Space Energy Based Prior Modelの解説

Denoising Diffusion Probabilistic Modelsの重要な式の解説

Context-dependent Token-wise Variational Autoencoder for Topic Modeling

A note on the density of Gumbel-softmax

トピックモデルの基礎と応用

Expectation propagation for latent Dirichlet allocation

Mini-batch Variational Inference for Time-Aware Topic Modeling

A note on variational inference for the univariate Gaussian

Document Modeling with Implicit Approximate Posterior Distributions

LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition

A Note on ZINB-VAE

A Note on Latent LSTM Allocation

A Note on TopicRNN

Topic modeling with Poisson factorization (2)

Poisson factorization

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Word count in Husserliana Volumes 1 to 28

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

FDSE2015

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

How to convert PDF to text with Nanonetsnaman860154

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Slack Application Development 101 Slidespraypatel2

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Histor y of HAM Radio presentation slidevu2urc

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

IAC 2024 - IA Fast Track to Search Focused AI Solutions

How to convert PDF to text with Nanonets

Powerful Google developer tools for immediate impact! (2023-24 C)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Slack Application Development 101 Slides

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Histor y of HAM Radio presentation slide

Advantages of Hiring UIUX Design Service Providers for Your Business

Finology Group – Insurtech Innovation Award 2024

Scaling API-first – The story of a global engineering organization

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

CNv6 Instructor Chapter 6 Quality of Service

GenCyber Cyber Security Day Presentation

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Breaking the Kubernetes Kill Chain: Host Path Mount

Axa Assurance Maroc - Insurer Innovation Award 2024

Infinite Latent Process Decomposition

1. infinite Latent Process Decomposition Tomonari MASADA (正田備也) tomonari.masada@gmail.com Nagasaki University (長崎大學)

2. From array data extract gene clusters sample-by-sample [Intuition] Different samples may show different groupings of gene expressionsProblem

3. Neither gene clustering nor sample clustering Clustering of gene-sample pairs What We Do

4. LPD [Rogers et al. 05] Latent Process Decomposition • Bayesian modeling • Assignment of each gene-sample pair to a process process = cluster Previous Work

5. [Ying et al. 08] • K (# processes) should be given as an input. • LPD is inefficient when K is large. In many cases, we don’t know optimal K. Weakness

6. iLPD infinite Latent Process Decomposition • Bayesian nonparametrics (K  ∞) Our New Method

7. • K can be truncated. (K∞ only theoretically.) • Memory size is fixed. • Parallelization is easy. • K can be set with little thought. Merits

8. Model Details γtruncatedGEM~ ,1 1 1 k l lkk πππ απd Dirichlet~ γγ,baGamma~ αα,baGamma~ ,1Beta~ ,γk ρρ,baGamma~ ,ρμgk 0Gauss~ 00Gamma~ ,bagk dgdg gzgzdg ,λμx Gauss~ ddg θz Multi~ Kk ,,1 

9. Collapsed Variational Bayesian Inference ○ Fixed memory size ○ Easy parallelization × Special function evaluation – digamma, trigamma, tetragamma functions Inference (CVB)

10. Experiment http://www.gems-system.org/ Dataset name Sample Gene Diagnostic Task 11_Tumors 174 12,534 11 various human tumor types 14_Tumors 308 15,010 14 various human tumor types and 12 normal tissue types 9_Tumors 60 5,727 9 various human tumor types Brain_Tumor1 90 5,921 5 human brain tumor types Brain_Tumor2 50 10,368 4 malignant glioma types Leukemia1 72 5,328 AML, ALL B-cell, and ALL T-cell Leukemia2 72 11,226 AML, ALL, and mixed-lineage leukemia (MLL) Lung_Cancer 203 12,601 4 lung cancer types and normal tissues SRBCT 83 2,309 Small, round blue cell tumors (SRBCT) of childhood Prostate_Tumor 102 10,510 Prostate tumor and normal tissues DLBCL 77 5,470 DLBCL and follicular lymphomas

11. • Compare iLPD with LPD [Ying et al. 08] • Train iLPD on 90% randomly selected data • Evaluate posterior density at 10% test data and calculate geometric mean • Average over 25 runs Evaluation

12. • iLPD is more efficient for a large K than LPD. • There is a dataset that is not well analyzed. –LPD-type methods may not be a panacea. Cf. BMC Bioinformatics 2010, 11:552 – Nonparametric Bayesian method based on Indian Buffet Processes Results

13. • Practical evaluation • Result interpretation • GPGPU acceleration • Visualization Future Work

14. 0.270 0.280 0.290 0.300 10 processes 20 processes 40 processes iLPD LPD Brain_Tumor1

15. 0.225 0.235 0.245 0.255 10 processes 20 processes 40 processes iLPD LPD Brain_Tumor2

16. 0.250 0.260 0.270 0.280 10 processes 20 processes 40 processes iLPD LPD DLBCL

17. 0.230 0.240 0.250 0.260 10 processes 20 processes 40 processes iLPD LPD Leukemia1

18. 0.300 0.310 0.320 0.330 10 processes 20 processes 40 processes iLPD LPD Leukemia2

19. 0.340 0.345 0.350 0.355 0.360 10 processes 20 processes 40 processes iLPD LPD Lung_Cancer

20. 0.425 0.445 0.465 0.485 10 processes 20 processes 40 processes iLPD LPD Prostate_Tumor

21. 0.230 0.240 0.250 0.260 0.270 0.280 10 processes 20 processes 40 processes iLPD LPD SRBCT

22. 0.305 0.310 0.315 10 processes 20 processes 40 processes iLPD LPD 11_Tumors

23. 0.470 0.480 0.490 0.500 10 processes 20 processes 40 processes iLPD LPD 14_Tumors

24. 0.140 0.150 0.160 0.170 0.180 0.190 10 processes 20 processes 40 processes iLPD LPD 9_Tumors