4. LPD [Rogers et al. 05]
Latent
Process
Decomposition
• Bayesian modeling
• Assignment of each
gene-sample pair
to a process
process = cluster
Previous
Work
5. [Ying et al. 08]
• K (# processes) should
be given as an input.
• LPD is inefficient
when K is large.
In many cases,
we don’t know
optimal K. Weakness
10. Experiment
http://www.gems-system.org/
Dataset name Sample Gene Diagnostic Task
11_Tumors 174 12,534 11 various human tumor types
14_Tumors 308 15,010
14 various human tumor types and
12 normal tissue types
9_Tumors 60 5,727 9 various human tumor types
Brain_Tumor1 90 5,921 5 human brain tumor types
Brain_Tumor2 50 10,368 4 malignant glioma types
Leukemia1 72 5,328 AML, ALL B-cell, and ALL T-cell
Leukemia2 72 11,226 AML, ALL, and mixed-lineage leukemia (MLL)
Lung_Cancer 203 12,601 4 lung cancer types and normal tissues
SRBCT 83 2,309 Small, round blue cell tumors (SRBCT) of childhood
Prostate_Tumor 102 10,510 Prostate tumor and normal tissues
DLBCL 77 5,470 DLBCL and follicular lymphomas
11. • Compare iLPD with
LPD [Ying et al. 08]
• Train iLPD on
90% randomly selected data
• Evaluate posterior density
at 10% test data and
calculate geometric mean
• Average over 25 runs Evaluation
12. • iLPD is more efficient
for a large K than LPD.
• There is a dataset that
is not well analyzed.
–LPD-type methods may
not be a panacea.
Cf. BMC Bioinformatics 2010, 11:552
– Nonparametric Bayesian method based on
Indian Buffet Processes
Results