1. The document discusses accelerating collapsed variational Bayesian inference for latent Dirichlet allocation (CVB) using Nvidia CUDA compatible GPU devices.
2. It describes parallelizing CVB for LDA by assigning different topics to different GPU threads. This achieves near-linear speedup compared to a single-threaded CPU implementation.
3. Experiments on text and image datasets demonstrate that the GPU implementation provides faster inference over the CPU version, though data transfer latency and memory limits remain challenges for large-scale problems.
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA compatible devices
1. Accelerating C ollapsed V ariational B ayesian Inference for L atent D irichlet A llocation with Nvidia CUDA Compatible Devices Tomonari MASADA 正田 備也 Nagasaki University [email_address]
5. Dir ( β ) topic = word multinomial Tomonari MASADA (IEA-AIE 2009) symmetric Dirichlet prior v 1 v 2 v 3 v 4 t 1 φ 11 φ 12 φ 13 φ 14 v 1 v 2 v 3 v 4 t 2 φ 21 φ 22 φ 23 φ 24 v 1 v 2 v 3 v 4 t 3 φ 31 φ 32 φ 33 φ 34
7. v 3 v 1 v 3 v 2 v 2 θ j 1 θ j 2 θ j 3 Tomonari MASADA (IEA-AIE 2009) v 1 v 2 v 3 v 4 t 3 φ 31 φ 32 φ 33 φ 34 v 1 v 2 v 3 v 4 t 2 φ 21 φ 22 φ 23 φ 24 v 1 v 2 v 3 v 4 t 1 φ 11 φ 12 φ 13 φ 14
15. O( JK ) size O( KW ) size O( K ) size O( MK ) size Tomonari MASADA (IEA-AIE 2009) j : doc id w : word id k : topic id E[ n jk ] Var[ n jk ] E[ n kw ] Var[ n kw ] E[ n k ] Var[ n k ] γ jwk
26. Data transfer latency Host Memory Transfer one large block instead of many smaller ones! Grid Device Memory Shared Memory Registers Thread Registers Thread Block Shared Memory Registers Thread Registers Thread Block
27. O( JK ) size O( KW ) size O( K ) size parameters of approximated posterior Tomonari MASADA (IEA-AIE 2009) j : doc id w : word id k : topic id E[ n jk ] Var[ n jk ] E[ n kw ] Var[ n kw ] E[ n k ] Var[ n k ] γ jwk