HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

Complex Network Clustering Using GPU-based
Parallel Non-negative Matrix Factorization

Xidian university
Huming Zhu, Maoguo Gong, Baolin Huang

zhuhum@mail.xidian.edu.cn
2013.11

openCL COURSE
! ID：0222277，0242277
! Opencl PROGRAMMING，Practice
! 2011、2012,2013

Contents

Complex Network Clustering of NMF

1
2
3
4
5

Parallel Bayesian NMF on GPU
Sparse BNMF on GPU

Experiment

Conclusion

Complex
Network
Clustering

* All pictures are from Internet

12/7/13

Xidian University

5
5

Complex Network Clustering
Network clustering aims to divide a network into several communities. It is required
that the number of edges linking nodes of the same communities should be higher
than the number of edges joining nodes belonging to different communities.

•  Network clustering is essential for understanding how a network is organized and
functions.

12/7/13

Xidian University

6
6

Non-negative Matrix Factorization (NMF)
"  The NMF problem is defined as a searching for an approximation of the matrix
A with respect to some metric (e.g., the norm) by factoring A into the product
W × H of two reduced matrices W and H.
"  NMF was applied in many areas, image processing,

" powerful interpretability and close relationship between clustering methods.
" Need a lot of computation power.
[1] D. D. Lee, H. S. Seung: Learning the parts of objects by non-negative matrix factorization. Nature 401,pp. 788–791 (1999).

12/7/13

Xidian University

7
7

Bayesian NMF
Input : Nonnegative data (observation) matrix A, fixed
hyperparameters a, b.
Output : Nonnegative matrices W and H
Step1 ：Initialize W and H to nonnegative values

Step5.

12/7/13

If convergence then stop, otherwise, go to step2.

Xidian University

8
8

Contents


1
2
3
4
5

12/7/13

Sparse BNMF on GPU

Experiment

Conclusion

Xidian University

9

Parallel Bayesian NMF

• P-BNMF
• Sparse-BNMF。

12/7/13

Xidian University

10

P-BNMF kernel

matrix multiplication

Matrix square sum

12/7/13

Xidian University

11

Matrix multiplication

"  Update matrix：W*H
"  Kernel: mat_mult_AB

12/7/13

Xidian University

12

sum of square of Matrix

12/7/13

Xidian University

13

Contents


1
2
3
4
5

12/7/13

Sparse BNMF on GPU

Experiment

Conclusion

Xidian University

14

Sparse-BNMF
Problem

GPU memory 1G，P-BNMF scale limit!
Solution

Sparse matrix storage format (CSR) ，Present Sparse-BNMF。

12/7/13

Xidian University

15

Sparse-BNMF

CSR column :
Aj_column, Av_column,
Ap_column

CSR :
Aj, Av, Ap

12/7/13

Xidian University

16

12/7/13

Xidian University

17

Pseudo-code for A_WH_csr kernel l

12/7/13

uint row = globalidy;
if(row < row_num)
{
uint rowStart = Ap[row];
//get the start start position in Aj of this row.
uint rowEnd = Ap[row+1]; //get the end position of this row.
int index = rowStart + groupidx * 16 + localid; //the size of group is 16*1
//get the position of this pe(processing elelmet).
int col = Aj[index];//get the position in Av of this pe.
int aStart = widthA *groupidy;
int aEnd = aStart + widthA -1;
int aStep = 16;
float Csub = 0.+0.000001;
int bStart = col;
int bStep = 16*widthB;
for(int a = aStart, b = bStart; a < aEnd; a += aStep, b += bStep)
{
if(rowStart + groupidx * 16 < rowEnd)
{//if there exist any nonzero value in this group
As[localid]=W[a + localid];
barrier(CLK_LOCAL_MEM_FENCE);
}
if(rowStart + groupidx * 16+ localid < rowEnd)
{// if this pe correspond to a nonzero value
for(int k=0; k<16; k++)
Bs[k*16+localid]= H[b + k*widthB];
for(int k=0; k<16; k++)
Csub += Bs[k*16+localid]*As[k];
}
if(rowStart + groupidx * 16+ localid < rowEnd)
Av_result[index] =1.0/Csub;
}
Xidian University
}

18

12/7/13

Xidian University

19

Contents


1
2
3
4
5

12/7/13

Sparse BNMF on GPU

Experiment

Conclusion

Xidian University

20

Machine
Host
Product Name

Device

HP xw9400 workstation

Product Name

AMD Radeon HD 7770

OS

Windows 7 .x64 Edition

Engine Speed

1000MHz

CPU

4× Dual-Core AMD Opteron
2220 2.80GHz

Processing Elements

640

Memory

32GB

Memory

1GB GDDR5

Memory Bandwidths

72GB/s

PCI

PCI Express® 3.0 x16

" AMD Accelerated Parallel Processing (APP) SDK v2.7, OpenCL 1.2
" Microsoft Visual Studio 2010；

12/7/13

Xidian University

21
21

Evaluation Modularity(Q)[1]

Q=

ki k j
1
( Aij −
)δ (Ci , C j )
∑
2m ij
2m

synthetic

Q↑，Better Network structure

real-world networks

Data

Vertex

Edges

Q

Data

Vertex

Edges

Q

Benchmark

128

1024

0.450

Facebook

324

4436

0.620

500

5135

0.813

Email

1133

5451

0.531

1000

9582

0.904

Netscience

1461

2742

0.905

5000

38007

0.908

Power

4941

6594

0.599

10000

148470

0.860

Scientists

6650

59870

0.647

50000

748337

0.900

Hep

7610

15751

0.772

LFR

[1]. M. E. J. Newman, M. Girvan, Finding and evaluating community structure in networks,
Phys. Rev. E 69 (2) (2004) 026113.

12/7/13

Xidian University

22
22

Network demo
Netscience (part)

• The netscience network is a network of co-authorship of
scientists working on network theory and experiment.
12/7/13

Xidian University

Facebook

23
23

Speedup
Data

Vertex

K

BNMF(s)

P-BNMF(s)

Sparse-BNMF(s)

P-Ratio

Sparse-Ratio

Benchmark

128

64

4.165

0.166

0.226

4.37

3.1

500
1000
5000

128
128
128

109.9
712.5
31031.5

0.823
2.98
109.96

1.096
2.798
71.167

67.63
187.58
279.39

51.35
181.6
417.21

10000

128

186321.7

615.09

334.23

302.92

556.2

50000

128

*

*

8250.28

*

*

Facebook

324

128

46.25

1.328

1.656

34.82

27.93

Email

1133

128

774.4

3.901

3.042

162.24

189.33

Netscience

1461

128

1253.2

6.725

4.628

166.11

215.81

Power
Hep
Scientists

4941
7610
6650

128
128
128

26202.4
76827.2
63254.5

108.30
271.28
208.2

61.787
152.66
125.55

239.29
281.75
303.81

404.38
491.85
503.84

LFR

K is the number of clustering，BNMF(s) serial time,P-Rati: P-BNMF/BNMF speedup
Sparse-Ratio:Sparse-BNMF/BNMF speedup。
12/7/13

Xidian University

24
24

Speedup

" Netscience
" Cluster number K 64~256.
" Speedup，Sparse-BNMF better。
12/7/13

Xidian University

25
25

"  Using CodeXL to analyze OpenCL kernels on AMD GPUs

12/7/13

Xidian University

26
26

Kernel information provided by CodeXL

Table1. P-BNMF kernel

Table 2.Sparse-BNMF kernel的

Method

GlobalWorkSize

WorkGroupSize

Time

Method

GlobalWorkSize

WorkGroupSize

Time

Update_H

{1472 128 1}

{16 16 1}

6.12726

mat_mult_AB

{1472 1472 1}

{16 16 1}

10.73615

Update_H
A_WH_csr_col

{1472 128 1}
{1472 1472 1}

{16 16 1}
{ 1 16 1}

6.11407
7.76119

mat_dot_div

{1472 1472 1}

{16 16 1}

3.70267

mat_mult_A_s_col

{1461 2048 1}

{ 1 16 1}

5.36341

mat_mult_AtB

{1472 128 1}

{16 16 1}

9.72355

mat_dot_mult

{1472 128 1}

{16 16 1}

0.2917

mat_dot_mult

{1472 128 1}

{16 16 1}

0.30133

mat_squ_sum_row

{1472 128 1}

{64 1 1}

0.5483

mat_squ_sum_row

{1472 128 1}

{64 1 1}

0.55304

mat_squ_sum_col

{ 128 1472 1}

{ 1 64 1}

7.27985

update_invbeta

{ 128 1 1}

{ 4 1 1}

0.03763

Update_W

{ 128 1472 1}

{16 16 1}

6.25437

mat_squ_sum_col
update_invbeta
Update_W
A_WH_csr

{ 128 1472 1}
{128 1 1}
{ 128 1472 1}
{1472 1472 1}

{ 1 64 1}
{ 4 1 1}
{16 16 1}
{16 1 1}

6.99467
0.03748
6.17718
6.29185

mat_mult_AB

{1472 1472 1}

{16 16 1}

10.75037

mat_mult_s_Bt

{2048 1461 1}

{16 1 1}

5.37615

mat_dot_div

{1472 1472 1}

{16 16 1}

3.64148

mat_mult_ABt

{ 128 1472 1}

{16 16 1}

9.04222

mat_dot_mult

{ 128 1472 1}

{16 16 1}

0.27763

mat_dot_mult

{ 128 1472 1}

{16 16 1}

0.2843

" Table 1, bolt kernel，W* H，dot matriply，AtB。

" Table 2, Sparse kernel, A_WH_csr_co和mat_mult_A_s_col。
" CSR is better。
12/7/13

Xidian University

27
27

PNMF VS Sparse-BNMF

PNMF

Sparse-BNMF

SIZE

small(<10000)

big

speedup

low

high

# the Sparse-BNMF algorithm can solve the memory limit problem effectively,
# which enables the algorithm to deal with larger scale networks.

12/7/13

Xidian University

28
28

Contents


1
2
3
4
5

12/7/13

Sparse BNMF on GPU

Experiment

Conclusion

Xidian University

29

Our work

" Present P-BNMF and Sparse-NMF；
"  P-BNMF；
"  Sparse-BNMF, CSR；
" speedup.

Future

" Portablity。
12/7/13

Xidian University

30
30

HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

Ähnlich wie HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu (20)

Mehr von AMD Developer Central

Mehr von AMD Developer Central (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu