Trelles_QnormBOSC2009

Q norm: A library of parallel methods for gene-expression Q-normalization José Manuel Mateos-Duran; Pjotr Prins; Andrés Rodríguez and Oswaldo Trelles The Bioinformatics Open Source Conference (BOSC)

European Concerted Research Action (COST) Bioinformatics new generation open source Bi ng os Improving open source software for high performance computing in Biology Problem : New HT technologies in several areas of life sciences produce enormous amounts of data. A bottleneck in our ability to process and analyse the data Solution : Increase communication between Bioinformatics, HPC and OSS communities for adapting/developing capable software tools ,[object Object],[object Object],[object Object],[object Object],[object Object]

1) Load data to memory 2) Order each column of R producing a set of indexes I[G][E]=p (where p is the original position of the value in column 4) Assign the average value to all entries O[g][e]= A[g] g=1 to G; e=1 to E 3) Obtain A[G] the average value for each row 5) Sort each column O[g][E] by the index I[g][E] (reproduce the original order) Q uantile normalization

C ode reorganization { nE = LoadProject(fname, fList); for (i=0;i< nE;i++) { // for each Exp [STEP 1] LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG); PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++) // Global average AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Out file & one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or disk) for (j=0;j<nG;j++) { // prepare OUT array dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector } } } P arallel prototype

S hared memory version { nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG); PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++) // Global average AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Output file and one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or from disk) for (j=0;j<nG;j++) { // complete output vector dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector } } } #pragma omp parallel shared From, To, Range // Open general parallel section #pragma omp parallel shared From, To, Range

Master Slave(s) Get Parameters, Initialize Start with params CalculateBlocks(nP,IniBlocks) Broadcast(IniBlocks)  Receive (Block) while(!ENDsignal) { for each experiment in block { LoadExperiment(Exp) SortExperiment(Exp) AcumulateAverage(Exp); } while (ThereIsBlocks) { AverageBlock(ResultBlock) Receive(ResultBlock,who)  Send(ResultBlock) AverageBlock(ResultBlock) if(!SendAllBlocks) { CalculateNextBlock(NextBlock) Send(who,NextBlock)  Receive(Block); } } } Broadcast(ENDsignal)  ReportResults M essage P assing version

CPU nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); CopyToGPU(dataIn); <<kernel>> QSortGPU(dataIn, dIndex) CopyFromGPU(dIndex); WriteToDisk(dIndex); <<kernel>> RowAccum(dataIn, AvG) } <<kernel>> GlobalAvg (AvG, nE) CopyFromGPU(AvG); // Step 2: Produce Output File // Using indexes and global average G PU version GPU NVIDIA CUDA Programming Model GPU kernels: QSortGPU(dataIn, dIndex) RowAccum(dataIn, AvG) GlobalAvg(AvG, nE)

Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 470 arrays. Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). H ardware & D ata Pablo : Shared Memory Cluster up-to 256 Nodes / JS20-IBM 512 CPUs - 1TB Distributed memory. Each node: 2 CPUs IBM PowerPC single-core 970FX - 64 bits - 2 GHz & 4GB RAM mem. HD : 40 GB (local) Interconnection Network: MERINET Picasso: Shared Memory Cluster up-to 64 Nodes Superdome HP 128 CPUs - 128 GB SM. Each node: 2 CPUs Intel Itanium-2 Dual Core - 1,6 GHz Almeria: CPU: Intel Core 2 Quad Q9450, 2.66 GHz, 1.33 GHz FSB, 12 MB L2 GPU: GeForce 9800 GX2, 600/1500 MHz, 2x1 GHz DDR3, 1 GB & 512 bits HD: 2 x 72 GB (RAID 0) **Western Digital Raptors **10000 RPM.

Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 4 70 arrays. Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). B enchmarking Distributed memory Shared memory GPU 2.9 x total speed-up 5.5 x processing speed-up

C onclusions Background Application domain: bioinformatics (diverse, disperse, heterogeneous, huge data…) I/O and memory oriented applications Large collection of sequential code unable to deal with computational demands Aims Featuring the application domain Start-up a library of (common) parallel procedures. Benchmarking Performance is strong related to code dependencies Parallel models (shared, distributed, etc) are appropriated for different code structures Shared memory is good but expensive GPU-based solution seem to be a good alternative for local installations I/O bounded applications should search of performance in the I/O device Q norm

Trelles_QnormBOSC2009

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (9)

Ähnlich wie Trelles_QnormBOSC2009

Ähnlich wie Trelles_QnormBOSC2009 (20)

Mehr von bosc

Mehr von bosc (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Trelles_QnormBOSC2009

Hinweis der Redaktion