SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Q norm: A library of parallel methods for gene-expression Q-normalization José Manuel Mateos-Duran; Pjotr Prins; Andrés Rodríguez and Oswaldo Trelles The Bioinformatics Open Source Conference (BOSC)
European Concerted Research Action (COST) Bioinformatics  new generation  open source Bi ng os Improving open source software  for high performance computing in Biology Problem : New HT technologies in several areas of life sciences produce enormous amounts of data. A bottleneck in our ability to process and analyse the data Solution : Increase communication between Bioinformatics, HPC and OSS communities for adapting/developing capable software tools ,[object Object],[object Object],[object Object],[object Object],[object Object]
1) Load data to memory 2) Order each column of R producing a set of indexes I[G][E]=p (where p is the original position of the value in column 4) Assign the average value to all entries O[g][e]= A[g]  g=1 to G; e=1 to E 3) Obtain A[G] the average value for each row  5) Sort each column O[g][E] by the index I[g][E] (reproduce the original order) Q uantile normalization
C ode reorganization {  nE = LoadProject(fname, fList); for (i=0;i< nE;i++) { // for each Exp [STEP 1] LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG);  PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++)  // Global average  AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Out file & one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or disk) for (j=0;j<nG;j++) { //  prepare   OUT  array dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector }  } } P arallel prototype
S hared memory version {  nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG);  PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++)  // Global average  AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Output file and one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or from disk) for (j=0;j<nG;j++) { // complete output vector dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector }  } } #pragma omp parallel shared From, To, Range // Open general parallel section #pragma omp parallel shared From, To, Range
Master  Slave(s) Get Parameters, Initialize Start with params CalculateBlocks(nP,IniBlocks) Broadcast(IniBlocks)  Receive (Block) while(!ENDsignal) {   for each experiment in block {   LoadExperiment(Exp)   SortExperiment(Exp)   AcumulateAverage(Exp); } while (ThereIsBlocks) {   AverageBlock(ResultBlock) Receive(ResultBlock,who)    Send(ResultBlock) AverageBlock(ResultBlock) if(!SendAllBlocks) { CalculateNextBlock(NextBlock) Send(who,NextBlock)    Receive(Block);   } } } Broadcast(ENDsignal)  ReportResults M essage  P assing version
CPU nE = LoadProject(fname, fList); for  (i=0; i< nE; i++) {  // for each Exp LoadFile(fList, i, dataIn); CopyToGPU(dataIn); <<kernel>> QSortGPU(dataIn, dIndex) CopyFromGPU(dIndex); WriteToDisk(dIndex); <<kernel>> RowAccum(dataIn, AvG) } <<kernel>> GlobalAvg (AvG, nE) CopyFromGPU(AvG); // Step 2: Produce Output File // Using indexes and global average G PU version GPU NVIDIA CUDA Programming Model GPU kernels: QSortGPU(dataIn, dIndex) RowAccum(dataIn, AvG) GlobalAvg(AvG, nE)
Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 470 arrays.  Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). H ardware &  D ata Pablo : Shared Memory Cluster up-to 256 Nodes / JS20-IBM 512 CPUs - 1TB Distributed memory. Each node: 2 CPUs IBM PowerPC single-core 970FX - 64 bits - 2 GHz & 4GB RAM mem. HD : 40 GB (local) Interconnection Network: MERINET  Picasso:  Shared Memory Cluster up-to 64 Nodes Superdome HP 128 CPUs - 128 GB SM. Each node: 2 CPUs Intel Itanium-2 Dual Core - 1,6 GHz Almeria:  CPU: Intel Core 2 Quad Q9450, 2.66 GHz, 1.33 GHz FSB, 12 MB L2  GPU: GeForce 9800 GX2, 600/1500 MHz, 2x1 GHz DDR3, 1 GB & 512 bits  HD: 2 x 72 GB (RAID 0) **Western Digital Raptors **10000 RPM.
Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x  4 70 arrays.  Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). B enchmarking Distributed memory  Shared memory  GPU 2.9 x total speed-up 5.5 x processing speed-up
C onclusions Background Application domain: bioinformatics (diverse, disperse, heterogeneous, huge data…) I/O and memory oriented applications Large collection of sequential code unable to deal with computational demands Aims Featuring the application domain Start-up a library of (common) parallel procedures. Benchmarking Performance is strong related to code dependencies Parallel models (shared, distributed, etc) are appropriated for different code structures Shared memory is good but expensive GPU-based solution seem to be a good alternative for local installations I/O bounded applications should search of performance in the I/O device Q norm

Weitere ähnliche Inhalte

Was ist angesagt?

PyCon KR 2019 sprint - RustPython by example
PyCon KR 2019 sprint  - RustPython by examplePyCon KR 2019 sprint  - RustPython by example
PyCon KR 2019 sprint - RustPython by exampleYunWon Jeong
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cachergrebski
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedis Labs
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaXKernel TLV
 
Scheduling in Linux and Web Servers
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web ServersDavid Evans
 
Performance evaluation of apache tajo
Performance evaluation of apache tajoPerformance evaluation of apache tajo
Performance evaluation of apache tajoJihoon Son
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksDavid Evans
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGordon Chung
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
Gnocchi v4 (preview)
Gnocchi v4 (preview)Gnocchi v4 (preview)
Gnocchi v4 (preview)Gordon Chung
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbagGordon Chung
 
Porting FreeRTOS on OpenRISC
Porting FreeRTOS   on   OpenRISCPorting FreeRTOS   on   OpenRISC
Porting FreeRTOS on OpenRISCYi-Chiao
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2Gordon Chung
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackKernel TLV
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)David Evans
 

Was ist angesagt? (20)

PyCon KR 2019 sprint - RustPython by example
PyCon KR 2019 sprint  - RustPython by examplePyCon KR 2019 sprint  - RustPython by example
PyCon KR 2019 sprint - RustPython by example
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
The Internet
The InternetThe Internet
The Internet
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
 
Storage
StorageStorage
Storage
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
 
Scheduling in Linux and Web Servers
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web Servers
 
Performance evaluation of apache tajo
Performance evaluation of apache tajoPerformance evaluation of apache tajo
Performance evaluation of apache tajo
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.x
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Debugging TV Frame 0x0D
Debugging TV Frame 0x0DDebugging TV Frame 0x0D
Debugging TV Frame 0x0D
 
Gnocchi v4 (preview)
Gnocchi v4 (preview)Gnocchi v4 (preview)
Gnocchi v4 (preview)
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbag
 
Porting FreeRTOS on OpenRISC
Porting FreeRTOS   on   OpenRISCPorting FreeRTOS   on   OpenRISC
Porting FreeRTOS on OpenRISC
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network Stack
 
Synchronization
SynchronizationSynchronization
Synchronization
 
Scheduling
SchedulingScheduling
Scheduling
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
 

Andere mochten auch

Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009bosc
 
Content Marketing Using Stories
Content Marketing Using StoriesContent Marketing Using Stories
Content Marketing Using StoriesSteve Rayson
 
Adapt Open Source Project - Initial Meeting Slides
Adapt Open Source Project - Initial Meeting SlidesAdapt Open Source Project - Initial Meeting Slides
Adapt Open Source Project - Initial Meeting SlidesSteve Rayson
 
Piipari_iMotif_BOSC2009
Piipari_iMotif_BOSC2009Piipari_iMotif_BOSC2009
Piipari_iMotif_BOSC2009bosc
 
Water&Poverty FCS20thC
Water&Poverty FCS20thCWater&Poverty FCS20thC
Water&Poverty FCS20thCjdankoff
 
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)actioncan
 

Andere mochten auch (9)

Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
 
Kineo china
Kineo chinaKineo china
Kineo china
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Content Marketing Using Stories
Content Marketing Using StoriesContent Marketing Using Stories
Content Marketing Using Stories
 
Adapt Open Source Project - Initial Meeting Slides
Adapt Open Source Project - Initial Meeting SlidesAdapt Open Source Project - Initial Meeting Slides
Adapt Open Source Project - Initial Meeting Slides
 
Piipari_iMotif_BOSC2009
Piipari_iMotif_BOSC2009Piipari_iMotif_BOSC2009
Piipari_iMotif_BOSC2009
 
Water&Poverty FCS20thC
Water&Poverty FCS20thCWater&Poverty FCS20thC
Water&Poverty FCS20thC
 
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 

Ähnlich wie Trelles_QnormBOSC2009

Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovFwdays
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...David Walker
 
Adios Api Scidac Tutorialv2
Adios Api Scidac Tutorialv2Adios Api Scidac Tutorialv2
Adios Api Scidac Tutorialv2fanc1985
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...Ganesan Narayanasamy
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconPeter Lawrey
 
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAkihiro Hayashi
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformaticsShanker Trivedi
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesESUG
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAGanesan Narayanasamy
 
Portable and reproducible bioinformatic analysis. Neoantigen discovery.
Portable and reproducible bioinformatic analysis. Neoantigen discovery.Portable and reproducible bioinformatic analysis. Neoantigen discovery.
Portable and reproducible bioinformatic analysis. Neoantigen discovery.Vladimir Kovacevic
 
Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008bosc_2008
 
Java gpu computing
Java gpu computingJava gpu computing
Java gpu computingArjan Lamers
 
Operating System 3
Operating System 3Operating System 3
Operating System 3tech2click
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta PyData
 
Node Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionNode Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionYunong Xiao
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMLinaro
 

Ähnlich wie Trelles_QnormBOSC2009 (20)

Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
Adios Api Scidac Tutorialv2
Adios Api Scidac Tutorialv2Adios Api Scidac Tutorialv2
Adios Api Scidac Tutorialv2
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL Generation
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformatics
 
Biopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and OutlookBiopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and Outlook
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGA
 
Portable and reproducible bioinformatic analysis. Neoantigen discovery.
Portable and reproducible bioinformatic analysis. Neoantigen discovery.Portable and reproducible bioinformatic analysis. Neoantigen discovery.
Portable and reproducible bioinformatic analysis. Neoantigen discovery.
 
Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008
 
Java gpu computing
Java gpu computingJava gpu computing
Java gpu computing
 
Operating System 3
Operating System 3Operating System 3
Operating System 3
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta
 
Lec05 buffers basic_examples
Lec05 buffers basic_examplesLec05 buffers basic_examples
Lec05 buffers basic_examples
 
Node Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionNode Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In Production
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
AES on modern GPUs
AES on modern GPUsAES on modern GPUs
AES on modern GPUs
 

Mehr von bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009bosc
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009bosc
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009bosc
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009bosc
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009bosc
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009bosc
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009bosc
 

Mehr von bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009
 

Kürzlich hochgeladen

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Trelles_QnormBOSC2009

  • 1. Q norm: A library of parallel methods for gene-expression Q-normalization José Manuel Mateos-Duran; Pjotr Prins; Andrés Rodríguez and Oswaldo Trelles The Bioinformatics Open Source Conference (BOSC)
  • 2.
  • 3. 1) Load data to memory 2) Order each column of R producing a set of indexes I[G][E]=p (where p is the original position of the value in column 4) Assign the average value to all entries O[g][e]= A[g] g=1 to G; e=1 to E 3) Obtain A[G] the average value for each row 5) Sort each column O[g][E] by the index I[g][E] (reproduce the original order) Q uantile normalization
  • 4. C ode reorganization { nE = LoadProject(fname, fList); for (i=0;i< nE;i++) { // for each Exp [STEP 1] LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG); PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++) // Global average AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Out file & one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or disk) for (j=0;j<nG;j++) { // prepare OUT array dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector } } } P arallel prototype
  • 5. S hared memory version { nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG); PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++) // Global average AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Output file and one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or from disk) for (j=0;j<nG;j++) { // complete output vector dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector } } } #pragma omp parallel shared From, To, Range // Open general parallel section #pragma omp parallel shared From, To, Range
  • 6. Master Slave(s) Get Parameters, Initialize Start with params CalculateBlocks(nP,IniBlocks) Broadcast(IniBlocks)  Receive (Block) while(!ENDsignal) { for each experiment in block { LoadExperiment(Exp) SortExperiment(Exp) AcumulateAverage(Exp); } while (ThereIsBlocks) { AverageBlock(ResultBlock) Receive(ResultBlock,who)  Send(ResultBlock) AverageBlock(ResultBlock) if(!SendAllBlocks) { CalculateNextBlock(NextBlock) Send(who,NextBlock)  Receive(Block); } } } Broadcast(ENDsignal)  ReportResults M essage P assing version
  • 7. CPU nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); CopyToGPU(dataIn); <<kernel>> QSortGPU(dataIn, dIndex) CopyFromGPU(dIndex); WriteToDisk(dIndex); <<kernel>> RowAccum(dataIn, AvG) } <<kernel>> GlobalAvg (AvG, nE) CopyFromGPU(AvG); // Step 2: Produce Output File // Using indexes and global average G PU version GPU NVIDIA CUDA Programming Model GPU kernels: QSortGPU(dataIn, dIndex) RowAccum(dataIn, AvG) GlobalAvg(AvG, nE)
  • 8. Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 470 arrays. Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). H ardware & D ata Pablo : Shared Memory Cluster up-to 256 Nodes / JS20-IBM 512 CPUs - 1TB Distributed memory. Each node: 2 CPUs IBM PowerPC single-core 970FX - 64 bits - 2 GHz & 4GB RAM mem. HD : 40 GB (local) Interconnection Network: MERINET Picasso: Shared Memory Cluster up-to 64 Nodes Superdome HP 128 CPUs - 128 GB SM. Each node: 2 CPUs Intel Itanium-2 Dual Core - 1,6 GHz Almeria: CPU: Intel Core 2 Quad Q9450, 2.66 GHz, 1.33 GHz FSB, 12 MB L2 GPU: GeForce 9800 GX2, 600/1500 MHz, 2x1 GHz DDR3, 1 GB & 512 bits HD: 2 x 72 GB (RAID 0) **Western Digital Raptors **10000 RPM.
  • 9. Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 4 70 arrays. Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). B enchmarking Distributed memory Shared memory GPU 2.9 x total speed-up 5.5 x processing speed-up
  • 10. C onclusions Background Application domain: bioinformatics (diverse, disperse, heterogeneous, huge data…) I/O and memory oriented applications Large collection of sequential code unable to deal with computational demands Aims Featuring the application domain Start-up a library of (common) parallel procedures. Benchmarking Performance is strong related to code dependencies Parallel models (shared, distributed, etc) are appropriated for different code structures Shared memory is good but expensive GPU-based solution seem to be a good alternative for local installations I/O bounded applications should search of performance in the I/O device Q norm

Hinweis der Redaktion

  1. Now, lets address the use of parallel processing in bioinformatic applications