Weitere ähnliche Inhalte
Ähnlich wie Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel
Ähnlich wie Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel (20)
Mehr von Romeo Kienzler (20)
Kürzlich hochgeladen (20)
Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel
- 1. © 2012 IBM Corporation1
Software and Hardware Infrastructures to conquer Data
Explosion in Life Science - Life Science Network Basel
Romeo Kienzler
Data Scientist and Architect, Pos. Graduate in Information Systems and Bioinformatics
IBM Innovation Center Zurich romeo.kienzler@ch.ibm.com
https://www.ibm.com/developerworks/mydeveloperworks/profiles/user/RomeoKienzler
- 2. © 2012 IBM Corporation2
Outline
●
Data Growth
●
Data Growth in Life Science
●
BigData in Life Science
●
How to address BigData?
●
Outlook
- 3. © 2012 IBM Corporation3
3
Data Growth
Data AVAILABLE to an
organization
data an organization can
PROCESS
Missed
opportunity
100 Million Tweets are posted every day, 35 hours of video are being uploaded every
minute,6.1 x 10^12 text messages have been sent in 2011 and 247 x 10^9 E-Mails passed
through the net.80 % spam and viruses. => Filtering is more and more important.
Up to 2003 the same amount of data has been produced as between 2003 and now
- 4. © 2012 IBM Corporation4
New Data Sources in Life Sciences
●
DNA (RNA) Sequencing
●
Next-Generation Sequencing
●
DNA Transistor
●
Imaging and Video
●
Unstructured Text
- 5. © 2012 IBM Corporation5
Data Growth in Life Sciences
Source: www.osehra.org
- 6. © 2012 IBM Corporation6
Data Growth in Life Sciences
Source: www.crops.org
- 8. © 2012 IBM Corporation8
Images and Videos
Source: www.phys.org
- 9. © 2012 IBM Corporation9
Examples – Text Analytics
Source: www.theglobalistreport.com
- 11. © 2012 IBM Corporation11
SIIB (Strategic IP Insight Platform)
Integrated chemical, biological and textual search
Deep analytics on scientific literature and patents
Aggregation of world wide Patent Data and scientific literature (30M+ docs) with ongoing updates
- 12. © 2012 IBM Corporation12
The challange
●
Store a huge amount of data
●
Process a huge amount of data
(incl. Search/Find)
●
Don't consume too much energy
- 15. © 2012 IBM Corporation15
Use many Hard Drives - Limits
(*) Given a Disk Capacity of 25TB
300 Crashes per Day, Data Loss after two weeks
- 16. © 2012 IBM Corporation16
Separate the Signal From the Noise¹
¹http://www.ibmsystemsmag.com/power/businessstrategy/BI-and-Analytics/signal_noise/
- 17. © 2012 IBM Corporation17
Store only what you need
- 18. © 2012 IBM Corporation18
Use many CPU's
Supercomputer before
➔
Weather
➔
Atom Bombs
➔
Science
➔
Crash Tests
Supercomputer in a Rack
➔
18 TB Main Memory, 1008 CPU Cores, 113 TFLOPS (1st
TOP500 2013: 17590 TFLOPS 2004: 71 TFLOPS)
- 20. © 2012 IBM Corporation20
Use specialized CPU's: GPUs
Source: www.ethz.ch
Source: www.nvidia.com
- 21. © 2012 IBM Corporation21
Use specialized CPU's: FPGA's
Source: www.virtex.com
- 22. © 2012 IBM Corporation22
Example FPGA: IBM Pure Data
●
Up to 1,28 PB Storage
●
Up to 10 Racks
●
Up to 500 GigaByte/s Throughput
●
Up to 1120 FPGA + 1120 Intel CPU Cores / 960 Hard Drives
- 23. © 2012 IBM Corporation23
Example FPGA: Conveycomputers
●
Accelerates BWA by 15x
●
Accelerates Smith-Waterman
Source: www.conveycomputer.com
- 24. © 2012 IBM Corporation24
Example: Algorithms
Source: www.biomedcentral.com/1471-2105/9/S2/S10
- 25. © 2012 IBM Corporation25
Example: Cloud
●
Managed Infrastructure
●
Dynamic Provisioning
●
Specialized HW
●
SaaS
Source: www.basespace.illumina.com
- 26. © 2012 IBM Corporation26
Conclusion
●
Main BigData Sources are Sequences and Plain Text
●
Many others to come (e.g. Images and Videos)
●
Store Data on many Commodity Hard Drives (Energy Problem not solved)
●
Filter Signal from Noise
●
Process Data on many CPU's
●
Usage of specialized Hardware / CPU's
●
Research in performance of algorithms
- 27. © 2012 IBM Corporation27
Outlook
●
Currently very heterogeneous infrastructures
●
Trends:
●
Virtualization
●
Standardization
●
Consumerization
●
Limits
●
Space
●
Energy consumption
●
What shall I do?
●
RELAX
- 28. © 2012 IBM Corporation28
The future will be full of surprises
A battery powered pocket size super
computer?
Raspberry Pi
Parallela
- 29. © 2012 IBM Corporation29
Acknowledgements
Slides 14 – 16 & 21 have been taken from a
Keynote speech of Axel Köster, IBM Germany