SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Big Data Infrastructure for Translational
Research
Christopher G. Wilson, Ph.D.
Associate Professor Physiology and Pediatrics
Center for Perinatal Biology
Translational Medicine, April 18th, 2015
Disclosures
The work reported here was supported, in part,
by NIH grants:
1R01HL081622-01 (NHLBI)
1R03HD064830-01 (NICHD)
Outline
• Defining “Big Data”
• Big data is of multiple modes/types
• Scaling data acquisition to build Big Data sets
•  Patient bed
•  Unit
•  Institution-wide
• Continuing challenges
What is “Big Data”?
• Big data is a blanket term for any collection of data sets so
large and complex that it becomes difficult to process using the
typical data management tools and data processing
applications.
• Big data usually includes data sets so large that commonly
used software (like Microsoft Office) cannot be used to
capture, curate, manage, and process the data quickly and
efficiently.
• Big data set sizes are a constantly moving target ranging
from 100’s of gigabytes (109 bytes), to terabytes (1012 bytes)
and even to petabytes (1015 bytes) of data in a single data set.
A feast of data!
• The world’s technological per-capita capacity to store
information has roughly doubled every 40 months since the
1980s
• Global Internet traffic has reached almost 1000 exabytes
(1018 bytes) annually and continues to grow*
• The challenge for both business and research science is
coming up with the tools to extract usable information from this
data
*Cisco systems estimate
Where does so much data come from?
Data sets grow to vast size because they are increasingly
being gathered by:
• Ubiquitous information-sensing mobile devices (phones,
fitbits, jawbones, etc.)
• Surveillance technologies (remote sensing devices like
drones or traffic cameras)
• Software logs from your internet activity (Hello—Facebook!)
• Radio-frequency identification (RFID) tags
• Wireless sensor networks (once again, the kind of thing your
phone “wants” to attach to when you are out and about)
• And scientific instruments, clinical monitors, patient
samples…
Personal
fitness
trackers
Work-flow of “Big Data” analysis
Or…
• Obtain data
• Scrub data
• Explore data
• Model the data
• Interpret the data
• Present the data
Data analytics is a team sport!
•  Project manager—responsible for setting clear project objectives and deliverables.
The project manager should be someone with more experience in data analysis
and a more comprehensive background than the other team members.
•  Statistician—should have a strong mathematics/statistics background and will be
responsible for reporting and developing the statistics workflow for the project.
•  Visualization specialist—responsible for the design/development of data
visualization (figures/animation) for the project.
•  Database specialist—develops ontology/meta-tags to represent the data and
incorporate this information in the team's chosen database schema.
•  Content Expert—has the strongest background in the focus area of the project
(Physiologist, systems biologist, molecular biologist, biochemist, clinician, etc.) and
is responsible for providing background material relevant to the project's focus.
•  Web developer/integrator—responsible for web-content related to the project,
including the final report formatting (for web/hardcopy display).
•  Data analyst/programmer—the most junior member of the team will take on
general responsibilities to assist the other team members. This is a learning
opportunity for a team member who is new to data analysis and needs time to
develop the skills necessary to fully participate in the workflow.
Data analytics is a team sport!
Project manager/
content expert
(physician/scientist)
Database/web
developer
Statistician/
Data viz
Programmer
Team members can have multiple roles….
What tools are typically used?
• 64 bit computing environment is typical (Big RAM and Big
storage, massively parallel software running on clusters/cloud
servers)
• Data is acquired and stored in a database (SQL for some but
NOSQL databases like Hadoop, MongoDB, CouchDB,
Clusterpoint, etc. are “better”)
• Data screening & cleaning using “scripting” languages (Perl
or Python typically) and processing using tools like
MapReduce
• “Industrial strength” statistical packages (typically R, SAS, or
SPSS)
• Visualization (D3/IDL/MATLAB/Python/Plot.ly, etc.)
• Metadata tagging (XML and variants)
How can we meet the challenge
of Big Data collection/integration
in a translational setting?
What are the challenges for clinicians/researchers?
The amount of biomedical data that is increasingly available
provides both opportunity and challenge for the translational
investigator.
• Molecular biology has provided tools to allow understanding of
genomics and proteomics.
• There is growing data on the connectomics of signaling pathways
• Patient demographic data and other EHR/EMR metrics are a resource
that is only now being widely deployed and interrogated.
• Patient physiology (bedside monitors) can be used to provide
fundamental information about patient health and adaptation to
pathophysiologies.
• Health Insurance Portability and Accountability Act of 1996 (HIPAA) is
a necessary challenge for data handling.
Courtesy Michael De Georgia & J. Michael Schmidt
Big Data to Decisions!
» Technology challenges for “Data to Decisions”
~  Transforming data from multiple sources into meaningful information (evidence-context dependent)
~  Association of data from diverse heterogeneous, asynchronous sources
~  Merging/fusion of information for alerts and decision support
~  Human guided processing and analysis
Multi-source Analysis For Pattern Discovery Extract & synthesize
information from diverse
data.
SOURCE
SOURCE
SOURCE
Source-to-Evidence:
Information Processing &
Extraction
Text Analytics
Image Analysis
Signal Processing
Data Association
Data Fusion:
Alerting & Decision
Support
Combine
Information
Weigh
Evidence
Real time
Alerting
User Interface:
Display & Analysis
Visualization
Queries
Data
Provenance
Sensitivity
Real-time Decision Support
Providing useful information to the clinician
» Real-time decision support to clinicians at the point of care
~  Codify best practice protocols
~  Enable efficient treatment decisions
~  Reduce needless procedures
~  Optimize coordination among care givers
~  Reduce the probability of mistakes being made
» Key features that affect decision support
~  Methods to retrieve, merge, and present data and information
~  Algorithms to extract information from complex, heterogeneous data
~  Visualization/graphical feedback to better understand patient conditions
» Automated alerting for conditions of concern
~  Combining information across data streams
~  Accumulation of weak evidence from multiple sources
~  Enhanced retrieval and visualization of information
Challenges inherent in Big Data Analytics
• Capture
• Curation
• Storage
• Search
• Sharing
• Transfer
• Analysis
• Visualization
Data is multi-modal
Unified data set
Physiology
waveforms
(ECG, EEG,
SaO2, BP)
Radiology
(X-Ray, MRI, CAT,
etc.)
EMR/EHR
“-omics”
data
Bedside Patient Data Acquisition
Scaling to a hospital-wide data center
Ken Loparo
Michael DeGeorgia
Frank Jacono
Farhad Kaffashi
CWRU IMEDSTM Proof of Concept
Demonstration
Why is IMEDS™ Different?
The Approach
~  “Bottom-up” development with clinicians and engineers working
side-by-side
~  Open source architecture design
~  Total integrated, “plug-and-play” system solution
~  Unbiased approach
~  Unified effort, rather than stove-piped, “one-off” solutions to small
pieces of the problem
~  Non-profit nation-wide consortium
~  Builds on existing infrastructures
~  Leverages best available technology, regardless of source
Courtesy Michael De Georgia & J. Michael Schmidt
Challenges inherent in Big Data Analytics
• Capture
• Curation
• Storage
• Search
• Sharing
• Transfer
• Analysis
• Visualization
Courtesy of Susanna-Assunta Sansone, PhD
Courtesy of Susanna-Assunta Sansone, PhD
Courtesy of Susanna-Assunta Sansone, PhD
IPython
interface
http://ipython.org
•  Reproducible
•  Version controlled (git)
•  Interactive analysis
Challenges inherent in Big Data Analytics
• Capture
• Curation
• Storage
• Search
• Sharing
• Transfer
• Analysis
• Visualization
Worldwide movement for FAIR data
Barend Mons and Susanna-Assunta Sansone
http://bd2k.nih.gov/workshops.html#ADDS
!
"
Launched on May 27th, 2014
A new online-only publication for descriptions of scientifically valuable datasets in
the life, environmental and biomedical sciences, but not limited to these
Credit for sharing
your data
Focused on reuse
and reproducibility
Peer reviewed,
curated
Promoting Community
Data Repositories
Open Access
Supported by:
Courtesy of Susanna-Assunta Sansone, PhD
Challenges inherent in Big Data Analytics
• Capture
• Curation
• Storage
• Search
• Sharing
• Transfer
• Analysis
• Visualization
Data Processing
Decision Tree
Analysis
Artificial Neural
Network
Mechanistic
Approaches
Graphical
Approaches
Bayesian
Network
Hierarchical
Clustering
Probabilistic
Approaches
Classical
Statistical
Inference
Bayesian
Statistical
Inference
Complex Systems Analysis
Time
Domain
Frequency
Domain
Scale Invariant
(Fractal) Analysis
Approximate
Entropy
Integrated
Patient
Database
Data Analysis Methods
Data Analysis Methods
Python as a data analytics environment
Advantages to using a Big Data approach
• Speed of data reduction and analysis
• Visualization of complex data sets can be done relatively
quickly
• Capacity for storage and processing of vast data sets is
inherent in the tool stack
• Scalability of cloud/cluster storage
• Potential for “Big Impact” on research and clinical care
Disadvantages to a Big Data approach
• Often not hypothesis driven (a fishing mission?)
• Requires expensive computing technology depending upon
data processing and storage needs
• Requires significant programming skill to develop and use the
tool stack
• Typically requires “team based” data analysis and
management (programmer, database manager, design/
visualization person, etc.)
• Just because you have lots of data, doesn’t mean you have
an obvious or easy way to extract the information!
Summary
• We live in a data-rich era.
• The data available to us is multi-modal and requires
integration.
• Data collection and integration can occur at many scales
(bedside to institution) but the data must be converted into
usable information.
• Team-based science depends upon a wide range of data
analytics skills.
• Curation, reproducibility of and shared access to data is an
ongoing challenge.
Where do you find your data
analytics team members?
Syllabus Overview (10 week course)
Foundations 1: Using text editors, using the IPython notebook for data exploration, using
version control software (git), using the class wiki.
Foundations 2: Using IPython/NumPy/SciPy, importing and manipulating data with Pandas,
data visualization in IPython.
Analysis Methods: Basic signal theory overview, time-series data, plotting (lines, histograms,
bars, etc.) dynamical systems analyses of data variability, information theory measures
(entropy) of complexity, frequency domain/spectral measures (FFT, time-varying spectrum),
wavelets.
Handling Sequence data: Using R/Bioconductor, differences between mRNA-Seq, gene-
array, proteomics, and deep-sequencing data, visualizing data from gene/RNA arrays.
Data set storage and retrieval: Basics of relational databases, SQL vs. NOSQL, cloud
storage/NAS/computing clusters, interfacing with Hadoop/MapReduce, metadata and ontology
for biomedical/patient data (XML), using secure databases (REDCap).
Data integrity and security: The Health Insurance Portability and Accountability Act (HIPAA)
and what it means for data management, de-identifying patient data (handling PHI), data
security best practices, making data available to the public—implications for data transparency
and large-scale data mining.
Coalition Institutions
The coding Queen and her Court…
Abby Dobyns
Princesses of Python
Rhaya Johnson
Regie Felix and Adaeze Anyanwu
And a Princeling….
Jamie Tillett
Acknowledgements
Loma Linda
• Andy Hopper
• Traci Marin
• Charles Wang
• Wilson Aruni
• Valery Filippov
CWRU
•  Michael De Georgia
•  Kenneth Loparo
•  Frank Jacono
•  Farhad Kaffashi
My laboratory’s git repository:
UC Riverside
• Thomas Girke
(Bioinformatics)
La Sierra University
•  Marvin Payne
CSU San Bernardino
•  Art Concepcion
(Bioinformatics)
UC Irvine
•  Alex Nicolau
(Comp Sci/Bioinf)
https://github.com/drcgw/bass
Questions?!
Further reading
• Doing Data Science by Cathy O’Neil and Rachel Schutt
• Data Analysis with Open-Source Tools by Philipp Janert
• The Art of R Programming by Norman Matloff
• R for Everyone by Jared P. Lander
• Python for Data Analysis by Wes McKinney
• Think Python by Allen B. Downey
• Think Stats by Allen B. Downey
• Think Complexity by Allen B. Downey
• Every one of Edward Tufte’s books (The Visual Display
of Quantitative Information, Visual Explanations,
Envisioning Information, Beautiful Evidence)
Example: Patient physiology waveforms + EMR
Example: Interrogating sequence data

Weitere ähnliche Inhalte

Was ist angesagt?

Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Carolyn Ten Holter
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRARDC
 
Enhancing Our Capacity for Large Health Dataset Analysis
Enhancing Our Capacity for Large Health Dataset AnalysisEnhancing Our Capacity for Large Health Dataset Analysis
Enhancing Our Capacity for Large Health Dataset AnalysisCTSI at UCSF
 
Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011heila1
 
development of information technology
development of information technologydevelopment of information technology
development of information technologyBiqie1995
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...ASIS&T
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
An Introduction to Machine Learning and Genomics
An Introduction to Machine Learning and GenomicsAn Introduction to Machine Learning and Genomics
An Introduction to Machine Learning and GenomicsBrittany Lasseigne, Ph.D.
 
Open science and data sharing: the DataFirst experience/Martin Wittenberg
Open science and data sharing: the DataFirst experience/Martin WittenbergOpen science and data sharing: the DataFirst experience/Martin Wittenberg
Open science and data sharing: the DataFirst experience/Martin WittenbergAfrican Open Science Platform
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?Philip Bourne
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhilip Bourne
 

Was ist angesagt? (16)

Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSR
 
Enhancing Our Capacity for Large Health Dataset Analysis
Enhancing Our Capacity for Large Health Dataset AnalysisEnhancing Our Capacity for Large Health Dataset Analysis
Enhancing Our Capacity for Large Health Dataset Analysis
 
Irt
IrtIrt
Irt
 
Irt
IrtIrt
Irt
 
Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011Survey of research data management practices up2010digschol2011
Survey of research data management practices up2010digschol2011
 
development of information technology
development of information technologydevelopment of information technology
development of information technology
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management Planning
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
An Introduction to Machine Learning and Genomics
An Introduction to Machine Learning and GenomicsAn Introduction to Machine Learning and Genomics
An Introduction to Machine Learning and Genomics
 
Open science and data sharing: the DataFirst experience/Martin Wittenberg
Open science and data sharing: the DataFirst experience/Martin WittenbergOpen science and data sharing: the DataFirst experience/Martin Wittenberg
Open science and data sharing: the DataFirst experience/Martin Wittenberg
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?
 
Jane Howard
Jane HowardJane Howard
Jane Howard
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 

Andere mochten auch

Big data in biology
Big data in biologyBig data in biology
Big data in biologyOmkar Reddy
 
Masterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceMasterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceDeepak Singh
 
Network biology - A basis for large-scale biomedica data mining
Network biology - A basis for large-scale biomedica data miningNetwork biology - A basis for large-scale biomedica data mining
Network biology - A basis for large-scale biomedica data miningLars Juhl Jensen
 
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and textNetwork biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and textLars Juhl Jensen
 
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"..."Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...Dataconomy Media
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataPhilip Bourne
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text miningLars Juhl Jensen
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...c.titus.brown
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningLars Juhl Jensen
 
Big data biology for pythonistas: getting in on the genomics revolution
Big data biology for pythonistas: getting in on the genomics revolutionBig data biology for pythonistas: getting in on the genomics revolution
Big data biology for pythonistas: getting in on the genomics revolutionDarya Vanichkina
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomicsGuy Coates
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelLars Juhl Jensen
 
Big Data, Computational Biology & the Future of Strategic Planning for Research
Big Data, Computational Biology & the Future of Strategic Planning for ResearchBig Data, Computational Biology & the Future of Strategic Planning for Research
Big Data, Computational Biology & the Future of Strategic Planning for ResearchNBBJDesign
 

Andere mochten auch (16)

Big data in biology
Big data in biologyBig data in biology
Big data in biology
 
Masterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceMasterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale science
 
Network biology - A basis for large-scale biomedica data mining
Network biology - A basis for large-scale biomedica data miningNetwork biology - A basis for large-scale biomedica data mining
Network biology - A basis for large-scale biomedica data mining
 
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and textNetwork biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
 
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"..."Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
 
Big data biology for pythonistas: getting in on the genomics revolution
Big data biology for pythonistas: getting in on the genomics revolutionBig data biology for pythonistas: getting in on the genomics revolution
Big data biology for pythonistas: getting in on the genomics revolution
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomics
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
 
Big Data, Computational Biology & the Future of Strategic Planning for Research
Big Data, Computational Biology & the Future of Strategic Planning for ResearchBig Data, Computational Biology & the Future of Strategic Planning for Research
Big Data, Computational Biology & the Future of Strategic Planning for Research
 

Ähnlich wie 2015 04-18-wilson cg

Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLHJisc
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Perficient, Inc.
 
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ARDC
 
HETT Conference Olympic Central 2014 Integrating Healthcare Delivery
HETT Conference Olympic Central 2014 Integrating Healthcare DeliveryHETT Conference Olympic Central 2014 Integrating Healthcare Delivery
HETT Conference Olympic Central 2014 Integrating Healthcare DeliveryElmar Flamme
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00Pistoia Alliance
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 
Data Warehousing: Bridging Islands of Health Information Systems
Data Warehousing: Bridging Islands of Health Information Systems Data Warehousing: Bridging Islands of Health Information Systems
Data Warehousing: Bridging Islands of Health Information Systems MEASURE Evaluation
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 

Ähnlich wie 2015 04-18-wilson cg (20)

Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
Big data analystics
Big data analysticsBig data analystics
Big data analystics
 
BIG DATA.ppt
BIG DATA.pptBIG DATA.ppt
BIG DATA.ppt
 
Cri big data
Cri big dataCri big data
Cri big data
 
Hadoop Enabled Healthcare
Hadoop Enabled HealthcareHadoop Enabled Healthcare
Hadoop Enabled Healthcare
 
Open Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality DataOpen Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality Data
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
 
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
 
Precision and Participatory Medicine - MEDINFO 2015 Panel on big data
Precision and Participatory Medicine - MEDINFO 2015 Panel on big dataPrecision and Participatory Medicine - MEDINFO 2015 Panel on big data
Precision and Participatory Medicine - MEDINFO 2015 Panel on big data
 
HETT Conference Olympic Central 2014 Integrating Healthcare Delivery
HETT Conference Olympic Central 2014 Integrating Healthcare DeliveryHETT Conference Olympic Central 2014 Integrating Healthcare Delivery
HETT Conference Olympic Central 2014 Integrating Healthcare Delivery
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
Data Warehousing: Bridging Islands of Health Information Systems
Data Warehousing: Bridging Islands of Health Information Systems Data Warehousing: Bridging Islands of Health Information Systems
Data Warehousing: Bridging Islands of Health Information Systems
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 

Kürzlich hochgeladen

Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.ktanvi103
 
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...chandigarhentertainm
 
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In FaridabadCall Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabadgragmanisha42
 
Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Vipesco
 
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service available
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service availableCall Girl Raipur 📲 9999965857 whatsapp live cam sex service available
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service availablegragmanisha42
 
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetJalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...Sheetaleventcompany
 
Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅
Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅
Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅gragmanisha42
 
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetraisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...Ahmedabad Call Girls
 
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetHubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
VIP Call Girl Sector 10 Noida Call Me: 9711199171
VIP Call Girl Sector 10 Noida Call Me: 9711199171VIP Call Girl Sector 10 Noida Call Me: 9711199171
VIP Call Girl Sector 10 Noida Call Me: 9711199171Call Girls Service Gurgaon
 
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetNanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Enjoyment ★ 8854095900 Indian Call Girls In Dehradun 🍆🍌 By Dehradun Call Girl ★
Enjoyment ★ 8854095900 Indian Call Girls In Dehradun 🍆🍌 By Dehradun Call Girl ★Enjoyment ★ 8854095900 Indian Call Girls In Dehradun 🍆🍌 By Dehradun Call Girl ★
Enjoyment ★ 8854095900 Indian Call Girls In Dehradun 🍆🍌 By Dehradun Call Girl ★indiancallgirl4rent
 
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetSambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetbhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...indiancallgirl4rent
 
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...Sheetaleventcompany
 
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMuzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In RaipurCall Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipurgragmanisha42
 

Kürzlich hochgeladen (20)

Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
 
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
❤️Call girls in Jalandhar ☎️9876848877☎️ Call Girl service in Jalandhar☎️ Jal...
 
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In FaridabadCall Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
Call Girls Service Faridabad 📲 9999965857 ヅ10k NiGhT Call Girls In Faridabad
 
Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510
 
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service available
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service availableCall Girl Raipur 📲 9999965857 whatsapp live cam sex service available
Call Girl Raipur 📲 9999965857 whatsapp live cam sex service available
 
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetJalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
 
Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅
Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅
Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅
 
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetraisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
 
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetHubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
VIP Call Girl Sector 10 Noida Call Me: 9711199171
VIP Call Girl Sector 10 Noida Call Me: 9711199171VIP Call Girl Sector 10 Noida Call Me: 9711199171
VIP Call Girl Sector 10 Noida Call Me: 9711199171
 
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetNanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Enjoyment ★ 8854095900 Indian Call Girls In Dehradun 🍆🍌 By Dehradun Call Girl ★
Enjoyment ★ 8854095900 Indian Call Girls In Dehradun 🍆🍌 By Dehradun Call Girl ★Enjoyment ★ 8854095900 Indian Call Girls In Dehradun 🍆🍌 By Dehradun Call Girl ★
Enjoyment ★ 8854095900 Indian Call Girls In Dehradun 🍆🍌 By Dehradun Call Girl ★
 
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetSambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetbhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
 
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
 
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMuzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In RaipurCall Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
 

2015 04-18-wilson cg

  • 1. Big Data Infrastructure for Translational Research Christopher G. Wilson, Ph.D. Associate Professor Physiology and Pediatrics Center for Perinatal Biology Translational Medicine, April 18th, 2015
  • 2. Disclosures The work reported here was supported, in part, by NIH grants: 1R01HL081622-01 (NHLBI) 1R03HD064830-01 (NICHD)
  • 3.
  • 4.
  • 5. Outline • Defining “Big Data” • Big data is of multiple modes/types • Scaling data acquisition to build Big Data sets •  Patient bed •  Unit •  Institution-wide • Continuing challenges
  • 6.
  • 7. What is “Big Data”? • Big data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using the typical data management tools and data processing applications. • Big data usually includes data sets so large that commonly used software (like Microsoft Office) cannot be used to capture, curate, manage, and process the data quickly and efficiently. • Big data set sizes are a constantly moving target ranging from 100’s of gigabytes (109 bytes), to terabytes (1012 bytes) and even to petabytes (1015 bytes) of data in a single data set.
  • 8. A feast of data! • The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s • Global Internet traffic has reached almost 1000 exabytes (1018 bytes) annually and continues to grow* • The challenge for both business and research science is coming up with the tools to extract usable information from this data *Cisco systems estimate
  • 9. Where does so much data come from? Data sets grow to vast size because they are increasingly being gathered by: • Ubiquitous information-sensing mobile devices (phones, fitbits, jawbones, etc.) • Surveillance technologies (remote sensing devices like drones or traffic cameras) • Software logs from your internet activity (Hello—Facebook!) • Radio-frequency identification (RFID) tags • Wireless sensor networks (once again, the kind of thing your phone “wants” to attach to when you are out and about) • And scientific instruments, clinical monitors, patient samples…
  • 11. Work-flow of “Big Data” analysis
  • 12. Or… • Obtain data • Scrub data • Explore data • Model the data • Interpret the data • Present the data
  • 13. Data analytics is a team sport! •  Project manager—responsible for setting clear project objectives and deliverables. The project manager should be someone with more experience in data analysis and a more comprehensive background than the other team members. •  Statistician—should have a strong mathematics/statistics background and will be responsible for reporting and developing the statistics workflow for the project. •  Visualization specialist—responsible for the design/development of data visualization (figures/animation) for the project. •  Database specialist—develops ontology/meta-tags to represent the data and incorporate this information in the team's chosen database schema. •  Content Expert—has the strongest background in the focus area of the project (Physiologist, systems biologist, molecular biologist, biochemist, clinician, etc.) and is responsible for providing background material relevant to the project's focus. •  Web developer/integrator—responsible for web-content related to the project, including the final report formatting (for web/hardcopy display). •  Data analyst/programmer—the most junior member of the team will take on general responsibilities to assist the other team members. This is a learning opportunity for a team member who is new to data analysis and needs time to develop the skills necessary to fully participate in the workflow.
  • 14. Data analytics is a team sport! Project manager/ content expert (physician/scientist) Database/web developer Statistician/ Data viz Programmer Team members can have multiple roles….
  • 15. What tools are typically used? • 64 bit computing environment is typical (Big RAM and Big storage, massively parallel software running on clusters/cloud servers) • Data is acquired and stored in a database (SQL for some but NOSQL databases like Hadoop, MongoDB, CouchDB, Clusterpoint, etc. are “better”) • Data screening & cleaning using “scripting” languages (Perl or Python typically) and processing using tools like MapReduce • “Industrial strength” statistical packages (typically R, SAS, or SPSS) • Visualization (D3/IDL/MATLAB/Python/Plot.ly, etc.) • Metadata tagging (XML and variants)
  • 16.
  • 17. How can we meet the challenge of Big Data collection/integration in a translational setting?
  • 18. What are the challenges for clinicians/researchers? The amount of biomedical data that is increasingly available provides both opportunity and challenge for the translational investigator. • Molecular biology has provided tools to allow understanding of genomics and proteomics. • There is growing data on the connectomics of signaling pathways • Patient demographic data and other EHR/EMR metrics are a resource that is only now being widely deployed and interrogated. • Patient physiology (bedside monitors) can be used to provide fundamental information about patient health and adaptation to pathophysiologies. • Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a necessary challenge for data handling.
  • 19. Courtesy Michael De Georgia & J. Michael Schmidt
  • 20. Big Data to Decisions! » Technology challenges for “Data to Decisions” ~  Transforming data from multiple sources into meaningful information (evidence-context dependent) ~  Association of data from diverse heterogeneous, asynchronous sources ~  Merging/fusion of information for alerts and decision support ~  Human guided processing and analysis Multi-source Analysis For Pattern Discovery Extract & synthesize information from diverse data. SOURCE SOURCE SOURCE Source-to-Evidence: Information Processing & Extraction Text Analytics Image Analysis Signal Processing Data Association Data Fusion: Alerting & Decision Support Combine Information Weigh Evidence Real time Alerting User Interface: Display & Analysis Visualization Queries Data Provenance Sensitivity
  • 21. Real-time Decision Support Providing useful information to the clinician » Real-time decision support to clinicians at the point of care ~  Codify best practice protocols ~  Enable efficient treatment decisions ~  Reduce needless procedures ~  Optimize coordination among care givers ~  Reduce the probability of mistakes being made » Key features that affect decision support ~  Methods to retrieve, merge, and present data and information ~  Algorithms to extract information from complex, heterogeneous data ~  Visualization/graphical feedback to better understand patient conditions » Automated alerting for conditions of concern ~  Combining information across data streams ~  Accumulation of weak evidence from multiple sources ~  Enhanced retrieval and visualization of information
  • 22. Challenges inherent in Big Data Analytics • Capture • Curation • Storage • Search • Sharing • Transfer • Analysis • Visualization
  • 23. Data is multi-modal Unified data set Physiology waveforms (ECG, EEG, SaO2, BP) Radiology (X-Ray, MRI, CAT, etc.) EMR/EHR “-omics” data
  • 24. Bedside Patient Data Acquisition
  • 25. Scaling to a hospital-wide data center Ken Loparo Michael DeGeorgia Frank Jacono Farhad Kaffashi
  • 26.
  • 27. CWRU IMEDSTM Proof of Concept Demonstration
  • 28. Why is IMEDS™ Different? The Approach ~  “Bottom-up” development with clinicians and engineers working side-by-side ~  Open source architecture design ~  Total integrated, “plug-and-play” system solution ~  Unbiased approach ~  Unified effort, rather than stove-piped, “one-off” solutions to small pieces of the problem ~  Non-profit nation-wide consortium ~  Builds on existing infrastructures ~  Leverages best available technology, regardless of source
  • 29. Courtesy Michael De Georgia & J. Michael Schmidt
  • 30. Challenges inherent in Big Data Analytics • Capture • Curation • Storage • Search • Sharing • Transfer • Analysis • Visualization
  • 34. IPython interface http://ipython.org •  Reproducible •  Version controlled (git) •  Interactive analysis
  • 35. Challenges inherent in Big Data Analytics • Capture • Curation • Storage • Search • Sharing • Transfer • Analysis • Visualization
  • 36. Worldwide movement for FAIR data Barend Mons and Susanna-Assunta Sansone http://bd2k.nih.gov/workshops.html#ADDS
  • 37. ! " Launched on May 27th, 2014 A new online-only publication for descriptions of scientifically valuable datasets in the life, environmental and biomedical sciences, but not limited to these Credit for sharing your data Focused on reuse and reproducibility Peer reviewed, curated Promoting Community Data Repositories Open Access Supported by: Courtesy of Susanna-Assunta Sansone, PhD
  • 38. Challenges inherent in Big Data Analytics • Capture • Curation • Storage • Search • Sharing • Transfer • Analysis • Visualization
  • 39. Data Processing Decision Tree Analysis Artificial Neural Network Mechanistic Approaches Graphical Approaches Bayesian Network Hierarchical Clustering Probabilistic Approaches Classical Statistical Inference Bayesian Statistical Inference Complex Systems Analysis Time Domain Frequency Domain Scale Invariant (Fractal) Analysis Approximate Entropy Integrated Patient Database Data Analysis Methods
  • 41. Python as a data analytics environment
  • 42. Advantages to using a Big Data approach • Speed of data reduction and analysis • Visualization of complex data sets can be done relatively quickly • Capacity for storage and processing of vast data sets is inherent in the tool stack • Scalability of cloud/cluster storage • Potential for “Big Impact” on research and clinical care
  • 43. Disadvantages to a Big Data approach • Often not hypothesis driven (a fishing mission?) • Requires expensive computing technology depending upon data processing and storage needs • Requires significant programming skill to develop and use the tool stack • Typically requires “team based” data analysis and management (programmer, database manager, design/ visualization person, etc.) • Just because you have lots of data, doesn’t mean you have an obvious or easy way to extract the information!
  • 44. Summary • We live in a data-rich era. • The data available to us is multi-modal and requires integration. • Data collection and integration can occur at many scales (bedside to institution) but the data must be converted into usable information. • Team-based science depends upon a wide range of data analytics skills. • Curation, reproducibility of and shared access to data is an ongoing challenge.
  • 45. Where do you find your data analytics team members?
  • 46. Syllabus Overview (10 week course) Foundations 1: Using text editors, using the IPython notebook for data exploration, using version control software (git), using the class wiki. Foundations 2: Using IPython/NumPy/SciPy, importing and manipulating data with Pandas, data visualization in IPython. Analysis Methods: Basic signal theory overview, time-series data, plotting (lines, histograms, bars, etc.) dynamical systems analyses of data variability, information theory measures (entropy) of complexity, frequency domain/spectral measures (FFT, time-varying spectrum), wavelets. Handling Sequence data: Using R/Bioconductor, differences between mRNA-Seq, gene- array, proteomics, and deep-sequencing data, visualizing data from gene/RNA arrays. Data set storage and retrieval: Basics of relational databases, SQL vs. NOSQL, cloud storage/NAS/computing clusters, interfacing with Hadoop/MapReduce, metadata and ontology for biomedical/patient data (XML), using secure databases (REDCap). Data integrity and security: The Health Insurance Portability and Accountability Act (HIPAA) and what it means for data management, de-identifying patient data (handling PHI), data security best practices, making data available to the public—implications for data transparency and large-scale data mining.
  • 48. The coding Queen and her Court… Abby Dobyns Princesses of Python Rhaya Johnson Regie Felix and Adaeze Anyanwu And a Princeling…. Jamie Tillett
  • 49. Acknowledgements Loma Linda • Andy Hopper • Traci Marin • Charles Wang • Wilson Aruni • Valery Filippov CWRU •  Michael De Georgia •  Kenneth Loparo •  Frank Jacono •  Farhad Kaffashi My laboratory’s git repository: UC Riverside • Thomas Girke (Bioinformatics) La Sierra University •  Marvin Payne CSU San Bernardino •  Art Concepcion (Bioinformatics) UC Irvine •  Alex Nicolau (Comp Sci/Bioinf) https://github.com/drcgw/bass
  • 51. Further reading • Doing Data Science by Cathy O’Neil and Rachel Schutt • Data Analysis with Open-Source Tools by Philipp Janert • The Art of R Programming by Norman Matloff • R for Everyone by Jared P. Lander • Python for Data Analysis by Wes McKinney • Think Python by Allen B. Downey • Think Stats by Allen B. Downey • Think Complexity by Allen B. Downey • Every one of Edward Tufte’s books (The Visual Display of Quantitative Information, Visual Explanations, Envisioning Information, Beautiful Evidence)
  • 52. Example: Patient physiology waveforms + EMR