SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Manoj Vig
manojvig@gmail.com
https://www.linkedin.com/in/manojvig
Twitter - #manojvig
I am an employee of Shire pharmaceuticals. The statements and
opinions expressed within this session are my own and do not
represent those of Shire.
There are some references to technical design pattern being
implemented within Shire but explanation of those
implementations provided in this session are purely technical.
This presentation outlines general technology direction and trend
analysis. Shire has no obligation to pursue any approaches
outlined in this document or use any functionality documented or
discussed in today’s session.
Volume
VarietyVelocity
(Petabytes of Data)
(Structured, Unstructured,
images, Sounds)
(Batch, sub second response,
stream, changes in data)
 Handle large volume of data
 Designed for Scalability & Failover
 Support multiple workloads
 Security, multi tenancy & privacy
 Cost effective
Characteristics of a big data system
3. Apache Hadoop  Multiple work loads/Distributed Computing
1. Mobility 2. Social
Participant Recruitment
Adherence & Engagement
User Interaction
 Frequent Data Generation
Remote Data Exchange
Data Generation
Participant
engagement
Patient &
Site Identification
Social Listenting
Distributed Scale Unstructured Velocity Security Access
Big Data Processing Systems
Twitter
Twitter API(Multi threaded data acquisition)
Curation
Filter Algorithms Rank
Location Profile
Distributed, Scalable, Fast & Economical
Key Decision
Makers
Targeted Ads
Visualizations
Web/Mobile
Delivery Channels
AutomatedProcess
Security,governance,privacyandAudit
BI Reports
&
Dashboards
Data Analysts
Data Scientists
Apps
(Web + Mobile)
Devices
Data Feeds
Data Service : Multiple data sources, multiple processing workloads and multiple delivery channels
Impala / Tez
(Interactive)
HDFS(Hadoop Distributed File System)
MR
(Batch)
Spark
(Stream,
ETL, DS)
Hive
(DW)
Robust Cloud Infrastructure(e.g. AWS EC2)
Governance,Security&Audit
YARN (Cluster Resource Manager)
Hbase
(NoSQL)
Solr
(Search)
Spark
(Mlib,
Graph)
Custom/proprietary/Visualization Apps
CTMS
CommonDataIngestion
Clinical
Trials.gov
Metadata
Data
Quality
Searchable Data
Catalog
Streaming
CRO
Data Feed
Genomic
Data
CTMS
Streaming
ClinicalTrials.gov
UK Clinical Trials
Gateway
Other R&D Datasets
SAS Datasets
Genomic
Datasets
Apache Solr Running on Hadoop Cluster
HDFS
(Data Landing)
Apache Solr
Data Indexing
Information Extraction
(Spark)
Pattern Recognition
(Spark)
Machine Learning
(Spark)
Metadata Driven Ontology
(Hbase)
Data Indexing
Solr
APIs
Web UI
Mobile
Apps
Desktop
Widgets
Dashboards
Data Sources
Consumption
Hbase
APIs
 Technology is here to stay
 Data Generation speed will accelerate
 Data Access will get easier
 Device connectivity will increase
 Technological disruption is inevitable
 Are Recommender Systems Now Mainstream?
◦ https://icrunchdatanews.com/recommender-systems-now-
mainstream/
 The Impact of Real-time Computing Systems – Part 1
◦ https://icrunchdatanews.com/impact-real-time-computing-
systems-part-1/
 The Impact of Real-time Computing Systems – Part 1
◦ https://icrunchdatanews.com/impact-real-time-computing-
systems-part-2/
 ASCOT: a text mining-based web-service
◦ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3339391/

Weitere ähnliche Inhalte

Was ist angesagt?

UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 
Future of text analysis forrester briefing
Future of text analysis   forrester briefingFuture of text analysis   forrester briefing
Future of text analysis forrester briefingStuart Shulman
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterGlobus
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
 
Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationJacqueline Stern
 
Yosemite part-4 webinar-final
Yosemite part-4 webinar-finalYosemite part-4 webinar-final
Yosemite part-4 webinar-finalDATAVERSITY
 
Med.data.edu.au project
Med.data.edu.au projectMed.data.edu.au project
Med.data.edu.au projectARDC
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data pagetTERN Australia
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingMaaike Duine
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...DATAVERSITY
 
Human resource assignment help
Human resource assignment helpHuman resource assignment help
Human resource assignment helpjohn mayer
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Data Quality and the FAIR principles
Data Quality and the FAIR principlesData Quality and the FAIR principles
Data Quality and the FAIR principlesAmrapali Zaveri, PhD
 
Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Ian Foster
 
Event Data - Crossref LIVE South Africa
Event Data - Crossref LIVE South Africa Event Data - Crossref LIVE South Africa
Event Data - Crossref LIVE South Africa Crossref
 
Research information management: making sense of it all
Research information management: making sense of it allResearch information management: making sense of it all
Research information management: making sense of it allDigital Science
 
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...Mary Bass
 

Was ist angesagt? (20)

UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Future of text analysis forrester briefing
Future of text analysis   forrester briefingFuture of text analysis   forrester briefing
Future of text analysis forrester briefing
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and Inspiration
 
Yosemite part-4 webinar-final
Yosemite part-4 webinar-finalYosemite part-4 webinar-final
Yosemite part-4 webinar-final
 
Med.data.edu.au project
Med.data.edu.au projectMed.data.edu.au project
Med.data.edu.au project
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data paget
 
Options for online profiles
Options for online profilesOptions for online profiles
Options for online profiles
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier Linking
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
 
Human resource assignment help
Human resource assignment helpHuman resource assignment help
Human resource assignment help
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Data Quality and the FAIR principles
Data Quality and the FAIR principlesData Quality and the FAIR principles
Data Quality and the FAIR principles
 
Jonathan Breeze, Symplectic
Jonathan Breeze, SymplecticJonathan Breeze, Symplectic
Jonathan Breeze, Symplectic
 
Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009
 
Event Data - Crossref LIVE South Africa
Event Data - Crossref LIVE South Africa Event Data - Crossref LIVE South Africa
Event Data - Crossref LIVE South Africa
 
White Manipulating Metadata to Enhance Access
White Manipulating Metadata to Enhance AccessWhite Manipulating Metadata to Enhance Access
White Manipulating Metadata to Enhance Access
 
Research information management: making sense of it all
Research information management: making sense of it allResearch information management: making sense of it all
Research information management: making sense of it all
 
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
 

Andere mochten auch

Creative Out of Home Media
Creative Out of Home MediaCreative Out of Home Media
Creative Out of Home MediaKenton Larsen
 
At the fron­tier of Big Data and Brain Health
At the fron­tier of Big Data and Brain HealthAt the fron­tier of Big Data and Brain Health
At the fron­tier of Big Data and Brain HealthSharpBrains
 
Employer Brand Integrated Communication Plan (Thermo Fisher)
Employer Brand Integrated Communication Plan (Thermo Fisher)Employer Brand Integrated Communication Plan (Thermo Fisher)
Employer Brand Integrated Communication Plan (Thermo Fisher)HR Open Source
 
Employer Brand Message Guidelines (Thermo Fisher)
Employer Brand Message Guidelines (Thermo Fisher)Employer Brand Message Guidelines (Thermo Fisher)
Employer Brand Message Guidelines (Thermo Fisher)HR Open Source
 
How to Drive ROI In Your Healthcare Quality Improvement Projects
How to Drive ROI In Your Healthcare Quality Improvement Projects How to Drive ROI In Your Healthcare Quality Improvement Projects
How to Drive ROI In Your Healthcare Quality Improvement Projects Health Catalyst
 
State of the Word 2011
State of the Word 2011State of the Word 2011
State of the Word 2011photomatt
 

Andere mochten auch (9)

Creative Out of Home Media
Creative Out of Home MediaCreative Out of Home Media
Creative Out of Home Media
 
Barbara Bierer, "Clinical Trial Data Sharing"
Barbara Bierer, "Clinical Trial Data Sharing"Barbara Bierer, "Clinical Trial Data Sharing"
Barbara Bierer, "Clinical Trial Data Sharing"
 
At the fron­tier of Big Data and Brain Health
At the fron­tier of Big Data and Brain HealthAt the fron­tier of Big Data and Brain Health
At the fron­tier of Big Data and Brain Health
 
Employer Brand Integrated Communication Plan (Thermo Fisher)
Employer Brand Integrated Communication Plan (Thermo Fisher)Employer Brand Integrated Communication Plan (Thermo Fisher)
Employer Brand Integrated Communication Plan (Thermo Fisher)
 
Employer Brand Message Guidelines (Thermo Fisher)
Employer Brand Message Guidelines (Thermo Fisher)Employer Brand Message Guidelines (Thermo Fisher)
Employer Brand Message Guidelines (Thermo Fisher)
 
How to Drive ROI In Your Healthcare Quality Improvement Projects
How to Drive ROI In Your Healthcare Quality Improvement Projects How to Drive ROI In Your Healthcare Quality Improvement Projects
How to Drive ROI In Your Healthcare Quality Improvement Projects
 
Analytics in Pharmaceutical Industry
Analytics in Pharmaceutical IndustryAnalytics in Pharmaceutical Industry
Analytics in Pharmaceutical Industry
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
State of the Word 2011
State of the Word 2011State of the Word 2011
State of the Word 2011
 

Ähnlich wie Clinical Trials & Big Data-Final

ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...DataWorks Summit/Hadoop Summit
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnectorNigel Jones
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An IntroductionShankar R
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTGuido Schmutz
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Mobile Data Analytics
Mobile Data AnalyticsMobile Data Analytics
Mobile Data AnalyticsRICHARD AMUOK
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisYuanyuan Tian
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
 
Alitora Innovation Networks
Alitora Innovation NetworksAlitora Innovation Networks
Alitora Innovation Networksalitora
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 

Ähnlich wie Clinical Trials & Big Data-Final (20)

SAIP
SAIPSAIP
SAIP
 
Big data
Big dataBig data
Big data
 
Social media with big data analytics
Social media with big data analyticsSocial media with big data analytics
Social media with big data analytics
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnector
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An Introduction
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Bigdata overview
Bigdata overviewBigdata overview
Bigdata overview
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
Mobile Data Analytics
Mobile Data AnalyticsMobile Data Analytics
Mobile Data Analytics
 
Bigdata
BigdataBigdata
Bigdata
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
Alitora Innovation Networks
Alitora Innovation NetworksAlitora Innovation Networks
Alitora Innovation Networks
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 

Clinical Trials & Big Data-Final

  • 2. I am an employee of Shire pharmaceuticals. The statements and opinions expressed within this session are my own and do not represent those of Shire. There are some references to technical design pattern being implemented within Shire but explanation of those implementations provided in this session are purely technical. This presentation outlines general technology direction and trend analysis. Shire has no obligation to pursue any approaches outlined in this document or use any functionality documented or discussed in today’s session.
  • 3. Volume VarietyVelocity (Petabytes of Data) (Structured, Unstructured, images, Sounds) (Batch, sub second response, stream, changes in data)  Handle large volume of data  Designed for Scalability & Failover  Support multiple workloads  Security, multi tenancy & privacy  Cost effective Characteristics of a big data system
  • 4. 3. Apache Hadoop  Multiple work loads/Distributed Computing 1. Mobility 2. Social
  • 5. Participant Recruitment Adherence & Engagement User Interaction  Frequent Data Generation Remote Data Exchange Data Generation
  • 6. Participant engagement Patient & Site Identification Social Listenting Distributed Scale Unstructured Velocity Security Access Big Data Processing Systems
  • 7. Twitter Twitter API(Multi threaded data acquisition) Curation Filter Algorithms Rank Location Profile Distributed, Scalable, Fast & Economical Key Decision Makers Targeted Ads Visualizations Web/Mobile Delivery Channels AutomatedProcess
  • 8. Security,governance,privacyandAudit BI Reports & Dashboards Data Analysts Data Scientists Apps (Web + Mobile) Devices Data Feeds Data Service : Multiple data sources, multiple processing workloads and multiple delivery channels Impala / Tez (Interactive) HDFS(Hadoop Distributed File System) MR (Batch) Spark (Stream, ETL, DS) Hive (DW) Robust Cloud Infrastructure(e.g. AWS EC2) Governance,Security&Audit YARN (Cluster Resource Manager) Hbase (NoSQL) Solr (Search) Spark (Mlib, Graph) Custom/proprietary/Visualization Apps CTMS CommonDataIngestion Clinical Trials.gov Metadata Data Quality Searchable Data Catalog Streaming CRO Data Feed Genomic Data
  • 9. CTMS Streaming ClinicalTrials.gov UK Clinical Trials Gateway Other R&D Datasets SAS Datasets Genomic Datasets Apache Solr Running on Hadoop Cluster HDFS (Data Landing) Apache Solr Data Indexing Information Extraction (Spark) Pattern Recognition (Spark) Machine Learning (Spark) Metadata Driven Ontology (Hbase) Data Indexing Solr APIs Web UI Mobile Apps Desktop Widgets Dashboards Data Sources Consumption Hbase APIs
  • 10.  Technology is here to stay  Data Generation speed will accelerate  Data Access will get easier  Device connectivity will increase  Technological disruption is inevitable
  • 11.
  • 12.  Are Recommender Systems Now Mainstream? ◦ https://icrunchdatanews.com/recommender-systems-now- mainstream/  The Impact of Real-time Computing Systems – Part 1 ◦ https://icrunchdatanews.com/impact-real-time-computing- systems-part-1/  The Impact of Real-time Computing Systems – Part 1 ◦ https://icrunchdatanews.com/impact-real-time-computing- systems-part-2/  ASCOT: a text mining-based web-service ◦ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3339391/