2. I am an employee of Shire pharmaceuticals. The statements and
opinions expressed within this session are my own and do not
represent those of Shire.
There are some references to technical design pattern being
implemented within Shire but explanation of those
implementations provided in this session are purely technical.
This presentation outlines general technology direction and trend
analysis. Shire has no obligation to pursue any approaches
outlined in this document or use any functionality documented or
discussed in today’s session.
3. Volume
VarietyVelocity
(Petabytes of Data)
(Structured, Unstructured,
images, Sounds)
(Batch, sub second response,
stream, changes in data)
Handle large volume of data
Designed for Scalability & Failover
Support multiple workloads
Security, multi tenancy & privacy
Cost effective
Characteristics of a big data system
4. 3. Apache Hadoop Multiple work loads/Distributed Computing
1. Mobility 2. Social
8. Security,governance,privacyandAudit
BI Reports
&
Dashboards
Data Analysts
Data Scientists
Apps
(Web + Mobile)
Devices
Data Feeds
Data Service : Multiple data sources, multiple processing workloads and multiple delivery channels
Impala / Tez
(Interactive)
HDFS(Hadoop Distributed File System)
MR
(Batch)
Spark
(Stream,
ETL, DS)
Hive
(DW)
Robust Cloud Infrastructure(e.g. AWS EC2)
Governance,Security&Audit
YARN (Cluster Resource Manager)
Hbase
(NoSQL)
Solr
(Search)
Spark
(Mlib,
Graph)
Custom/proprietary/Visualization Apps
CTMS
CommonDataIngestion
Clinical
Trials.gov
Metadata
Data
Quality
Searchable Data
Catalog
Streaming
CRO
Data Feed
Genomic
Data
9. CTMS
Streaming
ClinicalTrials.gov
UK Clinical Trials
Gateway
Other R&D Datasets
SAS Datasets
Genomic
Datasets
Apache Solr Running on Hadoop Cluster
HDFS
(Data Landing)
Apache Solr
Data Indexing
Information Extraction
(Spark)
Pattern Recognition
(Spark)
Machine Learning
(Spark)
Metadata Driven Ontology
(Hbase)
Data Indexing
Solr
APIs
Web UI
Mobile
Apps
Desktop
Widgets
Dashboards
Data Sources
Consumption
Hbase
APIs
10. Technology is here to stay
Data Generation speed will accelerate
Data Access will get easier
Device connectivity will increase
Technological disruption is inevitable
11.
12. Are Recommender Systems Now Mainstream?
◦ https://icrunchdatanews.com/recommender-systems-now-
mainstream/
The Impact of Real-time Computing Systems – Part 1
◦ https://icrunchdatanews.com/impact-real-time-computing-
systems-part-1/
The Impact of Real-time Computing Systems – Part 1
◦ https://icrunchdatanews.com/impact-real-time-computing-
systems-part-2/
ASCOT: a text mining-based web-service
◦ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3339391/