Weitere ähnliche Inhalte Ähnlich wie Big data ready Enterprise (20) Kürzlich hochgeladen (20) Big data ready Enterprise1. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL1
Big Data Ready Enterprise
Sri Harsha Boda – Wipro Technologies
2. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL2
Big Data Ready Enterprise
3. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL3
Agenda
Common challenges when implementing at scale
How BDRE addresses the needs across the lifecycle
FastTrack Implementation using BDRE
Demo
Typical enterprise deployment view with BDRE
2
3
4
5
6
7
1
Typical use cases around Big Data Platform
Metadata Management in depth8
About BDRE
4. 4
About BDRE
BDRE is an Apache Licensed (APL 2.0) open source project.
Code is available on GitHub
Wipro’s largest opensource contribution till date.
Community choice winner of modern data applications track –
Hadoop summit San Jose, 2016.
5. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL5
Typical use cases organizations are embarking on Big Data Analytics
Information Delivery
Enterprise Data
Hub / Lake
Information, Integration & Governance
Batch Data
Processing
Event Stream &
Micro batch
Processing
Enterprise Data
Provisioning
Platform
Low Latency
Store
Complex multistep
pipeline
transformation
Migration of EDW
workloads
Data as a Service
Enterprise
Analytical Platform
6. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL6
Common challenges when implementing these use cases at scale
Skilled resource, shorter implementation cycles
Rapid Ingestion of data
Rework across several complex multi-step process
Robust application deployment support
Support flexible operations & SLA management
Robust operational metadata across technologies
7. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL7
Pluggable
Architecture
Community
Driven
Distribution
Compatible
How BDRE addresses the needs across the lifecycle
Operational
functions you
like to build
Development
effort from
scratch
Basic Hadoop
– at the base
“Pre-built operational
functions”
Brought it by BDRE
HADOOP
APPPLICATIONS
Minimal development effort
through Customization on
BDRE components
Supporting Operational
Functions
OPERATIONAL METADATA
RAPID INGESTION
VISUAL DATA PIPELINE
AUTOMATED WORKFLOW
ONE TOUCH DEPLOYMENT
SLA MANANGEMENT
RICH VISUALIZATION
Value – Add through
BDRE
With BDRE Without BDRE
Implementation
Jumpstart
Big Data Ready Enterprise
8. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL8
FastTrack Implementation using BDRE
Key features that can be rapidly implemented using the product
Data Ingestion via Multiple Sources
Abstraction layer: Component to ingest variety of data
(CPY, XML, DB, Mainframes)
Streaming Data Ingest – 16 sources with Twitter, Flume, logs,
message queue
File Monitoring: Component to check validity of incoming data at
file and record level
Cluster to Cluster Hive Table Migration
Job Automation & Security Integration
UI based Workflow Designer
Supports Hive, Pig, Map Reduce, Spark, R, Python
Automated Workflow Generator – Oozie/Airflow
Authentication : Integration with Kerberos & JAAS
Data Quality and Data Profiling
Enforce Data Quality and Data processing rules
(during ingestion or post ingestion)
DQ Analysis, Integrity & Failure Handling
Data Loading - Test Data Generation
One Touch Deployment
Automated central deployment and application management.
Registry of all workflow processes / templates
Automated Process flow Planner
Operational Metadata & Lineage
Job registry
Configuration management
Dependency management - Pipelining
Batch management/tracking
Real Time Execution status
Ingestion registry
Job monitoring and proactive/reactive alerting
Restartability
Analytics & Visualization
Support for Executing Models – R, Python, Spark
Zero Coding UI based configuration for common use cases
User Interface based metadata interaction& search
Data Exploration integration with notebooks
Visual Representation of workflow
10. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL10
Typical enterprise deployment view with BDRE
NN RM
Browser
App Server
Eventing
Framework
Espresso Email
Oozie
Workflow
Generator
Data Quality
Workflow
Non Hadoop
Workflows
Ingestion
Workflow
Semantic
Workflow
Bulk Data
generation
Workflow
Job Deploy
Scripts
SLA notification
BDRE UI
App
BDRE Rest
API
App Server
JAASEdge Node
Operational
Metadata
RDBMS
Metastore
Rule Engine(for
DQ)
Job
Job
Job
Job
Job
Hadoop Cluster
Proactive Reporting
APP Store
(Git Repo)
Job
Export/Import
11. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL11
BDRE Metadata Management system
12. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL12
Intra and Inter Process Dependency
Pid Enq id Parent id
300 Null Null
301 100 300
302 Null 300
303 Null 300
304 200 300
Process 101
Process 102
Process 103
Process 203
Process 204
Process 205
Process 202
Process 201
Process 100
Process 200
Process 401
Process 402
Process 301
Process 302
Process 303
Process 300
Process 304
Process 400
13. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL13
Job Status Management
InitJob
HaltJob
(success)
TermJob
(failure)
InitStep
HaltStep
(Success)
TermStep
(Failure)
BDRE
Operational
Metadata
Fail queue
Success
queue
Consumer
JIRA
M
Q
Halt and TermJob APIs can send message to MQ
for proactive alerting
Alternatively BDRE could directly connect to any
alerting/ticket mgmt system skipping the MQ
14. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL14
Batch Management
101
102
103
200201
202
203
204
205
300
301
302 303
304
400
401 402
Batch
Bat
ch
Bat
ch
Bat
ch
Queue
Bat
ch
Bat
ch
Queue
Batch
Logical pipeline between the processes
Process 200
Process 300
Process 100
Process 400
Workflow id 200
Workflow id
400Workflow id 100
Batch
A row is added to the queue table for
all downstream upon each successful
execution of an upstream process.
Downstream looks up the queue
and process all pending batches en-
queued by upstream.
Multiple source batch
consumed = one target
batch is produced
Workflow 300
100
15. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL15
Data Quality Component
Map only MR job
Mapper 1 Mapper 2 Mapper n
Rules
Guvnor API
Rule definition
Rule engine UI
Bad records Good records
Hadoop
Original file with
all records
16. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL16
Important Links
BDRE- GitHub Repo -
https://github.com/WiproOpenSourcePractice/openbdre
Contains source code, setup instructions and demo videos
To contribute, please sign up at:
BDRE – Jira: https://openbdre.atlassian.net/
Please join the community
https://groups.google.com/forum/#!forum/bdre.
If you have any questions/suggestions please email to
bdre-queries@googlegroups.com .
17. © 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL17
Sri Harsha Boda
Thank You
sri.boda@wipro.com