Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Building a modern data architecture
March 31, 2016
Ben Sharma | CEO and Founder
ben@zaloni.com
•  Award-winning provider of enterprise
data lake management solutions:
Integrated data lake management platform
Self-serv...
Data lakes will be central to the modern data architecture
Agility Insight Scalability
3
Zaloni Proprietary
•  Store all types data: structured and unstructured data
•  Store raw data in its original form for extended period of ti...
Data architecture modernizationTraditionalNew
Data Lake
Sources ETL EDW
Derived
(Transformed)
Discovery Sandbox
EDW
Stream...
Data lake challenges and complications
•  Ingestion
•  Lack of Visibility
•  Privacy and Compliance
•  Quality Issues
•  R...
Data lake reference architecture
Consumption
Zone
Source
System
File Data
DB Data
ETL Extracts
Streaming
Transient
Loading...
Data lake management platform
Unified Data Management
Managed Ingestion
Data Reliability
Data Visibility
Data Security and...
•  Ability to ingest vast amounts of data
•  Ability to handle a wide variety of formats
(streaming, files, custom)
•  Abi...
•  Reduced time to insight for analytics
•  File and record level watermarking provides data lineage
Capture metadata to i...
Diagram derived from Gartner report on Self Service Data Preparation
•  Interactive data preparation to address errors, co...
•  Data lakes enable multiple groups to share access
to centrally stored data
•  Differing permissions require enhanced da...
Discover, Enrich, Provision
Self Service Data Preparation for Analytics: Catalog, Wrangling, Collaboration
•  See what dat...
Catalog with KPIs
Zaloni Confidential and Proprietary14
•  Seeing rapid increase of big data in the Cloud
•  Leverage cloud platforms as complementary to on-premises
•  Support s...
INGEST
Manage data ingestion
so you know what is your
Hadoop Data Lake
ORGANIZE
Define and capture
metadata for ease of
se...
Network Data Lake architecture
BI Tools
Network Data Lake
Custom Apps
Data Warehouse
Custom Applications:
•  Subscriber Us...
Managed data lake for healthcare payers
Data Lake Management
Edge Node
Data Sources
Relational
Streaming
Files
Data Lake
C...
Data Lake for BCBS239 Compliance (RDARR)
Register/ update
metadata
RDBMS
Mainframes
Flat files
Binary files
Source Systems...
Getting Started
Roadmap
Prototype
Analytics Strategy
Business drivers
AND
Business
Questions:
Where is fraud
occurring?
Ho...
Stop by booth #1335
and ask for a copy of
our new book and a
free t-shirt!
DON’T GO IN THE DATA
LAKE WITHOUT US
Zaloni Pro...
Nächste SlideShare
Wird geladen in …5
×

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

2.268 Aufrufe

Veröffentlicht am

When building your data stack, the architecture could be your biggest challenge. Yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin to assemble best practices for a scalable data architecture? Ben Sharma, thought leader and coauthor of Architecting Data Lakes, offers lessons learned from the field to get you started.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

  1. 1. Building a modern data architecture March 31, 2016 Ben Sharma | CEO and Founder ben@zaloni.com
  2. 2. •  Award-winning provider of enterprise data lake management solutions: Integrated data lake management platform Self-service data preparation •  Data Lake Design and Implementation Services •  Data Science Professional Services 2 Zaloni Proprietary Delivering on the business of big data Funded by top-tier technology investors:
  3. 3. Data lakes will be central to the modern data architecture Agility Insight Scalability 3 Zaloni Proprietary
  4. 4. •  Store all types data: structured and unstructured data •  Store raw data in its original form for extended period of time •  Uses various tools to correlate, enrich and query for insights on the data •  Provides democratized access via a single unified view across the Enterprise The promise of a data lake: All data is welcome…. Zaloni Proprietary4
  5. 5. Data architecture modernizationTraditionalNew Data Lake Sources ETL EDW Derived (Transformed) Discovery Sandbox EDW Streaming Unstructured Data Various Sources Zaloni Proprietary Data Discovery Analytics BI Data Science Data Discovery Analytics BI 5
  6. 6. Data lake challenges and complications •  Ingestion •  Lack of Visibility •  Privacy and Compliance •  Quality Issues •  Reliance on IT •  Reusability •  Rate of Change •  Skills Gap •  Complexity Building: Managing: Delivering: Zaloni Proprietary6 Engage the business • Discover • Enrich • Provision Govern the data in the lake • Cleanse • Secure • Operationalize Enable the data lake • Ingest • Organize • Catalog
  7. 7. Data lake reference architecture Consumption Zone Source System File Data DB Data ETL Extracts Streaming Transient Loading Zone Raw Data Refined Data Trusted Data Discovery Sandbox Original unaltered data attributes Tokenized Data APIs Reference Data Master Data Data Wrangling Data Discovery Exploratory Analytics Metadata Data Quality Data Catalog Security Data Lake Integrate to common format Data Validation Data Cleansing Aggregations OLTP or ODS Enterprise Data Warehouse Logs (or other unstructured data) Cloud Services Business Analysts Researchers Data Scientists Zaloni Proprietary 7
  8. 8. Data lake management platform Unified Data Management Managed Ingestion Data Reliability Data Visibility Data Security and Privacy Integrated Data Lake Management Zaloni Proprietary8
  9. 9. •  Ability to ingest vast amounts of data •  Ability to handle a wide variety of formats (streaming, files, custom) •  Ability to handle wide variety of sources •  Capture operational metadata implicitly as new data arrives •  Build in repeatability through automation to pick up incoming data and apply pre-defined processing First things first….managed ingestion Various Sources Streaming Unstructured Data Zaloni Proprietary9
  10. 10. •  Reduced time to insight for analytics •  File and record level watermarking provides data lineage Capture metadata to improve data visibility and reliability Type of Metadata Description Example Technical Captures the form and structure of each data set Type of data (text, JSON, Avro), structure of the data (fields and their types) Operational Captures lineage, quality, profile and provenance of the data Source and target locations of data, size, number of records, lineage Business Captures what it all means to the user Business names, descriptions, tags, quality and masking rules Zaloni Proprietary10
  11. 11. Diagram derived from Gartner report on Self Service Data Preparation •  Interactive data preparation to address errors, corrupted formats, duplicates •  Data enrichment to go from raw to refined •  Self service to prepare data without IT request/SQL knowledge Data ready: Data preparation required for actionable data Orchestrate and automate workflows Transform Refined Data Explore BI Reports Enterprise Data Integrations Data Science Data Discovery Analytics Raw Data Automation Reusable Transformations Data Preparation Zaloni Proprietary11
  12. 12. •  Data lakes enable multiple groups to share access to centrally stored data •  Differing permissions require enhanced data security §  Mask or tokenize data before published in the lake for consumption §  Policy-based security •  Metadata management enables audit and traceability •  End result: more open and democratized access to data in the lake for those with permission Protect sensitive data Zaloni Proprietary12
  13. 13. Discover, Enrich, Provision Self Service Data Preparation for Analytics: Catalog, Wrangling, Collaboration •  See what data is available across your enterprise •  Blend data in the lake without a costly IT project •  Perform interactive data-driven transformations •  Collaborate and share data assets and transformations with peers EXPLORE PREPARE OPERATIONALIZE 13 Zaloni Proprietary
  14. 14. Catalog with KPIs Zaloni Confidential and Proprietary14
  15. 15. •  Seeing rapid increase of big data in the Cloud •  Leverage cloud platforms as complementary to on-premises •  Support sensitive data on premise and external data in the cloud (e.g. client data, machine-generated) Key data challenges for hybrid environments: “Ground to Cloud” hybrid architectures Zaloni Proprietary VISIBILITY GOVERNANCE Need enterprise-wide data catalog (logical data lake) Need consistent data governance requirements for hybrid platforms 15
  16. 16. INGEST Manage data ingestion so you know what is your Hadoop Data Lake ORGANIZE Define and capture metadata for ease of searching and browsing ENRICH Orchestrate and manage the data preparation process ENGAGE Data visibility and self- service data preparation Manage the complete data pipeline 16 Zaloni Proprietary
  17. 17. Network Data Lake architecture BI Tools Network Data Lake Custom Apps Data Warehouse Custom Applications: •  Subscriber Usage •  Network Usage Exploration & Ad-hoc Analytics Data Lake Manage Ingestion Manage Metadata Manage, Monitor, Schedule Operations and Metadata Store Data Quality & Rules Engine Transformation Engine Work flow Executor Enterprise Data Warehouse •  CDR •  DPI •  IPFIX •  SNMP •  RADIUS Network Data •  CRM •  Billing •  Inventory Enterprise Data Zaloni Proprietary 17
  18. 18. Managed data lake for healthcare payers Data Lake Management Edge Node Data Sources Relational Streaming Files Data Lake Configure Ingestion Administer Metadata Manage, Monitor, Schedule Operations and Metadata Store Data Quality & Rules Engine Transformation Engine Workflow Executor Analytical Applications Enterprise Data Warehouse Consumers Data Lake •  Claims •  EMR •  Lab/Pathology •  Pharmacy •  Member •  Social •  Enterprise Data Applications: •  HEDIS Reporting •  Bundle Payments •  Medical Benefits Management •  Scorecards •  Enterprise Reports Batch Ingestion Streaming Ingestion Change Data Capture Data Sets: 18 Zaloni Proprietary
  19. 19. Data Lake for BCBS239 Compliance (RDARR) Register/ update metadata RDBMS Mainframes Flat files Binary files Source Systems Metadata repositories Metadata Management solution Extract/ Read metadata Data Ingestion Data Quality and Validation Layout Standardization Operational Metadata Generation Data at Rest Data Acquisition Automation •  Automated Data Acquisition Framework providing timeliness of data •  Capture Metadata in all phases: Ingestion, Transformation •  Integration with Enterprise Metadata Management •  Integrated Data Quality Analysis Zaloni Proprietary 19
  20. 20. Getting Started Roadmap Prototype Analytics Strategy Business drivers AND Business Questions: Where is fraud occurring? How to optimize inventory? Data Use Cases Platform Subject areas Source system Capabilities, Process Ingest, Organize, Enrich, Explore Roadmap Prototype Analytics Strategy 1Questions 2 Inputs 3 Outcomes Zaloni Proprietary 20 + + =
  21. 21. Stop by booth #1335 and ask for a copy of our new book and a free t-shirt! DON’T GO IN THE DATA LAKE WITHOUT US Zaloni Proprietary

×