SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Data Lake Architecture
Modern Strategies & Approaches
Donna Burbank, Managing Director
Global Data Strategy, Ltd.
August 23rd, 2018
Follow on Twitter @donnaburbank
Twitter Event hashtag: #DAStrategies
Global Data Strategy, Ltd. 2018
Donna Burbank
Donna is a recognised industry expert in
information management with over 20 years
of experience in data strategy, information
management, data modeling, metadata
management, and enterprise architecture.
Her background is multi-faceted across
consulting, product development, product
management, brand strategy, marketing,
and business leadership.
She is currently the Managing Director at
Global Data Strategy, Ltd., an international
information management consulting
company that specializes in the alignment of
business drivers with data-centric
technology. In past roles, she has served in
key brand strategy and product
management roles at CA Technologies and
Embarcadero Technologies for several of the
leading data management products in the
market.
As an active contributor to the data
management community, she is a long time
DAMA International member, Past President
and Advisor to the DAMA Rocky Mountain
chapter, and was recently awarded the
Excellence in Data Management Award from
DAMA International in 2016.
Donna is also an analyst at the Boulder BI
Train Trust (BBBT) where she provides advice
and gains insight on the latest BI and
Analytics software in the market. She was on
several review committees for the Object
Management Group’s for key information
management and process modeling
notations.
She has worked with dozens of Fortune 500
companies worldwide in the Americas,
Europe, Asia, and Africa and speaks regularly
at industry conferences. She has co-
authored two books: Data Modeling for the
Business and Data Modeling Made Simple
with ERwin Data Modeler and is a regular
contributor to industry publications. She can
be reached at
donna.burbank@globaldatastrategy.com
Donna is based in Boulder, Colorado, USA.
2
Follow on Twitter @donnaburbank
Twitter Event hashtag: #DAStrategies
Global Data Strategy, Ltd. 2018
DATAVERSITY Data Architecture Strategies
• January - on demand Panel: Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February - on demand Building an Enterprise Data Strategy – Where to Start?
• March - on demand Modern Metadata Strategies
• April - on demand The Rise of the Graph Database
• May - on demand Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
• June - on demand Artificial Intelligence: Real-World Applications for Your Organization
• July - on demand Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic Asset
• August Data Lake Architecture – Modern Strategies & Approaches
• Sept Master Data Management: Practical Strategies for Integrating into Your Data Architecture
• October Business-Centric Data Modeling: Strategies for Maximizing Business Benefit
• December Panel: Self-Service Reporting and Data Prep – Benefits & Risks
3
This Year’s Line Up for 2018
Global Data Strategy, Ltd. 2018
Today’s Topic
4
Building a Successful Data Lake Architecture
• Data Lake or Data Swamp? By now, we’ve likely all heard the comparison.
• Data Lake architectures have the opportunity to provide the ability to integrate vast amounts of
disparate data across the organization for strategic business analytic value.
• But without a proper architecture and metadata management strategy in place, a Data Lake can
quickly devolve into a swamp of information that is difficult to understand.
• This webinar will offer practical strategies to architect and manage your Data Lake in a way that
optimizes its success.
Global Data Strategy, Ltd. 2018
Data Lakes – the Opportunity
• Data Lakes provide a response to the opportunity & reality of today’s data-focused world.
• Consumer data provides a myriad of opportunities
• IoT data for machine logs, sensors, etc.
• And more…
• aka “Big Data”
5
Opportunity and Complexity
Purchasing
Patterns
Photos &
Video
Support Call
Logs
Web Click
Activity
Etc…
Social Media
Interactions
Consumer
IoT data from
wearable
tech
Location data
from phone
Global Data Strategy, Ltd. 2018
What is Big Data?
• Big Data is often characterised by the “3 Vs”:
• Volume: Is there a high volume of data? (e.g. terabytes per day)
• Velocity: Is data generated or changed at a rapid pace? (e.g. per second, sub-second)
• Variety: Is data stored across multiple formats? (e.g. machine data, media files, log files)
• The ability to understand and manage these sources and integrate them into the
larger Business Intelligence ecosystem can provide the ability to gain valuable
insights from data.
• Social Media Sentiment Analysis – e.g. What are customers saying about our products?
• Web Browsing Analytics – Customer usage patterns
• Internet of Things (IoT) Analysis – e.g. Sensor data, Machine log data
• Customer Support – e.g. Call log analysis
• This ability leads to the “4th V” of Big Data – Value.
• Value: Valuable insights gained from the ability to analyze and
discover new patterns and trends from high-volume and/or
cross-platform systems.
• Volume
• Velocity
• Variety
Value
Global Data Strategy, Ltd. 2018
The Business Need Spans Traditional and
Modern Technology
7
Tell me what
customers are
saying about our
product.
Sybase
SAP
DB2
Oracle
SQL
Server
SQL
Azure
Informix
Teradata
DBA
Which customer
database do you
want me to pull this
from? We have 25.
Data
Architect
And, by the way, the databases
all store customer information
in a different format.
“CUST_NM” on DB2,
“cust_last_nm” on Oracle, etc.
It’s a mess.
Traditional Databases & DW
Data
Scientist
I’ll need to input the raw data
from thousands of sources, and
write a program to parse and
analyze the relevant
information.
Big Data & Data Lake
Global Data Strategy, Ltd. 2018
The 5th “V” - Veracity
• Only through proper Governance, Data Quality Management, Metadata Management, etc., can
organizations achieve the 5th “V” – Veracity.
• Veracity: Trust in the accuracy, quality and content of the organizations’ information assets.
• i.e. The hard work doesn’t go away with Big Data
Raw data used in Self-Service Analytics and BI environments is
often so poor that many data scientists and BI professionals
spend an estimated 50 – 90% of their time cleaning and
reformatting data to make it fit for purpose.(4
Source: DataCenterJournal.com
The absence of commonly understood and shared metadata
and data definitions is cited as one of the main impediments
to the success of Data Lakes.
Source: Radiant Advisors
Correcting poor data quality is a Data Scientist’s least favorite
task, consuming on average 80% of their working day
Source: Forbes 2016
71% of interviewees expect digitization to grow their
business. But 70% say the biggest barrier is finding the right
data; 62% cite inconsistent data
Source: Stibo Systems
Data Science Data Lakes
Data Science Digitization & Data Quality
Global Data Strategy, Ltd. 2018
Big Data a Growing Trend
• Over 70% of organizations are
either using Big Data solutions, or
planning to in the future.
• Analysis & Discovery are leading
trends including:
• Data Science & Discovery
• Reporting & Analytics
• “Sandbox” Exploration
9
Analysis & Discovery are Key Drivers
1 Trends in Data Architecture, 2017, DATAVERSITY, by Donna Burbank and Charles Roe
1
Global Data Strategy, Ltd. 2018
Big Data Concerns
10
• The Complexity of current Big
Data solutions & the Skills
Required to manage them
were also common issues.
• Security is a leading concern, and Data
Governance was a top write-in response.
1 Trends in Data Architecture, 2017, DATAVERSITY, by Donna
Burbank and Charles Roe
1
Global Data Strategy, Ltd. 2018
Balance Opportunity & Risk
• Scalability
• Cost Considerations
• Latency
• Storage of Diverse Data Sources
11
With the Opportunity of Big Data Comes Risk
• Privacy
• Security
• Compliance
• Collaboration between New Roles
Architecture Governance
• With the opportunity from Big Data comes a myriad of risks and concerns such as scalability, security, etc.
• These concerns can be addressed through a combination of data Lake Architecture and the supporting
Governance mechanisms.
Global Data Strategy, Ltd. 2018 12
A Successful Data Strategy links Business Goals with Technology Solutions
“Top-Down” alignment with
business priorities
“Bottom-Up” management &
inventory of data sources
Managing the people, process,
policies & culture around data
Coordinating & integrating
disparate data sources
Leveraging & managing data for
strategic advantage
Copyright 2018 Global Data Strategy, Ltd
Aligning Business Strategy and Data Strategy
Global Data Strategy, Ltd. 2018
Traditional Relational Technologies and “Big Data”:
a Paradigm Shift
Traditional
• Top-Down, Hierarchical
• Design, then Implement
• “Passive”, Push technology
• “Manageable” volumes of information
• “Stable” rate of change
• Data Warehouse
• Business Intelligence
Big Data
• Distributed, Democratic
• Discover and Analyze
• Collaborative, Interactive
• Massive volumes of information
• Rapid and Exponential rate of growth
• Data Lake
• Statistical Analysis
Design Implement Discover Analyze
Global Data Strategy, Ltd. 2018
“Traditional” way of Looking at the World: Hierarchies
• Carolus Linnaeus in 1735 established a hierarchy/taxonomy for organizing and identifying
biological systems.
Kingdom
Phylum
Class
Order
Family
Genus
Species
Global Data Strategy, Ltd. 2018
“New” Way of Looking at the World - Emergence
In philosophy, systems theory, science, and art, emergence is
the way complex systems and patterns arise out of a
multiplicity of relatively simple interactions.
- Wikipedia
I love my new
Levis jeans.
Is Levi coming
to my party?
Sale #LEVIS
20% at Macys.
LOL. TTYL.
Leving soon.
Global Data Strategy, Ltd. 2018
Data Warehouse vs. Data Lake
16
Data Warehouse Data Lake
A Data Lake is a storage repository that holds a vast
amount of raw data in its native format, including
structured, semi-structured, and unstructured data.
The data structure & requirements are not defined until
the data is needed.
A Data Warehouse is a storage repository that holds current
and historical data used for creating analytical reports. Data
structures & requirements are pre-defined, and data is
organized & stored according to these definitions.
Global Data Strategy, Ltd. 2018
Combining DW & Big Data Provides Value
• There are numerous ways to gain value from data
• Relational Database and Data Warehouse systems are one key source of value
• Customer information
• Product information
• Big Data can offer new insights from data
• From new data sources (e.g. social media, IoT)
• By correlating multiple new and existing data sources (e.g. network patterns & customer data)
• Integrating DW and Big Data can provide valuable new insights.
• Examples include:
• Customer Experience Optimization
• Churn Management
• Products & Services Innovation
17
New
InsightsData
Warehouse
Data
Lake
Global Data Strategy, Ltd. 2018
Data Lake Adoption is Varied
• Most are using a Data Lake along
with a Data Warehouse
• Many are not currently using a Data
Lake
18
1 Trends in Data Architecture, 2017, DATAVERSITY, by Donna Burbank and Charles Roe
1
Global Data Strategy, Ltd. 2018
Poll: Are you currently implementing a Data Lake?
Are you currently implementing a Data Lake?
19
YES NO
Global Data Strategy, Ltd. 2018
Integrating the Data Lake & Traditional Data Sources
• The Data Lake has a different architecture & purpose than traditional data sources such as data
warehouses.
• But the two environments can co-exist to share relevant information.
20
Data Analysis & Discovery – Data Lake Enterprise Systems of Record
Data Governance & Collaboration
Master &
Reference Data
Data Warehouse
Data MartsOperational Data
Security & Privacy
Sandbox
Lightly Modeled
Data
Data
Exploration
Reporting & Analytics
Advanced
Analytics
Self-Service BI
Standard BI
Reports
Global Data Strategy, Ltd. 2018
The Data Ecosystem
• Know what to manage closely and what to leave alone
• The more the data is shared across & beyond the organization, the more formal governance needs to be
21
Core Enterprise
Data
Functional & Operational
Data
Exploratory Data
Reference &
Master Data
Core Enterprise Data
• Common data elements used by multiple
stakeholders, departments, etc. (e.g. DW)
• Highly governed
• Highly published & shared
Functional & Operational Data
• Lightly modeled & prepared data for
limited sharing & reuse
• Collaboration-based governance
• May be future candidates for core data
Exploratory Data
• Raw or lightly prepped data for
exploratory analysis
• Mainly ad hoc, one-off analysis
• Light touch governance
Examples
• Operational Reporting
• Non-productionized analytical model data
• Ad hoc reporting & discovery
Examples
• Raw data sets for exploratory analytics
• External & Open data sources
Examples
• Common Financial Metrics: for Financial & Regulatory Reporting
• Common Attributes: Core attributes reused across multiple areas
Master & Reference Data
• Common data elements used by multiple stakeholders
across functional areas, applications, etc.
• Highly governed
• Highly published & shared
Examples
• Reference Data: Department Codes, Country Codes, etc.
• Master Data: Customer, Product, Student, Supplier, etc.
Exploratory analysis
uses core data sets
when applicable
Derived variables of
value can be fed into
Core Enterprise, or
even Master Data.
PublishPromote
Global Data Strategy, Ltd. 2018
Governance Requires Interaction Between Roles
Data Scientist
“Citizen Data
Scientist”
Data Architect
BI Reporting
Analyst
ETL Developer
Data Steward
Data Warehouse – centric roles Data Lake – centric roles
Alignment
DW Developer
Data Lake Platform
Administrator
Data Governance
ManagerCross-cutting Governance
& Architecture Roles
Global Data Strategy, Ltd. 2018
Metadata Repository vs. Data Catalogue
• The collaboration paradigm of the data lake can require a different way of managing metadata
23
Different Data Sources Require Different Ways of Working
Encyclopedia – Metadata Repository Wikipedia – Data Catalogue
• Created by a few, then published as read-only
• Single source of “vetted” truth
• Slowly-changing
• Created by a by many, edited by many
• Eventual consistency with multiple inputs
• Dynamic
For Standardized, Enterprise Data Sets
Data Warehouse
For Data Exploration, Self Service
Data Lake
Global Data Strategy, Ltd. 2018
Collaboration, Governance & Metadata
24
Data Lakes require new ways of collaborating
Core Enterprise
Data
Functional & Operational
Data
Exploratory Data
Reference &
Master Data
Metadata Repository, Stricter Governance
Data Catalogue – Collaborative Governance
• Glossary: Strictly vetted
• Data Dictionary: approved sources
• Data Lineage: detailed source/target mapping at
field level
• Audit Trails
• PII mapping and audit
• Data classification
• Glossary: Crowdsourced & open
• Data Dictionary: exploratory sources
• Data Lineage: high-level data flow and
lineage between source and target
• Usage ranking
• Usefulness ranking and “likes”
• Tagging
Global Data Strategy, Ltd. 2018
Data Catalogue: Harnessing “Tribal Knowledge”
25
Usage Ranking
• Which:
• Definitions are most
complete & helpful?
• Algorithms offer a helpful
starting point?
• Queries offer great logic
to share?
• Etc.
Helpfulness Ranking
• Which:
• Queries are others using?
• Tables are accessed the
most?
• Glossary terms are most
often searched?
• Etc.
Collaboration & Crowdsourcing
Term: Part Number
Alternate Names: Component Number
Definition:
A part number is an 8 digit alphanumeric field that uniquely
identifies a machine part used in the manufacturing process.
Is this truly the same as the old Component
Number? That was a 10 digit numeric field. It
didn’t have letters.
Yes, it is. I had the same problem for the
finance app, and I wrote a quick program to
convert the numbers. We just strip off the first
two chars now. Click here to find it.
Global Data Strategy, Ltd. 2018
Avoiding Silos
• Don’t create Data Lily Pads – i.e. disparate Data Lakes not connected with a wider Data Strategy.
26
• Often, teams create their own “stealth” data
lakes in order to solve an immediate, tactical
problem.
• This approach loses the value of cross-
functional data sharing.
• Costs issues and redundancy are also a
concern.
Global Data Strategy, Ltd. 2018
Considerations & Risks to Avoid
• The World of Data Lakes brings with it new risks and concerns
27
Platform
- On Prem
- Cloud
- Provider selection
Skills
- Outsourced
- In House
- Training Requirements
Cost
- Is Cloud the right model
for our scalable usage?
- Are we shutting off
sandboxes when we’re
done?
Data Lifecycle
- What can be cold storage vs hot storage?
- When can data be deleted?
- How do we move from Exploration to Enterprise?
Data Security
- Who has access?
- How is PII managed?
Data Governance
- Is there common semantic meaning?
- How do teams work together – operating model?
- Policies & Procedures
- Who is spinning up a sandbox & why?
Global Data Strategy, Ltd. 2018
Summary
• Data Lakes can provide significant opportunity to an organization to gain value from
cross-functional, disparate data sources
• Data Warehouses and Data Lakes work well together for a comprehensive enterprise
view.
• Data Governance is critical for the success of data lakes:
• Collaboration and sharing of information
• Access control and security
• Lifecycle and production to enterprise-data assets
• Operating model and ways of working between roles and departments
Global Data Strategy, Ltd. 2018
DATAVERSITY Data Architecture Strategies
• January - on demand Panel: Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February - on demand Building an Enterprise Data Strategy – Where to Start?
• March - on demand Modern Metadata Strategies
• April - on demand The Rise of the Graph Database: Practical Use Cases & Approaches to Benefit your Business
• May - on demand Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
• June –on demand Artificial Intelligence: Real-World Applications for Your Organization
• July – on demand Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic Asset
• August – soon on demand Data Lake Architecture – Modern Strategies & Approaches
• Sept Master Data Management: Practical Strategies for Integrating into Your Data Architecture
• October Business-Centric Data Modeling: Strategies for Maximizing Business Benefit
• December Panel: Self-Service Reporting and Data Prep – Benefits & Risks
29
This Year’s Line Up for 2018 – Join Us Next Month
Global Data Strategy, Ltd. 2018
White Paper: Trends in Data Architecture
30
Free Download
• Download from
www.globaldatastrategy.com
• Under ‘Resources/Whitepapers’
Global Data Strategy, Ltd. 2018
Questions?
31
Thoughts? Ideas?

Más contenido relacionado

Was ist angesagt?

Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data GovernanceJohn Bao Vuu
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationDenodo
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture StrategiesDATAVERSITY
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesBoris Otto
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Modern Metadata Strategies
Modern Metadata StrategiesModern Metadata Strategies
Modern Metadata StrategiesDATAVERSITY
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Reference master data management
Reference master data managementReference master data management
Reference master data managementDr. Hamdan Al-Sabri
 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata ManagementDATAVERSITY
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner
 

Was ist angesagt? (20)

Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data Governance
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data Virtualization
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture Strategies
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Modern Metadata Strategies
Modern Metadata StrategiesModern Metadata Strategies
Modern Metadata Strategies
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Reference master data management
Reference master data managementReference master data management
Reference master data management
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata Management
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
 

Ähnlich wie Data Lake Architecture – Modern Strategies & Approaches

DAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DAS Slides: Self-Service Reporting and Data Prep – Benefits & RisksDAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DAS Slides: Self-Service Reporting and Data Prep – Benefits & RisksDATAVERSITY
 
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeData Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeDATAVERSITY
 
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...DATAVERSITY
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...DATAVERSITY
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
DAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata ManagementDAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata ManagementDATAVERSITY
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
The Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementThe Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementDATAVERSITY
 
DAS Slides: Data Governance and Data Architecture – Alignment and Synergies
DAS Slides: Data Governance and Data Architecture – Alignment and SynergiesDAS Slides: Data Governance and Data Architecture – Alignment and Synergies
DAS Slides: Data Governance and Data Architecture – Alignment and SynergiesDATAVERSITY
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceDATAVERSITY
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DATAVERSITY
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityDATAVERSITY
 
Data Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph DatabaseData Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph DatabaseDATAVERSITY
 
Big Data - Bridging Technology and Humans
Big Data - Bridging Technology and HumansBig Data - Bridging Technology and Humans
Big Data - Bridging Technology and HumansMark Laurance
 
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 

Ähnlich wie Data Lake Architecture – Modern Strategies & Approaches (20)

DAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DAS Slides: Self-Service Reporting and Data Prep – Benefits & RisksDAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
 
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeData Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
 
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
DAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata ManagementDAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata Management
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
The Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementThe Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata Management
 
DAS Slides: Data Governance and Data Architecture – Alignment and Synergies
DAS Slides: Data Governance and Data Architecture – Alignment and SynergiesDAS Slides: Data Governance and Data Architecture – Alignment and Synergies
DAS Slides: Data Governance and Data Architecture – Alignment and Synergies
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-Service
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 
Data Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph DatabaseData Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph Database
 
Big Data - Bridging Technology and Humans
Big Data - Bridging Technology and HumansBig Data - Bridging Technology and Humans
Big Data - Bridging Technology and Humans
 
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
 

Mehr von DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceDATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsDATAVERSITY
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelDATAVERSITY
 

Mehr von DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and Analytics
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
 

Último

Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 

Último (20)

Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 

Data Lake Architecture – Modern Strategies & Approaches

  • 1. Data Lake Architecture Modern Strategies & Approaches Donna Burbank, Managing Director Global Data Strategy, Ltd. August 23rd, 2018 Follow on Twitter @donnaburbank Twitter Event hashtag: #DAStrategies
  • 2. Global Data Strategy, Ltd. 2018 Donna Burbank Donna is a recognised industry expert in information management with over 20 years of experience in data strategy, information management, data modeling, metadata management, and enterprise architecture. Her background is multi-faceted across consulting, product development, product management, brand strategy, marketing, and business leadership. She is currently the Managing Director at Global Data Strategy, Ltd., an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. In past roles, she has served in key brand strategy and product management roles at CA Technologies and Embarcadero Technologies for several of the leading data management products in the market. As an active contributor to the data management community, she is a long time DAMA International member, Past President and Advisor to the DAMA Rocky Mountain chapter, and was recently awarded the Excellence in Data Management Award from DAMA International in 2016. Donna is also an analyst at the Boulder BI Train Trust (BBBT) where she provides advice and gains insight on the latest BI and Analytics software in the market. She was on several review committees for the Object Management Group’s for key information management and process modeling notations. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences. She has co- authored two books: Data Modeling for the Business and Data Modeling Made Simple with ERwin Data Modeler and is a regular contributor to industry publications. She can be reached at donna.burbank@globaldatastrategy.com Donna is based in Boulder, Colorado, USA. 2 Follow on Twitter @donnaburbank Twitter Event hashtag: #DAStrategies
  • 3. Global Data Strategy, Ltd. 2018 DATAVERSITY Data Architecture Strategies • January - on demand Panel: Emerging Trends in Data Architecture – What’s the Next Big Thing? • February - on demand Building an Enterprise Data Strategy – Where to Start? • March - on demand Modern Metadata Strategies • April - on demand The Rise of the Graph Database • May - on demand Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape • June - on demand Artificial Intelligence: Real-World Applications for Your Organization • July - on demand Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic Asset • August Data Lake Architecture – Modern Strategies & Approaches • Sept Master Data Management: Practical Strategies for Integrating into Your Data Architecture • October Business-Centric Data Modeling: Strategies for Maximizing Business Benefit • December Panel: Self-Service Reporting and Data Prep – Benefits & Risks 3 This Year’s Line Up for 2018
  • 4. Global Data Strategy, Ltd. 2018 Today’s Topic 4 Building a Successful Data Lake Architecture • Data Lake or Data Swamp? By now, we’ve likely all heard the comparison. • Data Lake architectures have the opportunity to provide the ability to integrate vast amounts of disparate data across the organization for strategic business analytic value. • But without a proper architecture and metadata management strategy in place, a Data Lake can quickly devolve into a swamp of information that is difficult to understand. • This webinar will offer practical strategies to architect and manage your Data Lake in a way that optimizes its success.
  • 5. Global Data Strategy, Ltd. 2018 Data Lakes – the Opportunity • Data Lakes provide a response to the opportunity & reality of today’s data-focused world. • Consumer data provides a myriad of opportunities • IoT data for machine logs, sensors, etc. • And more… • aka “Big Data” 5 Opportunity and Complexity Purchasing Patterns Photos & Video Support Call Logs Web Click Activity Etc… Social Media Interactions Consumer IoT data from wearable tech Location data from phone
  • 6. Global Data Strategy, Ltd. 2018 What is Big Data? • Big Data is often characterised by the “3 Vs”: • Volume: Is there a high volume of data? (e.g. terabytes per day) • Velocity: Is data generated or changed at a rapid pace? (e.g. per second, sub-second) • Variety: Is data stored across multiple formats? (e.g. machine data, media files, log files) • The ability to understand and manage these sources and integrate them into the larger Business Intelligence ecosystem can provide the ability to gain valuable insights from data. • Social Media Sentiment Analysis – e.g. What are customers saying about our products? • Web Browsing Analytics – Customer usage patterns • Internet of Things (IoT) Analysis – e.g. Sensor data, Machine log data • Customer Support – e.g. Call log analysis • This ability leads to the “4th V” of Big Data – Value. • Value: Valuable insights gained from the ability to analyze and discover new patterns and trends from high-volume and/or cross-platform systems. • Volume • Velocity • Variety Value
  • 7. Global Data Strategy, Ltd. 2018 The Business Need Spans Traditional and Modern Technology 7 Tell me what customers are saying about our product. Sybase SAP DB2 Oracle SQL Server SQL Azure Informix Teradata DBA Which customer database do you want me to pull this from? We have 25. Data Architect And, by the way, the databases all store customer information in a different format. “CUST_NM” on DB2, “cust_last_nm” on Oracle, etc. It’s a mess. Traditional Databases & DW Data Scientist I’ll need to input the raw data from thousands of sources, and write a program to parse and analyze the relevant information. Big Data & Data Lake
  • 8. Global Data Strategy, Ltd. 2018 The 5th “V” - Veracity • Only through proper Governance, Data Quality Management, Metadata Management, etc., can organizations achieve the 5th “V” – Veracity. • Veracity: Trust in the accuracy, quality and content of the organizations’ information assets. • i.e. The hard work doesn’t go away with Big Data Raw data used in Self-Service Analytics and BI environments is often so poor that many data scientists and BI professionals spend an estimated 50 – 90% of their time cleaning and reformatting data to make it fit for purpose.(4 Source: DataCenterJournal.com The absence of commonly understood and shared metadata and data definitions is cited as one of the main impediments to the success of Data Lakes. Source: Radiant Advisors Correcting poor data quality is a Data Scientist’s least favorite task, consuming on average 80% of their working day Source: Forbes 2016 71% of interviewees expect digitization to grow their business. But 70% say the biggest barrier is finding the right data; 62% cite inconsistent data Source: Stibo Systems Data Science Data Lakes Data Science Digitization & Data Quality
  • 9. Global Data Strategy, Ltd. 2018 Big Data a Growing Trend • Over 70% of organizations are either using Big Data solutions, or planning to in the future. • Analysis & Discovery are leading trends including: • Data Science & Discovery • Reporting & Analytics • “Sandbox” Exploration 9 Analysis & Discovery are Key Drivers 1 Trends in Data Architecture, 2017, DATAVERSITY, by Donna Burbank and Charles Roe 1
  • 10. Global Data Strategy, Ltd. 2018 Big Data Concerns 10 • The Complexity of current Big Data solutions & the Skills Required to manage them were also common issues. • Security is a leading concern, and Data Governance was a top write-in response. 1 Trends in Data Architecture, 2017, DATAVERSITY, by Donna Burbank and Charles Roe 1
  • 11. Global Data Strategy, Ltd. 2018 Balance Opportunity & Risk • Scalability • Cost Considerations • Latency • Storage of Diverse Data Sources 11 With the Opportunity of Big Data Comes Risk • Privacy • Security • Compliance • Collaboration between New Roles Architecture Governance • With the opportunity from Big Data comes a myriad of risks and concerns such as scalability, security, etc. • These concerns can be addressed through a combination of data Lake Architecture and the supporting Governance mechanisms.
  • 12. Global Data Strategy, Ltd. 2018 12 A Successful Data Strategy links Business Goals with Technology Solutions “Top-Down” alignment with business priorities “Bottom-Up” management & inventory of data sources Managing the people, process, policies & culture around data Coordinating & integrating disparate data sources Leveraging & managing data for strategic advantage Copyright 2018 Global Data Strategy, Ltd Aligning Business Strategy and Data Strategy
  • 13. Global Data Strategy, Ltd. 2018 Traditional Relational Technologies and “Big Data”: a Paradigm Shift Traditional • Top-Down, Hierarchical • Design, then Implement • “Passive”, Push technology • “Manageable” volumes of information • “Stable” rate of change • Data Warehouse • Business Intelligence Big Data • Distributed, Democratic • Discover and Analyze • Collaborative, Interactive • Massive volumes of information • Rapid and Exponential rate of growth • Data Lake • Statistical Analysis Design Implement Discover Analyze
  • 14. Global Data Strategy, Ltd. 2018 “Traditional” way of Looking at the World: Hierarchies • Carolus Linnaeus in 1735 established a hierarchy/taxonomy for organizing and identifying biological systems. Kingdom Phylum Class Order Family Genus Species
  • 15. Global Data Strategy, Ltd. 2018 “New” Way of Looking at the World - Emergence In philosophy, systems theory, science, and art, emergence is the way complex systems and patterns arise out of a multiplicity of relatively simple interactions. - Wikipedia I love my new Levis jeans. Is Levi coming to my party? Sale #LEVIS 20% at Macys. LOL. TTYL. Leving soon.
  • 16. Global Data Strategy, Ltd. 2018 Data Warehouse vs. Data Lake 16 Data Warehouse Data Lake A Data Lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure & requirements are not defined until the data is needed. A Data Warehouse is a storage repository that holds current and historical data used for creating analytical reports. Data structures & requirements are pre-defined, and data is organized & stored according to these definitions.
  • 17. Global Data Strategy, Ltd. 2018 Combining DW & Big Data Provides Value • There are numerous ways to gain value from data • Relational Database and Data Warehouse systems are one key source of value • Customer information • Product information • Big Data can offer new insights from data • From new data sources (e.g. social media, IoT) • By correlating multiple new and existing data sources (e.g. network patterns & customer data) • Integrating DW and Big Data can provide valuable new insights. • Examples include: • Customer Experience Optimization • Churn Management • Products & Services Innovation 17 New InsightsData Warehouse Data Lake
  • 18. Global Data Strategy, Ltd. 2018 Data Lake Adoption is Varied • Most are using a Data Lake along with a Data Warehouse • Many are not currently using a Data Lake 18 1 Trends in Data Architecture, 2017, DATAVERSITY, by Donna Burbank and Charles Roe 1
  • 19. Global Data Strategy, Ltd. 2018 Poll: Are you currently implementing a Data Lake? Are you currently implementing a Data Lake? 19 YES NO
  • 20. Global Data Strategy, Ltd. 2018 Integrating the Data Lake & Traditional Data Sources • The Data Lake has a different architecture & purpose than traditional data sources such as data warehouses. • But the two environments can co-exist to share relevant information. 20 Data Analysis & Discovery – Data Lake Enterprise Systems of Record Data Governance & Collaboration Master & Reference Data Data Warehouse Data MartsOperational Data Security & Privacy Sandbox Lightly Modeled Data Data Exploration Reporting & Analytics Advanced Analytics Self-Service BI Standard BI Reports
  • 21. Global Data Strategy, Ltd. 2018 The Data Ecosystem • Know what to manage closely and what to leave alone • The more the data is shared across & beyond the organization, the more formal governance needs to be 21 Core Enterprise Data Functional & Operational Data Exploratory Data Reference & Master Data Core Enterprise Data • Common data elements used by multiple stakeholders, departments, etc. (e.g. DW) • Highly governed • Highly published & shared Functional & Operational Data • Lightly modeled & prepared data for limited sharing & reuse • Collaboration-based governance • May be future candidates for core data Exploratory Data • Raw or lightly prepped data for exploratory analysis • Mainly ad hoc, one-off analysis • Light touch governance Examples • Operational Reporting • Non-productionized analytical model data • Ad hoc reporting & discovery Examples • Raw data sets for exploratory analytics • External & Open data sources Examples • Common Financial Metrics: for Financial & Regulatory Reporting • Common Attributes: Core attributes reused across multiple areas Master & Reference Data • Common data elements used by multiple stakeholders across functional areas, applications, etc. • Highly governed • Highly published & shared Examples • Reference Data: Department Codes, Country Codes, etc. • Master Data: Customer, Product, Student, Supplier, etc. Exploratory analysis uses core data sets when applicable Derived variables of value can be fed into Core Enterprise, or even Master Data. PublishPromote
  • 22. Global Data Strategy, Ltd. 2018 Governance Requires Interaction Between Roles Data Scientist “Citizen Data Scientist” Data Architect BI Reporting Analyst ETL Developer Data Steward Data Warehouse – centric roles Data Lake – centric roles Alignment DW Developer Data Lake Platform Administrator Data Governance ManagerCross-cutting Governance & Architecture Roles
  • 23. Global Data Strategy, Ltd. 2018 Metadata Repository vs. Data Catalogue • The collaboration paradigm of the data lake can require a different way of managing metadata 23 Different Data Sources Require Different Ways of Working Encyclopedia – Metadata Repository Wikipedia – Data Catalogue • Created by a few, then published as read-only • Single source of “vetted” truth • Slowly-changing • Created by a by many, edited by many • Eventual consistency with multiple inputs • Dynamic For Standardized, Enterprise Data Sets Data Warehouse For Data Exploration, Self Service Data Lake
  • 24. Global Data Strategy, Ltd. 2018 Collaboration, Governance & Metadata 24 Data Lakes require new ways of collaborating Core Enterprise Data Functional & Operational Data Exploratory Data Reference & Master Data Metadata Repository, Stricter Governance Data Catalogue – Collaborative Governance • Glossary: Strictly vetted • Data Dictionary: approved sources • Data Lineage: detailed source/target mapping at field level • Audit Trails • PII mapping and audit • Data classification • Glossary: Crowdsourced & open • Data Dictionary: exploratory sources • Data Lineage: high-level data flow and lineage between source and target • Usage ranking • Usefulness ranking and “likes” • Tagging
  • 25. Global Data Strategy, Ltd. 2018 Data Catalogue: Harnessing “Tribal Knowledge” 25 Usage Ranking • Which: • Definitions are most complete & helpful? • Algorithms offer a helpful starting point? • Queries offer great logic to share? • Etc. Helpfulness Ranking • Which: • Queries are others using? • Tables are accessed the most? • Glossary terms are most often searched? • Etc. Collaboration & Crowdsourcing Term: Part Number Alternate Names: Component Number Definition: A part number is an 8 digit alphanumeric field that uniquely identifies a machine part used in the manufacturing process. Is this truly the same as the old Component Number? That was a 10 digit numeric field. It didn’t have letters. Yes, it is. I had the same problem for the finance app, and I wrote a quick program to convert the numbers. We just strip off the first two chars now. Click here to find it.
  • 26. Global Data Strategy, Ltd. 2018 Avoiding Silos • Don’t create Data Lily Pads – i.e. disparate Data Lakes not connected with a wider Data Strategy. 26 • Often, teams create their own “stealth” data lakes in order to solve an immediate, tactical problem. • This approach loses the value of cross- functional data sharing. • Costs issues and redundancy are also a concern.
  • 27. Global Data Strategy, Ltd. 2018 Considerations & Risks to Avoid • The World of Data Lakes brings with it new risks and concerns 27 Platform - On Prem - Cloud - Provider selection Skills - Outsourced - In House - Training Requirements Cost - Is Cloud the right model for our scalable usage? - Are we shutting off sandboxes when we’re done? Data Lifecycle - What can be cold storage vs hot storage? - When can data be deleted? - How do we move from Exploration to Enterprise? Data Security - Who has access? - How is PII managed? Data Governance - Is there common semantic meaning? - How do teams work together – operating model? - Policies & Procedures - Who is spinning up a sandbox & why?
  • 28. Global Data Strategy, Ltd. 2018 Summary • Data Lakes can provide significant opportunity to an organization to gain value from cross-functional, disparate data sources • Data Warehouses and Data Lakes work well together for a comprehensive enterprise view. • Data Governance is critical for the success of data lakes: • Collaboration and sharing of information • Access control and security • Lifecycle and production to enterprise-data assets • Operating model and ways of working between roles and departments
  • 29. Global Data Strategy, Ltd. 2018 DATAVERSITY Data Architecture Strategies • January - on demand Panel: Emerging Trends in Data Architecture – What’s the Next Big Thing? • February - on demand Building an Enterprise Data Strategy – Where to Start? • March - on demand Modern Metadata Strategies • April - on demand The Rise of the Graph Database: Practical Use Cases & Approaches to Benefit your Business • May - on demand Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape • June –on demand Artificial Intelligence: Real-World Applications for Your Organization • July – on demand Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic Asset • August – soon on demand Data Lake Architecture – Modern Strategies & Approaches • Sept Master Data Management: Practical Strategies for Integrating into Your Data Architecture • October Business-Centric Data Modeling: Strategies for Maximizing Business Benefit • December Panel: Self-Service Reporting and Data Prep – Benefits & Risks 29 This Year’s Line Up for 2018 – Join Us Next Month
  • 30. Global Data Strategy, Ltd. 2018 White Paper: Trends in Data Architecture 30 Free Download • Download from www.globaldatastrategy.com • Under ‘Resources/Whitepapers’
  • 31. Global Data Strategy, Ltd. 2018 Questions? 31 Thoughts? Ideas?