SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Building your Big Data
Roadmap and Data
Governance Strategy
Lessons from a Data Nerd - Raquel Seville
About Raquel
➔ Big Data Nerd: I have been working with massive volumes of
structured and unstructured data for over a decade
➔ Analytics Wrangler: I am keen on BI user adoption and having the
right tools, talent and processes to ensure maximum ROI
➔ Blogger: I blog all things data at www.exportBI.com
➔ Author: SAP OpenUI5 for Mobile BI and Analytics
➔ SAP Mentor
➔ Foodie, Travel Addict @QuelzSeville
Digital Age?
“The Stone Age did not
end for lack of stone, and
the Oil Age will end long
before the world runs out
of oil.”
- Saudi Oil Minister
Sheikh Yamani
What is the fastest growing
commodity in the world?
It’s not oil! It’s Data.
Data is the new oil...
Data Never
Sleeps: In 60 seconds,
Whatsapp users send 29
million messages,
Google receives 4
million search queries,
Instagram has over 65k
photos uploaded, and
Facebook has 3.3
million posts
Data is growing!
The exponential growth of data
and intelligent things in an
environment of ubiquitous
Internet connectivity is enabling
a fourth industrial revolution —
digital business
transformation
- Jen Underwood, Founder,
Impact Analytix, LLC
Say Hello to Netflix.
Over 109 million members
globally in over 190 countries
Streaming over 125 million
hours of content per day
Data warehouse size is over
60 petabytes
Netflix Big Data
Two streams of data - event and
dimension data
Event data from cloud services via
Ursula (data pipeline)
Dimension data is pulled from
Cassandra cluster
S3 is the single source of truth (Cloud
DW from AWS)
The biggest big data
challenge at Netflix is
scale
Netflix offers Big Data as
a Service
Netflix developed Genie to manage
access to clusters and data abstraction
Genie is a federated job orchestration
engine
It is designed to manage various big
data jobs such as Hadoop, Pig, Hive
Read more: https://github.com/Netflix/genie
Source: QCon SF 2016 - Netflix Big Data Infrastructure
https://www.infoq.com/presentations/netflix-big-data-infrastructure
Quality user experience
Netflix uses big data to predict the next hit
series and this helps to strengthen their
position as a content provider
Big Data also helps to drive customer
recommendations and helps to improve
predictions based on customer’s viewing
habits
Your Roadmap
When building your big data roadmap and
data governance strategy, there are two
broad areas that you must focus on and these
areas can be asked as questions:
➔ Where are you now?
Take a closer look at your existing
environment, tools, data, users
➔ Where do you want to be?
Develop a plan and determine
accessibility, budget, stakeholders and
so on.
Where are you now?
Existing tools, infrastructure
and resources that support
reporting, data warehousing,
dashboards, data mining
and analytics
Where are you now?
What are your big data
sources (in-house databases,
social media, websites)
Where are you now?
Analyse user base, company
size, user roles (SME/Domain
expert, analysts, power users,
consumers), access and
security restrictions
Where are you now?
What problems are you
trying to solve?
What decisions are you trying
to make?
Where are you now?
What is the existing data
culture?
Document processes and
identify gaps
Where do you want to be?
Create a project
plan and identify
scope, budget,
timelines, KPIs,
stakeholders
Where do you want to be?
Determine
accessibility
needs for
deployment, such as
desktop, mobile, cloud,
apps, web
Where do you want to be?
Deliver quick wins
and tangible value-
added solutions
Where do you want to be?
Align to analytics value
escalator
descriptive,
diagnostic,
predictive, or
prescriptive
“You can have data without
information, but you cannot
have information without
data.”
– Daniel Keys Moran

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data for Small Companies by Don Pierson
Big Data for Small Companies  by Don PiersonBig Data for Small Companies  by Don Pierson
Big Data for Small Companies by Don Pierson
PHX Startup Week
 

Was ist angesagt? (20)

Operational Research Society - annual analytics summit 2017
Operational Research Society - annual analytics summit 2017Operational Research Society - annual analytics summit 2017
Operational Research Society - annual analytics summit 2017
 
Data Con LA 2020 Keynote - Bryan Kirschner
Data Con LA 2020 Keynote - Bryan KirschnerData Con LA 2020 Keynote - Bryan Kirschner
Data Con LA 2020 Keynote - Bryan Kirschner
 
Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class
 
The Role of Artificial Intelligence in Corporate Innovation
The Role of Artificial Intelligence in Corporate InnovationThe Role of Artificial Intelligence in Corporate Innovation
The Role of Artificial Intelligence in Corporate Innovation
 
Big Data for Small Companies by Don Pierson
Big Data for Small Companies  by Don PiersonBig Data for Small Companies  by Don Pierson
Big Data for Small Companies by Don Pierson
 
(Open) Data Innovation: Sharing Data in the Cloud for Greater Innovation and ...
(Open) Data Innovation: Sharing Data in the Cloud for Greater Innovation and ...(Open) Data Innovation: Sharing Data in the Cloud for Greater Innovation and ...
(Open) Data Innovation: Sharing Data in the Cloud for Greater Innovation and ...
 
How to Create Newsworthy Content
How to Create Newsworthy ContentHow to Create Newsworthy Content
How to Create Newsworthy Content
 
Bigdata
Bigdata Bigdata
Bigdata
 
Enrich Gis With Social Media And Open Data
Enrich Gis With Social Media And Open DataEnrich Gis With Social Media And Open Data
Enrich Gis With Social Media And Open Data
 
Infoactive Hacks/Hackers presentation
Infoactive Hacks/Hackers presentationInfoactive Hacks/Hackers presentation
Infoactive Hacks/Hackers presentation
 
Dr Ohad Barzilay
Dr Ohad BarzilayDr Ohad Barzilay
Dr Ohad Barzilay
 
Big data
Big dataBig data
Big data
 
Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?
 
Ekc 2017 big data
Ekc 2017  big dataEkc 2017  big data
Ekc 2017 big data
 
Big data
Big dataBig data
Big data
 
Data science a glance
Data science a glanceData science a glance
Data science a glance
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATIONKNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
 
SEO Theory vs. Reality
SEO Theory vs. RealitySEO Theory vs. Reality
SEO Theory vs. Reality
 
Big data
Big dataBig data
Big data
 

Ähnlich wie BizTech2017 Presentation

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
exploit_big_data_v1
exploit_big_data_v1exploit_big_data_v1
exploit_big_data_v1
Attila Barta
 

Ähnlich wie BizTech2017 Presentation (20)

What is big data.pdf
What is big data.pdfWhat is big data.pdf
What is big data.pdf
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
How technologies like big data and social
How technologies like big data and socialHow technologies like big data and social
How technologies like big data and social
 
Why Alt Data Is So Important
Why Alt Data Is So ImportantWhy Alt Data Is So Important
Why Alt Data Is So Important
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Future Of Digital Marketing
Future Of Digital MarketingFuture Of Digital Marketing
Future Of Digital Marketing
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
 
Computer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VComputer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop V
 
exploit_big_data_v1
exploit_big_data_v1exploit_big_data_v1
exploit_big_data_v1
 
Crawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopCrawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with Hadoop
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
SPS Utah 2016 - Unlock your big data with analytics and BI on Office 365
SPS Utah 2016 - Unlock your big data with analytics and BI on Office 365SPS Utah 2016 - Unlock your big data with analytics and BI on Office 365
SPS Utah 2016 - Unlock your big data with analytics and BI on Office 365
 
SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365
 
Basic SEO by Andrea H. Berberich @webpresenceopti
Basic SEO by Andrea H. Berberich @webpresenceoptiBasic SEO by Andrea H. Berberich @webpresenceopti
Basic SEO by Andrea H. Berberich @webpresenceopti
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big Data
 
Generating Big Value from Big Data
Generating Big Value from Big DataGenerating Big Value from Big Data
Generating Big Value from Big Data
 
Agile data science
Agile data scienceAgile data science
Agile data science
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

BizTech2017 Presentation

  • 1. Building your Big Data Roadmap and Data Governance Strategy Lessons from a Data Nerd - Raquel Seville
  • 2. About Raquel ➔ Big Data Nerd: I have been working with massive volumes of structured and unstructured data for over a decade ➔ Analytics Wrangler: I am keen on BI user adoption and having the right tools, talent and processes to ensure maximum ROI ➔ Blogger: I blog all things data at www.exportBI.com ➔ Author: SAP OpenUI5 for Mobile BI and Analytics ➔ SAP Mentor ➔ Foodie, Travel Addict @QuelzSeville
  • 3. Digital Age? “The Stone Age did not end for lack of stone, and the Oil Age will end long before the world runs out of oil.” - Saudi Oil Minister Sheikh Yamani
  • 4. What is the fastest growing commodity in the world?
  • 5. It’s not oil! It’s Data. Data is the new oil...
  • 6. Data Never Sleeps: In 60 seconds, Whatsapp users send 29 million messages, Google receives 4 million search queries, Instagram has over 65k photos uploaded, and Facebook has 3.3 million posts
  • 7. Data is growing! The exponential growth of data and intelligent things in an environment of ubiquitous Internet connectivity is enabling a fourth industrial revolution — digital business transformation - Jen Underwood, Founder, Impact Analytix, LLC
  • 8. Say Hello to Netflix. Over 109 million members globally in over 190 countries Streaming over 125 million hours of content per day Data warehouse size is over 60 petabytes
  • 9. Netflix Big Data Two streams of data - event and dimension data Event data from cloud services via Ursula (data pipeline) Dimension data is pulled from Cassandra cluster S3 is the single source of truth (Cloud DW from AWS)
  • 10. The biggest big data challenge at Netflix is scale
  • 11. Netflix offers Big Data as a Service Netflix developed Genie to manage access to clusters and data abstraction Genie is a federated job orchestration engine It is designed to manage various big data jobs such as Hadoop, Pig, Hive Read more: https://github.com/Netflix/genie
  • 12. Source: QCon SF 2016 - Netflix Big Data Infrastructure https://www.infoq.com/presentations/netflix-big-data-infrastructure
  • 13. Quality user experience Netflix uses big data to predict the next hit series and this helps to strengthen their position as a content provider Big Data also helps to drive customer recommendations and helps to improve predictions based on customer’s viewing habits
  • 14. Your Roadmap When building your big data roadmap and data governance strategy, there are two broad areas that you must focus on and these areas can be asked as questions: ➔ Where are you now? Take a closer look at your existing environment, tools, data, users ➔ Where do you want to be? Develop a plan and determine accessibility, budget, stakeholders and so on.
  • 15. Where are you now? Existing tools, infrastructure and resources that support reporting, data warehousing, dashboards, data mining and analytics
  • 16. Where are you now? What are your big data sources (in-house databases, social media, websites)
  • 17. Where are you now? Analyse user base, company size, user roles (SME/Domain expert, analysts, power users, consumers), access and security restrictions
  • 18. Where are you now? What problems are you trying to solve? What decisions are you trying to make?
  • 19. Where are you now? What is the existing data culture? Document processes and identify gaps
  • 20. Where do you want to be? Create a project plan and identify scope, budget, timelines, KPIs, stakeholders
  • 21. Where do you want to be? Determine accessibility needs for deployment, such as desktop, mobile, cloud, apps, web
  • 22. Where do you want to be? Deliver quick wins and tangible value- added solutions
  • 23. Where do you want to be? Align to analytics value escalator descriptive, diagnostic, predictive, or prescriptive
  • 24. “You can have data without information, but you cannot have information without data.” – Daniel Keys Moran

Hinweis der Redaktion

  1. After disruption, there is a shift. With advances is technology, there is greater diversification of energy resources; solar, wind, nuclear etc.
  2. Internet of things Digital business
  3. A petabyte (PB) is 1015 bytes of data, 1,000 terabytes (TB) or 1,000,000 gigabytes (GB).
  4. Multiple hadoop clusters Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
  5. https://github.com/Netflix/genie
  6. Parquet file format: It is column oriented, allowing for improved compression. Parquet files also store additional metadata, such as information about the min / max length of columns and their sizes. This allows operations such as counts or skipping to be performed very quickly. Hadoop Distributed File System ( HDFS )
  7. netflix data: time spent selecting movies, time of day, playback habits
  8. What we are not fully aware of however is how to leverage all this data to get the most value for decision making and analysis. The main driver that sits at the core of demystifying this problem is a solid, yet evolving big data roadmap and data governance strategy.
  9. Problems: poor quality, redundancy, security, privacy, availability, updates, complexity, volume Decisions: company goals, improve decision making in specific areas, strategic objectives
  10. Senior management buy-in is critical
  11. Determine accessibility needs for deployment, such as desktop, mobile, cloud, apps, web etc.
  12. Use iterative and agile methods to deliver quick wins and deliver tangible value-added solutions in short sprints (high-value, low-cost)
  13. Decide where you want to be along the analytics value escalator - Descriptive, Diagnostic, Predictive or Prescriptive (Machine Learning/Artificial Intelligence)?