SlideShare a Scribd company logo
1 of 37
Download to read offline
BI in the Digital Era
How to do Data Science & Big Data Analytics
Maloy MANNA linkedin.com/in/maloy biguru.wordpress.com
About
Over 14 years experience in Data and Analytics.
BI Program Delivery Manager,
AXA Group Solutions
Previous roles: Product manager, Pre-sales, Technical
architect, Project manager and Consulting
Organizations: Thomson Reuters, Saama (silicon valley
analytics pure-play), Infosys, TCS
Locations worked: India, UK, US, France
Maloy MANNA linkedin.com/in/maloy biguru.wordpress.com
Maloy MANNA
Introduction
The future is digital.
The digital revolution is changing the way we do business and live our lives.
With increasing adoption of social media, smartphones, cloud computing and
technological advancements in data storage and processing, the age of big
data is upon us.
In the digital era it is no more sufficient to have look-back reports and graphs
when the competition is deriving insights from big data. It is time to test and
learn new strategies and learn new skills of data science and big data
analytics.
In this webinar we'll look at the rapid changes in the way we do analytics and
go beyond the hype to learn about practical approaches and tools you will
need to do data science.
Maloy MANNA linkedin.com/in/maloy biguru.wordpress.com
Google Trends
Defining the data explosion …
Big data is high-volume, high-velocity and high-variety information assets that
demand cost-effective, innovative forms of information processing for
enhanced insight and decision making.
- Gartner, 2001 (Doug Laney)
Volume
SCALE OF DATA
Variety
TYPES OF DATA
Velocity
SPEED OF DATA
GENERAION
Big Data is here …
2.3 TRILLION GIGABYTES
of data created each day40 ZETTABYTES
[43 Trillion GIGABYTES]
of data will be created by
2020, an increase of 300
times from 2005
6 BILLION People
have cellphones
World population: 7 Billion
4 BILLION + HOURS
of video watched on
YouTube each month
30 BILLION
PIECES OF CONTENT
are shared on Facebook
every month
400 MILLION TWEETS
sent per day by 200 million
monthly active users
Sources: McKinsey Global Institute, Twitter, Cisco, Gartner, EMC, SAS, IBM
Modern cars have close to
100 SENSORS
To monitor items like fuel
level, tire pressure
… and the Digital Revolution
…and it isn’t just about Web 2.0 / Social
E-TATTOOS
Patents on stick-on
tattoos by Google,
Motorola (mc10) WEARABLES
Fitbit, Apple, Google
SENSOR-ENABLED PILLS
Proteus
3D PRINTING
Physical objects from digital
models
SMART GRID & METERS
Digital power grid & meters
… but also the Internet of Things
The Digital Wave is …
1.75 BILLION
smartphone users in 2014
World population: 7 Billion
MOORE’S LAW
doubling integrated circuits
every 2 years
26 BILLION Estimated
Connected Devices
in the Internet Of Things
ARTIFICIAL INTELLIGENCE
& ROBOTICS
…disrupting businesses
Digital businesses operate at lower cost, at higher speed and are vastly more
innovative and disruptive. They know how to make the most of opportunities
provided by the Digital Revolution and capture new markets and build new
business models.
Winners Losers
Expectations have changed…
Business expectations have changed.
Digital disruption is forcing business to move faster. The need for speed is the
single most crucial expectation of BI from business.
Business cannot afford to wait for months while IT integrates data sources
and builds ETL to get to the “single version of truth”.
Questions have evolved.
It is no longer sufficient to have look-back reports. Newer business models
ask newer questions: what-if, why, experiment, anticipate and predict.
BI needs to evolve too.
Expectations have changed…
Trends driving changing expectations:
Apps: Can I have an app for that?
Search bar, multi-touch screen consumer tech. make users demand the same
experience as Google or Apple, in business.
Social:
Share, Crowdsource, Collaborate. Flattens hierarchies. Decentralizes decision-
making.
Mobile:
Smartphones and tablets deliver business on-the-go.
Traditional BI
Traditional BI = reports, dashboards, analysis, visualization
“Current-state” questions:
What was sold? SELECT * …
When? Where? How much? GROUP BY Time, Store …
Give me last quarter’s / month’s / week’s figures
“Analysis” = Slice-and-dice, drill-down & across
Performance improvements = Pre-built cubes, summary tables, indexes
The ETL bottleneck
Traditional BI constrained by ETL.
Ever-increasing data. Ever-decreasing ETL time-window. More Performance!!
Image: Courtesy Cloudera
Traditional BI … problems
Transactional data. Partially / fully aggregated. Structured data.
Low-fidelity. Data lineage and traceability difficult.
Constrained by ETL bottleneck.
Strict data modeling required to build data structures BEFORE ETL.
Fast-evolving requirements = Schema changes.
Fill in a CHANGE REQUEST form!
Unstructured data not allowed. Conform all LOB to “single-version-of-truth”
Self-service – limited functionality, limited to power users. IT needs to help!
Locked-down enterprise vs. Spreadmart Hell.
... So, how can BI evolve with Big Data?
With the data explosion, there has also been an advent of new tools and
technologies to manage Big Data.
The most well-known among these is of course, Hadoop.
But there are also other technologies, several of which are now being
integrated into the Hadoop ecosystem.
• Elastic cloud computing
• NoSQL databases
• In-memory computing
• Data visualization
Changing paradigm
Business expectations have changed. Questions have evolved.
Big Data = “Next State” questions
What will happen? PREDICTIVE
Why did this happen / why didn’t this happen? EXPLANATORY
What would happen if we did… HYPOTHESIS
How can we prevent …/ How to make this happen? RESPONSE
Focus shift away from transactions to sub-transactions & behaviors.
Changing paradigm
With changing business expectations BI needs to evolve.
BI in the Digital Era:
A paradigm shift from “Current state” to “Next State” questions.
Answering “Next State” questions requires a scientific approach.
Design experiments, test hypothesis, derive inference / interpret results.
This is Data Science.
Data Science
Why Data Science?
Statistical Data Science.
Data Scientist:
The sexiest job of the 21st century.
- Harvard Business Review
Data Science
Who is a Data Scientist? / What does a Data Scientist do?
Images courtesy: Drew Conway / Forbes
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
http://b-i.forbesimg.com/danwoods/files/2013/08/HilaryMason_AwesomeNerds_Graphic.png
Data access, Data cleaning, Analysis.
Exploratory Data Analysis.
Data reporting / Interpreting results.
Data Analysts with skills in math/statistics.Hilary Mason
The Tools for Data Science
Acquire Data. Prepare Data.
Exploratory Data Analysis.
Statistical Inference. Data Reporting.
Rinse. Repeat.
R. RStudio – Free.
Statistics – OpenIntro
Code versioning - Github
Pause ,
The Dangers of Data Science
Interpreting results is crucial.
Downsides…
The Dangers of Data Science
$1M Netflix prize, 2009
Contest to build a recommendation engine that could more accurately
predict the movies customers would like than Netflix’s in-house Cinematch.
Result: Not implemented.
“Additional accuracy gains that we measured did not seem to justify the
engineering effort needed to bring them into a production environment.”
- Netflix
The Tools for Data Science
Data Wrangling
… and where to get Data
R
Open Refine and Google Freebase
Perl
Microsoft Power Query for Excel
DataHero
Trifacta (formerly Data Wrangler)
Open Data www.data.gov
Google Public Data Explorer www.google.com/publicdata
KDNuggets www.kdnuggets.com/datasets
Wait… what about Hadoop?
The Hadoop Ecosystem
Image: Courtesy Hortonworks
The Hadoop Ecosystem
Image: Apache Foundation
The Hadoop Ecosystem – Data Lake
Images: Hortonworks, Cloudera
Managing Big Data / Data Science projects
How do I get started?
• Start small. Iterate. Prove value. Evolve.
• As in any project, getting buy-in is crucial.
• Don’t boil the ocean / No big-bang
• Be agile
Ongoing
• Look (and look out) for new business models
• Partner with academia
4 key points to assess feasibility:
• Technical
• Data
• Legal / Data Privacy
• Business value
Pause ,
Be prepared for disruption.
External.
Or internal. Think cloud computing against in-house IT (admin, DBA,…)
Managing Big Data / Data Science projects
The regulatory challenge – data privacy & legal
• Unethical but legal? Brand reputation at stake.
• Illegal but ethical? Possibility of changing laws.
• Hardline stances - regulators
• Grey areas
• Competition / Entrenched player / regulatory protection
Managing Big Data / Data Science projects
Security and availability aspects
• Cloud data masking. Privacy – yes, but also …
• Physical security
• Failover plan
Resources
Learning Data Science with R
Coursera Data Science Specialization
from Johns Hopkins
Data Origami
Datacamp
Python:
LearnPython.org
CodeAcademy.com
Scala:
SimplyScala
Resources
Platforms and IDEs
Dataiku Studio
KNIME
AlpineNow
More on Hadoop and Big Data …
Data visualization and … exploration
Data Visualization Tools for the Data Scientist
Statistics plots in R –
Base R, Lattice plots, ggplot2 package
DataViz software
Tableau Public, Qlik Sense Desktop, Visualize Free
Exploration –
R, ZoomData
Questions
BI in the Digital Era
How to do Data Science & Big Data Analytics
Connect
Maloy MANNA linkedin.com/in/maloy biguru.wordpress.com

More Related Content

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

BI in the Digital Era - Data Science and Big Data Analytics

  • 1. BI in the Digital Era How to do Data Science & Big Data Analytics Maloy MANNA linkedin.com/in/maloy biguru.wordpress.com
  • 2. About Over 14 years experience in Data and Analytics. BI Program Delivery Manager, AXA Group Solutions Previous roles: Product manager, Pre-sales, Technical architect, Project manager and Consulting Organizations: Thomson Reuters, Saama (silicon valley analytics pure-play), Infosys, TCS Locations worked: India, UK, US, France Maloy MANNA linkedin.com/in/maloy biguru.wordpress.com Maloy MANNA
  • 3. Introduction The future is digital. The digital revolution is changing the way we do business and live our lives. With increasing adoption of social media, smartphones, cloud computing and technological advancements in data storage and processing, the age of big data is upon us. In the digital era it is no more sufficient to have look-back reports and graphs when the competition is deriving insights from big data. It is time to test and learn new strategies and learn new skills of data science and big data analytics. In this webinar we'll look at the rapid changes in the way we do analytics and go beyond the hype to learn about practical approaches and tools you will need to do data science. Maloy MANNA linkedin.com/in/maloy biguru.wordpress.com
  • 5. Defining the data explosion … Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. - Gartner, 2001 (Doug Laney) Volume SCALE OF DATA Variety TYPES OF DATA Velocity SPEED OF DATA GENERAION
  • 6. Big Data is here … 2.3 TRILLION GIGABYTES of data created each day40 ZETTABYTES [43 Trillion GIGABYTES] of data will be created by 2020, an increase of 300 times from 2005 6 BILLION People have cellphones World population: 7 Billion 4 BILLION + HOURS of video watched on YouTube each month 30 BILLION PIECES OF CONTENT are shared on Facebook every month 400 MILLION TWEETS sent per day by 200 million monthly active users Sources: McKinsey Global Institute, Twitter, Cisco, Gartner, EMC, SAS, IBM Modern cars have close to 100 SENSORS To monitor items like fuel level, tire pressure
  • 7. … and the Digital Revolution
  • 8. …and it isn’t just about Web 2.0 / Social E-TATTOOS Patents on stick-on tattoos by Google, Motorola (mc10) WEARABLES Fitbit, Apple, Google SENSOR-ENABLED PILLS Proteus 3D PRINTING Physical objects from digital models SMART GRID & METERS Digital power grid & meters
  • 9. … but also the Internet of Things
  • 10. The Digital Wave is … 1.75 BILLION smartphone users in 2014 World population: 7 Billion MOORE’S LAW doubling integrated circuits every 2 years 26 BILLION Estimated Connected Devices in the Internet Of Things ARTIFICIAL INTELLIGENCE & ROBOTICS
  • 11. …disrupting businesses Digital businesses operate at lower cost, at higher speed and are vastly more innovative and disruptive. They know how to make the most of opportunities provided by the Digital Revolution and capture new markets and build new business models. Winners Losers
  • 12. Expectations have changed… Business expectations have changed. Digital disruption is forcing business to move faster. The need for speed is the single most crucial expectation of BI from business. Business cannot afford to wait for months while IT integrates data sources and builds ETL to get to the “single version of truth”. Questions have evolved. It is no longer sufficient to have look-back reports. Newer business models ask newer questions: what-if, why, experiment, anticipate and predict. BI needs to evolve too.
  • 13. Expectations have changed… Trends driving changing expectations: Apps: Can I have an app for that? Search bar, multi-touch screen consumer tech. make users demand the same experience as Google or Apple, in business. Social: Share, Crowdsource, Collaborate. Flattens hierarchies. Decentralizes decision- making. Mobile: Smartphones and tablets deliver business on-the-go.
  • 14. Traditional BI Traditional BI = reports, dashboards, analysis, visualization “Current-state” questions: What was sold? SELECT * … When? Where? How much? GROUP BY Time, Store … Give me last quarter’s / month’s / week’s figures “Analysis” = Slice-and-dice, drill-down & across Performance improvements = Pre-built cubes, summary tables, indexes
  • 15. The ETL bottleneck Traditional BI constrained by ETL. Ever-increasing data. Ever-decreasing ETL time-window. More Performance!! Image: Courtesy Cloudera
  • 16. Traditional BI … problems Transactional data. Partially / fully aggregated. Structured data. Low-fidelity. Data lineage and traceability difficult. Constrained by ETL bottleneck. Strict data modeling required to build data structures BEFORE ETL. Fast-evolving requirements = Schema changes. Fill in a CHANGE REQUEST form! Unstructured data not allowed. Conform all LOB to “single-version-of-truth” Self-service – limited functionality, limited to power users. IT needs to help! Locked-down enterprise vs. Spreadmart Hell.
  • 17. ... So, how can BI evolve with Big Data? With the data explosion, there has also been an advent of new tools and technologies to manage Big Data. The most well-known among these is of course, Hadoop. But there are also other technologies, several of which are now being integrated into the Hadoop ecosystem. • Elastic cloud computing • NoSQL databases • In-memory computing • Data visualization
  • 18. Changing paradigm Business expectations have changed. Questions have evolved. Big Data = “Next State” questions What will happen? PREDICTIVE Why did this happen / why didn’t this happen? EXPLANATORY What would happen if we did… HYPOTHESIS How can we prevent …/ How to make this happen? RESPONSE Focus shift away from transactions to sub-transactions & behaviors.
  • 19. Changing paradigm With changing business expectations BI needs to evolve. BI in the Digital Era: A paradigm shift from “Current state” to “Next State” questions. Answering “Next State” questions requires a scientific approach. Design experiments, test hypothesis, derive inference / interpret results. This is Data Science.
  • 20. Data Science Why Data Science? Statistical Data Science. Data Scientist: The sexiest job of the 21st century. - Harvard Business Review
  • 21. Data Science Who is a Data Scientist? / What does a Data Scientist do? Images courtesy: Drew Conway / Forbes http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram http://b-i.forbesimg.com/danwoods/files/2013/08/HilaryMason_AwesomeNerds_Graphic.png Data access, Data cleaning, Analysis. Exploratory Data Analysis. Data reporting / Interpreting results. Data Analysts with skills in math/statistics.Hilary Mason
  • 22. The Tools for Data Science Acquire Data. Prepare Data. Exploratory Data Analysis. Statistical Inference. Data Reporting. Rinse. Repeat. R. RStudio – Free. Statistics – OpenIntro Code versioning - Github
  • 23. Pause , The Dangers of Data Science Interpreting results is crucial.
  • 24. Downsides… The Dangers of Data Science $1M Netflix prize, 2009 Contest to build a recommendation engine that could more accurately predict the movies customers would like than Netflix’s in-house Cinematch. Result: Not implemented. “Additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.” - Netflix
  • 25. The Tools for Data Science Data Wrangling … and where to get Data R Open Refine and Google Freebase Perl Microsoft Power Query for Excel DataHero Trifacta (formerly Data Wrangler) Open Data www.data.gov Google Public Data Explorer www.google.com/publicdata KDNuggets www.kdnuggets.com/datasets
  • 26. Wait… what about Hadoop? The Hadoop Ecosystem Image: Courtesy Hortonworks
  • 27. The Hadoop Ecosystem Image: Apache Foundation
  • 28. The Hadoop Ecosystem – Data Lake Images: Hortonworks, Cloudera
  • 29. Managing Big Data / Data Science projects How do I get started? • Start small. Iterate. Prove value. Evolve. • As in any project, getting buy-in is crucial. • Don’t boil the ocean / No big-bang • Be agile Ongoing • Look (and look out) for new business models • Partner with academia 4 key points to assess feasibility: • Technical • Data • Legal / Data Privacy • Business value
  • 30. Pause , Be prepared for disruption. External. Or internal. Think cloud computing against in-house IT (admin, DBA,…)
  • 31. Managing Big Data / Data Science projects The regulatory challenge – data privacy & legal • Unethical but legal? Brand reputation at stake. • Illegal but ethical? Possibility of changing laws. • Hardline stances - regulators • Grey areas • Competition / Entrenched player / regulatory protection
  • 32. Managing Big Data / Data Science projects Security and availability aspects • Cloud data masking. Privacy – yes, but also … • Physical security • Failover plan
  • 33. Resources Learning Data Science with R Coursera Data Science Specialization from Johns Hopkins Data Origami Datacamp Python: LearnPython.org CodeAcademy.com Scala: SimplyScala
  • 34. Resources Platforms and IDEs Dataiku Studio KNIME AlpineNow
  • 35. More on Hadoop and Big Data …
  • 36. Data visualization and … exploration Data Visualization Tools for the Data Scientist Statistics plots in R – Base R, Lattice plots, ggplot2 package DataViz software Tableau Public, Qlik Sense Desktop, Visualize Free Exploration – R, ZoomData
  • 37. Questions BI in the Digital Era How to do Data Science & Big Data Analytics Connect Maloy MANNA linkedin.com/in/maloy biguru.wordpress.com