SlideShare a Scribd company logo
1 of 27
Download to read offline
Big Data and Big Analytics –
Why, what and how
Agenda
• Big Data and Big Analytics – What is it?
• Big Analytics vs. the Data Warehouse?
• Big Analytics examples
• Database technologies for Big Analytics
• Questions and Answers
What is Big Data?
• Big Data is data that is not
immediately related to my own
business
• Big Data is largely unstructured
• Big Data consists of data from many
different sources, such as
Facebook,Twitter web-pages, blogs
and any other source you can find
• Big Data is all about volume and
analysis!
Because you want to grow your
business!
• You can get customers from your competitors
– The data on these customers are not in your CRM!
– Why did they go with someone else than with
you? Your Data Warehouse has few answers to
this!
• You can grow the market
– Those new customers are not in your CRM or Data
Warehouse either, to a large extent!
• You can do both of these!
Why do I need all this data
• “My Data Warehouse tells me all I ever want
to know, in gruesome detail, about my
customers, what more do I need?”
• “I get much more data from my CRM system
than I do from friggin’ Facebook!”
• “Why would I need all those pictures from
Facebook and all those twitter texts, they tell
me nuthin’!”
What is Big Analytics
• To get insights from Big Data, you need a more
powerful analysis: Big Analytics
• Big Analytics often cannot rely
on simple BTREE indexes
• Big Analytics provides
exponentially better accuracy
the more data you have
What is Big Analytics useful for?
• For getting information on things
in the “outside world”
– My competitors
– My competitors customers
• For foreseeing trends
– What will be “the next big thing” in my business?
– What new markets are developing?
– What is happening in my current market?
Big Data, Analytics and Insights!
Big
Data
Big
Analytics
Big
Insights!
Big Analytics use cases
• The higher the volume of your business, the
more useful Big Data becomes
– If you have very few customers, Big Data might be
less useful
• Retail is a common use case, but there are
many more
– Finance – Big Data trend analysis
– Intelligence – Analysis of new and unknown
trends and loosely tied groups
– Politics – What is my competition up to?
Big Analytics vs. Data Warehouse
• Your Data Warehouse is very focused and
contains high quality information on low level
data:
“John Doe bought Chocko Chocolate Chip
Cookies for $3.61 on Jan 12 2013”
• Big Data provides much more data, but each
information item has less detail to it:
“Chocko Chocolate Chip Cookies suck!”
“An increasing amount of people tweet about
Chocolate Chip Cookies”
Big Analytics vs. Data Warehouse
• What Big Analytics lack in terms of data item
correctness can be compensated for by:
– Volume: If more than 200.000 tweets agree that
our Chocko cookies suck, then we should probably
look into it.
– Proper analysis: Images can be analyzed for
content and stuff you didn’t think about: Maybe
“Ma Cookies” brand cookies has an edge on us in
that their packaging looks more pleasing? Do we
see “Ma Cookies” being eaten in unexpected
places or at unexpected times?
Big Analytics - Linguistic analysis
• This is for tweets, blogs, Facebook and similar.
Proper linguistic analysis is complex:
– Sentiment
“Ma Cookies might seems like they suck, but they
are actually quite tasty”
– Temporal
“In January 2011 we wrote that Chocko Cookies
used to taste like manure in 2008, but that they
have improved since then”
– Ranking
– Really complex for larger blocks of text
Other types of Big Analytics
• Image analysis is a fast developing field,
where we find new and interesting use cases
– What are the most popular colors?
– What color has peoples clothes?
– How long has that suitcase been standing at the
floor at the airport?
• Location analysis
– Where did this happen?
– In what city is that? What country?
• Temporal analysis
– When did this happen? When was it published?
New Visualizations for New Insights
• Visualizing data as a report with columns and
rows isn’t always effective
• With new and diverse types of data, we need
new ways of visualizing data
– Location on maps
– Timelines
– Sentiments
• Even with traditional Data Warehouse data,
new visualizing can provide new insights!
• Interactive visualizations
Big Analytics and Visualization examples
What is Mitt Romney talking about?
Map Visualization – Android or iOS
Visualizations by MapBox
• Smartphone OS metadata in Geography view
– iPhone is Red, Android is Green
– Based on data from Verizon passed to NSA
Big Analytics database issues
• Big Analytics is complex!
• Big Analytics doesn’t always allow the
“analyze-once-find-later” attribute
of a classic index!
• Big Analytics is compute intensive
• Big Analytics needs some
programming. Yikes!
Map-Reduce to the rescue
• Map-Reduce allows distributed processing on
large amounts of data
– Map – Algorithm to distribute data across nodes
– Reduce – Algorithm to aggregate data from the nodes
• Hadoop is the best known and used Map-Reduce
framework
• Map and Reduce still must be developed
• But we still need some kind of database
So, what we need is an Analytical
Database
• Support for complex analysis
• Support for distributed, parallel processing
(Map-Reduce for example)
• Support for storing and processing massive
amounts of data
• Some kind of cool index technology that work
with big data, both reads and writes
– Or maybe. A scary idea just came to me…
No indexes! Because you don’t
need or want them!
• What! What’s wrong with good old BTREEs?
– They are not well suited to Big Data!
– Their usefulness slows down as data grows
– Updates slow down significantly as the tree
grows!
– Skewed data is doesn’t work well
• SPATIAL? FREETEXT? HASH? BITMAP?
– These are either too specialized or lacks the
functionality we need
Calpont InfiniDB
Real-time, Consistent Query Performance
Linear Scale for Massive Data
Removes Limits to Dimensions and Granularity
Easy to Deploy and Maintain
Tiered Query Execution
•User Module – Processes SQL Requests
•Performance Module – Executes the Queries
or
Single ServerMPP
Map-Reduce for Powerful Analytics
SQL Operations are mapped to Performance Module threads
• Parallel/Distributed Data Access
• Parallel/Distributed Joins (Inner, Outer)
• Parallel/Distributed Sub-queries (From, Where, Select)
• Parallel/Distributed Group By, Distinct, and Aggregation
• Extensible with Parallel/Distributed User Defined Functions
Results are returned to User Module in Reduce Phase
Map  Reduce 
Calpont InfiniDB
• Support for Amazon EC2
– Full EBS support
– Prepackaged AMIs for ease of provisioning
• Hadoop connector
• Multiple parallel load
options
• Available now!
• This is true of analytics in general, but particularly
true when working with Big Analytics
• The more data you have, the more
relevant questions you can ask
• The more questions you ask, the more
you know
• The more you know, the more questions
you can ask
• The wider the range of data you have, the wider
questions can be asked
If you think you have all the right answers,
you haven’t asked all the right questions
Questions? Answers!
The question is not “What is
the answer?”, the question is
“What is the question?”.
Henri Poincaré

More Related Content

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Featured

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

Big Data and Big Analytics - Why, what and how

  • 1. Big Data and Big Analytics – Why, what and how
  • 2. Agenda • Big Data and Big Analytics – What is it? • Big Analytics vs. the Data Warehouse? • Big Analytics examples • Database technologies for Big Analytics • Questions and Answers
  • 3. What is Big Data? • Big Data is data that is not immediately related to my own business • Big Data is largely unstructured • Big Data consists of data from many different sources, such as Facebook,Twitter web-pages, blogs and any other source you can find • Big Data is all about volume and analysis!
  • 4. Because you want to grow your business! • You can get customers from your competitors – The data on these customers are not in your CRM! – Why did they go with someone else than with you? Your Data Warehouse has few answers to this! • You can grow the market – Those new customers are not in your CRM or Data Warehouse either, to a large extent! • You can do both of these!
  • 5. Why do I need all this data • “My Data Warehouse tells me all I ever want to know, in gruesome detail, about my customers, what more do I need?” • “I get much more data from my CRM system than I do from friggin’ Facebook!” • “Why would I need all those pictures from Facebook and all those twitter texts, they tell me nuthin’!”
  • 6. What is Big Analytics • To get insights from Big Data, you need a more powerful analysis: Big Analytics • Big Analytics often cannot rely on simple BTREE indexes • Big Analytics provides exponentially better accuracy the more data you have
  • 7. What is Big Analytics useful for? • For getting information on things in the “outside world” – My competitors – My competitors customers • For foreseeing trends – What will be “the next big thing” in my business? – What new markets are developing? – What is happening in my current market?
  • 8. Big Data, Analytics and Insights! Big Data Big Analytics Big Insights!
  • 9. Big Analytics use cases • The higher the volume of your business, the more useful Big Data becomes – If you have very few customers, Big Data might be less useful • Retail is a common use case, but there are many more – Finance – Big Data trend analysis – Intelligence – Analysis of new and unknown trends and loosely tied groups – Politics – What is my competition up to?
  • 10. Big Analytics vs. Data Warehouse • Your Data Warehouse is very focused and contains high quality information on low level data: “John Doe bought Chocko Chocolate Chip Cookies for $3.61 on Jan 12 2013” • Big Data provides much more data, but each information item has less detail to it: “Chocko Chocolate Chip Cookies suck!” “An increasing amount of people tweet about Chocolate Chip Cookies”
  • 11. Big Analytics vs. Data Warehouse • What Big Analytics lack in terms of data item correctness can be compensated for by: – Volume: If more than 200.000 tweets agree that our Chocko cookies suck, then we should probably look into it. – Proper analysis: Images can be analyzed for content and stuff you didn’t think about: Maybe “Ma Cookies” brand cookies has an edge on us in that their packaging looks more pleasing? Do we see “Ma Cookies” being eaten in unexpected places or at unexpected times?
  • 12. Big Analytics - Linguistic analysis • This is for tweets, blogs, Facebook and similar. Proper linguistic analysis is complex: – Sentiment “Ma Cookies might seems like they suck, but they are actually quite tasty” – Temporal “In January 2011 we wrote that Chocko Cookies used to taste like manure in 2008, but that they have improved since then” – Ranking – Really complex for larger blocks of text
  • 13. Other types of Big Analytics • Image analysis is a fast developing field, where we find new and interesting use cases – What are the most popular colors? – What color has peoples clothes? – How long has that suitcase been standing at the floor at the airport? • Location analysis – Where did this happen? – In what city is that? What country? • Temporal analysis – When did this happen? When was it published?
  • 14. New Visualizations for New Insights • Visualizing data as a report with columns and rows isn’t always effective • With new and diverse types of data, we need new ways of visualizing data – Location on maps – Timelines – Sentiments • Even with traditional Data Warehouse data, new visualizing can provide new insights! • Interactive visualizations
  • 15. Big Analytics and Visualization examples
  • 16. What is Mitt Romney talking about?
  • 17. Map Visualization – Android or iOS Visualizations by MapBox • Smartphone OS metadata in Geography view – iPhone is Red, Android is Green – Based on data from Verizon passed to NSA
  • 18. Big Analytics database issues • Big Analytics is complex! • Big Analytics doesn’t always allow the “analyze-once-find-later” attribute of a classic index! • Big Analytics is compute intensive • Big Analytics needs some programming. Yikes!
  • 19. Map-Reduce to the rescue • Map-Reduce allows distributed processing on large amounts of data – Map – Algorithm to distribute data across nodes – Reduce – Algorithm to aggregate data from the nodes • Hadoop is the best known and used Map-Reduce framework • Map and Reduce still must be developed • But we still need some kind of database
  • 20. So, what we need is an Analytical Database • Support for complex analysis • Support for distributed, parallel processing (Map-Reduce for example) • Support for storing and processing massive amounts of data • Some kind of cool index technology that work with big data, both reads and writes – Or maybe. A scary idea just came to me…
  • 21. No indexes! Because you don’t need or want them! • What! What’s wrong with good old BTREEs? – They are not well suited to Big Data! – Their usefulness slows down as data grows – Updates slow down significantly as the tree grows! – Skewed data is doesn’t work well • SPATIAL? FREETEXT? HASH? BITMAP? – These are either too specialized or lacks the functionality we need
  • 22. Calpont InfiniDB Real-time, Consistent Query Performance Linear Scale for Massive Data Removes Limits to Dimensions and Granularity Easy to Deploy and Maintain
  • 23. Tiered Query Execution •User Module – Processes SQL Requests •Performance Module – Executes the Queries or Single ServerMPP
  • 24. Map-Reduce for Powerful Analytics SQL Operations are mapped to Performance Module threads • Parallel/Distributed Data Access • Parallel/Distributed Joins (Inner, Outer) • Parallel/Distributed Sub-queries (From, Where, Select) • Parallel/Distributed Group By, Distinct, and Aggregation • Extensible with Parallel/Distributed User Defined Functions Results are returned to User Module in Reduce Phase Map  Reduce 
  • 25. Calpont InfiniDB • Support for Amazon EC2 – Full EBS support – Prepackaged AMIs for ease of provisioning • Hadoop connector • Multiple parallel load options • Available now!
  • 26. • This is true of analytics in general, but particularly true when working with Big Analytics • The more data you have, the more relevant questions you can ask • The more questions you ask, the more you know • The more you know, the more questions you can ask • The wider the range of data you have, the wider questions can be asked If you think you have all the right answers, you haven’t asked all the right questions
  • 27. Questions? Answers! The question is not “What is the answer?”, the question is “What is the question?”. Henri Poincaré