SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
1
Big data rmoug
We are a managed service AND a solution provider of elite
  database and System Administration skills in Oracle, MySQL and
  SQL Server




                                                                   3
Big Data is a marketing term, like cloud. All kinds of databases get called “Big
   Data”. But in order for “Big Data” to define a set of solution architectures, we
   need to define the problem we are solving.
The first requirement from Big Data data store is to be a good fit to store large
   volumes of data. Large volume can also mean different things to different
   people, but if you have less than 5T of data, you’ll need to work hard to
   convince me that you need a Big Data solution.
Second requirement is variety – it refers to the need to store not just short
   strings and numbers, but also long texts from emails, web log files, XML,
   images and video. It also refers to requirements around frequent changes in
   the data types stored. Some databases deal with schema changes better
   than others.
Velocity – Require storing or serving data very quickly even under highly
   concurrent load. The data store should minimize overhead and locking.
Value – the data requires large amount of processing in order to extract
   business value, and the data store should support this.
Visualization – when the amounts of data are huge, new techniques for
   extracting value are required, and data visualization is gaining prominence
   as a method of data exploration. Big Data solutions should be well integrated
   with visualization solutions.




                                                                                      4
Big data rmoug
One of the main reasons for the explosion of data stored in the last few
  years is that many problems are easier to solve if you apply more data
  to them.

Take the Netflix Challenge for example. Netflix challenged the AI
  community to improve the movie recommendations made by Netflix to
  its customers based on a database of ratings and viewing history.
  Teams that used the available data more extensively did better than
  teams that used more advanced algorithms on a smaller data set.

More data also allows businesses to make better, more informed
  decisions. Why have focus groups to decide on new store design, if
  you can re-design several stores and compare how customers
  proceeded through each store and how many left without buying? On-
  line stores make the process even easier.

Modern businesses become more scientific and metrics driven, and rely
  less on “gut feeling” as the cost of making business experiments and
  measuring the results decrease.




                                                                           6
Data also arrives in more forms and from more sources than
  ever. Some of these don’t fit into a relational database very
  well, and for some, the relational database does not have
  the right tools to process the data.
One of Pythian’s customers analyses social media sources and
  allow companies to find comments of their performance and
  service and respond to complaints via non-traditional
  customer support routes.
Storing facebook comments and blog posts in Oracle for later
  processing, results in most of the data getting stored in
  BLOBs, where it is relatively difficult to manage. Most of the
  processing is done outside of Oracle using Nature Language
  Processing tools. So, why use Oracle for storage at all? Why
  not store and process the documents elsewhere and only
  store the ready-to-display results in Oracle?




                                                                   7
Companies like Infochimps sell organized public information
  that can be used the data collected by the business itself.
  This is mostly geographically based information such as
  houses for sale, local businesses, community surveys and
  even petroleum reports. Such information can be valuable
  for marketing departments and the information is not only
  for sale, it is accessible through programmable API so new
  data can arrive on-the-fly on regular basis to your data
  center.
In general, the trend is that businesses use more and more
  data that did not originate within the company – whether
  tweets or purchased data. This means that the business has
  little control over the format of the data as it arrives, and
  the format can change overnight.




                                                                  8
Data, especially from outside sources is not in a perfect condition to be
  useful to your business.
Not only does it need to be processed into useful formats, it also needs:
•   Filtering for potentially useful information. 99% of everything is crap
•   Statistical analysis – is this data significant?
•   Integration with existing data
•   Entity resolution. Is “Oracle Corp” the same as “Oracle” and “Oracle
    Corporation”?
•   De-Duplication

Good processing and filtering of data can reduce the volume and variety
of data. It is important to distinguish between true and accidental
variety.

This requires massive use of processing power. In a way, there is a trade-
off between storage space and CPU. If you don’t invest CPU in filtering,
de-duping and entity resolution – you’ll need more storage.




                                                                              9
•   Bad schema design is not big data
•   Using 8 year old hardware is not big data
•   Not having purging policy is not big data
•   Not configuring your database and operating system
    correctly is not big data
•   Poor data filtering is not big data either

Keep the data you need and use. In a way that you can
actually use it.
If doing this requires cutting edge technology, excellent! But
don’t tell me you need NoSQL because you don’t purge data
and have un-optimized PL/SQL running on 10-yo hardware.




                                                                 10
The new volume of data, and the need to transform it, filter it
  and clean it up require:
1. Not only more storage, but also faster access rates
2. Reliable storage. We want high availability and resilient
    systems
3. You also need access to as many cores as you can get, to
    process all this data
4. These cores should be as close to the data as possible to
    avoid moving large amounts of data on the net
5. The architecture should allow to use many of the cores in
    parallel for data processing




                                                                  11
Data warehouses require the data to be structured in a certain way, and
  it has to be structured that way before the data gets into the data
  warehouse. This means that we need to know all the questions we
  would like to answer with this data when designing the schema for the
  data warehouse.

This works very well in many cases, but sometimes there are issues:
•   The raw data is not relational – images, video, text and we want to
    keep raw data for future use
•   The requirements from the business frequently change

In these cases it is better to store the data and create patterns from it as
it is parsed and processed. This allows the business to move from large
up-front design to just-in-time processing.

For example: Astrometry project searches Flickr for photos of night sky,
identifies the part of the sky its from and the prominent celestial bodies
and creates a standard database of the position of elements in the sky.




                                                                               12
Hadoop is the most common solution for the new Big Data
  requirement. It’s a scalable distributed file system, and a
  distributed job processing system on top of the file system.
  This lets companies keep massive amounts of unstructured
  data and efficiently process it. The assumption behind
  Hadoop is that most jobs will want to scan entire data sets,
  not specific rows or columns. So efficient access to specific
  data is not a core capability.

Hadoop is open source, and there is a large eco-system of
  tools, products and appliances built around it.
Open source tools that make data processing on Hadoop easier
  and more accessible, BI and integration products, improved
  implementations of Hadoop that are faster or more reliable,
  Hadoop cloud services and hardware appliances.




                                                                  13
14
Divide the job into many small tasks, each operating on a
   separate set of data.
Run the task on the machine with the data. If one machine is
   busy, we can find another with same data. Machines and
   tasks are constantly monitored.
Move programs around, not data.
If things are still too slow, more servers (with more disks,
   allow more data replication) and more cores are added.




                                                               15
Modern data centers generate huge amounts of logs from
  applications and web services.
These logs contain very specific information about how users
  are using our application and how the application performs.
Hadoop is often used to answer questions like:
• How many users use each feature in my site?
• Which page do users usually go to after visiting page X?
• Do people return more often to my site after I made the
   new changes?
• What use patterns correlate with people who eventually
   buy a product?
• What is the correlation between slow performance and
   purchase rates?

Note that the web logs can be processed, loaded into RDBMS
and parsed there. However, we are talking about very large
amounts of data, and each piece of data needs to be read just
once to answer each question. There are very few relations
there. Why bother loading all this to RDBMS?




                                                                16
Hadoop has large storage, high bandwidth, lots of cores and
  was build for data aggregation.
Also, it is cheap.
Data is dumped from the OLTP database (Oracle or MySQL) to
  Hadoop. Transformation code is written on Hadoop to
  aggregate the data (this is the tricky part) and the data is
  loaded to the data warehouse (usually Oracle).
This is such a common use case that Oracle built an appliance
  especially for this.




                                                                 17
A lot of the modern web experience revolves around websites being
about to predict what you’ll do next or what you’d like to do but don’t
know about yet.
•    People you may know
•    Jobs you may be interested in
•    Other customers who looked at this product eventually bought…
•    These emails are more important than others

To generate this information, usage patterns are extracted from OLTP
databases and logs, the data is analyzed, and the results are loaded to
an OLTP database again for use by the customer.

The analysis task started out as daily batch job, but soon users expected
more immediate feedback.
More processing resources were brought in to speed up the process.
Then the system started incorporating customer feedback into the
analysis when making new recommendations. This new information
needed more storage and more processing power.




                                                                            18
The best use cases for Hadoop is either storing large amounts
of unprocessed data, or off-loading computationally intensive
tasks away from expensive Oracle cores.




                                                                19
Big data rmoug
Businesses want to be able to respond to events automatically
  and immediately.
This usually means comparing current information to historical
  data and responding to trends and outliers immediately.
This means speeding up the rates at which data arrives, is
  stored and processed and at which the results are served.




                                                                 21
22
23
24
25
Recommendation system is an excellent example of how big
  data brings value to business.

You get customers to buy more, by processing more data with
  smarter analysis.
And they are iterative feedback systems.

The same idea can work within the organization – the
  recommendations can be on business decisions to
  executives, not necessarily for external customers.

Different tools can be used – analysis of relationship graphs,
  correlations between past purchases and clustering of
  products and customers to groups with similar attributes.




                                                                 26
27
28
29
30
Example stolen from Greg Rahn to show why a chart is a
  powerful data exploration tool for big data.




                                                         31
32
33
34
Big data rmoug
36
37
Oracle’s Big Data machine was built to move data between
  Oracle RDBMS and Hadoop fast, and I doubt if anyone can
  beat Oracle at that.
Both the tools that are bundled with the machine and the fast
  IB connection to Exadata make it very attractive for
  businesses wishing to use Hadoop as ETL solution. Note that
  the tools should also be avba




                                                                38
39

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteAmr Awadallah
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLTushar Shende
 
Data Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionData Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionFrank Y
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?James Serra
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersAdaryl "Bob" Wakefield, MBA
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.Richard Vermillion
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 

Was ist angesagt? (20)

Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Data lake
Data lakeData lake
Data lake
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
 
Data Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionData Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In Action
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
Integrated dwh 3
Integrated dwh 3Integrated dwh 3
Integrated dwh 3
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 

Ähnlich wie Big data rmoug

Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceSense Corp
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Optimization Of Existing Data Warehouse Using Hadoop Essay
Optimization Of Existing Data Warehouse Using Hadoop EssayOptimization Of Existing Data Warehouse Using Hadoop Essay
Optimization Of Existing Data Warehouse Using Hadoop EssayAmelia Richardson
 
Enterprise Data Lake
Enterprise Data LakeEnterprise Data Lake
Enterprise Data Lakesambiswal
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleVasu S
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureObjectRocket
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Dell World
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaJyrki Määttä
 

Ähnlich wie Big data rmoug (20)

Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
The new EDW
The new EDWThe new EDW
The new EDW
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Optimization Of Existing Data Warehouse Using Hadoop Essay
Optimization Of Existing Data Warehouse Using Hadoop EssayOptimization Of Existing Data Warehouse Using Hadoop Essay
Optimization Of Existing Data Warehouse Using Hadoop Essay
 
Enterprise Data Lake
Enterprise Data LakeEnterprise Data Lake
Enterprise Data Lake
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 

Mehr von Gwen (Chen) Shapira

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep DiveGwen (Chen) Shapira
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote Gwen (Chen) Shapira
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupGwen (Chen) Shapira
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings MeetupGwen (Chen) Shapira
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersGwen (Chen) Shapira
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 

Mehr von Gwen (Chen) Shapira (20)

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep Dive
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 

Kürzlich hochgeladen

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 

Kürzlich hochgeladen (20)

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 

Big data rmoug

  • 1. 1
  • 3. We are a managed service AND a solution provider of elite database and System Administration skills in Oracle, MySQL and SQL Server 3
  • 4. Big Data is a marketing term, like cloud. All kinds of databases get called “Big Data”. But in order for “Big Data” to define a set of solution architectures, we need to define the problem we are solving. The first requirement from Big Data data store is to be a good fit to store large volumes of data. Large volume can also mean different things to different people, but if you have less than 5T of data, you’ll need to work hard to convince me that you need a Big Data solution. Second requirement is variety – it refers to the need to store not just short strings and numbers, but also long texts from emails, web log files, XML, images and video. It also refers to requirements around frequent changes in the data types stored. Some databases deal with schema changes better than others. Velocity – Require storing or serving data very quickly even under highly concurrent load. The data store should minimize overhead and locking. Value – the data requires large amount of processing in order to extract business value, and the data store should support this. Visualization – when the amounts of data are huge, new techniques for extracting value are required, and data visualization is gaining prominence as a method of data exploration. Big Data solutions should be well integrated with visualization solutions. 4
  • 6. One of the main reasons for the explosion of data stored in the last few years is that many problems are easier to solve if you apply more data to them. Take the Netflix Challenge for example. Netflix challenged the AI community to improve the movie recommendations made by Netflix to its customers based on a database of ratings and viewing history. Teams that used the available data more extensively did better than teams that used more advanced algorithms on a smaller data set. More data also allows businesses to make better, more informed decisions. Why have focus groups to decide on new store design, if you can re-design several stores and compare how customers proceeded through each store and how many left without buying? On- line stores make the process even easier. Modern businesses become more scientific and metrics driven, and rely less on “gut feeling” as the cost of making business experiments and measuring the results decrease. 6
  • 7. Data also arrives in more forms and from more sources than ever. Some of these don’t fit into a relational database very well, and for some, the relational database does not have the right tools to process the data. One of Pythian’s customers analyses social media sources and allow companies to find comments of their performance and service and respond to complaints via non-traditional customer support routes. Storing facebook comments and blog posts in Oracle for later processing, results in most of the data getting stored in BLOBs, where it is relatively difficult to manage. Most of the processing is done outside of Oracle using Nature Language Processing tools. So, why use Oracle for storage at all? Why not store and process the documents elsewhere and only store the ready-to-display results in Oracle? 7
  • 8. Companies like Infochimps sell organized public information that can be used the data collected by the business itself. This is mostly geographically based information such as houses for sale, local businesses, community surveys and even petroleum reports. Such information can be valuable for marketing departments and the information is not only for sale, it is accessible through programmable API so new data can arrive on-the-fly on regular basis to your data center. In general, the trend is that businesses use more and more data that did not originate within the company – whether tweets or purchased data. This means that the business has little control over the format of the data as it arrives, and the format can change overnight. 8
  • 9. Data, especially from outside sources is not in a perfect condition to be useful to your business. Not only does it need to be processed into useful formats, it also needs: • Filtering for potentially useful information. 99% of everything is crap • Statistical analysis – is this data significant? • Integration with existing data • Entity resolution. Is “Oracle Corp” the same as “Oracle” and “Oracle Corporation”? • De-Duplication Good processing and filtering of data can reduce the volume and variety of data. It is important to distinguish between true and accidental variety. This requires massive use of processing power. In a way, there is a trade- off between storage space and CPU. If you don’t invest CPU in filtering, de-duping and entity resolution – you’ll need more storage. 9
  • 10. Bad schema design is not big data • Using 8 year old hardware is not big data • Not having purging policy is not big data • Not configuring your database and operating system correctly is not big data • Poor data filtering is not big data either Keep the data you need and use. In a way that you can actually use it. If doing this requires cutting edge technology, excellent! But don’t tell me you need NoSQL because you don’t purge data and have un-optimized PL/SQL running on 10-yo hardware. 10
  • 11. The new volume of data, and the need to transform it, filter it and clean it up require: 1. Not only more storage, but also faster access rates 2. Reliable storage. We want high availability and resilient systems 3. You also need access to as many cores as you can get, to process all this data 4. These cores should be as close to the data as possible to avoid moving large amounts of data on the net 5. The architecture should allow to use many of the cores in parallel for data processing 11
  • 12. Data warehouses require the data to be structured in a certain way, and it has to be structured that way before the data gets into the data warehouse. This means that we need to know all the questions we would like to answer with this data when designing the schema for the data warehouse. This works very well in many cases, but sometimes there are issues: • The raw data is not relational – images, video, text and we want to keep raw data for future use • The requirements from the business frequently change In these cases it is better to store the data and create patterns from it as it is parsed and processed. This allows the business to move from large up-front design to just-in-time processing. For example: Astrometry project searches Flickr for photos of night sky, identifies the part of the sky its from and the prominent celestial bodies and creates a standard database of the position of elements in the sky. 12
  • 13. Hadoop is the most common solution for the new Big Data requirement. It’s a scalable distributed file system, and a distributed job processing system on top of the file system. This lets companies keep massive amounts of unstructured data and efficiently process it. The assumption behind Hadoop is that most jobs will want to scan entire data sets, not specific rows or columns. So efficient access to specific data is not a core capability. Hadoop is open source, and there is a large eco-system of tools, products and appliances built around it. Open source tools that make data processing on Hadoop easier and more accessible, BI and integration products, improved implementations of Hadoop that are faster or more reliable, Hadoop cloud services and hardware appliances. 13
  • 14. 14
  • 15. Divide the job into many small tasks, each operating on a separate set of data. Run the task on the machine with the data. If one machine is busy, we can find another with same data. Machines and tasks are constantly monitored. Move programs around, not data. If things are still too slow, more servers (with more disks, allow more data replication) and more cores are added. 15
  • 16. Modern data centers generate huge amounts of logs from applications and web services. These logs contain very specific information about how users are using our application and how the application performs. Hadoop is often used to answer questions like: • How many users use each feature in my site? • Which page do users usually go to after visiting page X? • Do people return more often to my site after I made the new changes? • What use patterns correlate with people who eventually buy a product? • What is the correlation between slow performance and purchase rates? Note that the web logs can be processed, loaded into RDBMS and parsed there. However, we are talking about very large amounts of data, and each piece of data needs to be read just once to answer each question. There are very few relations there. Why bother loading all this to RDBMS? 16
  • 17. Hadoop has large storage, high bandwidth, lots of cores and was build for data aggregation. Also, it is cheap. Data is dumped from the OLTP database (Oracle or MySQL) to Hadoop. Transformation code is written on Hadoop to aggregate the data (this is the tricky part) and the data is loaded to the data warehouse (usually Oracle). This is such a common use case that Oracle built an appliance especially for this. 17
  • 18. A lot of the modern web experience revolves around websites being about to predict what you’ll do next or what you’d like to do but don’t know about yet. • People you may know • Jobs you may be interested in • Other customers who looked at this product eventually bought… • These emails are more important than others To generate this information, usage patterns are extracted from OLTP databases and logs, the data is analyzed, and the results are loaded to an OLTP database again for use by the customer. The analysis task started out as daily batch job, but soon users expected more immediate feedback. More processing resources were brought in to speed up the process. Then the system started incorporating customer feedback into the analysis when making new recommendations. This new information needed more storage and more processing power. 18
  • 19. The best use cases for Hadoop is either storing large amounts of unprocessed data, or off-loading computationally intensive tasks away from expensive Oracle cores. 19
  • 21. Businesses want to be able to respond to events automatically and immediately. This usually means comparing current information to historical data and responding to trends and outliers immediately. This means speeding up the rates at which data arrives, is stored and processed and at which the results are served. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. Recommendation system is an excellent example of how big data brings value to business. You get customers to buy more, by processing more data with smarter analysis. And they are iterative feedback systems. The same idea can work within the organization – the recommendations can be on business decisions to executives, not necessarily for external customers. Different tools can be used – analysis of relationship graphs, correlations between past purchases and clustering of products and customers to groups with similar attributes. 26
  • 27. 27
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. Example stolen from Greg Rahn to show why a chart is a powerful data exploration tool for big data. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 36. 36
  • 37. 37
  • 38. Oracle’s Big Data machine was built to move data between Oracle RDBMS and Hadoop fast, and I doubt if anyone can beat Oracle at that. Both the tools that are bundled with the machine and the fast IB connection to Exadata make it very attractive for businesses wishing to use Hadoop as ETL solution. Note that the tools should also be avba 38
  • 39. 39