SlideShare ist ein Scribd-Unternehmen logo
1 von 122
Database Shootout:
what's best for BI?
2
The New Data Warehousing
Source: Timo Elliot, SAP
3
Let's just buy Teradata...
Forrester Wave, as of April 2011
Gartner Group MQ, as of February 2012
4
Or not...
5
But...
⇨
Back to basics: BI & DWHBack to basics: BI & DWH
⇨
The Need for SpeedThe Need for Speed
⇨
Database ArchitecturesDatabase Architectures
⇨
#BigData & the Hadoop Hoopla#BigData & the Hadoop Hoopla
⇨
The forgotten power ofThe forgotten power of Olap & MDXOlap & MDX
⇨
A Cloudy future?A Cloudy future?
⇨
Shootout: Evaluating alternativesShootout: Evaluating alternatives
7
”
“
Business Intelligence (BI)
Process of identifying, collecting,
combining, analyzing, interpreting
and communicating internal and
external information to support
decision making processes
Concepts and methods to improveConcepts and methods to improve
business decision making by usingbusiness decision making by using
fact-based support systemsfact-based support systems
“
”
First definition of BI: 1958!
8
Business Intelligence is....
Doing useful stuff with data, in order to…
Support the
Decision Making
process
So why not simply use
Decision Support Systems?
9
How it all started in 1958...
⇨ Hans Peter Luhn (IBM) → A
Business Intelligence System
The notion of intelligence is also defined here, in a
more general sense, as the “ability to apprehend
the interrelationships of presented facts in such a
way as to guide action towards a desired goal.”
Full text on Timo Elliott's blog:
http://timoelliott.com/blog/2007/11/the_real_pioneer_of_busin
ess_i.html
10
Luhn's Vision (1958!)
A Business Intelligence System
Abstract: An automatic system is being developed to disseminate
information to the various sections of any industrial, scientific or
government organization. This intelligence system will utilize data-
processing machines for auto-abstracting and auto-encoding of documents
and for creating interest profiles for each of the “action points” in an
organization. Both incoming and internally generated documents are
automatically abstracted, characterized by a word pattern, and sent
automatically to appropriate action points. This paper shows the flexibility
of such a system in identifying known information, in finding who needs to
know it and in disseminating it efficiently either in abstract form or as a
complete document.
11
BI is dead; long live Analytics?
12
12
Business Analytics
Evolution
13
The Evolution of Enterprise Business
Intelligence
Enterprise Decision Management
Embracing all relevant data sources
BI injected into everyday business
processes
Master Data Management
Advanced Data Mining / Analytics
Business Activity Monitoring
Common Information &
Processes
BusinessValue
Disconnected Silos of
Information
Query/Reporting/Online Analytical Processing
Content Management/Data Warehousing
Search
2000 2005 2010 2015
Slide 13
Image courtesy Jim Fitzgerald, IBM research
14
Evolving Business Intelligence Platform Requirements
Image courtesy Teradata
15
Passive Monitoring: BI Starting Point
Source: Mark Madsen, Third Nature
16
Supporting Better Analysis: Common Next Step
Source: Mark Madsen, Third Nature
17
Active Monitoring
Source: Mark Madsen, Third Nature
18
Active Monitoring and Feedback
Source: Mark Madsen, Third Nature
19
Analysis
Source: Mark Madsen, Third Nature
20
Prediction (passive)
Source: Mark Madsen, Third Nature
21
Active Prediction
Source: Mark Madsen, Third Nature
22
Prescription and Enterprise Decision Management
Source: Mark Madsen, Third Nature
24
Watch out, the world is changing
750TB per week (compressed)
⇨ Sensor data
⇨ RFID
⇨ Sentiment Analysis
⇨
Text mining
⇨ Location data
25
Machines generate most data
26
Remember the Origins
•The general conception of a
separate architecture for BI has been
around longer, but this is the first
formal relational architecture and
definition published.
•One thing left out of most designs:
the box labeled business process
definitions.
“An architecture for a business and
information system”, B. A. Devlin, P. T.
Murphy, IBM Systems Journal, Vol.27,
No. 1, (1988)
27
2012: we're still doing this!
Staging
Area
CSV
Files
ETL
ERP
DBMS
Sources ETL Process Data Warehouse EUL
DBMS
Files
ETL
Central DWH &
Data Marts
DBMS ETL
End User Layer,
in case you were
wondering ;-)
28
We’ve (also) accumulated over 20 years of changes
Databases Documents Flat Files XML Queues ERP Applications
Source Environments
Data Consumers
Databases Dashboards OLAP Productivity BAM/BPM Reporting ETL Data Mining Applications
Warehouse
Database
ETL
Marts
ODS
EDR EII
Content
Store
EAI
Stream
processing
SQL Service API
29
The assumption of the warehouse as a
database is gone
29
Traditional tabular
or structured data
Data at rest
Non-traditional
data (logs, audio,
documents)
Parallel
programming
platforms
Databases
Streaming
DBs/engines
Message
streams
Data in motion
Slide 29
Copyright Third Nature, Inc.
30
The Need for Speed
31
Why BI Projects Fail?
1. Query Performance Too Slow
(BI Survey 9)
70% of DWH's experience
performance constrained issues
of various types
(Gartner DWH MQ 2010)
Poor Query Performance
No 1 reason for replacing DWH
(TDWI Best Practices)
32
Two dimensions of Speed
Companies wishing to maximize BI benefits
should focus on
1) support quality
2) implementation timeimplementation time
3) query response timequery response time and
4) breadth of deployment,
in that order.
33
Minimize Implementation Time
Use 'RTF': vs
Off the shelf: Etc.
+ use Agile methods like Scrum or DSDM
+ look at Data Vault model & methodology
34
Minimize Query Response Time
Source: TDWI Next generation Data Warehouse Platforms,
By Philip Russom
Why replace
a data
warehouse
solution?
35
Solving Performance Problems
Replace every single thing before the database?
Migrating to an analytic database is twice as likely as to another row-store database.
36
Applying “Laborware”: think twice...
⇨
Apply traditional optimization techniques:
⇨ Redesign solution
⇨ Add/optimize indexes
⇨ Horizontal partitioning
⇨ Add materialized views
⇨ Rewrite queries
⇨ Reorganize data
⇨ Offload old data
⇨ …
⇨ Costs will increase & recur!
“Hardware will change 
the basic assumptions of 
BI professionals about 
what they can do”
Richard Hackathorn
38
Numbers everyone should know
⇨
L1 cache reference 0.5 ns
⇨ Branch mispredict 5 ns
⇨
L2 cache reference 7 ns
⇨
Mutex lock/unlock 100 ns
⇨ Main memory reference 100 ns
⇨
Compress 1K bytes with Zippy 10,000 ns
⇨ Send 2K bytes over 1 Gbps network 20,000 ns
⇨
Read 1 MB sequentially from memory 250,000 ns
⇨ Round trip within same datacenter 500,000 ns
⇨
Disk seek 10,000,000 ns
⇨ Read 1 MB sequentially from network 10,000,000 ns
⇨
Read 1 MB sequentially from disk 30,000,000 ns
⇨ Send packet CA->Netherlands->CA 150,000,000 ns
Source: Jeff Dean, Google
39
Trend: decreasing latency times
source:90’s versus 2010's
40
The Good News: HW Cost Decline
⇨ < 2000:
⇨ tune software
⇨ 2012
⇨ hardware cheap
⇨ Mustang Index ~0.7
⇨ Cost per Gigaflop:
⇨ 1984: $ 15,000,000
⇨ 1997: $ 30,000
⇨ 2003: $ 82
⇨ 2011: $ 1.80
41
Memory Cost Decline
41
We're still waiting!
⇨ 32 GB, May 2010:
⇨ 8 *4 GB = $1,200
⇨ 4* 8 GB = $2,000
⇨ 2*16GB = $2,400
⇨ 32 GB, May 2012:
⇨ 8 *4 GB = $ 280
⇨ 4* 8 GB = $ 350
⇨ 2*16GB = $ 500
42
Intel keeps pushing the limits
43
CPU: Moore's Law in action?
Same price!
Intel Xeon
E5-2680
635
44
Storage costs keep going down
Year Size in GB US $/GB
1955 0.012 6,382,933.00
1960 0.01 3,686,400.00
1970 0.1 265,933.00
1980 2.5 16,000.00
1990 0.34 5,406.00
2000 40 7.17
2010 2,000 0.05
2012 3,000 0.07
“By the end of 2012, drives will have 100 times more 
capacity at 1/100 of the cost per GB compared to 2000”
45
Your next data warehouse?
The next-generation SDXC memory card specification,
released to members in April, 2009, dramatically
improves consumers digital lifestyles by increasing
storage capacity from more than 32 GB up to 2 TB and
increasing bus interface speed up to 104 MB per second
in 2009 with a road map to 300 MB per second.
46
Architecture basics
47
Choosing the right architecture is a trade off
FlexibilityFlexibility
AgilityAgility
Real-TimeReal-Time
ComplexityComplexity
IntegrationIntegration
AuditabilityAuditability
Data VolumeData Volume
Advanced
Analysis
Advanced
Analysis
PerformancePerformance
Low costLow cost
Skills &
Standards
Skills &
Standards
BI ArchitectureBI Architecture
Source:
48
What this means…
• No ‘one size fits all’ solution
• Easy to over or under provision
• There are always exceptions
• Clueless analysts
• Tech savvy managers (even C-level)
• Excel Junkies
49
Comparing Solutions
⇨ By Technology?
⇨
Columns, MPP, In-Memory, etc
⇨
By Storage Type?
⇨ Files, tables, OLAP
⇨ By Deployment type?
⇨ Appliance, Cloud, Saas
⇨ By Features/API?
⇨ SQL, MapReduce, R, etc.
⇨ By Speed?
⇨ TPC-H, Airline DB, Custom
⇨ By Licence type/price?
⇨ CPU, data size, memory usage
50
SQL DB's for BI/Analytics
51
Major BI Vendors have SQL DB's
IBM:
Microsoft:
Oracle:
SAP:
⇨ DB2, Netezza
⇨ SQL Server
⇨ MySQL, Oracle DB, Exalytics (TimesTen)
⇨ Sybase (IQ) & SAP Hana
..and all others are DB agnostic: Microstrategy, SAS, Tableau,
Tibco Spotfire, LogiXML, Pentaho, Jaspersoft, etc.
52
Analytical DB's: What’s Different?
⇨ MPP: Massive Parallel Processing
⇨ Column based data organization
⇨ Data compression
⇨ Read optimization
⇨ In memory operation
⇨ Different disk configuration options
⇨ In DB analytics
⇨ Data mining
⇨ Statistics
53
Architecture: SMP vs MPP
Different storage approaches:
● Shared Disk (clustering)
● Shared Nothing
Most DWH appliance & new software vendors
use Shared Nothing, MPP, Scale Out architecture
54
Scaling Up and Out
Typical Workloads
55
Blurring lines
⇨ 1 machine, up to:
⇨ 8 CPU/80 cores
⇨ 2 TB Ram
⇨ 24 SAS/SSD
⇨24*512 GB SSD = 12 TB
⇨24*900 GB SAS = 21 TB
“In terms of raw speed, nothing beats DASD”
Supermicro SuperServer 5086B-TRF
56
Beware of SPOF's (or: why clustering?)
57
Rows vs Columns
⇨ Nothing new about column storage: Taxir, 1969
⇨ Conceptual (and simplified) view:
Rows
Rows:
1,Smith,Joe,40000;2,Jones,Mary,
50000;3,Johnson,Cathy,44000;
1,2,3;Smith,Jones,Johnson;Joe,
Mary,Cathy;40000,50000,44000
EmpID Lastname Firstname Salary
1 Smith Joe 40000
2 Jones Mary 50000
3 Johnson Cathy 44000
Columns
58
Rows vs Columns (2)
Source: Paraccel®
⇨ Columnar Challenges:
⇨ (fast) Loading
⇨ Updates
59
Rows AND Columns!
⇨ Many vendors offer hybrid row/column options
⇨ Beware of differences between storage & indexing
⇨ Examples:
⇨ Teradata Aster
⇨ Greenplum
⇨ HP Vertica
⇨ Vectorwise
⇨ Microsoft
⇨ Oracle
60
Data compression
Source:
⇨
Compression 50-90%
⇨
Some vendors claim > 95%
⇨
DB size < raw data size
61
Read Optimization
⇨ OLTP: 90% write, 10% read
⇨ DWH: 10% write, 90% read
⇨ Common solution:
⇨ 'buffer' area (row oriented)
⇨ background process
updates/inserts to columns
⇨ Bulk loading = directSource:
62
Memory Usage
⇨ Different approaches
⇨ Query (result) caching
⇨ Dynamic allocation
⇨ Explicit loading (e.g. dim)
⇨ Some products still disk
focused! (e.g. GP)
⇨ VectorWise: RAM as
secondary (!) storage
63
Disk Usage/Configuration
⇨ 1. Disk/partition per CPU (core), e.g. Greenplum
⇨ 2. Software 'Raid' by DBMS, e.g. Paraccel
ADB
64
Disk Usage/Configuration (2)
⇨ 3. Use standard devices, e.g. VectorWise, Vertica
ADB
⇨ 3 is easiest to set up (but some ADB's auto config)
⇨ Speed depends on other things too
65
RAIS instead of RAID
⇨ 1. Failover Node (Hot Standby) ⇨ 2. Data Distribution
A
B
B
A
C
C
etc.
Hot Standby
66
Mixed Storage Solution
⇨ SAN = SOR
⇨ Nodes = Persistent
subset
⇨ 'Blended Scan'
⇨ Patent Pending
Source: Paraccel®
67
ILM: Software meets Hardware
⇨ Different approaches
⇨ Usage (e.g. TeraData)
⇨ Age (e.g. Oracle)
⇨ Partitions (e.g. Sybase IQ)
Burning
Hot
Warm
Cool
Cold
Sas
Sata
www.etre.com
68
Beware of (Interconnect) Bottlenecks
Fast & Expensive SAN
Fast & Expensive Servers(s)
1Gb/s
1Gb/s shared
DWH
VM
ERP
VM
MAIL
VM
CRM
VM
Undersized Virtual DWH
You want:
* Dedicated hardware
* Infiniband QDR 12x: 96 Gb/s, or
* 100 Gb Ethernet: 100Gb/s
OR: Local storage (MPP w DASD)
69
In Database Analytics
Source: Fuzzy Logix
70
Everybody Loves R
71
Inevitable In DB analytics
⇨ Fuzzy Logix
⇨ IBM/Netezza
⇨ IBM/Informix
⇨ SAP/Sybase
⇨ Paraccel
⇨ Microsoft
⇨ Asterdata/Teradata
⇨ SAS
⇨ IBM/Netezza
⇨ EMC Greenplum
⇨ TeraData
⇨ R
⇨ IBM/Netezza
⇨ AsterData/TeraData
⇨ Oracle
⇨ Greenplum
⇨ SAS
72
The Hadoop Hoopla
73
#BigData, the new frontier
Yes, these (and more) are all Open Source!
74
#BigData?
Largest data set analyzed
KDNuggets poll 2012
75
Putting #BigData into perspective*
Median
DWH size
*Idea by Glen Rabie, YellowFin BI
76
*THIS* is Hadoop:
a Distributed File System
Data Distribution Data Retrieval using M/R
77
#BigData & NoSQL: No Standards
“Each NoSQL DB has its own strengths/weaknesses;
most are not (directly) suited for typical BI workloads”
78
The Great Divide(s)
⇨ Pure SQL DB's
⇨ All OS Column Stores
⇨ Paraccel, Kognitio
⇨ In Database Analytics
⇨ Map/Reduce (many)
⇨ R (GreenPlum)
⇨ SAS (TeraData)
⇨ Everything (Netezza iClass)
⇨ NoSQL Databases
⇨ Hive (Hadoop)
⇨ MongoDB
⇨ CouchDB
⇨ etc.
Worlds Colliding
⇨MapReduce (NoSQL)
⇨
Programming model
⇨
No DBMS/SQL required
⇨
Schema free
⇨
Exclusively <key,value>
⇨
Java, Python, C++, C,
etc.
⇨
Text/data mining
⇨
Eventually Consistent
⇨SQL (RDBMS)
⇨
Query language
⇨
DBMS required
⇨
Fixed schema
⇨
Complex structure
⇨
SQL
⇨
Not good at Text
⇨
ACID compliant
80
What is MapReduce?
⇨ M/R is now patented by
Google (Patent
#7,650,331)
⇨ Used in many ADB's
⇨Hadoop, CouchDB
⇨AsterData
⇨GreenPlum
⇨Vertica
⇨...
MapReduce is a programming
model and an associated
implementation for processing and
generating large data sets.
Users specify a map function that
processes a key/value pair to
generate a set of intermediate
key/value pairs, and a reduce
function that merges all
intermediate values associated with
the same intermediate key
81
MapReduce Explained
Source: http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/
MR info: http://www.mapreduce.org (by Aster Data)
82
M/R & SQL: How to get there
⇨ SQL on top of M/R
⇨ e.g. Hive-Hadoop
⇨ M/R invoking SQL
⇨ e.g. Greenplum
⇨ SQL invoking M/R
⇨ e.g. TeraData/Aster Data
⇨ Most ADB vendors implementing/investigating M/R
⇨ e.g. Vertica (Hadoop integration), Oracle, Netezza, etc.
83
84
85
86
87
88
89
90
91
92
(R/H/M)OLAP
⇨ OnLine Analytical Processing
⇨ Analyse multidimensional data
⇨ Basic architecture:
Data Warehouse
MDX
OLAP
engine/server
Analysis front end
93
Stars and Cubes
⇨ Star schema
⇨ Dimension & fact tables
⇨ Best foundation for cubes
⇨ Cubes (logical/physical)
⇨ Dimensions
⇨Hierarchies
⇨ Levels
⇨Attributes
⇨ Measures
94
The power of OLAP
Aggregates, positional calculations (prior vs current), range
calculations (ytd, mtd), level calculations (child to parent
contribution)
95
MDX
⇨ Short for 'Multi Dimensional Expressions'
⇨ ~ SQL for OLAP:
⇨
SELECT
{set for column headers} ON COLUMNS,
{set for row headers} on ROWS
FROM [Cube Name]
WHERE {set for filtering}
⇨
SELECT:
{[Measures].[Unit Sales]} ON COLUMNS,
{[Product].[Drink], [Product].[Food]} ON ROWS
FROM [Sales]
WHERE [Time].[1997]
96
The Power of MDX
Positional: [Measures].[Profit], [Time].PrevMember
Range: Aggregate(YTD(), [Measures].[Profit]
“MDX is far
more powerful
than SQL for
the typical BI
questions”
97
Adding OLAP to the mix
⇨ Virtual Cubes, e.g.
⇨ Kognitio Pablo
⇨ Pentaho Mondrian
⇨ Microstrategy
⇨ Physical Cubes, e.g.
⇨ Microsoft Analysis Services
⇨ Oracle Essbase
⇨ Jedox Palo
Physical cubes allow 'write back': what
if, forecasting, budgetting & planning
'New' kid on the block: SAP HANA
99
The promises of the Cloud
⇨ “Utility computing”
⇨ Unlimited capacity
⇨ Pay as you go/by the sip
⇨ Lower costs
⇨ Always up to date
⇨ Invisible OS
⇨ Security
⇨ Safety
100
Cloud still getting Hotter
Source: IBM CIO Survey 2011
101
Types of Cloud Solutions
Virtualization
IaaS (Infrastructure)
PaaS (Platform)
SaaS (Software)
ValueAdded
102
Cloud Cost Components
StorageStorage
BandwithBandwith
SLA/ServiceSLA/Service
CPU powerCPU power
MemoryMemory
Data transferData transfer
RequestsRequests
103
BI&DWH aaS Scenarios
104
The trouble with Cloud DWH
⇨ DWH aaS vendors:
⇨ e.g. 1010Data, Kognitio, Vertica, EMC/Greenplum
⇨
more will follow
105
What about No Database at all?
Rick F. van der Lans
Key element: abstraction (de-coupling)
106
Data Virtualization concept
Virtual DB
SQL SOAP REST FILE WS-*
Information Consumers
107
© 2011 Composite Software, Inc. / Composite Proprietary
Example: Composite 6
Discovery
Active Cluster
Composite Information Server
XQuery, Java, WSDL, SCA
(Services Centric)
Front-end Applications
Security
Metadata Repository
Views, SQLScript
(Database Centric)
Security
Query Engine
Cost-based
Optimizer
Rules-based
Optimizer
Federation
Engine
Web Services
(HTTP, REST, SOAP, JSON, XQuery)
SQL
(ODBC, JDBC, ADO.NET)
Messaging
(JMS)
Java
(POJO)
Web Services
(HTTP, SOAP, JSON)
Messaging
(JMS)
Application
APIs
MF
Adapter
Java
(POJO)
Advanced Functions
Quality GovernanceCaching
SQL
(ODBC, JDBC)
URI
Monitor
Manager
Studio
Performance Plus
Adapters
Development
Environment
Runtime Server
Environment
Management
Environment
Applications, Big Data Stores, Excel, Flat Files, Mainframes, Messages, OLAP Cubes,
RDBMS, Web Services, XML Documents
108
Meet
109
A Unified Data Hub
110
Virtual vs Physical trade offs
Source:Source:
Mark MadsenMark Madsen
111
The Shootout!
⇨
Things to ask your (potential) vendor
⇨ References
⇨ Assist in a paid POC
⇨ License model & unit of cost: CPU, Core,
Server, (raw) Data volume, Memory used
⇨ Free dev/test editions (only pay for
production use)
⇨ Support options (updates only, mail/phone
support, etc)
⇨ If migrating: trade in discount
⇨ Opt out/de-integration options
112
Does your DB cover the Basics?
⇨ Full SQL 2003 support?
⇨ Easy backup/restore features?
⇨ Scaling up or out?
⇨ Failover & persistency?
⇨ External (management) Tool integration?
113
Which deployment types?
⇨ On Premise ⇨ Saas/Cloud
Software only
Appliance
Vendor/ISPCustomer
114
Size/Workload/Complexity?
Source:
Bloor Group
Source:
Third Nature
Most
organizations
are here
115
What's the question?
Source:
116
Analytical Power?
Source: SAP (Sybase IQ 15.4)
117
Beware of
Benchmarks !
⇨ Differences in
⇨# threads
⇨# cores
⇨# disks
⇨# nodes
⇨CPU generation/speed
1. Always use P.O.C. on your own
data & query workload
2. Don't trust the MQ's
⇨
Ongoing Market Consolidation
⇨
More additional/alternative storage engines
⇨
Hybrid Row/Column solutions
⇨
Every db will get In DB analytics
⇨
Every db will get Hadoop/MR extensions
⇨
Everything in-memory
119
So what's the best
database for BI?
120
Web: www.tholis.com
Email: jos<at>tholis.com
Phone: +31-(0)6-51169606
Skype: tholis.jos
LinkedIn: jvdongen
Twitter: josvandongen
IRC: _grumpy
Jos van Dongen
In BI since 1991
Principal Consultant
Author/Speaker/Analyst
Proud member of #BBBT

Weitere ähnliche Inhalte

Was ist angesagt?

Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Edureka!
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptxchennakesava44
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLCockroachDB
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsAnil Nair
 
Global Payment Reference Architecture
Global Payment Reference ArchitectureGlobal Payment Reference Architecture
Global Payment Reference ArchitectureRamadas MV
 
Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Karthik Deivasigamani
 
Veri Ambarı Nedir, Nasıl Oluşturulur?
Veri Ambarı Nedir, Nasıl Oluşturulur?Veri Ambarı Nedir, Nasıl Oluşturulur?
Veri Ambarı Nedir, Nasıl Oluşturulur?Gurcan Orhan
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesYoav Francis
 
Make Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For ItMake Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For ItMarkus Michalewicz
 
Free Load Testing Tools for Oracle Database – Which One Do I Use?
Free Load Testing Tools for Oracle Database – Which One Do I Use?Free Load Testing Tools for Oracle Database – Which One Do I Use?
Free Load Testing Tools for Oracle Database – Which One Do I Use?Christian Antognini
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?James Serra
 
Smart monitoring how does oracle rac manage resource, state ukoug19
Smart monitoring how does oracle rac manage resource, state ukoug19Smart monitoring how does oracle rac manage resource, state ukoug19
Smart monitoring how does oracle rac manage resource, state ukoug19Anil Nair
 
Réplication de base de données oracle avec Golden Gate
Réplication de base de données oracle avec Golden GateRéplication de base de données oracle avec Golden Gate
Réplication de base de données oracle avec Golden GateMor THIAM
 

Was ist angesagt? (20)

Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQL
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret Internals
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Global Payment Reference Architecture
Global Payment Reference ArchitectureGlobal Payment Reference Architecture
Global Payment Reference Architecture
 
Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid
 
Rac questions
Rac questionsRac questions
Rac questions
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Veri Ambarı Nedir, Nasıl Oluşturulur?
Veri Ambarı Nedir, Nasıl Oluşturulur?Veri Ambarı Nedir, Nasıl Oluşturulur?
Veri Ambarı Nedir, Nasıl Oluşturulur?
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata Migrations
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and Practices
 
Make Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For ItMake Your Application “Oracle RAC Ready” & Test For It
Make Your Application “Oracle RAC Ready” & Test For It
 
Free Load Testing Tools for Oracle Database – Which One Do I Use?
Free Load Testing Tools for Oracle Database – Which One Do I Use?Free Load Testing Tools for Oracle Database – Which One Do I Use?
Free Load Testing Tools for Oracle Database – Which One Do I Use?
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Smart monitoring how does oracle rac manage resource, state ukoug19
Smart monitoring how does oracle rac manage resource, state ukoug19Smart monitoring how does oracle rac manage resource, state ukoug19
Smart monitoring how does oracle rac manage resource, state ukoug19
 
Réplication de base de données oracle avec Golden Gate
Réplication de base de données oracle avec Golden GateRéplication de base de données oracle avec Golden Gate
Réplication de base de données oracle avec Golden Gate
 

Andere mochten auch

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4AllJos van Dongen
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed DatawarehousingJos van Dongen
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012Jos van Dongen
 
Estado del arte del BI | Jornada Madrid 2014 | UOC
Estado del arte del BI | Jornada Madrid 2014 | UOCEstado del arte del BI | Jornada Madrid 2014 | UOC
Estado del arte del BI | Jornada Madrid 2014 | UOCJosep Curto
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
5 Signs You Need to Re-Think Your Data Integration Strategy
5 Signs You Need to Re-Think Your Data Integration Strategy5 Signs You Need to Re-Think Your Data Integration Strategy
5 Signs You Need to Re-Think Your Data Integration StrategyDarren Cunningham
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business IntelligenceJos van Dongen
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Jos van Dongen
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataEdward Hsu
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleCambridge Semantics
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraRobbie Strickland
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosDataWorks Summit
 

Andere mochten auch (20)

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4All
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed Datawarehousing
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012
 
Estado del arte del BI | Jornada Madrid 2014 | UOC
Estado del arte del BI | Jornada Madrid 2014 | UOCEstado del arte del BI | Jornada Madrid 2014 | UOC
Estado del arte del BI | Jornada Madrid 2014 | UOC
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Business Intelligence In The Cloud
Business Intelligence In The CloudBusiness Intelligence In The Cloud
Business Intelligence In The Cloud
 
5 Signs You Need to Re-Think Your Data Integration Strategy
5 Signs You Need to Re-Think Your Data Integration Strategy5 Signs You Need to Re-Think Your Data Integration Strategy
5 Signs You Need to Re-Think Your Data Integration Strategy
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business Intelligence
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big Data
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 

Ähnlich wie Database Shootout: What's best for BI?

Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Making the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedMaking the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedInside Analysis
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesYellowbrick Data
 
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, SisenseDatabase Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense✔ Eric David Benari, PMP
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Miningcpjcollege
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
Informix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceInformix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceKeshav Murthy
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsLuke Han
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfAltinity Ltd
 
High Performance BI with Cognos and ParAccel Analytic Database
High Performance BI with Cognos and ParAccel Analytic DatabaseHigh Performance BI with Cognos and ParAccel Analytic Database
High Performance BI with Cognos and ParAccel Analytic DatabaseKarol Chlasta
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 

Ähnlich wie Database Shootout: What's best for BI? (20)

Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Making the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedMaking the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than Speed
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-Haves
 
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, SisenseDatabase Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
Informix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceInformix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performance
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
High Performance BI with Cognos and ParAccel Analytic Database
High Performance BI with Cognos and ParAccel Analytic DatabaseHigh Performance BI with Cognos and ParAccel Analytic Database
High Performance BI with Cognos and ParAccel Analytic Database
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 

Kürzlich hochgeladen

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 

Kürzlich hochgeladen (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 

Database Shootout: What's best for BI?

  • 2. 2 The New Data Warehousing Source: Timo Elliot, SAP
  • 3. 3 Let's just buy Teradata... Forrester Wave, as of April 2011 Gartner Group MQ, as of February 2012
  • 6. ⇨ Back to basics: BI & DWHBack to basics: BI & DWH ⇨ The Need for SpeedThe Need for Speed ⇨ Database ArchitecturesDatabase Architectures ⇨ #BigData & the Hadoop Hoopla#BigData & the Hadoop Hoopla ⇨ The forgotten power ofThe forgotten power of Olap & MDXOlap & MDX ⇨ A Cloudy future?A Cloudy future? ⇨ Shootout: Evaluating alternativesShootout: Evaluating alternatives
  • 7. 7 ” “ Business Intelligence (BI) Process of identifying, collecting, combining, analyzing, interpreting and communicating internal and external information to support decision making processes Concepts and methods to improveConcepts and methods to improve business decision making by usingbusiness decision making by using fact-based support systemsfact-based support systems “ ” First definition of BI: 1958!
  • 8. 8 Business Intelligence is.... Doing useful stuff with data, in order to… Support the Decision Making process So why not simply use Decision Support Systems?
  • 9. 9 How it all started in 1958... ⇨ Hans Peter Luhn (IBM) → A Business Intelligence System The notion of intelligence is also defined here, in a more general sense, as the “ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.” Full text on Timo Elliott's blog: http://timoelliott.com/blog/2007/11/the_real_pioneer_of_busin ess_i.html
  • 10. 10 Luhn's Vision (1958!) A Business Intelligence System Abstract: An automatic system is being developed to disseminate information to the various sections of any industrial, scientific or government organization. This intelligence system will utilize data- processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the “action points” in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points. This paper shows the flexibility of such a system in identifying known information, in finding who needs to know it and in disseminating it efficiently either in abstract form or as a complete document.
  • 11. 11 BI is dead; long live Analytics?
  • 13. 13 The Evolution of Enterprise Business Intelligence Enterprise Decision Management Embracing all relevant data sources BI injected into everyday business processes Master Data Management Advanced Data Mining / Analytics Business Activity Monitoring Common Information & Processes BusinessValue Disconnected Silos of Information Query/Reporting/Online Analytical Processing Content Management/Data Warehousing Search 2000 2005 2010 2015 Slide 13 Image courtesy Jim Fitzgerald, IBM research
  • 14. 14 Evolving Business Intelligence Platform Requirements Image courtesy Teradata
  • 15. 15 Passive Monitoring: BI Starting Point Source: Mark Madsen, Third Nature
  • 16. 16 Supporting Better Analysis: Common Next Step Source: Mark Madsen, Third Nature
  • 17. 17 Active Monitoring Source: Mark Madsen, Third Nature
  • 18. 18 Active Monitoring and Feedback Source: Mark Madsen, Third Nature
  • 20. 20 Prediction (passive) Source: Mark Madsen, Third Nature
  • 21. 21 Active Prediction Source: Mark Madsen, Third Nature
  • 22. 22 Prescription and Enterprise Decision Management Source: Mark Madsen, Third Nature
  • 23.
  • 24. 24 Watch out, the world is changing 750TB per week (compressed) ⇨ Sensor data ⇨ RFID ⇨ Sentiment Analysis ⇨ Text mining ⇨ Location data
  • 26. 26 Remember the Origins •The general conception of a separate architecture for BI has been around longer, but this is the first formal relational architecture and definition published. •One thing left out of most designs: the box labeled business process definitions. “An architecture for a business and information system”, B. A. Devlin, P. T. Murphy, IBM Systems Journal, Vol.27, No. 1, (1988)
  • 27. 27 2012: we're still doing this! Staging Area CSV Files ETL ERP DBMS Sources ETL Process Data Warehouse EUL DBMS Files ETL Central DWH & Data Marts DBMS ETL End User Layer, in case you were wondering ;-)
  • 28. 28 We’ve (also) accumulated over 20 years of changes Databases Documents Flat Files XML Queues ERP Applications Source Environments Data Consumers Databases Dashboards OLAP Productivity BAM/BPM Reporting ETL Data Mining Applications Warehouse Database ETL Marts ODS EDR EII Content Store EAI Stream processing SQL Service API
  • 29. 29 The assumption of the warehouse as a database is gone 29 Traditional tabular or structured data Data at rest Non-traditional data (logs, audio, documents) Parallel programming platforms Databases Streaming DBs/engines Message streams Data in motion Slide 29 Copyright Third Nature, Inc.
  • 31. 31 Why BI Projects Fail? 1. Query Performance Too Slow (BI Survey 9) 70% of DWH's experience performance constrained issues of various types (Gartner DWH MQ 2010) Poor Query Performance No 1 reason for replacing DWH (TDWI Best Practices)
  • 32. 32 Two dimensions of Speed Companies wishing to maximize BI benefits should focus on 1) support quality 2) implementation timeimplementation time 3) query response timequery response time and 4) breadth of deployment, in that order.
  • 33. 33 Minimize Implementation Time Use 'RTF': vs Off the shelf: Etc. + use Agile methods like Scrum or DSDM + look at Data Vault model & methodology
  • 34. 34 Minimize Query Response Time Source: TDWI Next generation Data Warehouse Platforms, By Philip Russom Why replace a data warehouse solution?
  • 35. 35 Solving Performance Problems Replace every single thing before the database? Migrating to an analytic database is twice as likely as to another row-store database.
  • 36. 36 Applying “Laborware”: think twice... ⇨ Apply traditional optimization techniques: ⇨ Redesign solution ⇨ Add/optimize indexes ⇨ Horizontal partitioning ⇨ Add materialized views ⇨ Rewrite queries ⇨ Reorganize data ⇨ Offload old data ⇨ … ⇨ Costs will increase & recur!
  • 38. 38 Numbers everyone should know ⇨ L1 cache reference 0.5 ns ⇨ Branch mispredict 5 ns ⇨ L2 cache reference 7 ns ⇨ Mutex lock/unlock 100 ns ⇨ Main memory reference 100 ns ⇨ Compress 1K bytes with Zippy 10,000 ns ⇨ Send 2K bytes over 1 Gbps network 20,000 ns ⇨ Read 1 MB sequentially from memory 250,000 ns ⇨ Round trip within same datacenter 500,000 ns ⇨ Disk seek 10,000,000 ns ⇨ Read 1 MB sequentially from network 10,000,000 ns ⇨ Read 1 MB sequentially from disk 30,000,000 ns ⇨ Send packet CA->Netherlands->CA 150,000,000 ns Source: Jeff Dean, Google
  • 39. 39 Trend: decreasing latency times source:90’s versus 2010's
  • 40. 40 The Good News: HW Cost Decline ⇨ < 2000: ⇨ tune software ⇨ 2012 ⇨ hardware cheap ⇨ Mustang Index ~0.7 ⇨ Cost per Gigaflop: ⇨ 1984: $ 15,000,000 ⇨ 1997: $ 30,000 ⇨ 2003: $ 82 ⇨ 2011: $ 1.80
  • 41. 41 Memory Cost Decline 41 We're still waiting! ⇨ 32 GB, May 2010: ⇨ 8 *4 GB = $1,200 ⇨ 4* 8 GB = $2,000 ⇨ 2*16GB = $2,400 ⇨ 32 GB, May 2012: ⇨ 8 *4 GB = $ 280 ⇨ 4* 8 GB = $ 350 ⇨ 2*16GB = $ 500
  • 43. 43 CPU: Moore's Law in action? Same price! Intel Xeon E5-2680 635
  • 44. 44 Storage costs keep going down Year Size in GB US $/GB 1955 0.012 6,382,933.00 1960 0.01 3,686,400.00 1970 0.1 265,933.00 1980 2.5 16,000.00 1990 0.34 5,406.00 2000 40 7.17 2010 2,000 0.05 2012 3,000 0.07 “By the end of 2012, drives will have 100 times more  capacity at 1/100 of the cost per GB compared to 2000”
  • 45. 45 Your next data warehouse? The next-generation SDXC memory card specification, released to members in April, 2009, dramatically improves consumers digital lifestyles by increasing storage capacity from more than 32 GB up to 2 TB and increasing bus interface speed up to 104 MB per second in 2009 with a road map to 300 MB per second.
  • 47. 47 Choosing the right architecture is a trade off FlexibilityFlexibility AgilityAgility Real-TimeReal-Time ComplexityComplexity IntegrationIntegration AuditabilityAuditability Data VolumeData Volume Advanced Analysis Advanced Analysis PerformancePerformance Low costLow cost Skills & Standards Skills & Standards BI ArchitectureBI Architecture Source:
  • 48. 48 What this means… • No ‘one size fits all’ solution • Easy to over or under provision • There are always exceptions • Clueless analysts • Tech savvy managers (even C-level) • Excel Junkies
  • 49. 49 Comparing Solutions ⇨ By Technology? ⇨ Columns, MPP, In-Memory, etc ⇨ By Storage Type? ⇨ Files, tables, OLAP ⇨ By Deployment type? ⇨ Appliance, Cloud, Saas ⇨ By Features/API? ⇨ SQL, MapReduce, R, etc. ⇨ By Speed? ⇨ TPC-H, Airline DB, Custom ⇨ By Licence type/price? ⇨ CPU, data size, memory usage
  • 50. 50 SQL DB's for BI/Analytics
  • 51. 51 Major BI Vendors have SQL DB's IBM: Microsoft: Oracle: SAP: ⇨ DB2, Netezza ⇨ SQL Server ⇨ MySQL, Oracle DB, Exalytics (TimesTen) ⇨ Sybase (IQ) & SAP Hana ..and all others are DB agnostic: Microstrategy, SAS, Tableau, Tibco Spotfire, LogiXML, Pentaho, Jaspersoft, etc.
  • 52. 52 Analytical DB's: What’s Different? ⇨ MPP: Massive Parallel Processing ⇨ Column based data organization ⇨ Data compression ⇨ Read optimization ⇨ In memory operation ⇨ Different disk configuration options ⇨ In DB analytics ⇨ Data mining ⇨ Statistics
  • 53. 53 Architecture: SMP vs MPP Different storage approaches: ● Shared Disk (clustering) ● Shared Nothing Most DWH appliance & new software vendors use Shared Nothing, MPP, Scale Out architecture
  • 54. 54 Scaling Up and Out Typical Workloads
  • 55. 55 Blurring lines ⇨ 1 machine, up to: ⇨ 8 CPU/80 cores ⇨ 2 TB Ram ⇨ 24 SAS/SSD ⇨24*512 GB SSD = 12 TB ⇨24*900 GB SAS = 21 TB “In terms of raw speed, nothing beats DASD” Supermicro SuperServer 5086B-TRF
  • 56. 56 Beware of SPOF's (or: why clustering?)
  • 57. 57 Rows vs Columns ⇨ Nothing new about column storage: Taxir, 1969 ⇨ Conceptual (and simplified) view: Rows Rows: 1,Smith,Joe,40000;2,Jones,Mary, 50000;3,Johnson,Cathy,44000; 1,2,3;Smith,Jones,Johnson;Joe, Mary,Cathy;40000,50000,44000 EmpID Lastname Firstname Salary 1 Smith Joe 40000 2 Jones Mary 50000 3 Johnson Cathy 44000 Columns
  • 58. 58 Rows vs Columns (2) Source: Paraccel® ⇨ Columnar Challenges: ⇨ (fast) Loading ⇨ Updates
  • 59. 59 Rows AND Columns! ⇨ Many vendors offer hybrid row/column options ⇨ Beware of differences between storage & indexing ⇨ Examples: ⇨ Teradata Aster ⇨ Greenplum ⇨ HP Vertica ⇨ Vectorwise ⇨ Microsoft ⇨ Oracle
  • 60. 60 Data compression Source: ⇨ Compression 50-90% ⇨ Some vendors claim > 95% ⇨ DB size < raw data size
  • 61. 61 Read Optimization ⇨ OLTP: 90% write, 10% read ⇨ DWH: 10% write, 90% read ⇨ Common solution: ⇨ 'buffer' area (row oriented) ⇨ background process updates/inserts to columns ⇨ Bulk loading = directSource:
  • 62. 62 Memory Usage ⇨ Different approaches ⇨ Query (result) caching ⇨ Dynamic allocation ⇨ Explicit loading (e.g. dim) ⇨ Some products still disk focused! (e.g. GP) ⇨ VectorWise: RAM as secondary (!) storage
  • 63. 63 Disk Usage/Configuration ⇨ 1. Disk/partition per CPU (core), e.g. Greenplum ⇨ 2. Software 'Raid' by DBMS, e.g. Paraccel ADB
  • 64. 64 Disk Usage/Configuration (2) ⇨ 3. Use standard devices, e.g. VectorWise, Vertica ADB ⇨ 3 is easiest to set up (but some ADB's auto config) ⇨ Speed depends on other things too
  • 65. 65 RAIS instead of RAID ⇨ 1. Failover Node (Hot Standby) ⇨ 2. Data Distribution A B B A C C etc. Hot Standby
  • 66. 66 Mixed Storage Solution ⇨ SAN = SOR ⇨ Nodes = Persistent subset ⇨ 'Blended Scan' ⇨ Patent Pending Source: Paraccel®
  • 67. 67 ILM: Software meets Hardware ⇨ Different approaches ⇨ Usage (e.g. TeraData) ⇨ Age (e.g. Oracle) ⇨ Partitions (e.g. Sybase IQ) Burning Hot Warm Cool Cold Sas Sata www.etre.com
  • 68. 68 Beware of (Interconnect) Bottlenecks Fast & Expensive SAN Fast & Expensive Servers(s) 1Gb/s 1Gb/s shared DWH VM ERP VM MAIL VM CRM VM Undersized Virtual DWH You want: * Dedicated hardware * Infiniband QDR 12x: 96 Gb/s, or * 100 Gb Ethernet: 100Gb/s OR: Local storage (MPP w DASD)
  • 71. 71 Inevitable In DB analytics ⇨ Fuzzy Logix ⇨ IBM/Netezza ⇨ IBM/Informix ⇨ SAP/Sybase ⇨ Paraccel ⇨ Microsoft ⇨ Asterdata/Teradata ⇨ SAS ⇨ IBM/Netezza ⇨ EMC Greenplum ⇨ TeraData ⇨ R ⇨ IBM/Netezza ⇨ AsterData/TeraData ⇨ Oracle ⇨ Greenplum ⇨ SAS
  • 73. 73 #BigData, the new frontier Yes, these (and more) are all Open Source!
  • 74. 74 #BigData? Largest data set analyzed KDNuggets poll 2012
  • 75. 75 Putting #BigData into perspective* Median DWH size *Idea by Glen Rabie, YellowFin BI
  • 76. 76 *THIS* is Hadoop: a Distributed File System Data Distribution Data Retrieval using M/R
  • 77. 77 #BigData & NoSQL: No Standards “Each NoSQL DB has its own strengths/weaknesses; most are not (directly) suited for typical BI workloads”
  • 78. 78 The Great Divide(s) ⇨ Pure SQL DB's ⇨ All OS Column Stores ⇨ Paraccel, Kognitio ⇨ In Database Analytics ⇨ Map/Reduce (many) ⇨ R (GreenPlum) ⇨ SAS (TeraData) ⇨ Everything (Netezza iClass) ⇨ NoSQL Databases ⇨ Hive (Hadoop) ⇨ MongoDB ⇨ CouchDB ⇨ etc.
  • 79. Worlds Colliding ⇨MapReduce (NoSQL) ⇨ Programming model ⇨ No DBMS/SQL required ⇨ Schema free ⇨ Exclusively <key,value> ⇨ Java, Python, C++, C, etc. ⇨ Text/data mining ⇨ Eventually Consistent ⇨SQL (RDBMS) ⇨ Query language ⇨ DBMS required ⇨ Fixed schema ⇨ Complex structure ⇨ SQL ⇨ Not good at Text ⇨ ACID compliant
  • 80. 80 What is MapReduce? ⇨ M/R is now patented by Google (Patent #7,650,331) ⇨ Used in many ADB's ⇨Hadoop, CouchDB ⇨AsterData ⇨GreenPlum ⇨Vertica ⇨... MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key
  • 82. 82 M/R & SQL: How to get there ⇨ SQL on top of M/R ⇨ e.g. Hive-Hadoop ⇨ M/R invoking SQL ⇨ e.g. Greenplum ⇨ SQL invoking M/R ⇨ e.g. TeraData/Aster Data ⇨ Most ADB vendors implementing/investigating M/R ⇨ e.g. Vertica (Hadoop integration), Oracle, Netezza, etc.
  • 83. 83
  • 84. 84
  • 85. 85
  • 86. 86
  • 87. 87
  • 88. 88
  • 89. 89
  • 90. 90
  • 91. 91
  • 92. 92 (R/H/M)OLAP ⇨ OnLine Analytical Processing ⇨ Analyse multidimensional data ⇨ Basic architecture: Data Warehouse MDX OLAP engine/server Analysis front end
  • 93. 93 Stars and Cubes ⇨ Star schema ⇨ Dimension & fact tables ⇨ Best foundation for cubes ⇨ Cubes (logical/physical) ⇨ Dimensions ⇨Hierarchies ⇨ Levels ⇨Attributes ⇨ Measures
  • 94. 94 The power of OLAP Aggregates, positional calculations (prior vs current), range calculations (ytd, mtd), level calculations (child to parent contribution)
  • 95. 95 MDX ⇨ Short for 'Multi Dimensional Expressions' ⇨ ~ SQL for OLAP: ⇨ SELECT {set for column headers} ON COLUMNS, {set for row headers} on ROWS FROM [Cube Name] WHERE {set for filtering} ⇨ SELECT: {[Measures].[Unit Sales]} ON COLUMNS, {[Product].[Drink], [Product].[Food]} ON ROWS FROM [Sales] WHERE [Time].[1997]
  • 96. 96 The Power of MDX Positional: [Measures].[Profit], [Time].PrevMember Range: Aggregate(YTD(), [Measures].[Profit] “MDX is far more powerful than SQL for the typical BI questions”
  • 97. 97 Adding OLAP to the mix ⇨ Virtual Cubes, e.g. ⇨ Kognitio Pablo ⇨ Pentaho Mondrian ⇨ Microstrategy ⇨ Physical Cubes, e.g. ⇨ Microsoft Analysis Services ⇨ Oracle Essbase ⇨ Jedox Palo Physical cubes allow 'write back': what if, forecasting, budgetting & planning
  • 98. 'New' kid on the block: SAP HANA
  • 99. 99 The promises of the Cloud ⇨ “Utility computing” ⇨ Unlimited capacity ⇨ Pay as you go/by the sip ⇨ Lower costs ⇨ Always up to date ⇨ Invisible OS ⇨ Security ⇨ Safety
  • 100. 100 Cloud still getting Hotter Source: IBM CIO Survey 2011
  • 101. 101 Types of Cloud Solutions Virtualization IaaS (Infrastructure) PaaS (Platform) SaaS (Software) ValueAdded
  • 102. 102 Cloud Cost Components StorageStorage BandwithBandwith SLA/ServiceSLA/Service CPU powerCPU power MemoryMemory Data transferData transfer RequestsRequests
  • 104. 104 The trouble with Cloud DWH ⇨ DWH aaS vendors: ⇨ e.g. 1010Data, Kognitio, Vertica, EMC/Greenplum ⇨ more will follow
  • 105. 105 What about No Database at all? Rick F. van der Lans Key element: abstraction (de-coupling)
  • 106. 106 Data Virtualization concept Virtual DB SQL SOAP REST FILE WS-* Information Consumers
  • 107. 107 © 2011 Composite Software, Inc. / Composite Proprietary Example: Composite 6 Discovery Active Cluster Composite Information Server XQuery, Java, WSDL, SCA (Services Centric) Front-end Applications Security Metadata Repository Views, SQLScript (Database Centric) Security Query Engine Cost-based Optimizer Rules-based Optimizer Federation Engine Web Services (HTTP, REST, SOAP, JSON, XQuery) SQL (ODBC, JDBC, ADO.NET) Messaging (JMS) Java (POJO) Web Services (HTTP, SOAP, JSON) Messaging (JMS) Application APIs MF Adapter Java (POJO) Advanced Functions Quality GovernanceCaching SQL (ODBC, JDBC) URI Monitor Manager Studio Performance Plus Adapters Development Environment Runtime Server Environment Management Environment Applications, Big Data Stores, Excel, Flat Files, Mainframes, Messages, OLAP Cubes, RDBMS, Web Services, XML Documents
  • 110. 110 Virtual vs Physical trade offs Source:Source: Mark MadsenMark Madsen
  • 111. 111 The Shootout! ⇨ Things to ask your (potential) vendor ⇨ References ⇨ Assist in a paid POC ⇨ License model & unit of cost: CPU, Core, Server, (raw) Data volume, Memory used ⇨ Free dev/test editions (only pay for production use) ⇨ Support options (updates only, mail/phone support, etc) ⇨ If migrating: trade in discount ⇨ Opt out/de-integration options
  • 112. 112 Does your DB cover the Basics? ⇨ Full SQL 2003 support? ⇨ Easy backup/restore features? ⇨ Scaling up or out? ⇨ Failover & persistency? ⇨ External (management) Tool integration?
  • 113. 113 Which deployment types? ⇨ On Premise ⇨ Saas/Cloud Software only Appliance Vendor/ISPCustomer
  • 117. 117 Beware of Benchmarks ! ⇨ Differences in ⇨# threads ⇨# cores ⇨# disks ⇨# nodes ⇨CPU generation/speed 1. Always use P.O.C. on your own data & query workload 2. Don't trust the MQ's
  • 118. ⇨ Ongoing Market Consolidation ⇨ More additional/alternative storage engines ⇨ Hybrid Row/Column solutions ⇨ Every db will get In DB analytics ⇨ Every db will get Hadoop/MR extensions ⇨ Everything in-memory
  • 119. 119 So what's the best database for BI?
  • 120. 120
  • 121.
  • 122. Web: www.tholis.com Email: jos<at>tholis.com Phone: +31-(0)6-51169606 Skype: tholis.jos LinkedIn: jvdongen Twitter: josvandongen IRC: _grumpy Jos van Dongen In BI since 1991 Principal Consultant Author/Speaker/Analyst Proud member of #BBBT

Hinweis der Redaktion

  1. The original definition of business intelligence
  2. What most people in the BI/DWH department tend to forget is that BI is not about technology, cool dashboards or the fastest analytical database. Nor is it about building ETL flows and publishing 100’s of reports. It is about helping the business user and manager to make more insightful decisions. If a simple Excel spreadsheet gets you there: great! Unfortunately, things are usually more complex than that... Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  3. In order to deliver full business impact, business intelligence must shift from retrospective analysis by experts to mechanisms that make it fully operational in a business context e.g. automatically triggered by external events as well as driven by people making decisions. The former is action within processes, while the latter is more often action on processes. As core business processes become more service oriented, there is increased scope for injecting decision-driven services. The technology evolution of software architecture means we can mix BI and decision services with application services. This allows us to maintain both application-oriented and data-oriented architectures. If business intelligence is going to directly impact business processes then we need a closed loop system to evaluate and improve results on an ongoing basis. This is where the combination of performance management concepts, business process models and data all come together. Current waterfall methods of design and construction are inadequate because they don’t allow evolution in different areas at different speeds, nor do they take into account the service model architecture over the application function-centric architecture.
  4. Data warehouses usually follow a predictable evolution. After over 25 years, we have seen the “stages” companies go through on their path to enterprise data warehousing. Moving from Stage 1 (What Happened?) into Stage 2 (Why Did It Happen?) requires new capabilities for ad hoc analysis. Then as you evolve to Stage 3 (Predicting What Will Happen) you again grow in your platform and database requirements. As you cross the chasm into Stages 4 and 5 (Operational Intelligence) you require a platform capable of “active” analysis.
  5. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  6. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  7. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  8. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  9. Step one is ad-hoc analysis, most frequently done manually and not to strict schedule. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, model building Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  10. Prediction implies automation of processes, systematic. Monitor: passive monitoring, basic description of what’s going on Identify: active monitoring, human intervention may be required, identify exceptions, alerts Explore: examine exceptions, determine what happened, boundaries and data Analyze: determine root causes, more detailed analysis, mdoel building Predict: model problems and processes, determine future outcomes Prescribe: optimize, determine choices between options, define actions to take
  11. The process requirement, and it’s lack in our environments, is coming back in BI, model and tool requirements.
  12. The warehouse concept is no longer a simple database-oriented model. It’s grown up into a large collection of data management, storage, processing and delivery components that must all work together. There have been many changes from the once per night batch oriented design, with a single data model capturing the entire enterprise, and 100% of the organization’s data readily available through a single user interface. We now have operational data stores and other staging areas to address mixed data latencies, different data types, the requirement to manage master data and clean up problems in operational data. In larger environments we’ve created warehouse-mart architectures and offloaded some of the processing or event refined the data further. Data, particularly in the case of planning, scenario modeling / what-if analysis, or scorecards, has writeback requirements. Data types are more varied and complex than SQL standard types. The new view: Data warehouse as a platform. This means meeting application needs as well as traditional BI workloads. We have to think in terms of data and decision services, as well as traditional query-response models. Access to both historical and current data. Multiple storage methods, possibly distributed. Multiple access methods. Data usage decoupled from the underlying platform. More fluid management of data, regardless of location.
  13. Any architecture now will have multiple repositories for data, multiple technologies to cope with the different needs. The primary technology classes line up like this. For most BI programs, the low hanging fruit has been picked. The BI market is changing and BI programs, skills and architectures need to change with it. That means learning about the storage and processing technologies and architectures, and how they can be put together.
  14. Lots of time is wasted on evaluating different solutions; by just take what you already have (MySQL, SQL Server, Oracle, Whatever) lots of time can be saved in your first (pilot) project. For bigger scale efforts: use &amp;apos;Ready To Fly&amp;apos; solutions, either off-premise (Cloud based stuff) or on-premise (Appliances)
  15. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  16. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  17. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  18. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  19. Selecting any solution is a trade off between conflicting goals; often, high performance and low cost don’t go well together; requiring full auditability and real time data access at the same time can also cause problems. For any combination of factors, a decision has to be made what factor has the more weight in a selection process. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;
  20. Seminar Open Source BI November 2008 Tholis Consulting &amp;lt;number&amp;gt;