SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Common BI/Big Data
Challenges and Solutions
By Andriy Zabavskyy
& Serhiy Haziyev

January, 2013
SoftServe BI/Big Data Lunch and Learn Workshop in Utah
January 30, 2013

The Common BI/Big Data Challenges and Solutions presented by seasoned
SoftServe experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of
Software Architecture).
This was a complimentary workshop where attendees had the opportunity to learn,
network and share knowledge during the lunch and education session.

About SoftServe Inc.
SoftServe, founded in 1993, is a leading global outsourced product and application
development company dedicated to empowering businesses worldwide by providing end-toend capabilities from product concept to completion. Utilizing Product Development Services
2.0 (PDS 2.0), we deliver proactive solutions in the areas of SaaS/Cloud, Mobility, BI/Analytics
and UI/UX for industries including Healthcare, Retail, Manufacturing, Logistics, and
Infrastructure & Storage. SoftServe is a rapidly growing global company with 3,000
professionals and offices in North America, Western Europe, Russia and Ukraine.
Agenda

Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Typical BI Solution
Data Sources

Data Integration

OLTP: CRM,
ERP, Finance

Data Warehouse

Data Mining

Users

Predictive
Prescriptive
Analytics

Data
Warehouse

OLAP cubes

Data Visualization
and Analysis

Flat files
ETL/ELT

Big Data

Reports
Dashboards

Spreadsheets

Legacy System

BI Tools

Analysts
Agenda

Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Dashboard & Scorecard
Client

Problem:
▪ Single view from multiple
sources
▪ Track performance against
company targets
Internet

Solution:
▪ Dashboard
▪ KPI and Scorecards

Server Tier
Dashboard & Scorecard: Implementation
Software Vendors Offering

Boxed solutions from
big players

Development Efforts

Customization

(e.g. SAS, SAP, IBI)

Dashboard Frameworks
(e.g. Tableau, QlikView, JasperSoft)

Dashboard libs
(JIDE libs)

Custom defined KPI
Integration Efforts
Custom defined KPI &
Custom built dashboard
framework
Dashboard & Scorecard: Highlights

• Adopting/Customizing of business lines ready
solution could be painful, long and costly process
• Not all dashboard solutions support multitenancy out-of-the-box
Self Service BI
Problem:
▪ Give ability for BI users to
explore and analyze data in
highly customizable manner

BI Users

Data Model

Solution:

Toolset

▪ Expose to users a data model
▪ Give a toolset with data
exploring and analysis
capabilities
OLAP

In-Memory

RDBMS/
NoSQL
Self Service BI: Implementation
• OLAP engines with proper OLAP
viewers
• BI tools with in-memory engines and
semantic/domain layers
• Report Authoring Tools :
– Microsoft Report Builder
– JasperServer Report Designer
Self Service BI: Traditional vs Agile BI Trade-off
Features
Time to Value
Self Service
Collaboration
Interactivity and UX

Customization
Data Quality
Pixel-perfect
Low cost solutions

Traditional

Agile
Self Service BI: Highlights

• Need to educate data consumers to properly use
SSBI tools
• Desktop versions of many SSBI vendors are often
more mature in comparison to Web tools
• In-memory capabilities are limited by RAM size
Agenda

Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Data Integration Patterns
Scheduled

ETL

ELT

Replication

EAI
EII
Real-time
Message/Record based

Large data sets

Source: Microsoft EDW Architecture, Guidance and Deployment Best Practices
ELT
Problem:
• Efficiently processing very
large volumes of data within
ever shortening processing
windows

Solution:
• Perform transformation steps
on target platform
• Set-based processing

Data Warehouse

Semantic Layer
Load

Staging Layer

Transform

Source

Source

Extract
ELT: Highlights

• Some data integration platforms have clearly separated
ETL and ELT components
• Consider usage of custom scripts native to target
platform vs. built-in DI component
ETL vs. ELT
ETL
Flow

Advantages

Disadvantages

ELT

 Data pipeline are used
 Transformations to the data one
record at a time
 Intermediate data results are
stored in memory

 Data is loaded into the
destination server
 Set-based processing
 Transformations and Lookups
are within the SQL

 Complex transformations
 Intermediate results in memory
is faster than persisting to disk

 The power of the relational
database system can be
utilized for very large data
sets

 Large data sets could
 Load on RDBMS
overwhelm the memory
 More disk activity
 Updates are more efficient using
set-based processing
Agenda

Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Kimball’s Multidimensional EDW
Problem:
• Integrate and consolidate data
from heterogeneous sources
• Keep data history

Data Warehouse

Solution:
• Use multidimensional model to
store data
• Iterate by business lines
• Integrate by conformed
dimensions

Data Sources
Kimball vs. Inmon
Sources

Data Integration and Data Warehousing

3NF

Inmon Approach

Kimball Approach

Visualization
Kimball vs. Inmon
Inmon

Kimball

Overall
Approach

Top-down

Bottom-up

Data
orientation

Subject- or data driven

Process oriented

Data
Modeling

Traditional

Multidimensional

Primary
Audience

IT professionals

End users
DWH: Implementation
• Trasitional RDBMS

• Analytical Column-based RDBMS
DWH: Highlights

Implications of column-based storage:
– Additional columns vs. Junked dimensions
– Update scenarios should be omitted where
possible
– Partitions scenario should be carefully established
to support maintenance activities
Agenda
Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Big Data

Big Data axis
Big Data: Hybrid Approach
Problem:
• Under big data circumstances:
– Flexible online analytics
– Access to most detailed raw
data

Operational and
Historical Analytics

Solution:
• Analytical RDBMS for online
analytics
• NoSQL DB as source for
RDBMS and most detailed row
data

NoSQL
RDBMS/DW

Source
Big Data: Implementation
Sample of Hybrid Approach in HP Operational Analytics Architecture
Tape Library

HDFS

Disk Array

Throughput
(600 GB load time)

140-500 MB/s
(0.3-1.2 h)

10-30 MB/s
(5.5-16 h)

50-700 MB/s
(0.25-4 h)

2-40 MB/s
(83h)

Max capacity

30-900 PB

21+ PB

16 PB

~Unlimited

Max file size

~Unlimited

~Unlimited

4 – 16 TB (OSlimited)

Accessibility

SAN

Java API, HTTP,
NFS (MapR)

NFS, CIFS, SAN

REST, SOAP

Scalability

Adding cartridges

Adding nodes

Adding disks

Pay-as-you-go

Reliability

Redundancy

Redundancy
(MapR)

Redundancy

99.99%

Encryption

Yes

Yes*

Yes*

Yes

By datacenter

By datacenter

By datacenter

By Amazon

?

No

Yes

Yes

Yes

No

No

Yes

Yes**

100 TB Cost

$40-60K

$100-200K

$80-400K

$132-216K/year

$12-96K/year

1 PB Cost

$90-140K

$1-2M

$0.5-4M

$1.1-1.6M/year

$120-360K/year

15 PB Cost

$0.7-1.2M

$15-30M

~$18M

$9.9-15M/year

$1.8-3.5M/year

HIPAA Compliancy
Random access
Parallel processing

Retention Storage

Requirements

Operation Storage

Big Data isn’t only Hadoop
Amazon S3

Amazon
Glacier

5 TB

40 TB

No
Big Data: Highlights

• Clickstream analysis is a classic use case
• Scheduled reports are well suited for Hadoop based
reports
• Majority of Self Service BI tools need relational
representation of data
Agenda
Data
Visualization
Data
Mining

Big Data

Data
Integration

Data
Warehousing
Prediction of Customer Loyalty
Problem:

Prediction

• Predict customer loyalty; profitability

Solution:
• Logistic regression algorithm
• Support vector machines

DM Tool

Historical Data

Algorithm
Recommendation System
Problem:

Recommendation

• Recommend to customers the most
suitable goods

Solution:

DM Tool

• K-means clustering algorithm
• Collaborative filtering

Historical Data

Algorithm
DM Models: Implementation

• Custom algorithm implementation
• Statistical packages like R
• Ready data mining model implementations
DM Models: Highlights

• The approach should be:
Problem -> Data Strategy -> Data analysis
… and not vice versa
• DM Algorithms should be carefully selected
• DM Algorithms are highly dependent on business
domain you create them for
SoftServe BI Maturity Model
• Improving the business

Wisdom

• decision making (executives)
• data mining, forecasting

• Gaining business insight

Knowledge

• analytical reports (analysts)
• dashboards, KPIs, scorecards, slice & dice, data
warehouse, OLAP

• Measuring and monitoring

Information

• consolidated reports (managers)
• charts, parametrized reports, dedicated
reporting database

• Running the business

Data

• personal operational reports
(workers, customers)
• simple reports, OLTP or files
SoftServe BI/BigData Expertise
Big Data and NoSQL

Data Integration

Data Warehouse

BI Platforms
More Info about SoftServe BI Offerings

 http://www.softserveinc.com/en-us/services/software-architecture/
 http://www.softserveinc.com/en-us/services/bi-analytics/

Weitere ähnliche Inhalte

Was ist angesagt?

DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURESachin Batham
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15madynav
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence ArchitecturePhilippe Julio
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewNagaraj Yerram
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business IntelligenceAlmog Ramrajkar
 
Bi presentation Designing and Implementing Business Intelligence Systems
Bi presentation   Designing and Implementing Business Intelligence SystemsBi presentation   Designing and Implementing Business Intelligence Systems
Bi presentation Designing and Implementing Business Intelligence SystemsVispi Munshi
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
Benefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBenefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBeing Topper
 
BISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in healthBISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in healthalbertisern
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Erik Fransen
 
Microsoft business intelligence
Microsoft business intelligenceMicrosoft business intelligence
Microsoft business intelligenceJawad Mohmand
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellHPDutchWorld
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analyticsRob Winters
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Edureka!
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-AshishGuleria
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcastWilfried Hoge
 
E-Commerce and In-Memory Computing: Crossing the Scalability Chasm
E-Commerce and In-Memory Computing: Crossing the Scalability ChasmE-Commerce and In-Memory Computing: Crossing the Scalability Chasm
E-Commerce and In-Memory Computing: Crossing the Scalability ChasmAli Hodroj
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehouseDenodo
 

Was ist angesagt? (20)

DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURE
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
Project+team+1 slides (2)
Project+team+1 slides (2)Project+team+1 slides (2)
Project+team+1 slides (2)
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
Bi presentation Designing and Implementing Business Intelligence Systems
Bi presentation   Designing and Implementing Business Intelligence SystemsBi presentation   Designing and Implementing Business Intelligence Systems
Bi presentation Designing and Implementing Business Intelligence Systems
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
Benefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBenefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topper
 
BISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in healthBISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in health
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
 
Microsoft business intelligence
Microsoft business intelligenceMicrosoft business intelligence
Microsoft business intelligence
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan Hartwell
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcast
 
E-Commerce and In-Memory Computing: Crossing the Scalability Chasm
E-Commerce and In-Memory Computing: Crossing the Scalability ChasmE-Commerce and In-Memory Computing: Crossing the Scalability Chasm
E-Commerce and In-Memory Computing: Crossing the Scalability Chasm
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
 

Andere mochten auch

What you Need to Know about Machine Learning?
What you Need to Know about Machine Learning?What you Need to Know about Machine Learning?
What you Need to Know about Machine Learning?ESRI Bulgaria
 
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)plarsen67
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisDatameer
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopHortonworks
 
SAP’s vision and strategy on BI & BIG (and small) data
SAP’s vision and strategy on BI & BIG (and small) dataSAP’s vision and strategy on BI & BIG (and small) data
SAP’s vision and strategy on BI & BIG (and small) dataWaldemar Adams
 
Implementing business intelligence
Implementing business intelligenceImplementing business intelligence
Implementing business intelligenceAlistair Sergeant
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020Anjan Roy, PMP
 
Innovations in telecom
Innovations in telecomInnovations in telecom
Innovations in telecomYulia Myronova
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Jordan Chung
 
Rick Bicc Foundation Services
Rick   Bicc Foundation ServicesRick   Bicc Foundation Services
Rick Bicc Foundation Servicesdfwcug
 
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGYGeorge Beaton
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BIDeZyre
 
Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016
Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016
Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016Micropole Group
 
Webinar: Bring Your Data to Life with Power BI-2016-01-28
Webinar: Bring Your Data to Life with Power BI-2016-01-28Webinar: Bring Your Data to Life with Power BI-2016-01-28
Webinar: Bring Your Data to Life with Power BI-2016-01-28TechSoup
 
Power BI Desktop screen tour in Thai
Power BI Desktop screen tour in ThaiPower BI Desktop screen tour in Thai
Power BI Desktop screen tour in ThaiPanaEk Warawit
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolAlex Rayón Jerez
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best PracticesYellowfin
 

Andere mochten auch (20)

What you Need to Know about Machine Learning?
What you Need to Know about Machine Learning?What you Need to Know about Machine Learning?
What you Need to Know about Machine Learning?
 
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
SAP’s vision and strategy on BI & BIG (and small) data
SAP’s vision and strategy on BI & BIG (and small) dataSAP’s vision and strategy on BI & BIG (and small) data
SAP’s vision and strategy on BI & BIG (and small) data
 
Implementing business intelligence
Implementing business intelligenceImplementing business intelligence
Implementing business intelligence
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020
 
Innovations in telecom
Innovations in telecomInnovations in telecom
Innovations in telecom
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Rick Bicc Foundation Services
Rick   Bicc Foundation ServicesRick   Bicc Foundation Services
Rick Bicc Foundation Services
 
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016
Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016
Matinée Micropole DE LA BI A LA DATA INTELLIGENCE 18-10-2016
 
Webinar: Bring Your Data to Life with Power BI-2016-01-28
Webinar: Bring Your Data to Life with Power BI-2016-01-28Webinar: Bring Your Data to Life with Power BI-2016-01-28
Webinar: Bring Your Data to Life with Power BI-2016-01-28
 
VBA
VBAVBA
VBA
 
Power BI Desktop screen tour in Thai
Power BI Desktop screen tour in ThaiPower BI Desktop screen tour in Thai
Power BI Desktop screen tour in Thai
 
080827 abramson inmon vs kimball
080827 abramson   inmon vs kimball080827 abramson   inmon vs kimball
080827 abramson inmon vs kimball
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best Practices
 

Ähnlich wie SoftServe BI/BigData Workshop in Utah

Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateCCG
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Data Pipelines and Tools to Integrate with Power BI and Spotfire.pdf
Data Pipelines and Tools to Integrate with Power BI and Spotfire.pdfData Pipelines and Tools to Integrate with Power BI and Spotfire.pdf
Data Pipelines and Tools to Integrate with Power BI and Spotfire.pdfGregKreutzer2
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview Rajesh Menon
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 

Ähnlich wie SoftServe BI/BigData Workshop in Utah (20)

Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Data Pipelines and Tools to Integrate with Power BI and Spotfire.pdf
Data Pipelines and Tools to Integrate with Power BI and Spotfire.pdfData Pipelines and Tools to Integrate with Power BI and Spotfire.pdf
Data Pipelines and Tools to Integrate with Power BI and Spotfire.pdf
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

SoftServe BI/BigData Workshop in Utah

  • 1. Common BI/Big Data Challenges and Solutions By Andriy Zabavskyy & Serhiy Haziyev January, 2013
  • 2. SoftServe BI/Big Data Lunch and Learn Workshop in Utah January 30, 2013 The Common BI/Big Data Challenges and Solutions presented by seasoned SoftServe experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture). This was a complimentary workshop where attendees had the opportunity to learn, network and share knowledge during the lunch and education session. About SoftServe Inc. SoftServe, founded in 1993, is a leading global outsourced product and application development company dedicated to empowering businesses worldwide by providing end-toend capabilities from product concept to completion. Utilizing Product Development Services 2.0 (PDS 2.0), we deliver proactive solutions in the areas of SaaS/Cloud, Mobility, BI/Analytics and UI/UX for industries including Healthcare, Retail, Manufacturing, Logistics, and Infrastructure & Storage. SoftServe is a rapidly growing global company with 3,000 professionals and offices in North America, Western Europe, Russia and Ukraine.
  • 4. Typical BI Solution Data Sources Data Integration OLTP: CRM, ERP, Finance Data Warehouse Data Mining Users Predictive Prescriptive Analytics Data Warehouse OLAP cubes Data Visualization and Analysis Flat files ETL/ELT Big Data Reports Dashboards Spreadsheets Legacy System BI Tools Analysts
  • 6. Dashboard & Scorecard Client Problem: ▪ Single view from multiple sources ▪ Track performance against company targets Internet Solution: ▪ Dashboard ▪ KPI and Scorecards Server Tier
  • 7. Dashboard & Scorecard: Implementation Software Vendors Offering Boxed solutions from big players Development Efforts Customization (e.g. SAS, SAP, IBI) Dashboard Frameworks (e.g. Tableau, QlikView, JasperSoft) Dashboard libs (JIDE libs) Custom defined KPI Integration Efforts Custom defined KPI & Custom built dashboard framework
  • 8. Dashboard & Scorecard: Highlights • Adopting/Customizing of business lines ready solution could be painful, long and costly process • Not all dashboard solutions support multitenancy out-of-the-box
  • 9. Self Service BI Problem: ▪ Give ability for BI users to explore and analyze data in highly customizable manner BI Users Data Model Solution: Toolset ▪ Expose to users a data model ▪ Give a toolset with data exploring and analysis capabilities OLAP In-Memory RDBMS/ NoSQL
  • 10. Self Service BI: Implementation • OLAP engines with proper OLAP viewers • BI tools with in-memory engines and semantic/domain layers • Report Authoring Tools : – Microsoft Report Builder – JasperServer Report Designer
  • 11. Self Service BI: Traditional vs Agile BI Trade-off Features Time to Value Self Service Collaboration Interactivity and UX Customization Data Quality Pixel-perfect Low cost solutions Traditional Agile
  • 12. Self Service BI: Highlights • Need to educate data consumers to properly use SSBI tools • Desktop versions of many SSBI vendors are often more mature in comparison to Web tools • In-memory capabilities are limited by RAM size
  • 14. Data Integration Patterns Scheduled ETL ELT Replication EAI EII Real-time Message/Record based Large data sets Source: Microsoft EDW Architecture, Guidance and Deployment Best Practices
  • 15. ELT Problem: • Efficiently processing very large volumes of data within ever shortening processing windows Solution: • Perform transformation steps on target platform • Set-based processing Data Warehouse Semantic Layer Load Staging Layer Transform Source Source Extract
  • 16. ELT: Highlights • Some data integration platforms have clearly separated ETL and ELT components • Consider usage of custom scripts native to target platform vs. built-in DI component
  • 17. ETL vs. ELT ETL Flow Advantages Disadvantages ELT  Data pipeline are used  Transformations to the data one record at a time  Intermediate data results are stored in memory  Data is loaded into the destination server  Set-based processing  Transformations and Lookups are within the SQL  Complex transformations  Intermediate results in memory is faster than persisting to disk  The power of the relational database system can be utilized for very large data sets  Large data sets could  Load on RDBMS overwhelm the memory  More disk activity  Updates are more efficient using set-based processing
  • 19. Kimball’s Multidimensional EDW Problem: • Integrate and consolidate data from heterogeneous sources • Keep data history Data Warehouse Solution: • Use multidimensional model to store data • Iterate by business lines • Integrate by conformed dimensions Data Sources
  • 20. Kimball vs. Inmon Sources Data Integration and Data Warehousing 3NF Inmon Approach Kimball Approach Visualization
  • 21. Kimball vs. Inmon Inmon Kimball Overall Approach Top-down Bottom-up Data orientation Subject- or data driven Process oriented Data Modeling Traditional Multidimensional Primary Audience IT professionals End users
  • 22. DWH: Implementation • Trasitional RDBMS • Analytical Column-based RDBMS
  • 23. DWH: Highlights Implications of column-based storage: – Additional columns vs. Junked dimensions – Update scenarios should be omitted where possible – Partitions scenario should be carefully established to support maintenance activities
  • 26. Big Data: Hybrid Approach Problem: • Under big data circumstances: – Flexible online analytics – Access to most detailed raw data Operational and Historical Analytics Solution: • Analytical RDBMS for online analytics • NoSQL DB as source for RDBMS and most detailed row data NoSQL RDBMS/DW Source
  • 27. Big Data: Implementation Sample of Hybrid Approach in HP Operational Analytics Architecture
  • 28. Tape Library HDFS Disk Array Throughput (600 GB load time) 140-500 MB/s (0.3-1.2 h) 10-30 MB/s (5.5-16 h) 50-700 MB/s (0.25-4 h) 2-40 MB/s (83h) Max capacity 30-900 PB 21+ PB 16 PB ~Unlimited Max file size ~Unlimited ~Unlimited 4 – 16 TB (OSlimited) Accessibility SAN Java API, HTTP, NFS (MapR) NFS, CIFS, SAN REST, SOAP Scalability Adding cartridges Adding nodes Adding disks Pay-as-you-go Reliability Redundancy Redundancy (MapR) Redundancy 99.99% Encryption Yes Yes* Yes* Yes By datacenter By datacenter By datacenter By Amazon ? No Yes Yes Yes No No Yes Yes** 100 TB Cost $40-60K $100-200K $80-400K $132-216K/year $12-96K/year 1 PB Cost $90-140K $1-2M $0.5-4M $1.1-1.6M/year $120-360K/year 15 PB Cost $0.7-1.2M $15-30M ~$18M $9.9-15M/year $1.8-3.5M/year HIPAA Compliancy Random access Parallel processing Retention Storage Requirements Operation Storage Big Data isn’t only Hadoop Amazon S3 Amazon Glacier 5 TB 40 TB No
  • 29. Big Data: Highlights • Clickstream analysis is a classic use case • Scheduled reports are well suited for Hadoop based reports • Majority of Self Service BI tools need relational representation of data
  • 31. Prediction of Customer Loyalty Problem: Prediction • Predict customer loyalty; profitability Solution: • Logistic regression algorithm • Support vector machines DM Tool Historical Data Algorithm
  • 32. Recommendation System Problem: Recommendation • Recommend to customers the most suitable goods Solution: DM Tool • K-means clustering algorithm • Collaborative filtering Historical Data Algorithm
  • 33. DM Models: Implementation • Custom algorithm implementation • Statistical packages like R • Ready data mining model implementations
  • 34. DM Models: Highlights • The approach should be: Problem -> Data Strategy -> Data analysis … and not vice versa • DM Algorithms should be carefully selected • DM Algorithms are highly dependent on business domain you create them for
  • 35. SoftServe BI Maturity Model • Improving the business Wisdom • decision making (executives) • data mining, forecasting • Gaining business insight Knowledge • analytical reports (analysts) • dashboards, KPIs, scorecards, slice & dice, data warehouse, OLAP • Measuring and monitoring Information • consolidated reports (managers) • charts, parametrized reports, dedicated reporting database • Running the business Data • personal operational reports (workers, customers) • simple reports, OLTP or files
  • 36. SoftServe BI/BigData Expertise Big Data and NoSQL Data Integration Data Warehouse BI Platforms
  • 37. More Info about SoftServe BI Offerings  http://www.softserveinc.com/en-us/services/software-architecture/  http://www.softserveinc.com/en-us/services/bi-analytics/

Hinweis der Redaktion

  1. Split DW and BD
  2. Split DW and BD
  3. Split DW and BD