The document discusses using the Data Vault 2.0 methodology for agile data mining projects. It provides background on a customer segmentation project for a motor insurance company. The Data Vault 2.0 modeling approach is described as well as the CRISP-DM process model. An example is then shown applying several iterations of a decision tree model to a sample database, improving results with each iteration by adding additional attributes to the Data Vault 2.0 model and RapidMiner process. The conclusions state that Data Vault 2.0 provides a flexible data model that supports an agile approach to data mining projects by allowing incremental changes to the model and attributes.
3. INTRODUCTION
Agile Data Mining with DataVault 2.0
Agile 12.02.2014 Data Mining with Data Vault 2.0 3
4. TIMO CIRKEL
BI-Consultant
Certified Data Vault 2.0 Practitioner
Analysis Of Policyholders
Specialized inCRM, Software Development,
DWHAutomation
Industries: Insurance, Energy
B. Sc. Business Informatics
12.02.2014 Agile Data Mining with Data Vault 2.0 4
5. MICHAEL OLSCHIMKE
Senior BI-Consultant
Certified Data Vault 2.0 Practitioner
Official Data Vault 2.0 Trainer in Europe
AssociateTeacher University of Hannover
Specializing in Data Vault 2.0, Data Mining,
CRM, project management
Industries: Insurance, Automotive, Retail,
Public Sector, Non-Profits
12.02.2014 Agile Data Mining with Data Vault 2.0 5
6. • Medium-sized consulting firm
• Official Partner of Dan Linstedt In
Europe
• Consulting, Training,
Implementation
• Industries:
• Insurance
• Automotive
• Banks
• Trade
• Pharmaceuticals
• Telecommunications
DÖRFFLER & PARTNER GMBH
12.02.2014 Agile Data Mining With Data Vault 2.0 6
7. BACKGROUND
Agile Data Mining with DataVault 2.0
Agile 12.02.2014 Data Mining with Data Vault 2.0 7
8. DATA MINING PROJECT IN THE VGH
Motor insurance
Customer segmentation
A first datamining pilot, therefore:
No specific requirements
Vision is developed during project
Agile Project Methodology
Close co-operation with business
12.02.2014 Agile Data Mining with Data Vault 2.0 8
9. • Extracting
information from
existing data and
Patterns
• Four (large)
categories:
• Segmentation
• Classification
• Prediction
• Association
• Wide range of
available algorithms
and methods
DATA MINING PROJECTS
"The term Data Mining ... describes
the extraction implicitly existing,
non-trivial and useful knowledge
from large, dynamic, relatively
complex structured data."
Datenbank
Anwendung
Anwender
Data-Mining-
Techniken
Aussagen, Regeln &
Informationen
Data Dictionary
Fachwissen
12.02.2014 Agile Data Mining with Data Vault 2.0 9
10. DATA VAULT 2.0 MODELING
Surrogate
Key
Business
Keys
Foreign Keys
Descriptors
In accordance with its own representation Linstedt, 2014
12.02.2014 Agile Data Mining with Data Vault 2.0 10
11. DATA VAULT 2.0 METHODOLOGY
Data Vault
2.0
Methodology
Six
Sigma
TQM
Scrum CMMI
PMP
SDLC
12.02.2014 Agile Data Mining with Data Vault 2.0 11
12. DATA VAULT 2.0 METHODOLOGY FOR DATA MINING
Advantages
• Agile project management for DWH projects
• Automation and generation
• Rapid adoption to changes in the model
• Incremental build-out = incremental cost control
• Targeted delivery = two week sprints
• Predictable and measurable results
Disadvantages
• Focus on loading of raw data and the production
of information
• Not many data mining references
• Many concepts in the methodology are not
applicable for data mining projects
• Difficult scaling of team sizes in data mining
projects
12.02.2014 Agile Data Mining with Data Vault 2.0 12
13. CRISP-DM
Own Representation in accordance with Chapman, et al. , 2000
12.02.2014 Agile Data Mining with Data Vault 2.0 13
14. PROCESS MODEL
Prozessmodell – VGH Kundensegmentierung
ivv KTC D & P
Daten in Data Vault
Modell speichern
Daten abziehen
Algorithmus
auswählen
Segmentierung
ausführen
Ergebnis erzielt?
Ja
Ergebnis
präsentieren
Ergebnis ok?
Ende
Ja
Start
Gütefunktion
erarbeiten
SQL-Query erstellen
Relevante VN-Attribute
ermitteln
Nein Formel ok?
Ja
Nein
Algorithmen
erforschen
Nein
Geeigneter
Algorithmus
gefunden?
Ja
Nein
12.02.2014 Agile Data Mining with Data Vault 2.0 14
16. EXAMPLE
Agile Data Mining with DataVault 2.0
Agile 12.02.2014 Data Mining with Data Vault 2.0 16
17. EXAMPLE
AdventureWorks-Database
Scenario:
Advertising campaign for a new bike
Identification of the target group
Solution:
Decision Tree
Identify relevant attributes in several iterations
Lachev, 2005, p. 238ff
Simple
Example
12.02.2014 Agile Data Mining with Data Vault 2.0 17
18. Agile Data Mining with Data Vault 2.0 18
10066 Records
Attribute
Marital
Status
Gender
Yearly
Income
Total
Children
Education
Number Cars
Owned
Commute
Distance
Occupation
House Owner
Flag
Age
19. ITERATION 1: DATA VAULT 2.0 MODEL
English
Education
Numbers Cars
Owned
Gender
Marital Status
Sat
Customer
Hub
Customer
Customer Key
Commute
Distance
Age
House Owner
Flag
English
Occupation
Sat Category
Product
Category
12.02.2014 Agile Data Mining with Data Vault 2.0 19
20. ITERATION 1: RAPIDMINER PROCESS
Data Gathering
Data preparation
Modeling
12.02.2014 Agile Data Mining with Data Vault 2.0 20
23. ITERATION 2: DATA VAULT 2.0 MODEL
English
Education
Numbers Cars
Owned
Gender
Marital Status
Sat
Customer
Hub
Customer
Sat Customer
Income
Customer Key
Commute
Distance
Age
House Owner
Flag
English
Occupation
Sat Customer
Children
Sat Category
Total
Children
Yearly
Income
Product
Category
12.02.2014 Agile Data Mining with Data Vault 2.0 23
24. ITERATION 2: RAPIDMINER PROCESS
Data Gathering
Preparation Modeling
12.02.2014 Agile Data Mining with Data Vault 2.0 24
26. ITERATION 3: DATA VAULT 2.0 MODEL
English
Education
Numbers Cars
Owned
Gender
Marital Status
Sat
Customer
Hub
Customer
Sat Customer
Income
Customer Key
Commute
Distance
Age
House Owner
Flag
English
Occupation
Sat Customer
Children
Sat Category
Total
Children
Yearly
Income
Product
Category
Commute
Distance Miles
CSat Customer
Distance
12.02.2014 Agile Data Mining with Data Vault 2.0 26
27. ITERATION 3: RAPIDMINER PROCESS
Data Gathering
Preparation Modeling
12.02.2014 Agile Data Mining with Data Vault 2.0 27
29. CONCLUSIONS
Agile Data Mining with DataVault 2.0
Agile 12.02.2014 Data Mining with Data Vault 2.0 29
30. CONCLUSIONS
Data Vault is a flexible data
model, with good support for agile project
methodology
DataVault is not an additional hurdle in data mining
projects
Additional attributes can be added at any time during
the project, in an incremental fashion
Business Vault: transparent data processing
12.02.2014 Agile Data Mining with Data Vault 2.0 30
31. FURTHER INFORMATION
Appears
2015
Available
Www.doerffler.com WWW.datavault.de Www.learndatavault.com
Appears
2015
12.02.2014 Agile Data Mining with Data Vault 2.0 31
32. Give us feedback
Agile Data Mining with Data Vault 2.0 32
Http://goo.gl/LGO4ze
Source:Vasilijonline.com
12.02.2014
Hinweis der Redaktion
In This Slides Only The logos Replace. To Try it out New Design /Discuss Have We No Time
Short On the DM Project In The VGH Comment.
On the BI Spectrum Article Point out
Objectives The Project
Used Tools. Crisp-DM Used. Etc.
GGF. For more Slides Open
Name The insurance?
No specific requirements
Attributes evolve over time
"Customer" does not exactly define first
Only private clients or companies?
Policyholders or vehicle owners?
What kinds of contracts?
How are "good" customers?
Hubs, Left, Satellite Short Explains With VDV. Take a look at In the Folder Sources, There Can You You Use.
We can no data and Findings of the VGH present
Therefore to avoid AdventureWorks
Setup took over from book
Short On Adenture Works DW Comment
Background Information
Model of the Relevant Tables
25 Attributes, 500k Records
On the First DV model Comment.
Demo in Rapidminer
Also On Measures Comment (Accuracy, Or Precision/recall).
On Best Graphically In Rm Represent.