The document discusses using the Data Vault 2.0 methodology for agile data mining projects. It provides background on a customer segmentation project for a motor insurance company. The Data Vault 2.0 modeling approach is described as well as the CRISP-DM process model. An example is then shown applying several iterations of a decision tree model to a sample database, improving results with each iteration by adding additional attributes to the Data Vault 2.0 model and RapidMiner process. The conclusions state that Data Vault 2.0 provides a flexible data model that supports an agile approach to data mining projects by allowing incremental changes to the model and attributes.
3. INTRODUCTION
Agile Data Mining with DataVault 2.0
Agile 12.02.2014 Data Mining with Data Vault 2.0 3
4. TIMO CIRKEL
ď BI-Consultant
ď Certified Data Vault 2.0 Practitioner
ď Analysis Of Policyholders
ď Specialized inCRM, Software Development,
DWHAutomation
ď Industries: Insurance, Energy
ď B. Sc. Business Informatics
12.02.2014 Agile Data Mining with Data Vault 2.0 4
5. MICHAEL OLSCHIMKE
ď Senior BI-Consultant
ď Certified Data Vault 2.0 Practitioner
ď Official Data Vault 2.0 Trainer in Europe
ď AssociateTeacher University of Hannover
ď Specializing in Data Vault 2.0, Data Mining,
CRM, project management
ď Industries: Insurance, Automotive, Retail,
Public Sector, Non-Profits
12.02.2014 Agile Data Mining with Data Vault 2.0 5
6. ⢠Medium-sized consulting firm
⢠Official Partner of Dan Linstedt In
Europe
⢠Consulting, Training,
Implementation
⢠Industries:
⢠Insurance
⢠Automotive
⢠Banks
⢠Trade
⢠Pharmaceuticals
⢠Telecommunications
DĂRFFLER & PARTNER GMBH
12.02.2014 Agile Data Mining With Data Vault 2.0 6
7. BACKGROUND
Agile Data Mining with DataVault 2.0
Agile 12.02.2014 Data Mining with Data Vault 2.0 7
8. DATA MINING PROJECT IN THE VGH
ď Motor insurance
ď Customer segmentation
ď A first datamining pilot, therefore:
ď No specific requirements
ď Vision is developed during project
ď Agile Project Methodology
ď Close co-operation with business
12.02.2014 Agile Data Mining with Data Vault 2.0 8
9. ⢠Extracting
information from
existing data and
Patterns
⢠Four (large)
categories:
⢠Segmentation
⢠Classification
⢠Prediction
⢠Association
⢠Wide range of
available algorithms
and methods
DATA MINING PROJECTS
"The term Data Mining ... describes
the extraction implicitly existing,
non-trivial and useful knowledge
from large, dynamic, relatively
complex structured data."
Datenbank
Anwendung
Anwender
Data-Mining-
Techniken
Aussagen, Regeln &
Informationen
Data Dictionary
Fachwissen
12.02.2014 Agile Data Mining with Data Vault 2.0 9
10. DATA VAULT 2.0 MODELING
Surrogate
Key
Business
Keys
Foreign Keys
Descriptors
In accordance with its own representation Linstedt, 2014
12.02.2014 Agile Data Mining with Data Vault 2.0 10
11. DATA VAULT 2.0 METHODOLOGY
Data Vault
2.0
Methodology
Six
Sigma
TQM
Scrum CMMI
PMP
SDLC
12.02.2014 Agile Data Mining with Data Vault 2.0 11
12. DATA VAULT 2.0 METHODOLOGY FOR DATA MINING
Advantages
⢠Agile project management for DWH projects
⢠Automation and generation
⢠Rapid adoption to changes in the model
⢠Incremental build-out = incremental cost control
⢠Targeted delivery = two week sprints
⢠Predictable and measurable results
Disadvantages
⢠Focus on loading of raw data and the production
of information
⢠Not many data mining references
⢠Many concepts in the methodology are not
applicable for data mining projects
⢠Difficult scaling of team sizes in data mining
projects
12.02.2014 Agile Data Mining with Data Vault 2.0 12
13. CRISP-DM
Own Representation in accordance with Chapman, et al. , 2000
12.02.2014 Agile Data Mining with Data Vault 2.0 13
14. PROCESS MODEL
Prozessmodell â VGH Kundensegmentierung
ivv KTC D & P
Daten in Data Vault
Modell speichern
Daten abziehen
Algorithmus
auswählen
Segmentierung
ausfĂźhren
Ergebnis erzielt?
Ja
Ergebnis
präsentieren
Ergebnis ok?
Ende
Ja
Start
GĂźtefunktion
erarbeiten
SQL-Query erstellen
Relevante VN-Attribute
ermitteln
Nein Formel ok?
Ja
Nein
Algorithmen
erforschen
Nein
Geeigneter
Algorithmus
gefunden?
Ja
Nein
12.02.2014 Agile Data Mining with Data Vault 2.0 14
15. RAPIDMINER
ď Java-based
data
mining
software
ď One of
the most
widely used
data mining
tools
ď Offers
ď Environment fo
r control flow
ď Large number
of algorithms
ď Large choice
of data sources
Overall CorporaTE Consultants Academics NGO / GOV'T
Š 2012 Rexer AnalYTICS
12.02.2014 Agile Data Mining with Data Vault 2.0 15
16. EXAMPLE
Agile Data Mining with DataVault 2.0
Agile 12.02.2014 Data Mining with Data Vault 2.0 16
17. EXAMPLE
ď AdventureWorks-Database
ď Scenario:
ď Advertising campaign for a new bike
ď Identification of the target group
ď Solution:
ď Decision Tree
ď Identify relevant attributes in several iterations
Lachev, 2005, p. 238ff
Simple
Example
12.02.2014 Agile Data Mining with Data Vault 2.0 17
18. Agile Data Mining with Data Vault 2.0 18
10066 Records
Attribute
Marital
Status
Gender
Yearly
Income
Total
Children
Education
Number Cars
Owned
Commute
Distance
Occupation
House Owner
Flag
Age
19. ITERATION 1: DATA VAULT 2.0 MODEL
English
Education
Numbers Cars
Owned
Gender
Marital Status
Sat
Customer
Hub
Customer
Customer Key
Commute
Distance
Age
House Owner
Flag
English
Occupation
Sat Category
Product
Category
12.02.2014 Agile Data Mining with Data Vault 2.0 19
20. ITERATION 1: RAPIDMINER PROCESS
Data Gathering
Data preparation
Modeling
12.02.2014 Agile Data Mining with Data Vault 2.0 20
23. ITERATION 2: DATA VAULT 2.0 MODEL
English
Education
Numbers Cars
Owned
Gender
Marital Status
Sat
Customer
Hub
Customer
Sat Customer
Income
Customer Key
Commute
Distance
Age
House Owner
Flag
English
Occupation
Sat Customer
Children
Sat Category
Total
Children
Yearly
Income
Product
Category
12.02.2014 Agile Data Mining with Data Vault 2.0 23
24. ITERATION 2: RAPIDMINER PROCESS
Data Gathering
Preparation Modeling
12.02.2014 Agile Data Mining with Data Vault 2.0 24
26. ITERATION 3: DATA VAULT 2.0 MODEL
English
Education
Numbers Cars
Owned
Gender
Marital Status
Sat
Customer
Hub
Customer
Sat Customer
Income
Customer Key
Commute
Distance
Age
House Owner
Flag
English
Occupation
Sat Customer
Children
Sat Category
Total
Children
Yearly
Income
Product
Category
Commute
Distance Miles
CSat Customer
Distance
12.02.2014 Agile Data Mining with Data Vault 2.0 26
27. ITERATION 3: RAPIDMINER PROCESS
Data Gathering
Preparation Modeling
12.02.2014 Agile Data Mining with Data Vault 2.0 27
29. CONCLUSIONS
Agile Data Mining with DataVault 2.0
Agile 12.02.2014 Data Mining with Data Vault 2.0 29
30. CONCLUSIONS
ď Data Vault is a flexible data
model, with good support for agile project
methodology
ď DataVault is not an additional hurdle in data mining
projects
ď Additional attributes can be added at any time during
the project, in an incremental fashion
ď Business Vault: transparent data processing
12.02.2014 Agile Data Mining with Data Vault 2.0 30
31. FURTHER INFORMATION
Appears
2015
Available
Www.doerffler.com WWW.datavault.de Www.learndatavault.com
Appears
2015
12.02.2014 Agile Data Mining with Data Vault 2.0 31
32. Give us feedback
Agile Data Mining with Data Vault 2.0 32
Http://goo.gl/LGO4ze
Source:Vasilijonline.com
12.02.2014
Hinweis der Redaktion
In This Slides Only The logos Replace. To Try it out New Design /Discuss Have We No Time
Short On the DM Project In The VGH Comment.
On the BI Spectrum Article Point out
Objectives The Project
Used Tools. Crisp-DM Used. Etc.
GGF. For more Slides Open
Name The insurance?
No specific requirements
Attributes evolve over time
"Customer" does not exactly define first
Only private clients or companies?
Policyholders or vehicle owners?
What kinds of contracts?
How are "good" customers?
Hubs, Left, Satellite Short Explains With VDV. Take a look at In the Folder Sources, There Can You You Use.
We can no data and Findings of the VGH present
Therefore to avoid AdventureWorks
Setup took over from book
Short On Adenture Works DW Comment
Background Information
Model of the Relevant Tables
25 Attributes, 500k Records