SlideShare ist ein Scribd-Unternehmen logo
1 von 131
Downloaden Sie, um offline zu lesen
Agile Data Warehouse
The Final Frontier
@tbunio
bornagainagilist.wordpress.com
www.protegra.com
Terry Bunio
Who Am I?
• Data Base Administrator
– Oracle, SQL Server, ADABAS
• Data Architect
– Investors Group, LPL Financial, Manitoba
Blue Cross, Assante Financial
• Agilist
– Innovation Gamer, Team Member, Project
Manager, PMO on SAP Implementation
Learning Objectives
• Learn about how a Data Warehouse Project
can be Agile
• Introduce Agile practices that can help to be
DWAgile
• Introduce DW practices that can help to be
DWAgile
What is Agile?
• Deliver frequently as possible
• Minimize Inventory
– All work that doesn’t directly contribute to
delivering value to the client
– Typically value is realized by code
Enterpise Models
Spock Method
Visualization
Spectre of the Agility
Database/Data Warehouse Architecture
DWAgile Practices
Data Warehouse
• Definition
– “a database used for reporting and data analysis.
It is a central repository of data which is created
by integrating data from multiple disparate
sources. Data warehouses store current as well
as historical data and are commonly used for
creating trending reports for senior management
reporting such as annual and quarterly
comparisons.” – Wikipedia.org
Data Warehouse
• Can refer to:
– Reporting Databases
– Operational Data Stores
– Data Marts
– Enterprise Data Warehouse
– Cubes
– Excel?
– Others
Two sides of Database Design
Two design methods
• Relational
– “Database normalization is the process of organizing
the fields and tables of a relational database to
minimize redundancy and dependency. Normalization
usually involves dividing large tables into smaller (and less
redundant) tables and defining relationships between them.
The objective is to isolate data so that additions, deletions,
and modifications of a field can be made in just one table
and then propagated through the rest of the database via
the defined relationships.”.”
Two design methods
• Dimensional
– “Dimensional modeling always uses the concepts of facts
(measures), and dimensions (context). Facts are typically
(but not always) numeric values that can be aggregated,
and dimensions are groups of hierarchies and descriptors
that define the facts
Relational
• Relational Analysis
– Database design is usually in Third Normal
Form
– Database is optimized for transaction
processing. (OLTP)
– Normalized tables are optimized for
modification rather than retrieval
Normal forms
• 1st - Under first normal form, all occurrences of a
record type must contain the same number of fields.
• 2nd - Second normal form is violated when a non-
key field is a fact about a subset of a key. It is only
relevant when the key is composite
• 3rd - Third normal form is violated when a non-key
field is a fact about another non-key field
Source: William Kent - 1982
Dimensional
• Dimensional Analysis
– Star Schema/Snowflake
– Database is optimized for analytical
processing. (OLAP)
– Facts and Dimensions optimized for retrieval
• Facts – Business events – Transactions
• Dimensions – context for Transactions
– Accounts
– Products
– Date
Relational
Dimensional
Kimball-lytes
• Bottom-up - incremental
– Operational systems feed the Data
Warehouse
– Data Warehouse is a corporate dimensional
model that Data Marts are sourced from
– Data Warehouse is the consolidation of Data
Marts
– Sometimes the Data Warehouse is generated
from Subject area Data Marts
Inmon-ians
• Top-down
– Corporate Information Factory
– Operational systems feed the Data
Warehouse
– Enterprise Data Warehouse is a corporate
relational model that Data Marts are sourced
from
– Enterprise Data Warehouse is the source of
Data Marts
The gist…
• Kimball’s approach is easier to implement as
you are dealing with separate subject areas,
but can be a nightmare to integrate
• Inmon’s approach has more upfront effort to
avoid these consistency problems, but takes
longer to implement.
Spectre of the Agility
Incremental - Kimball
•In Segments
•Detailed Analysis
•Development
•Deploy
•Long Feedback loop
•Considerable changes
•Rework
•Defects
Waterfall - Inmon
•Detailed Analysis
•Large Development
•Large Deploy
•Long Feedback loop
•Extensive changes
•Many Defects
Data
Warehouse
Project
Popular Agile Data Warehouse Pattern
• Son’a method
– Analyze data requirements department by
department
– Create Reports and Facts and Dimensions for
each
– Integrate when you do subsequent
departments
The two problems
• Conforming Dimensions
– A Dimension conforms when it is in
equivalent structure and content
– Is a client defined by Marketing the same as
Finance?
• Probably not
– If the Dimensions do not conform, this
severely hampers the Data Warehouse
The two problems
• Modeling the use of the data versus the data
– By using reporting needs as the primary
foundation for the data model, you are modeling
the use of the data rather than the data
– This will cause more rework in the future as the
use of the data is more likely to change than the
data itself.
Where is she?
Where is the true Agility?
• Iterations not Increments
• Brutal Visibility/Visualization
• Short Feedback loops
• Just enough requirements
• Working on enterprise priorities – not just for
an individual department
Fact
• True iterative development on a Data
Warehouse project is hard – perhaps harder
than a traditional Software Development
project
– ETL, Data Models, and Business Intelligence
stories can have a high impact on other
stories
– Can be difficult to create independent stories
– Stories can have many prerequisites
Fiction
• True iterative development on a Data
Warehouse project is impossible
– ETL, Data Models, and Business Intelligence
stories can be developed iteratively
– Independent stories can be developed
– Stories can have many prerequisites – but
this can be limited
Agile Mindset
• We need to implement an Agile Mindset to
Data Modelling
– What is just enough Data Modelling?
– And do no more…
Our Mission
• “Data... the Final Frontier. These are the
continuing voyages of the starship Agile.
Her on-going mission: to explore strange
new projects, to seek out new value and
new clients, to iteratively go where no
projects have gone before.”
The Prime Directive
The Prime Directive
• Is a vision or philosophy that binds the
actions of Starfleet
• Can an Data Warehouse project truly be
Agile without a Vision of either the Business
Domain or Data Domain?
– Essentially it is then just an Ad Hoc Data
Warehouse. Separate components that may fit
together.
– How do we ensure we are working on the right
priorities for the entire enterprise?
Enterprise Data Model?
Torture
• Why does the creation of Enterprise Data
Models feel like torture?
– Interrogation
– Coercion
– Agreement on Excessive detail without direct
alignment to business value
Enterprise Models
Enterprise Models
Two new models
Agile Enterprise Normalized Data Model
• Confirms the major entities and the
relationships between them
– 30-50 entities
• Confirms the Data Domain
• Starts the definition of a Normalized Data
Model that will be refined over time
– Completed in 1 – 4 weeks
Agile Enterprise Normalized Data Model
• Is just enough to understand the data
domain so that the iterations can proceed
• Is not mapping all the attributes
– Is not BDUF
• Is an Information Map for the Data Domain
• Contains placeholders for refinement
– Like a User Story Map
Agile Enterprise Dimensional Data Model
• Confirms the Business Objects and the
relationships between them
– 10-15 entities
• Confirms the Business Domains
• Starts the definition of a Dimensional Data
Model that will be refined over time
– Completed in 1 – 2 weeks
Agile Enterprise Dimensional Data Model
• Is just enough to understand the business
domain so that the iterations can proceed
– And to validate the understanding of the data
domain
• Is not mapping all the attributes
– Is not BDUF
• Is an Information Map for the Business Domain
• Contains placeholders for refinement
– Like a User Story Map
Agile Information Maps
Agile Information Maps
• Agile Information Maps allow for:
– Efficient Navigation of the Data and Business
Domains
– Ability to set up ‘Neutral Zones’ for areas that
need more negotiation
– Visual communication of the topology of the
Data and Business Domains
• Easier and more accurate to validate than text
• ‘feels right’
Agile Information Maps
• Are
– Our vision
– Our Maps for the Data and Business Domains
– A guide for our solution
– Minimizes rework and refactoring
– Our Prime Directive
– Data Models
Kimball or Inmon?
Spock
• Hybrid approach
– It is only logical
– Needs of the many outweigh the needs of the
few – or the one
Spock Approach
Agile Normalized
Data Model
DM
DM
DM
ODS
DWAgile Dimensional
Data Model
Business
Domain
Spike
Spock Approach
• Business Domain Spike
• Agile Information Maps
– Agile Enterprise Normalized Data Model
– Agile Enterprise Dimensional Data Model
• Implement
– Operational Data Store
– Dimensional Data Warehouse
• Reporting can then be done from either
Business Domain Spike
• Needs to precede work on Agile Information
Maps
• Need to understand the business and
industry before you can create Data of
Business Information Maps
• Can take 1-2 weeks for an initial
understanding
– Constant refinement
Benefits of Spock Approach
• Agile Enterprise Normalized Data Model
– Validates knowledge of Data Domain
– Ensure later increments don’t uncover data
that was previously unknown and hard to
integrate
• Minimizes rework and refactoring
– True iterations
• Confirm at high level and then refine
Benefits of Spock Approach
• Agile Enterprise Dimensional Data Model
– Validates knowledge of Business Domain
– The process of ‘cooking down’ to a
Dimensional Model validates design and
identifies areas of inconsistencies or errors
• Especially true when you need to design how
changes and history will be handled
– True iterations
• Confirm at high level and then refine
Benefits of Spock Approach
• Operational Data Store
– Model data relationally to provide enterprise
level operational reports
– Consolidate and cleanse data before it is
visible to end-users
– Is used to refine the Agile Enterprise
Normalized Data Model
– Start creating reports to validate data model
immediately!
Benefits of Spock Approach
• Dimensional Data Warehouse
– Model data dimensionally to provide
enterprise level analytical reports
– Provide full historical data and context for
reports
– Is used to refine the Agile Enterprise
Dimensional Data Model
– Clients can start creating reports to validate
data model immediately!
Do we need an ODS and DW?
• Relational Analysis provides
– Validation of the Data domain
• Dimensional Analysis provides
– Validation of the Business domain
– Additional level of confirmation of the Data
domain as the relational model in translated
into a dimensional one
• Much easier for inconsistencies and errors to
hide in 300+ tables as opposed to 30+
Most Importantly..
• Operational Data Store
– Minimal Data Latency
– Current state
– Allow for efficient Operational Reporting
• Data Warehouse
– Moderate Data Latency
– Full history
– Allows for efficient Analytical Reporting
Agile Approach
• With an Agile approach you can deliver just
enough of an Operational Data Store or Data
Warehouse based on needs
– No longer do they need to be a huge deliverable
• Neither presumes a complete implementation
is required
• The Information Models allow for iterative
delivery of value
How do we work iteratively on
a Data Warehouse?
Increments versus iterations
• Increments
– Series by series – department by department
• Iterations
– Story by story – episode by episode
• Enterprise prioritization
– Work on the highest priority for the enterprise
– Not just within each series/department
Iterative Focus
• Instead of focusing on trying to have a
complete model, we focused on creating
processes that allow us to deliver changes
within 30 minutes from model to deployment
Captain, we need more Visualization!
The View Screen
The View Screen
• Enabled bridge to bridge communications
• Provided visual images in and around the
ship
– From different angles
– How did that work?
• Allowed for more understanding of the
situation
Visualization
Visualization
• Is required to:
– Report Project status
– Provide a visual report map
Kanban Board
• We used a standard Kanban board to track
stories as we worked on them
– These stories resulted in ETL, Data Model,
and Reporting tasks
– We had a Data Model/ETL board and a Report
board
– ETL and Data Model required a foundation
created by the Information Maps before we
could start on stories
• We also used thermometer imagery to report
how we were progressing according to the
schedule
– Milestones were on the thermometer along
with the number of reports that we had
completed every day
Report Visualization
Cardassian Union
Be careful how you spell that…
Data Modeling Union
• For too long the Data Modellers have not
been integrated with Software Developers
• Data Modellers have been like the
Cardassian Union, not integrated with the
Federation
Issues
• This has led to:
– Holy wars
– Each side expecting the other to follow their
schedule
– Lack of communication and collaboration
• Data Modellers need to join the ‘United
Federation of Projects’
How did we be Agile?
Tools of the trade
Tools of the Trade
• Version Control and Refactoring
• Test Automation
• Communication and Governance
• Adaptability and Change Tolerance
• Assimilation
Version Control
Version Control
• If you don’t control versions, they will control
you
• Data Models must become integrated with
the source control of the project
– In the same repository of project trunk and
branches
• You can’t just version a series of SQL files
separate from your data model
Our Version Experience
• We are using Subversion
• We are using Oracle Data Modeler as our
Modeling tool.
– It has very good integration with Subversion
– Our DBMS is SQL Server 2012
• Unlike other modeling tools, the data model
was able to be integrated in Subversion with
the rest of the project
ODM Shameless plug
• Free
• Subversion Integration
• Supports Logical and Relational data models
• Since it is free, the data models can be
shared and refined by all members of the
development team
• Currently on version 2685
How do we roll out versions?
• Create Data Model changes
• Use Red Gate SQL Compare to generate
alter script
– Generate a new DB and compare to the last
version to generate alter script
• 95% of changes deployed in less than 10
minutes
How do we roll out versions?
• We build on the Farley and Humble Blue-
Green Deployment model
– Blue – Current Version and Revision – Database
Name will be ‘ODS’
– Green – 1 Revision Old – Database Name will be
‘ODS-GREEN’
– Brown – 1 Major Version Old – Database Name
will be ‘ODS-BROWN’
Versioning
• SQL Change scripts are generated all
changes
• A full script is generated for every major
version
– A new folder is created for every major
version
– Major version folders and named after the
greek alphabet. (alpha, beta, gamma)
SQL Script version naming standards
• [revision number]-[ODS/DW]-[I/A][version number]-
[subversion revision number of corresponding Data
model].sql
– Revision number – auto-incrementing
– Version Number – A999
• Alphabetic character represents major version – corresponds
with folder named after greek alphabet
• 999 indicates minor versions
– subversion revision number of corresponding Data model – allows
for a exact synchronization between Data Model and SQL Scripts
• All objects are stored within one Subversion repository
– They all share the same revision numbering
SQL Script version naming standards
• For example:
– 0-ODS-I-A001-745.sql – initial db and table
creation for current ODS version (includes
reference data)
– 1-ODS-A-A001-1574.sql – revision 1 ODS alter
script that corresponds to data model subversion
revision 1574
– 2-ODS-A-A001-1590.sql - revision 2 ODS alter
script that corresponds to data model subversion
revision 1590
SQL Script error handling
• Validation is done to prevent
– Scripts being run out of sequence
– Revision being applied without addressing
required refactoring
– Scripts being run on any environment but the
Blue environment
But what about Refactoring?
• Having Agile Information Maps has
significantly reduced refactoring
– This was an entirely new data domain for the
team
• Using the Blue-Green-Brown deployment
model has simplified required refactoring
• We have used the methods described by
Scott Ambler on the odd occasion
Good Start
Create the plan for how you
will re-factor
Refactoring Experience
• We haven’t needed to refactor much
• Since are iteratively refining we haven’t had
to re-define much
– Just adding more detail
– Main Information Maps have held together
Test Automation
Test Automation
• Enterprise was saved due to constantly
running tests on the warp engine
• Allowed for quick decision making
Automated Test Suite
• Leveraged the tSQLt Open Source
Framework
• Purchased SQL-test from Red-Gate for a
enhanced interface
• Enhanced the framework to execute tests
from four custom tables we defined
Automated Test Suite
• Leveraged Data Mapping spreadsheet that
the automated tests used
– Two database tables were loaded from the
spreadsheet
– Two additional tables contained ETL test
cases
– 13 Stored Procedures executed the tests
– 3300+ columns mapped!
Table Tests
• TstTableCount: Compares record counts between source
data and target data.
• TstTableColumnDistinct: Compares counts on distinct values
of columns.
• TstTableColumnNull: Generates a report of all columns
where all the contents of a field is all null.
Column Tests
• TstColumnDataMapping: Compares columns directly
assigned from a source column on a field by field basis for 5-10
rows in the target table.
• TstColumnConstantMapping: Compares columns assigned a
constant on a field by field basis for 5-10 rows in the target
table.
• TstColumnNullMapping: Compares columns assigned a Null
value on a field by field basis for 5-10 rows in the target table.
• TstColumnTransformedMapping: Compares transformed
columns on a field by field basis for 5-10 rows in the target
table.
Data Quality Tests
• TstInvalidParentFKColumn: Tests that an Invalid Parent FK
value results in the records being logged and bypassed. This
record will be added to the staging table to test the process.
• TstInvalidFKColumn: Tests that an Invalid FK value results in
the value being assigned a default value or Null. This record
will be added to the staging table to test the process.
• TstInvalidColumn: Tests that an Invalid value results in the
value being assigned a default value or Null. This record will be
added to the staging table to test the process.
Process Integrity Tests
• TstRestartTask: Tests that a Task can be started from the
start and subsequent steps will run in sequence.
• TstRecoverTask: Tests that a Task can be re-started in the
middle and that record are processed correctly and subsequent
steps will run in sequence.
Interested?
• Leave me a business card and I’ll send you
the design document and stored procedures
Communication
Team Communication
• Frequent Data Model walkthroughs with
application teams
• Full access to the Data model through the
Data Modeling development tool
• Data Models posted in every room for
developers to mark up with suggestions
• Database deployment to play with for every
release
Client Communication
• Frequent Conceptual Data Model
walkthroughs with clients
– Includes presentation of scenarios with data
to confirm and validate understanding
• Collaboration on the iterative plan to ensure
they agree on the process and support it
Monthly Governance Meeting
– Visual Kan Ban boards reviewed
– Reports developed in the prior iterations were
demonstrated
– Business Areas were asked to submit a ranked
list of their top 10-20 data requirement/reports for
the next iteration.
Adaptability
Be Nimble
• Already discussed how we can roll out new
versions quickly
Change Tolerant Data Model
• Only add tables and columns when they are
absolutely required
• Leverage Data Domains so that attributes
are created consistently and can be changed
in unison
– Use limited number of standard domains
Change Tolerant Data Model
• Data Model needs to be loosely coupled and
have high cohesion
– Need to model the data and business and not
the applications or reports!
Change Tolerant Data Model
• Don’t model the data according to the
application’s Object Model
• Don’t model the data according to source
systems
• These items will change more frequently
than the actual data structure
• Your Data Model and Object Model should
be different!
Assimilate
Assimilate
• Assimilate Version Control, Communication,
Adaptability, Refinement, and Re-Factoring
into core project activities
– Stand ups
– Continuous Integration
– Check outs and Check Ins
• Make them part of the standard activities –
not something on the side
Our experience
Our Mission
• These practices and methods are being
used to redevelop an entire Business
Intelligence platform for a major ‘Blue’ Health
Benefits company
– Operational and Analytical Reports
• 100+ integration projects
• SAP Claims solution
Our Mission
• Integration projects are being run Agile
• 100+ team members across all projects
• SAP project is being run more in a more
traditional manner
– ‘big-bang’ SAP implementation
• I’m now also fulfilling the role of an Agile PMO
Our Challenge
• How can we deploy to production early and
often when the system is a ‘big-bang’
implementation
– We were ready to deploy ahead of clients and
other projects
– We were dependant on other conversion
projects
Our Challenge
• We are now exploring alternate ways to
deploy to production before the ‘big-bang’
implementation
– To allow the clients to use the reports and
iteratively refine them and the solution
– Also allows our team to validate data integrity
and quality iteratively
– We are now executing iterations to make this
possible
Our BI Solution
• SQL Server 2012
– Integration Services
– Reporting Services
• SharePoint 2010 Foundation
– SharePoint Integrated Reporting Solution
Our team
• Integrated team of
– 2 enterprise DBAs from the ‘Blue’
– 5 Data Analysts/DBAs/SSIS/SSRS developers
• Governance team comprised of
– Business Areas
– Systems Areas
– Stakeholders
Current Stardate
• We have completed the initial ODS and DW
development – including ETL
• We have completed a significant revision of
ODS, DW, and ETL – without major issues
• We are now finishing Report development –
reports have required database changes and
ETL changes – but no major changes!
– 300+ reports developed
Summary
• Use Agile Enterprise Data Models to provide
the initial vision and allow for refinements
• Strive for Iterations over Increments
• Align governance and prioritization with
iterations
• Plan and Integrate processes for Versioning,
Test Automation, Adaptability, Refinement
What doesn’t change?
Leadership
Leadership
• “If you want to build a ship, don't drum up
people together to collect wood and don't
assign them tasks and work, but rather teach
them to long for the endless immensity of the
sea.” ~ Antoine de Saint-Exupery
Leadership
• “[A goalie's] job is to stop pucks, ... Well, yeah, that's
part of it. But you know what else it is? ... You're
trying to deliver a message to your team that things
are OK back here. This end of the ice is pretty well
cared for. You take it now and go. Go! Feel the
freedom you need in order to be that dynamic,
creative, offensive player and go out and score. ...
That was my job. And it was to try to deliver a
feeling.” ~ Ken Dryden
Three awesome books
The final frontier v3

Weitere ähnliche Inhalte

Was ist angesagt?

Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkKeys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkSenturus
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkkguest4e975e2
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Erik Fransen
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Mike Frampton
 
Role of Tableau on the Data Discovery Market
Role of Tableau on the Data Discovery MarketRole of Tableau on the Data Discovery Market
Role of Tableau on the Data Discovery MarketDmitry Anoshin
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellHPDutchWorld
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence ArchitecturePhilippe Julio
 
Data modelingzone geoffrey-clark-v2
Data modelingzone geoffrey-clark-v2Data modelingzone geoffrey-clark-v2
Data modelingzone geoffrey-clark-v2Geoffrey Clark
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-AshishGuleria
 
Data Warehouse Methodology
Data Warehouse MethodologyData Warehouse Methodology
Data Warehouse MethodologySQL Power
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligencePrithwis Mukerjee
 
Agile data warehouse
Agile data warehouseAgile data warehouse
Agile data warehouseDao Vo
 
Horizons 2014 - Enterprise Solutions
Horizons 2014 - Enterprise SolutionsHorizons 2014 - Enterprise Solutions
Horizons 2014 - Enterprise SolutionsKeyMark
 
Delivering fast, powerful and scalable analytics #OPEN18
Delivering fast, powerful and scalable analytics #OPEN18Delivering fast, powerful and scalable analytics #OPEN18
Delivering fast, powerful and scalable analytics #OPEN18Kangaroot
 
Douglas Briggs
Douglas BriggsDouglas Briggs
Douglas BriggsdaveGBE
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OnePanchaleswar Nayak
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewNagaraj Yerram
 
Best Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse QuicklyBest Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse QuicklyWhereScape
 

Was ist angesagt? (20)

Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkKeys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkk
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2
 
Role of Tableau on the Data Discovery Market
Role of Tableau on the Data Discovery MarketRole of Tableau on the Data Discovery Market
Role of Tableau on the Data Discovery Market
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan Hartwell
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Data modelingzone geoffrey-clark-v2
Data modelingzone geoffrey-clark-v2Data modelingzone geoffrey-clark-v2
Data modelingzone geoffrey-clark-v2
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
 
Data Warehouse Methodology
Data Warehouse MethodologyData Warehouse Methodology
Data Warehouse Methodology
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
Agile data warehouse
Agile data warehouseAgile data warehouse
Agile data warehouse
 
Horizons 2014 - Enterprise Solutions
Horizons 2014 - Enterprise SolutionsHorizons 2014 - Enterprise Solutions
Horizons 2014 - Enterprise Solutions
 
Delivering fast, powerful and scalable analytics #OPEN18
Delivering fast, powerful and scalable analytics #OPEN18Delivering fast, powerful and scalable analytics #OPEN18
Delivering fast, powerful and scalable analytics #OPEN18
 
Douglas Briggs
Douglas BriggsDouglas Briggs
Douglas Briggs
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
Best Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse QuicklyBest Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse Quickly
 

Ähnlich wie The final frontier v3

Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Terry Bunio
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsTerry Bunio
 
Lesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxLesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxcalf_ville86
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AXAlvin You
 
Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primerTerry Bunio
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...Fabio Fumarola
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Metadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approachMetadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approachRoland Bullivant
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)tafosepsdfasg
 

Ähnlich wie The final frontier v3 (20)

Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling Topics
 
Lesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxLesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptx
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AX
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primer
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Business analysis
Business analysisBusiness analysis
Business analysis
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
Metadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approachMetadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approach
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 

Mehr von Terry Bunio

Uof m empathys role
Uof m empathys roleUof m empathys role
Uof m empathys roleTerry Bunio
 
Data modeling tips from the trenches
Data modeling tips from the trenchesData modeling tips from the trenches
Data modeling tips from the trenchesTerry Bunio
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourceTerry Bunio
 
Ssrs and sharepoint there and back again - SQL SAT Fargo
Ssrs and sharepoint   there and back again - SQL SAT FargoSsrs and sharepoint   there and back again - SQL SAT Fargo
Ssrs and sharepoint there and back again - SQL SAT FargoTerry Bunio
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonTerry Bunio
 
SSRS and Sharepoint there and back again
SSRS and Sharepoint   there and back againSSRS and Sharepoint   there and back again
SSRS and Sharepoint there and back againTerry Bunio
 
Role of an agile pm
Role of an agile pmRole of an agile pm
Role of an agile pmTerry Bunio
 
Introduction to lean and agile
Introduction to lean and agileIntroduction to lean and agile
Introduction to lean and agileTerry Bunio
 
Pmi june 5th 2007
Pmi june 5th 2007Pmi june 5th 2007
Pmi june 5th 2007Terry Bunio
 
Pmi sac november 20
Pmi sac november 20Pmi sac november 20
Pmi sac november 20Terry Bunio
 
Iiba.november.09
Iiba.november.09Iiba.november.09
Iiba.november.09Terry Bunio
 
Sdec11 when user stories are not enough
Sdec11 when user stories are not enoughSdec11 when user stories are not enough
Sdec11 when user stories are not enoughTerry Bunio
 
Sdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysSdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysTerry Bunio
 
Sdec10 lean package implementation
Sdec10 lean package implementationSdec10 lean package implementation
Sdec10 lean package implementationTerry Bunio
 
Role of an agile Project Manager
Role of an agile Project ManagerRole of an agile Project Manager
Role of an agile Project ManagerTerry Bunio
 

Mehr von Terry Bunio (20)

Uof m empathys role
Uof m empathys roleUof m empathys role
Uof m empathys role
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Data modeling tips from the trenches
Data modeling tips from the trenchesData modeling tips from the trenches
Data modeling tips from the trenches
 
#YesEstimates
#YesEstimates#YesEstimates
#YesEstimates
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Ssrs and sharepoint there and back again - SQL SAT Fargo
Ssrs and sharepoint   there and back again - SQL SAT FargoSsrs and sharepoint   there and back again - SQL SAT Fargo
Ssrs and sharepoint there and back again - SQL SAT Fargo
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
 
SSRS and Sharepoint there and back again
SSRS and Sharepoint   there and back againSSRS and Sharepoint   there and back again
SSRS and Sharepoint there and back again
 
Role of an agile pm
Role of an agile pmRole of an agile pm
Role of an agile pm
 
Estimating 101
Estimating 101Estimating 101
Estimating 101
 
Introduction to lean and agile
Introduction to lean and agileIntroduction to lean and agile
Introduction to lean and agile
 
Pmi june 5th 2007
Pmi june 5th 2007Pmi june 5th 2007
Pmi june 5th 2007
 
Pmi sac november 20
Pmi sac november 20Pmi sac november 20
Pmi sac november 20
 
Iiba.november.09
Iiba.november.09Iiba.november.09
Iiba.november.09
 
Sdec11 when user stories are not enough
Sdec11 when user stories are not enoughSdec11 when user stories are not enough
Sdec11 when user stories are not enough
 
Sdec10 lean AMS
Sdec10 lean AMSSdec10 lean AMS
Sdec10 lean AMS
 
Sdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysSdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92days
 
Sdec10 lean package implementation
Sdec10 lean package implementationSdec10 lean package implementation
Sdec10 lean package implementation
 
Role of an agile Project Manager
Role of an agile Project ManagerRole of an agile Project Manager
Role of an agile Project Manager
 

The final frontier v3

  • 1. Agile Data Warehouse The Final Frontier
  • 3. Who Am I? • Data Base Administrator – Oracle, SQL Server, ADABAS • Data Architect – Investors Group, LPL Financial, Manitoba Blue Cross, Assante Financial • Agilist – Innovation Gamer, Team Member, Project Manager, PMO on SAP Implementation
  • 4.
  • 5.
  • 6.
  • 7. Learning Objectives • Learn about how a Data Warehouse Project can be Agile • Introduce Agile practices that can help to be DWAgile • Introduce DW practices that can help to be DWAgile
  • 8. What is Agile? • Deliver frequently as possible • Minimize Inventory – All work that doesn’t directly contribute to delivering value to the client – Typically value is realized by code
  • 9. Enterpise Models Spock Method Visualization Spectre of the Agility Database/Data Warehouse Architecture DWAgile Practices
  • 10. Data Warehouse • Definition – “a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from multiple disparate sources. Data warehouses store current as well as historical data and are commonly used for creating trending reports for senior management reporting such as annual and quarterly comparisons.” – Wikipedia.org
  • 11. Data Warehouse • Can refer to: – Reporting Databases – Operational Data Stores – Data Marts – Enterprise Data Warehouse – Cubes – Excel? – Others
  • 12. Two sides of Database Design
  • 13. Two design methods • Relational – “Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.”.”
  • 14. Two design methods • Dimensional – “Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts
  • 15. Relational • Relational Analysis – Database design is usually in Third Normal Form – Database is optimized for transaction processing. (OLTP) – Normalized tables are optimized for modification rather than retrieval
  • 16. Normal forms • 1st - Under first normal form, all occurrences of a record type must contain the same number of fields. • 2nd - Second normal form is violated when a non- key field is a fact about a subset of a key. It is only relevant when the key is composite • 3rd - Third normal form is violated when a non-key field is a fact about another non-key field Source: William Kent - 1982
  • 17. Dimensional • Dimensional Analysis – Star Schema/Snowflake – Database is optimized for analytical processing. (OLAP) – Facts and Dimensions optimized for retrieval • Facts – Business events – Transactions • Dimensions – context for Transactions – Accounts – Products – Date
  • 20.
  • 21. Kimball-lytes • Bottom-up - incremental – Operational systems feed the Data Warehouse – Data Warehouse is a corporate dimensional model that Data Marts are sourced from – Data Warehouse is the consolidation of Data Marts – Sometimes the Data Warehouse is generated from Subject area Data Marts
  • 22. Inmon-ians • Top-down – Corporate Information Factory – Operational systems feed the Data Warehouse – Enterprise Data Warehouse is a corporate relational model that Data Marts are sourced from – Enterprise Data Warehouse is the source of Data Marts
  • 23. The gist… • Kimball’s approach is easier to implement as you are dealing with separate subject areas, but can be a nightmare to integrate • Inmon’s approach has more upfront effort to avoid these consistency problems, but takes longer to implement.
  • 24. Spectre of the Agility
  • 25. Incremental - Kimball •In Segments •Detailed Analysis •Development •Deploy •Long Feedback loop •Considerable changes •Rework •Defects Waterfall - Inmon •Detailed Analysis •Large Development •Large Deploy •Long Feedback loop •Extensive changes •Many Defects Data Warehouse Project
  • 26.
  • 27. Popular Agile Data Warehouse Pattern • Son’a method – Analyze data requirements department by department – Create Reports and Facts and Dimensions for each – Integrate when you do subsequent departments
  • 28. The two problems • Conforming Dimensions – A Dimension conforms when it is in equivalent structure and content – Is a client defined by Marketing the same as Finance? • Probably not – If the Dimensions do not conform, this severely hampers the Data Warehouse
  • 29. The two problems • Modeling the use of the data versus the data – By using reporting needs as the primary foundation for the data model, you are modeling the use of the data rather than the data – This will cause more rework in the future as the use of the data is more likely to change than the data itself.
  • 31. Where is the true Agility? • Iterations not Increments • Brutal Visibility/Visualization • Short Feedback loops • Just enough requirements • Working on enterprise priorities – not just for an individual department
  • 32. Fact • True iterative development on a Data Warehouse project is hard – perhaps harder than a traditional Software Development project – ETL, Data Models, and Business Intelligence stories can have a high impact on other stories – Can be difficult to create independent stories – Stories can have many prerequisites
  • 33. Fiction • True iterative development on a Data Warehouse project is impossible – ETL, Data Models, and Business Intelligence stories can be developed iteratively – Independent stories can be developed – Stories can have many prerequisites – but this can be limited
  • 34. Agile Mindset • We need to implement an Agile Mindset to Data Modelling – What is just enough Data Modelling? – And do no more…
  • 35. Our Mission • “Data... the Final Frontier. These are the continuing voyages of the starship Agile. Her on-going mission: to explore strange new projects, to seek out new value and new clients, to iteratively go where no projects have gone before.”
  • 37.
  • 38.
  • 39. The Prime Directive • Is a vision or philosophy that binds the actions of Starfleet • Can an Data Warehouse project truly be Agile without a Vision of either the Business Domain or Data Domain? – Essentially it is then just an Ad Hoc Data Warehouse. Separate components that may fit together. – How do we ensure we are working on the right priorities for the entire enterprise?
  • 41. Torture • Why does the creation of Enterprise Data Models feel like torture? – Interrogation – Coercion – Agreement on Excessive detail without direct alignment to business value
  • 45. Agile Enterprise Normalized Data Model • Confirms the major entities and the relationships between them – 30-50 entities • Confirms the Data Domain • Starts the definition of a Normalized Data Model that will be refined over time – Completed in 1 – 4 weeks
  • 46. Agile Enterprise Normalized Data Model • Is just enough to understand the data domain so that the iterations can proceed • Is not mapping all the attributes – Is not BDUF • Is an Information Map for the Data Domain • Contains placeholders for refinement – Like a User Story Map
  • 47. Agile Enterprise Dimensional Data Model • Confirms the Business Objects and the relationships between them – 10-15 entities • Confirms the Business Domains • Starts the definition of a Dimensional Data Model that will be refined over time – Completed in 1 – 2 weeks
  • 48. Agile Enterprise Dimensional Data Model • Is just enough to understand the business domain so that the iterations can proceed – And to validate the understanding of the data domain • Is not mapping all the attributes – Is not BDUF • Is an Information Map for the Business Domain • Contains placeholders for refinement – Like a User Story Map
  • 50. Agile Information Maps • Agile Information Maps allow for: – Efficient Navigation of the Data and Business Domains – Ability to set up ‘Neutral Zones’ for areas that need more negotiation – Visual communication of the topology of the Data and Business Domains • Easier and more accurate to validate than text • ‘feels right’
  • 51. Agile Information Maps • Are – Our vision – Our Maps for the Data and Business Domains – A guide for our solution – Minimizes rework and refactoring – Our Prime Directive – Data Models
  • 53. Spock • Hybrid approach – It is only logical – Needs of the many outweigh the needs of the few – or the one
  • 54. Spock Approach Agile Normalized Data Model DM DM DM ODS DWAgile Dimensional Data Model Business Domain Spike
  • 55. Spock Approach • Business Domain Spike • Agile Information Maps – Agile Enterprise Normalized Data Model – Agile Enterprise Dimensional Data Model • Implement – Operational Data Store – Dimensional Data Warehouse • Reporting can then be done from either
  • 56. Business Domain Spike • Needs to precede work on Agile Information Maps • Need to understand the business and industry before you can create Data of Business Information Maps • Can take 1-2 weeks for an initial understanding – Constant refinement
  • 57. Benefits of Spock Approach • Agile Enterprise Normalized Data Model – Validates knowledge of Data Domain – Ensure later increments don’t uncover data that was previously unknown and hard to integrate • Minimizes rework and refactoring – True iterations • Confirm at high level and then refine
  • 58. Benefits of Spock Approach • Agile Enterprise Dimensional Data Model – Validates knowledge of Business Domain – The process of ‘cooking down’ to a Dimensional Model validates design and identifies areas of inconsistencies or errors • Especially true when you need to design how changes and history will be handled – True iterations • Confirm at high level and then refine
  • 59. Benefits of Spock Approach • Operational Data Store – Model data relationally to provide enterprise level operational reports – Consolidate and cleanse data before it is visible to end-users – Is used to refine the Agile Enterprise Normalized Data Model – Start creating reports to validate data model immediately!
  • 60. Benefits of Spock Approach • Dimensional Data Warehouse – Model data dimensionally to provide enterprise level analytical reports – Provide full historical data and context for reports – Is used to refine the Agile Enterprise Dimensional Data Model – Clients can start creating reports to validate data model immediately!
  • 61. Do we need an ODS and DW? • Relational Analysis provides – Validation of the Data domain • Dimensional Analysis provides – Validation of the Business domain – Additional level of confirmation of the Data domain as the relational model in translated into a dimensional one • Much easier for inconsistencies and errors to hide in 300+ tables as opposed to 30+
  • 62. Most Importantly.. • Operational Data Store – Minimal Data Latency – Current state – Allow for efficient Operational Reporting • Data Warehouse – Moderate Data Latency – Full history – Allows for efficient Analytical Reporting
  • 63. Agile Approach • With an Agile approach you can deliver just enough of an Operational Data Store or Data Warehouse based on needs – No longer do they need to be a huge deliverable • Neither presumes a complete implementation is required • The Information Models allow for iterative delivery of value
  • 64. How do we work iteratively on a Data Warehouse?
  • 65. Increments versus iterations • Increments – Series by series – department by department • Iterations – Story by story – episode by episode • Enterprise prioritization – Work on the highest priority for the enterprise – Not just within each series/department
  • 66.
  • 67. Iterative Focus • Instead of focusing on trying to have a complete model, we focused on creating processes that allow us to deliver changes within 30 minutes from model to deployment
  • 68. Captain, we need more Visualization!
  • 70. The View Screen • Enabled bridge to bridge communications • Provided visual images in and around the ship – From different angles – How did that work? • Allowed for more understanding of the situation
  • 72. Visualization • Is required to: – Report Project status – Provide a visual report map
  • 73. Kanban Board • We used a standard Kanban board to track stories as we worked on them – These stories resulted in ETL, Data Model, and Reporting tasks – We had a Data Model/ETL board and a Report board – ETL and Data Model required a foundation created by the Information Maps before we could start on stories
  • 74. • We also used thermometer imagery to report how we were progressing according to the schedule – Milestones were on the thermometer along with the number of reports that we had completed every day Report Visualization
  • 75.
  • 77. Be careful how you spell that…
  • 78. Data Modeling Union • For too long the Data Modellers have not been integrated with Software Developers • Data Modellers have been like the Cardassian Union, not integrated with the Federation
  • 79. Issues • This has led to: – Holy wars – Each side expecting the other to follow their schedule – Lack of communication and collaboration • Data Modellers need to join the ‘United Federation of Projects’
  • 80. How did we be Agile?
  • 81. Tools of the trade
  • 82. Tools of the Trade • Version Control and Refactoring • Test Automation • Communication and Governance • Adaptability and Change Tolerance • Assimilation
  • 84. Version Control • If you don’t control versions, they will control you • Data Models must become integrated with the source control of the project – In the same repository of project trunk and branches • You can’t just version a series of SQL files separate from your data model
  • 85. Our Version Experience • We are using Subversion • We are using Oracle Data Modeler as our Modeling tool. – It has very good integration with Subversion – Our DBMS is SQL Server 2012 • Unlike other modeling tools, the data model was able to be integrated in Subversion with the rest of the project
  • 86. ODM Shameless plug • Free • Subversion Integration • Supports Logical and Relational data models • Since it is free, the data models can be shared and refined by all members of the development team • Currently on version 2685
  • 87. How do we roll out versions? • Create Data Model changes • Use Red Gate SQL Compare to generate alter script – Generate a new DB and compare to the last version to generate alter script • 95% of changes deployed in less than 10 minutes
  • 88. How do we roll out versions? • We build on the Farley and Humble Blue- Green Deployment model – Blue – Current Version and Revision – Database Name will be ‘ODS’ – Green – 1 Revision Old – Database Name will be ‘ODS-GREEN’ – Brown – 1 Major Version Old – Database Name will be ‘ODS-BROWN’
  • 89. Versioning • SQL Change scripts are generated all changes • A full script is generated for every major version – A new folder is created for every major version – Major version folders and named after the greek alphabet. (alpha, beta, gamma)
  • 90. SQL Script version naming standards • [revision number]-[ODS/DW]-[I/A][version number]- [subversion revision number of corresponding Data model].sql – Revision number – auto-incrementing – Version Number – A999 • Alphabetic character represents major version – corresponds with folder named after greek alphabet • 999 indicates minor versions – subversion revision number of corresponding Data model – allows for a exact synchronization between Data Model and SQL Scripts • All objects are stored within one Subversion repository – They all share the same revision numbering
  • 91. SQL Script version naming standards • For example: – 0-ODS-I-A001-745.sql – initial db and table creation for current ODS version (includes reference data) – 1-ODS-A-A001-1574.sql – revision 1 ODS alter script that corresponds to data model subversion revision 1574 – 2-ODS-A-A001-1590.sql - revision 2 ODS alter script that corresponds to data model subversion revision 1590
  • 92. SQL Script error handling • Validation is done to prevent – Scripts being run out of sequence – Revision being applied without addressing required refactoring – Scripts being run on any environment but the Blue environment
  • 93. But what about Refactoring? • Having Agile Information Maps has significantly reduced refactoring – This was an entirely new data domain for the team • Using the Blue-Green-Brown deployment model has simplified required refactoring • We have used the methods described by Scott Ambler on the odd occasion
  • 95. Create the plan for how you will re-factor
  • 96. Refactoring Experience • We haven’t needed to refactor much • Since are iteratively refining we haven’t had to re-define much – Just adding more detail – Main Information Maps have held together
  • 98. Test Automation • Enterprise was saved due to constantly running tests on the warp engine • Allowed for quick decision making
  • 99. Automated Test Suite • Leveraged the tSQLt Open Source Framework • Purchased SQL-test from Red-Gate for a enhanced interface • Enhanced the framework to execute tests from four custom tables we defined
  • 100. Automated Test Suite • Leveraged Data Mapping spreadsheet that the automated tests used – Two database tables were loaded from the spreadsheet – Two additional tables contained ETL test cases – 13 Stored Procedures executed the tests – 3300+ columns mapped!
  • 101. Table Tests • TstTableCount: Compares record counts between source data and target data. • TstTableColumnDistinct: Compares counts on distinct values of columns. • TstTableColumnNull: Generates a report of all columns where all the contents of a field is all null.
  • 102. Column Tests • TstColumnDataMapping: Compares columns directly assigned from a source column on a field by field basis for 5-10 rows in the target table. • TstColumnConstantMapping: Compares columns assigned a constant on a field by field basis for 5-10 rows in the target table. • TstColumnNullMapping: Compares columns assigned a Null value on a field by field basis for 5-10 rows in the target table. • TstColumnTransformedMapping: Compares transformed columns on a field by field basis for 5-10 rows in the target table.
  • 103. Data Quality Tests • TstInvalidParentFKColumn: Tests that an Invalid Parent FK value results in the records being logged and bypassed. This record will be added to the staging table to test the process. • TstInvalidFKColumn: Tests that an Invalid FK value results in the value being assigned a default value or Null. This record will be added to the staging table to test the process. • TstInvalidColumn: Tests that an Invalid value results in the value being assigned a default value or Null. This record will be added to the staging table to test the process.
  • 104. Process Integrity Tests • TstRestartTask: Tests that a Task can be started from the start and subsequent steps will run in sequence. • TstRecoverTask: Tests that a Task can be re-started in the middle and that record are processed correctly and subsequent steps will run in sequence.
  • 105. Interested? • Leave me a business card and I’ll send you the design document and stored procedures
  • 107. Team Communication • Frequent Data Model walkthroughs with application teams • Full access to the Data model through the Data Modeling development tool • Data Models posted in every room for developers to mark up with suggestions • Database deployment to play with for every release
  • 108. Client Communication • Frequent Conceptual Data Model walkthroughs with clients – Includes presentation of scenarios with data to confirm and validate understanding • Collaboration on the iterative plan to ensure they agree on the process and support it
  • 109. Monthly Governance Meeting – Visual Kan Ban boards reviewed – Reports developed in the prior iterations were demonstrated – Business Areas were asked to submit a ranked list of their top 10-20 data requirement/reports for the next iteration.
  • 111. Be Nimble • Already discussed how we can roll out new versions quickly
  • 112. Change Tolerant Data Model • Only add tables and columns when they are absolutely required • Leverage Data Domains so that attributes are created consistently and can be changed in unison – Use limited number of standard domains
  • 113. Change Tolerant Data Model • Data Model needs to be loosely coupled and have high cohesion – Need to model the data and business and not the applications or reports!
  • 114. Change Tolerant Data Model • Don’t model the data according to the application’s Object Model • Don’t model the data according to source systems • These items will change more frequently than the actual data structure • Your Data Model and Object Model should be different!
  • 116. Assimilate • Assimilate Version Control, Communication, Adaptability, Refinement, and Re-Factoring into core project activities – Stand ups – Continuous Integration – Check outs and Check Ins • Make them part of the standard activities – not something on the side
  • 118. Our Mission • These practices and methods are being used to redevelop an entire Business Intelligence platform for a major ‘Blue’ Health Benefits company – Operational and Analytical Reports • 100+ integration projects • SAP Claims solution
  • 119. Our Mission • Integration projects are being run Agile • 100+ team members across all projects • SAP project is being run more in a more traditional manner – ‘big-bang’ SAP implementation • I’m now also fulfilling the role of an Agile PMO
  • 120. Our Challenge • How can we deploy to production early and often when the system is a ‘big-bang’ implementation – We were ready to deploy ahead of clients and other projects – We were dependant on other conversion projects
  • 121. Our Challenge • We are now exploring alternate ways to deploy to production before the ‘big-bang’ implementation – To allow the clients to use the reports and iteratively refine them and the solution – Also allows our team to validate data integrity and quality iteratively – We are now executing iterations to make this possible
  • 122. Our BI Solution • SQL Server 2012 – Integration Services – Reporting Services • SharePoint 2010 Foundation – SharePoint Integrated Reporting Solution
  • 123. Our team • Integrated team of – 2 enterprise DBAs from the ‘Blue’ – 5 Data Analysts/DBAs/SSIS/SSRS developers • Governance team comprised of – Business Areas – Systems Areas – Stakeholders
  • 124. Current Stardate • We have completed the initial ODS and DW development – including ETL • We have completed a significant revision of ODS, DW, and ETL – without major issues • We are now finishing Report development – reports have required database changes and ETL changes – but no major changes! – 300+ reports developed
  • 125. Summary • Use Agile Enterprise Data Models to provide the initial vision and allow for refinements • Strive for Iterations over Increments • Align governance and prioritization with iterations • Plan and Integrate processes for Versioning, Test Automation, Adaptability, Refinement
  • 128. Leadership • “If you want to build a ship, don't drum up people together to collect wood and don't assign them tasks and work, but rather teach them to long for the endless immensity of the sea.” ~ Antoine de Saint-Exupery
  • 129. Leadership • “[A goalie's] job is to stop pucks, ... Well, yeah, that's part of it. But you know what else it is? ... You're trying to deliver a message to your team that things are OK back here. This end of the ice is pretty well cared for. You take it now and go. Go! Feel the freedom you need in order to be that dynamic, creative, offensive player and go out and score. ... That was my job. And it was to try to deliver a feeling.” ~ Ken Dryden