5. Put your energy into the constraint
Top 5 constraints in IT
1. Dev environments setup
2. QA setup
3. Code Architecture
4. Development
5. Product management
- Gene Kim Surveyed
• 14000 companies
• 100s of CIOs
7. 7
Automation
Jenkins Team City Travis
Data
Virtualizatio
n
Configurati
on Chef Puppet Ansible Vagrant
Compute
Virtualizatio
n
?
Vmware OpenStack Docker
10. Data Management not Agile
10
20% SDLC time lost waiting for data
60% dev/QA time consumed by data-related task
data management does not scale to
Agile
- Infosys & Compuware
11. Data is the constraint
60% Projects Over Schedule
85% delayed waiting for data
Data is the Constraint
CIO Magazine Survey:
only getting worse
Gartner: Data Doomsday, by 2017 1/3rd IT in crisis
45. QA : Virtual Data
• Fast
• Parallel
• A/B testing
46. Physical Data : late stage bugs
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
Cost
To
Correct
Dev QA UAT
Software
Engineering
Economics
– Barry Boehm (198
Production
0
50
100
150
200
250
300
350
400
450
500
Dev Testing UAT Production
Bugs Discovered Legacy
47. Physical Data : find bugs fast
Dev QA UAT Production
Dev Testing UAT Production
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
Cost
To
Correct
48. The Impact: Shift Left in Quality
0
50
100
150
200
250
300
350
400
450
500
Dev Testing UAT Production
Bugs Discovered Legacy
Dev Testing UAT Production
With Delphix
51. Virtual Data : Fast Refresh
20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST
• Fast
• Full
• Fresh
• Efficient
8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs 8 Hrs
20 MIN
TEST
67. Production Time Flow
Virtual Data: Audit
1/27/2016 67
Instance
Prod
DVA
Live Archive
Live Archive data for years
• Archive EBS R11 before upgrade to R12
• Sarbanes-Oxley
• Dodd-Frank
• Financial Stress tests
72. “I looked like a hero”
Tony Young, CIO Informatica
Virtual Data: Federated
73. 1.Development & QA
– Dev throughput increase by 2x
2. Production Support
– 30 days in size of source
3. Business Continuity
– 24x7 ETL & federated cloning
Use Case Summary
74. 74
Automation
Jenkins Team City Travis
Data
Virtualizatio
n
Configurati
on Chef Puppet Ansible
Compute
Virtualizatio
n Vmware OpenStack Docker
? ? ? ?
75. 75
Automation
Jenkins Team City Travis
Data
Virtualizatio
n
Configurati
on Chef Puppet Ansible
Compute
Virtualizatio
n Vmware OpenStack Docker
76. • Projects “12 months to 6 months.”
– New York Life
• Insurance product “about 50 days ... to about 23 days”
– Presbyterian Health
• “Can't imagine working without it”
– State of California
Virtual Data Quotes
77. • Problem: Data constraint
• Solution: Data Virtualization
Summary
Talking mainly about Delphix
What IT tasks have the most impact on company performance
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.
Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“
One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it
Eliyahu Goldratt
IT bottlenecks
Setting Priorities
Company Goals
Defining Metrics
Fast Iterations
IT version of
“The Goal”
by E. Goldratt
“One of the most powerful things that organizations
can do is to enable development and testing to get
environment they need when they need it“
Not enough resources
Contention on shared environments
Lack of enough environments
Late stage bug discovery
Faulty Data leading to bugs
Subsets
Synthetic data
Old data
Slow environment builds
Delays
Developers waiting
QA slow and expensive
Get the right data
To the right people
At the right time
Not enough resources
Contention on shared environments
Lack of enough environments
Late stage bug discovery
Faulty Data leading to bugs
Subsets
Synthetic data
Old data
Slow environment builds
Delays
Developers waiting
QA slow and expensive
Not sure if you’ve run into this but I have personally experience the following
When I was talking to one group at Ebay, in that development group they
Shared a single copy of the production database between the developers on that team.
What this sharing of a single copy of production meant, is that whenever a
Developer wanted to modified that database, they had to submit their changes to code
Review and that code review took 1 to 2 weeks.
I don’t know about you, but that kind of delay would stifle my motivation
And I have direct experience with the kind of disgruntlement it can cause.
When I was last a DBA, all schema changes went through me.
It took me about half a day to process schema changes. That delay was too much so it was unilaterally decided by
They developers to go to an EAV schema. Or entity attribute value schema
Which mean that developers could add new fields without consulting me and without stepping on each others feat.
It also mean that SQL code as unreadable and performance was atrocious.
Besides creating developer frustration, sharing a database
also makes refreshing the data difficult as it takes a while to refresh the full copy
And it takes even longer to coordinate a time when everyone stops using the copy to make the refresh
All this means is that the copy rarely gets refreshed and the data gets old and unreliable
Not sure if you’ve run into this but I have personally experience the following
When I was talking to one group at Ebay, in that development group they
Shared a single copy of the production database between the developers on that team.
What this sharing of a single copy of production meant, is that whenever a
Developer wanted to modified that database, they had to submit their changes to code
Review and that code review took 1 to 2 weeks.
I don’t know about you, but that kind of delay would stifle my motivation
And I have direct experience with the kind of disgruntlement it can cause.
When I was last a DBA, all schema changes went through me.
It took me about half a day to process schema changes. That delay was too much so it was unilaterally decided by
They developers to go to an EAV schema. Or entity attribute value schema
Which mean that developers could add new fields without consulting me and without stepping on each others feat.
It also mean that SQL code as unreadable and performance was atrocious.
Besides creating developer frustration, sharing a database
also makes refreshing the data difficult as it takes a while to refresh the full copy
And it takes even longer to coordinate a time when everyone stops using the copy to make the refresh
All this means is that the copy rarely gets refreshed and the data gets old and unreliable
KLA Tencore
Stateado
To circumvent the problems of sharing a single copy of production
Many shops we talk to create subsets.
One company we talked to , spends 50% of time copying databases
have to subset because not enough storage
subsetting process constantly needs fixing modification
Now What happens when developers use subsets -- ****** -----
We talked to Presbyterian Healthcare
And they told us that they spend 96% of their QA cycle time building the QA environment
And only 4% actually running the QA suite
This happens for every QA suite
meaning
For every dollar spent on QA there was only 4 cents of actual QA value
And that 96% cost is infrastructure time and overhead
What happens now in the industry
Typically the application development life cycle is something like this
We have some production database with production applications running on top of the database
And we have developers either customizing that application or writing new functionality for the application
We need copies of that data to make sure our code runs correctly when it gets to production develop and
We have teams of people, DBAs, sys admins, storage admins, etc making these copies
It’s slow work to copy all this data
It’s tedious work
All the while we have developers and QA testers waiting for these copies
Internet vs browser
Automate or die – the revolution will be automated
The worst enemy of companies today is thinking that they have the best processes that exist, that their IT organizations are using the latest and greatest technology and nothing better exists in the field. This mentality will be the undermining of many companies.
http://www.kylehailey.com/automate-or-die-the-revolution-will-be-automated/
Data IS the constraint
Business skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not virtual. The belief that there is no agility problem is part of the problem.
http://www.kylehailey.com/data-is-the-constraint/
Due to the constraints of building clone copy database environments one ends up in the “culture of no”
Where developers stop asking for a copy of a production database because the answer is “no”
If the developers need to debug an anomaly seen on production or if they need to write a custom module which requires a copy of production they know not to even ask and just give up.
Everyone Standup
Sit down if your QA data sets are less than a
week old
Month old
6 months
Year
2 years
How long does it take a developer to get a copy of a database
Time: how long to get or make a DB copy?
Dev?
QA?
DBA?
Old: How old is data ?
BI ,DW
QA ,Dev
Storage : How much storage used?
Analysts: batch job windows, lock out periods?
Audits : can you support “?
Fastest query is the query not run
Delphix radically changes this paradym
Delphix is software that we provide as
a virtual machine OVA file that you spin up on any commodity intel hardware
You give us any storage
Delphix maps it’s own proprietary file system on to the storage
We have a web UI
With the web UI you can point us to any database or data source such as
Oracle, SQL Server, Sybase, Postgres, flatfiles etc
At link time we take one full copy.
We only do it once and never again
We compress the data so
If the data is 3TB on source it will be
1TB on Delphix
From then and forever we just pull in the changed blocks
With the changed blocks Delphix builds up a timeline of data versions
The default window is 2 weeks but you can configure it to be 2 months or 2 years
You can spin up a copy of the data down to the second at any point in time in the time window
Now with a few clicks of a mouse and in a few minutes we can spin up copies on
Developer machines, QA machines, UAT etc
When we make copies there is no data being moved
We just point the copies to data that already exists on Delphix
There is no data on the target machines
All the data is on Delphix
Delphix looks like a NAS or NFS file server to the target machines
We give them a read writeable point in time snapshot o the data
We also track all the block changes on the virtual databases
With the block change tracking on the virutal database we can do cool thigs links
Roll them back, branch them, version them, share them, book mark the data
All this is super simple to run
Delphix can generally be be run by a junior DBA in quarter time
The coolest thing, especially for DevOps process, is self server interface for developers and testers
Where they can refresh data from production
Roll back changes
Bookmark and share data between dev and QA
We can treat data the way we treat code
In the physical database world, 3 clones take up 3x the storage.
In the virtual world 3 clones take up 1/3 the storage thanks to block sharing and compression
What happens now in the industry
Typically the application development life cycle is something like this
We have some production database with production applications running on top of the database
And we have developers either customizing that application or writing new functionality for the application
We need copies of that data to make sure our code runs correctly when it gets to production develop and
We have teams of people, DBAs, sys admins, storage admins, etc making these copies
It’s slow work to copy all this data
It’s tedious work
All the while we have developers and QA testers waiting for these copies
Delphix radically changes this paradym
Delphix is software that we provide as
a virtual machine OVA file that you spin up on any commodity intel hardware
You give us any storage
Delphix maps it’s own proprietary file system on to the storage
We have a web UI
With the web UI you can point us to any database or data source such as
Oracle, SQL Server, Sybase, Postgres, flatfiles etc
At link time we take one full copy.
We only do it once and never again
We compress the data so
If the data is 3TB on source it will be
1TB on Delphix
From then and forever we just pull in the changed blocks
With the changed blocks Delphix builds up a timeline of data versions
The default window is 2 weeks but you can configure it to be 2 months or 2 years
You can spin up a copy of the data down to the second at any point in time in the time window
Now with a few clicks of a mouse and in a few minutes we can spin up copies on
Developer machines, QA machines, UAT etc
When we make copies there is no data being moved
We just point the copies to data that already exists on Delphix
There is no data on the target machines
All the data is on Delphix
Delphix looks like a NAS or NFS file server to the target machines
We give them a read writeable point in time snapshot o the data
We also track all the block changes on the virtual databases
With the block change tracking on the virutal database we can do cool thigs links
Roll them back, branch them, version them, share them, book mark the data
All this is super simple to run
Delphix can generally be be run by a junior DBA in quarter time
The coolest thing, especially for DevOps process, is self server interface for developers and testers
Where they can refresh data from production
Roll back changes
Bookmark and share data between dev and QA
We can treat data the way we treat code
For example Stubhub went from 5 copies of production in development to 120
Giving each developer their own copy
Stubhub estimated a 20% reduction in bugs that made it to production
Slow downs mean bottlenecks
We talked to Presbyterian Healthcare
And they told us that they spend 96% of their QA cycle time building the QA environment
And only 4% actually running the QA suite
This happens for every QA suite
meaning
For every dollar spent on QA there was only 4 cents of actual QA value
And that 96% cost is infrastructure time and overhead
We talked to Presbyterian Healthcare
And they told us that they spend 96% of their QA cycle time building the QA environment
And only 4% actually running the QA suite
This happens for every QA suite
meaning
For every dollar spent on QA there was only 4 cents of actual QA value
And that 96% cost is infrastructure time and overhead
Physically independent but logically correlated
Cloning multiple source databases at the same time can be a daunting task
One example with our customers is Informatica
Who had a project to integrate 6 databases into one central database
The time of the project was estimated at 12 months
With much of that coming from trying to orchestrating
Getting copies of the 6 databases at the same point in time
Like herding cats
Walmart.com
Informatical had a 12 month project to integrate 6 databases.
After installing Delphix they did it in 6 months.
I delivered this early
I generated more revenue
I freed up money and put it into innovation
won an award with Ventana Research for this project
Unshackle yourself from massive infrastructure drag and bureaucratic quagmires
And put a jetpack on your IT organizations and application development projects
Moving the data IS the big gorilla. Eliminating the data tax is crucial to the success of your company. And, if huge databases can be ready at target data centers in minutes, the rest of the excuses are flimsy.
virtual data – virtualized data – uses a small footprint. A truly virtual data platform can deliver full size datasets cheaper than subsets. A truly virtual data platform can move the time or the location pointer on its data very rapidly, and can store any version that’s needed in a library at an unbelievably low cost. And, a truly virtual data platform can massively improve app quality by making it reliable and dead simple to return to a common baseline for one or many databases in a very short amount of time. Applications delivered with agile data can afford a lot more full size virtual copies, eliminating wait time and extra work caused by sharing, as well as side effects. With the cost of data falling so dramatically, business can radically increase their utilization of existing hardware and storage, delivering much more rapidly without any additional cost. An agile data platform presents data so rapidly and reliably that the data becomes commoditized – and servers that sit idle because it would just take too long to rebuild can now switch roles on demand.