6. The Phoenix Project
“Any improvement not made at the constraint
is an illusion.”
What is the constraint?
7. The Phoenix Project
“Any improvement not made at the constraint
is an illusion.”
What is the constraint?
“One of the most powerful things that IT can
do is get environments to development and QA
when they need it”
8. Problem in IT
I. Data Constraint strains IT
II. Data Constraint price is huge
III. Data Constraint companies
unaware
9. Problem in IT
60% Projects Over Schedule
85% delayed waiting for data
Data is the Constraint
CIO Magazine Survey:
Current situation: only getting worse … Data Doomsday
10. I. Data Constraint strains IT
If you can’t satisfy the business demands then your process is broken.
21. I. Data constraint: Data floods company
infrastructure
92% of the cost of business
, in financial services business ,
is “data”
www.wsta.org/resources/industry-articles
Most companies have
2-9% IT spending
http://uclue.com/?xq=1133
Data management is the largest
Part of IT expense
Gartner: Data Doomsday
22. Data is the constraint
I. Data Constraint strains IT
II. Data Constraint price is huge
III. Data Constraint companies
unaware
24. Part II. Data constraint price is Huge
• Four Areas data tax hits
1. IT Capital resources
2. IT Operations personnel
3. Application Development
4. Business
25. Part II. Data constraint price is Huge
• Four Areas data tax hits
1. IT Capital resources
2. IT Operations personnel
3. Application Development
4. Business
26. II. Data constraint price is huge : 1. IT Capital
• Hardware
–Servers
–Storage
–Network
–Data center floor
space, power, cooling
27. Part II. Data constraint price is Huge
• Four Areas data tax hits
1. IT Capital resources
2. IT Operations personnel
3. Application Development
4. Business
28. II. Data constraint price is huge : 2. IT Operations
• People
– DBAs
– SYS Admin
– Storage Admin
– Backup Admin
– Network Admin
• Hours : 1000s just for DBAs
• $100s Millions for data center modernizations
29. Part II. Data constraint price is Huge
• Four Areas data tax hits
1. IT Capital resources
2. IT Operations personnel
3. Application Development
4. Business
30. II. Data constraint price is Huge : 3. App Dev
• Inefficient QA: Higher costs of QA
• QA Delays : Greater re-work of code
• Sharing DB Environments : Bottlenecks
• Using DB Subsets: More bugs in Prod
• Slow Environment Builds: Delays
“if you can't measure it you can’t manage it”
31. II. Data Tax is Huge : 3. App Dev
Long Build Time
QA Test
96% of QA time was building environment
$.04/$1.00 actual testing vs. setup
Build
32. II. Data Tax is Huge : 3. App Dev
Build QA Env QA Build QA Env QA
Sprint 1 Sprint 2 Sprint 3
Bug CodeX
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
Delay in Fixing the bug
Cost
To
Correct
Software Engineering Economics – Barry Boehm (1981)
33. II. Data Tax is Huge : 3. App Dev
full copies cause bottlenecks
Frustration Waiting
Old Unrepresentative Data
34. II. Data Tax is Huge : 3. App Dev
subsets cause bugs
35. II. Data Tax is Huge : 3. App Dev
subsets cause bugs
The Production ‘Wall’
36. II. Data Tax is Huge : 3. App Dev
Developer Asks for DB Get Access
Manager approves
DBA Request
system
Setup DB
System
Admin
Request
storage
Setup
machine
Storage
Admin
Allocate
storage
(take snapshot)
3-6 Months to Deliver Data
37. II. Data Tax is Huge : 3. App Dev
Why are hand offs so expensive?
1hour
1 day
9 days
38. II. Data Tax is Huge : 3. App Dev
Slow Environment Builds
Never enough environments
39. Part II. Data constraint price is Huge
• Four Areas data tax hits
1. IT Capital resources
2. IT Operations personnel
3. Application Development
4. Business
40. II. Data constraint price is Huge : 4. Business
Ability to capture revenue
• Business Intelligence
– Old data = less intelligence
• Business Applications
– Delays cause
=> Lost Revenue
46. III. Data Constraint companies unaware
#1 Biggest Enemy :
IT departments believe
– best processes
– greatest technology
– Just the way it is
47. III. Data Constraint companies unaware
Why do I need an iPhone ?
Don’t we already do that ?
48. III. Data Constraint companies unaware
• Ask Questions
– me: we provision environments in minutes for
almost not extra storage.
– Customer: We already do that
– me: How long does it take a developer to get
an environment after they ask ?
– Customer: 2-3 weeks
– me: we do it in 2-3 minutes
49. III. Data Constraint companies unaware
How to enlighten? Ask for metrics
– How old is data in
• BI and DW : ETL windows
• QA and Dev : how often refreshed
– How long does it take a developer to get a DB copy?
– How long does it take QA to setup an environment
50. Data is the constraint
I. Data Constraint strains IT
II. Data Constraint price is huge
III. Data Constraint companies
unaware
62. Allocate Any Storage to Delphix
Allocate Storage
Any type
Pure Storage + Delphix
Better Performance for
1/10 the cost
63. One time backup of source database
Database
Production
File systemFile system
Upcoming
Supports
InstanceInstanceInstance
Application Stack Data
64. DxFS (Delphix) Compress Data
Database
Production
Data is
compressed
typically 1/3
size
File system
InstanceInstanceInstance
65. Incremental forever change collection
Database
Production
File system
Changes
• Collected incrementally forever
• Old data purged
File system
Time Window
Production
InstanceInstanceInstance
87. QA : Fast environments with Branching
Instance
Instance
Instance
Source Dev
QA
branched from Dev
Source
dev
QA
88. QA : Fast environments with Branching
B
u
i
l
d
T
i
m
e
QA Test
1% of QA time was building environment
$.99/$1.00 actual testing vs. setup
Build Time
QA Test
Build
89. QA : bugs found fast
Sprint 1 Sprint 2 Sprint 3
Bug CodeX
QA QA
Build QA
Env
Q
A
Build QA
Env
Q
A
Sprint 1 Sprint 2 Sprint 3
Bug
Cod
e
X
104. Business Intelligence ETL and Refresh
Windows
2011
2012
2013
2014
2015
1pm 10pm 8am
noon
10pm 8am noon 9pm
6am 8am 10pm
105. Business Intelligence: ETL and DW Refreshes
Instance
Prod
Instance
DW & BI
Data Guard – requires full refresh if used
Active Data Guard – read only, most reports don’t work
106. Business Intelligence: Fast Refreshes
• Collect only Changes
• Refresh in minutes
Instance Instance
Prod
Instance
BI and DW
ETL
24x7
116. Dev
QA
UAT
Dev
QA
UAT
2.6
2.7
Dev
QA
UAT
2.8
Data Control = Source Control for the Database
Production Time Flow
Modernization: Auditing & Version Control
CIO
Insurance
600 Applications
CIO
Investment Banking
180 Applications
CIO
South America
65 Applications
117. Use Case Summary
1. Development
2. QA
3. Quality
4. Business Intelligence
5. Performance Acceleration
118. How expensive is the Data Constraint?
Measure before and after Delphix w/ Fortune 500 :
Median App Dev throughput increase by 2x
119. How expensive is the Data Constraint?
• 10 x Faster Financial Close
• 9x Faster BI refreshes
• 2x faster Projects
• 20 % less bugs
120. Agile Data Quotes
• “Allowed us to shrink our project schedule from 12
months to 6 months.”
– BA Scott, NYL VP App Dev
• "It used to take 50-some-odd days to develop an
insurance product, … Now we can get a product to the
customer in about 23 days.”
– Presbyterian Health
• “Can't imagine working without it”
– Ramesh Shrinivasan CA Department of General Services
121.
122. Summary
• Problem: Data is the constraint
• Solution: Agile data is small & fast
• Results: Deliver projects
– Half the Time
– Higher Quality
– Increase Revenue
Kyle@delphix.com
kylehailey.com
slideshare.net/khailey
123. Future
Now
• Application Stack Cloning
• Cross Platform Cloning : UNIX -> Linux
• Postgres
Coming
• VM cloning
• Workflows
– Chef, Puppet, etc workflows for virtual data provisioning
• Developer workspaces
– Check out, check in, bookmark, tagging, rollback, refresh
• Secure Data
– Masking
• More Databases
– MySQL, Sybase, DB2, Hadoop, Mongo, Cassandra
• DR and HA
129. $1,000,000
1TB cache on SAN
$6,000
200GB shared cache on Delphix
Five 200GB database copies are
cached with :
Hinweis der Redaktion
Work for a company called DelphixWe write software that enables Oracle and SQL Server customers toCopy their databases in 2 minutes with almost no storage overheadWe accomplish that by taking one initial copy and sharing the duplicate blocks Across all the clonesExpect vt100 interface -> got an apple slick interfaceConcerned about NFS performance -> Banged on it for 2 years.what is Agile Data?How does that change the industry?How do you get data where you need it? Like Hadoop? Sure file system snapshots exists, but only available to sites with Netapp or EMC Can change your careerRock start DBADBA manager -> directorDirector -> VPVP -> CTO
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need itEliyahuGoldratt
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need itEliyahuGoldrattIT bottlenecksSetting PrioritiesCompany GoalsDefining MetricsFast IterationsIT version of “The Goal” by E. Goldratt
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need itEliyahuGoldrattIT bottlenecksSetting PrioritiesCompany GoalsDefining MetricsFast IterationsIT version of “The Goal” by E. Goldratt
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need itEliyahuGoldrattIT bottlenecksSetting PrioritiesCompany GoalsDefining MetricsFast IterationsIT version of “The Goal” by E. Goldratt
Get the right dataTo the right peopleAt the right time
want data now.don’t understand DBAs.Db bigger and harder to copy.Devswant more copies.Reporting wants more copies.Everyone has storage constraints.If you can’t satisfy the business demands your process is broken
Moving the data IS the big gorilla. This gorilla of a data tax is hitting your bottom line hard.
Probably nothing more onerous for a DBA than to hear “can you get me a copy of the production database for my project”RMAN vs Delphix. I was running out of space for RMAN live demo !When moving data is too hard, then the data in non production systems such as reporting, development or QA becomes older, and the older the data, the less actionable intelligence your BI or Analytics can give you.
ExampleSome customers have over 1 Petabyte duplicate data(1000 TB, ie 1,000,000 GB )
We know from our experience that there are some $1B+ Data center consolidation price tags. Taking even 30% of the cost out of that, and cutting the timeline, is a strong and powerful way to improve margin.What about really big problems like consolidating data center real estate, or moving to the cloud?f you can non-disruptively collect the data, and easily and repeatedly present it in the target data center, you take huge chunks out of these migration timelines. Moreover, with data being so easy to move on demand, you neutralize the hordes of users who insist that there isn’t enough time to do this, or its too hard, or too risky. Annual time spent coping databases can measure in the 1000s of hours just for DBAs not including all the other personnel required to supply the infrastructure necessary
Data gets old because not refreshedInstead of running 5 tests in two weeks (because it takes me 2 days to rollback after each of my 1 hour tests) and paying the cost of bugs slipping into production, what if I could run 15 tests in that same two weeks and have no bugs at all in production?
And they told us that they spend 96% of their QA cycle time building the QA environmentAnd only 4% actually running the QA suiteThis happens for every QA suitemeaningFor every dollar spent on QA there was only 4 cents of actual QA value Meaning 96% cost is spent infrastructure time and overhead
Because of the time required to set up QA environmentsThe actual QA tests suites lag behind the end of a sprint or code freezeMeaning that the amount of time that goes by after the introduction of a bug in code and before the bug is found increasesAnd the more time that goes by after the introduction of a bug into the codeThe more dependent is written on top of the bug Increasing the amount of code rework required after the bug is finally foundIn his seminal book that some of you may be familiar with, “Software Engineering Economics”, author Barry Boehm Introduce the computer world to the idea that the longer one delays fixing a bug in the application design lifescyleThe more expensive it is to to fix that bug and these cost rise exponentially the laterThe bug is address in the cycle
Not sure if you’ve run into this but I have personally experience the followingWhen I was talking to one group at Ebay, in that development group they Shared a single copy of the production database between the developers on that team.What this sharing of a single copy of production meant, is that whenever a Developer wanted to modified that database, they had to submit their changes to codeReview and that code review took 1 to 2 weeks.I don’t know about you, but that kind of delay would stifle my motivationAnd I have direct experience with the kind of disgruntlement it can cause.When I was last a DBA, all schema changes went through me.It took me about half a day to process schema changes. That delay was too much so it was unilaterally decided byThey developers to go to an EAV schema. Or entity attribute value schemaWhich mean that developers could add new fields without consulting me and without stepping on each others feat.It also mean that SQL code as unreadable and performance was atrocious.Besides creating developer frustration, sharing a database also makes refreshing the data difficult as it takes a while to refresh the full copyAnd it takes even longer to coordinate a time when everyone stops using the copy to make the refreshAll this means is that the copy rarely gets refreshed and the data gets old and unreliable
To circumvent the problems of sharing a single copy of productionMany shops we talk to create subsets.One company we talked to , spends 50% of time copying databases have to subset because not enough storagesubsetting process constantly needs fixing modificationNow What happens when developers use subsets -- ****** -----
Subsets instead of full database copies.
If Walmart in New York sold Lego Batman like hotcakes the morning it came out, wouldn’t be good to know at Walmart CaliforniaWeek old data happens when refreshes are too disruptive and limited to weekends
You might be familiar with this cycle that we’ve seen in the industry:Where IT departments budgets are being constrainedWhen IT budgets are constrained one of the first targets is reducing storageAs storage budgets are reduced the ability to provision database copies and development environments goes downAs development environments become constrained, projects start to hit delays. As projects are delayed The applications that the business depend on to generate revenue to pay for IT budgets are delayedWhich reduces revenue as the business cannot access new applications Which in turn puts more pressure on the IT budget.It becomes a viscous circle
Internet vs browserAutomate or die – the revolution will be automatedThe worst enemy of companies today is thinking that they have the best processes that exist, that their IT organizations are using the latest and greatest technology and nothing better exists in the field. This mentality will be the undermining of many companies.http://www.kylehailey.com/automate-or-die-the-revolution-will-be-automated/Data IS the constraintBusiness skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not agile. The belief that there is no agility problem is part of the problem.http://www.kylehailey.com/data-is-the-constraint/
Due to the constraints of building clone copy database environments one ends up in the “culture of no”Where developers stop asking for a copy of a production database because the answer is “no”If the developers need to debug an anomaly seen on production or if they need to write a custom module which requires a copy of production they know not to even ask and just give up.
“The status quo is pre-ordained failure” Internet vs browserAutomate or die – the revolution will be automatedThe worst enemy of companies today is thinking that they have the best processes that exist, that their IT organizations are using the latest and greatest technology and nothing better exists in the field. This mentality will be the undermining of many companies.http://www.kylehailey.com/automate-or-die-the-revolution-will-be-automated/Data IS the constraintBusiness skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not agile. The belief that there is no agility problem is part of the problem.http://www.kylehailey.com/data-is-the-constraint/
Internet vs browserengine vs carAutomate or die – the revolution will be automatedThe worst enemy of companies today is thinking that they have the best processes that exist, that their IT organizations are using the latest and greatest technology and nothing better exists in the field. This mentality will be the undermining of many companies.http://www.kylehailey.com/automate-or-die-the-revolution-will-be-automated/Data IS the constraintBusiness skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not agile. The belief that there is no agility problem is part of the problem.http://www.kylehailey.com/data-is-the-constraint/
How long does it take a developer to get a copy of a database
Fastest query is the query not run
Source Syncing* Initial backup once onlyContinual forever change collection Purging of old data Storage DxFSShare blocks snap shots , unlimited, storage agnosticCompression , 1/3 typically, compress on block boundaries. Overhead for compression is basically undetectable Share data in memory, super caching*Self Service AutomationVirtual database provisioning, rollback, refresh*, branching*, tagging*Mount files over NFSInit.ora, SID, database name, database unique nameSecurity on who can see which source databases, how many clones they can make and how much storage they can use
Source Syncing* Initial backup once onlyContinual forever change collection Purging of old data Storage DxFSShare blocks snap shots , unlimited, storage agnosticCompression , 1/3 typically, compress on block boundaries. Overhead for compression is basically undetectable Share data in memory, super caching*Self Service AutomationVirtual database provisioning, rollback, refresh*, branching*, tagging*Mount files over NFSInit.ora, SID, database name, database unique nameSecurity on who can see which source databases, how many clones they can make and how much storage they can use
Like the internet
In the physical database world, 3 clones take up 3x the storage.In the virtual world 3 clones take up 1/3 the storage thanks to block sharing and compression
Software installs an any x86 hardware uses any storage supports Oracle 9.2-12c, standard edition, enterprise edition, single instance and RAC on AIX, Sparc, HPUX, LINUX support SQL Server
EMC, Netapp, Fujitsu, Or newer flash storage likeViolin, Pure Storage, Fusion IO etc
Delphix does a one time only copy of the source database onto Delphix
Giving each developer their own copy
Requirements: fast data refresh, rollbackData delivery takes 480 out of 500 minute test cycle (4% value)$.04/$1.00 actual testing vs. setup
Multiple scripted dumps or RMAN backups are used to move data today. With application awareness, we only request change blocks—dramatically reducing production loads by as much as 80%. We also eliminate the need for DBAs to manage custom scripts, which are expensive to maintain and support over time.
Physically independent but logically correlatedCloning multiple source databases at the same time can be a daunting task
One example with our customers is InformaticaWho had a project to integrate 6 databases into one central databaseThe time of the project was estimated at 12 monthsWith much of that coming from trying to orchestratingGetting copies of the 6 databases at the same point in timeLike herding cats
Walmart.comInformatical had a 12 month project to integrate 6 databases.After installing Delphix they did it in 6 months.I delivered this earlyI generated more revenueI freed up money and put it into innovationwon an award with Ventana Research for this project
From our experience before and after with Fortune 500 companies
How big is the data tax? One way we can measure it is by looking at the improvements in project timelines at companies that have eliminated this data tax through implementing a data virtualization appliance (DVA) and creating an agile data platform (ADP). Agile data is data that is delivered to the exact spot it’s needed just in time and with much less time/cost/effort. By looking at productivity rates after implementing an ADP compared to before the ADP we can get an idea of the price of the data tax without an ADP. IT experts building mission critical systems for Fortune 500 companies have seen real project returns averaging 20-50% productivity increases after having implemented an ADP. That’s a big data tax to pay without an ADP. The data tax is real, and once you understand how real it is, you realize how many of your key business decisions and strategies are affected by the agility of the data in your applications.Took us 50 days to develop an insurance product … now we can get a product to the customer in 23 days with Delphix
Moral of this storyInstead of dragging behind enormous amounts of infrastructureand bureaucracy required to provide database copiesUses db virteliminates the drag and provides power and acceleration To your companyDefining moment CompetitorsServices
Moving the data IS the big gorilla. Eliminating the data tax is crucial to the success of your company. And, if huge databases can be ready at target data centers in minutes, the rest of the excuses are flimsy. Agile data – virtualized data – uses a small footprint. A truly agile data platform can deliver full size datasets cheaper than subsets. A truly agile data platform can move the time or the location pointer on its data very rapidly, and can store any version that’s needed in a library at an unbelievably low cost. And, a truly agile data platform can massively improve app quality by making it reliable and dead simple to return to a common baseline for one or many databases in a very short amount of time. Applications delivered with agile data can afford a lot more full size virtual copies, eliminating wait time and extra work caused by sharing, as well as side effects. With the cost of data falling so dramatically, business can radically increase their utilization of existing hardware and storage, delivering much more rapidly without any additional cost. An agile data platform presents data so rapidly and reliably that the data becomes commoditized – and servers that sit idle because it would just take too long to rebuild can now switch roles on demand.
Once Last Thinghttp://www.dadbm.com/wp-content/uploads/2013/01/12c_pluggable_database_vs_separate_database.png
250 pdb x 200 GB = 50 TBEMC sells 1GB$1000Dell sells 32GB $1,000.terabyte of RAM on a Dell costs around $32,000terabyte of RAM on a VMAX 40k costs around $1,000,000.
http://www.emc.com/collateral/emcwsca/master-price-list.pdf These prices obtain on pages 897/898:Storage engine for VMAX 40k with 256 GB RAM is around $393,000Storage engine for VMAX 40k with 48 GB RAM is around $200,000So, the cost of RAM here is 193,000 / 208 = $927 a gigabyte. That seems like a good deal for EMC, as Dell sells 32 GB RAM DIMMs for just over $1,000. So, a terabyte of RAM on a Dell costs around $32,000, and a terabyte of RAM on a VMAX 40k costs around $1,000,000.2) Most DBs have a buffer cache that is less than 0.5% (not 5%, 0.5%) of the datafile size.