2. Database Cloning Challenge
Business want data now.
Business don’t understand DBAs.
Databases getting bigger & harder to copy.
Developers want more copies.
Reporting wants more copies.
Everyone has storage constraints.
If you can’t satisfy the business
demands your process is broken.
10. The Production
‘Wall’
Classic problem is that queries that
run fast on subsets hit the wall in
production.
Developers are unable to test against
all data
12. Shared access = Poor Productivity
Developers and
tester get frustrated
Databases become old
and unrepresentative
of production.
Requires complex
scheduling and
management
14. Physical Copies – Time & Space
consuming
Time consuming
Time to make copies, days to weeks
RMAN backup, archive logs, copy data over, recover
Meetings , days to weeks
System, Storage ,Database ,Network Admins
, manager coordination
Space consuming
12 devs x 10TB production = 120TB
20 report DBs x 40 TB = 800TB
Slow Clones mean bottlenecks
28. I. clonedb
II. Copy on Write Snapshots
a) EMC BCV
b) EMC SRDF or Recover Point
c) Vmware
III. Allocate on Write
a) Netapp (EMC VNX)
b) ZFS
c) DxFS
2. Thin Provision Cloning
33. I. clonedb
II. Copy on Write + snapshots
a) EMC BCV
b) EMC SRDF or Recover Point
c) Vmware
III. Allocate on Write
a) Netapp (EMC VNX)
b) ZFS
c) DxFS
2. Thin Provision Cloning
36. • Goal: backup
– Create BCV , take snaps of BCV
– Zone and mask LUN to target host
– Full copy of disk, snapshot, recover
• Snapshots
– 16 snapshots
– Zero or one snapshots of snapshots
• “Golden Copy”
– EMC uses a save area, area for changes to the snapshot
– Initial snapshot has to stay
II. Copy on Write a) EMC BCV
37. Non-Prod FilerEMC Filer
Production
Database
Database
Luns
Target A
Target B
Target C
Clone 1
Clone 2
Clone 3
Clone 4
Snapshots
Max 16
a) EMC
BCVIII. Allocate on Write
BCV
Day 1
snapshot
Snapshot
or break
Day 2
Timefinder needed for snapshots
on across multiple LUNs
38. • SRDF – Symetrix Remote Data Facility
– Stream Changes to a remote machine
• Recover Point
– Capture changes on wire
– Split changes send to remote site
II. Copy on Write b) EMC SRDF
41. II Copy on Write c) VMware
Data Director : Linked Clones
• Not support for Oracle databases
• Golden Copy : rebuild after 32 snapshots
• x86 host databases only
• Performance issues
– “Having several linked clones can affect the performance of the source database and the
performance of the linked clones.”
http://bit.ly/QOXbyE (on http://pubs.vmware.com )
– “If you are focused on performance, you should prefer a full clone over a linked clone.”
http://www.vmware.com/support/ws5/doc/ws_clone_typeofclone.html
– Performance worse with more snapshots
– Performance worse with more concurrent users
42. I. clonedb
II. Copy on Write
a) EMC BCV
b) EMC SRDF or Recover Point
c) Vmware
III. Allocate on Write
a) ZFS
b) Netapp (EMC VNX)
c) DxFS
2. Thin Provision Cloning
53. • Compression
• typically 3x
• Block sharing
• DxFS optimized for databases
• Write optimizations
• Space allocation and destroy
• Shared blocks in memory
c) DxFSIII. Allocate on Write
54. I. clonedb
II. Copy on Write Snapshots
a) EMC BCV
b) EMC SRDF or Recover Point
c) Vmware
III. Allocate on Write
a) Netapp (EMC VNX)
b) ZFS
c) DxFS
2. Thin Provision Cloning
62. a) Oracle 12c SMU
Oracle Snap Management Utility for ZFS Appliance
• Requires ZFS Appliance
• Supports Linux , Solaris 10+, Windows 2008+
• GUI
– snapshot source databases
– provision virtual databases
3. Database Virtualization
63.
64. Virtualization Layer
x86 hardware
Allocate
Storage
Any type
SMU
Choose your virtualization Layer:
• Delphix and Oracle SMU automated out of box
• Netapp and EMC SRDF/Recover Point require massive scripting
• Delphix and possibly SMU share blocks in memory
ZFS Storage Appliance
65. One time backup of source database
Database
Production
Instance
File system
RMAN APIs
67. Incremental forever change collection
Database
Production
Instance
File system
Changes are collected
automatically forever
Data older than retention
widow freed
72. Database
Virtualization
layer
3 clones of same source database 3 virtual clones of same source database
Virtualization layer orchestrates I/O
access between database and storage
96. A/B Testing
• Production vs Virtual
– invisible index on Prod
– Creating index on virtual
• Flashback vs Virtual
• Confidence testing
– change reporting SQL
– compare new code over last 30 days
97. 2: review QA
• Fork copies off of Dev to QA
• QA fork copies back to Dev
• Instant replay
– performance A/B
– Upgrade, patching
– Forensic analysis of Prod
118. Matrix of features
CloneDB ZFS
Appliance
Delphix Data
Director
NetApp EMC
Time Flow No Yes Yes No Yes No
Hardware
Agnostic
Yes No Yes Yes No No
Snapshots No Unlimited Unlimited 31 255 16
(96 read only)
Snapshots of
snapshots
No Unlimited Unlimited 30 255 1
Automated
Snapshots
No No Yes No Yes No
Automated
Provisioning
No No Yes No No No
Any O/S Yes Yes Yes No
x86 only
Yes Yes
Max size None None None ? 16-
100TB
?
119. Conclusion : Enterprise Solutions
• VMware Data Director
• EMC Timefinder
– offer limited ability to benefit from cloning
• Clonedb ***
– fast easy way to create many clones of the same copy
– limited to 11.2.0.2 and systems with sparse file system capability
– suffers the golden image problem and performance
• NetApp Flexclone, Snap Manager for Oracle
– offers a rolling solution
– limited database awareness
– file system clones
– limited snapshots
– Vendor lock-in
• Oracle ZFS Appliance
– Vendor Lock-in
• Delphix
– Agility : Automation, unlimited snapshots, clones, multi-database
120. Appendix
• CloneDB
– http://www.oracle-base.com/articles/11g/clonedb-11gr2.php
• ZFS
– http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/zfslast.pdf
• ZFS Appliance
– ZFS setup for Oracle http://www.oracle.com/technetwork/server-storage/solaris/config-solaris-zfs-wp-167894.pdf
– using RMAN http://www.oracle.com/technetwork/articles/systems-hardware-architecture/cloning-solution-353626.pdf
– using dataguard http://www.oracle.com/technetwork/database/features/availability/maa-db-clone-szfssa-172997.pdf
• Data Director
– http://pubs.vmware.com/datadirector-25/topic/com.vmware.ICbase/PDF/vfabric-data-director-25-administration-guide.pdf
p107 linked clone not support for Oracle
• EMC
– https://community.emc.com/servlet/JiveServlet/previewBody/11789-102-1-45992/h8728-snapsure-oracle-dnfs-wp.pdf
– http://www.centroid.com/knowledgebase/blog/cloning-oracle-databases-with-emc-snapview-and-recoverpoint
– http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:39960485246162 BCV and SRDF
• NetApp
– http://www.redbooks.ibm.com/redpapers/pdfs/redp4133.pdf
– OEM for Netapp http://www.youtube.com/watch?v=J7fnfLS5Dxg&feature=youtu.be
• Oracle SMU
– http://www.oracle.com/us/products/servers-storage/storage/nas/smu-bus-wp-final-1857065.pdf
– http://www.oracle.com/us/products/servers-storage/storage/nas/snap/smu-datasheet-1903178.pdf
– http://www.oracle.com/us/products/servers-storage/storage/nas/snap/overview/index.html
• Oracle OEM Snapshot Manager for Netapp
– http://www.youtube.com/watch?v=J7fnfLS5Dxg&feature=youtu.be
– Demo starts at 3:10 goes to 19:43
121. Other
• Can’t run multiple clones on same host with
ASM
https://forums.oracle.com/forums/thread.jsp
a?messageID=1810251
• EMC VNX
http://virtualgeek.typepad.com/virtual_geek/
2012/05/vnx-inyo-is-going-to-blow-some-
minds.html
122. NetApp Limits
Controller Size Limit
32 bit controllers 16TB
FAS3140/FAS3040/FAS3050 40TB
FAS3160/FAS3070 50TB
FAS6040/FAS3170 70TB
FAS6080 100TB
All sources have to be in the same aggregate to be
snapshot together.
Limit of 255 snapshots
snaps are limited to the same aggregate (storage pool)
Aggregates have size limits depending on controller
129. Application Life Cycle Management
Number of required clones change during Project and Extended Project Life Cycle
Projects with phased Go-Lives can have Implementation, Handover ‘Cut-Over’ and
Operations ‘RUN’ all running concurrently.
130. Development & maintenance is hard
User
Acceptance
Test
TestDevelopment
Production
Training &
Education
Production
Support
New release / upgrades lifecycle…
Bespoke, enhancements, configuration changes etc…
Production Support lifecycle…
Internal bug fixes, 3rd Party Patches, Data fixes etc..
User
Acceptance
Test
TestDevelopment
Training &
Education
Production
Support
Multiple Clones used by Production Support
Analysts, DBA team and for internal training
DBA/
Note: All environments periodically refreshed from production
131. Database’s not just used by developers
Data Migration Team
Data Mapping & Cleansing
Development of ETL
(Extraction, Transformation
& Load) Routines
…
User
Acceptance
Test
User
Acceptance
Test
System
Validation
Testing
Data
MigrationData
MigrationData
Migration
Technical Teams
System Integration
Enterprise Scheduling
Automated Testing
Performance Testing
…
Application Functional
Specialists
Development of System
Set-up
Business Process
Configuration
Sandbox
…
Business Process Owners
Application Functional
Process validation
CRP (Conference Room
Pilot)
…
User
Acceptance
Test
User
Acceptance
Test
User
Acceptance
Test
Automated
TestingAutomated
TestingAutomated
Testing
Kyle HaileyWork for a company called DelphixWe write software that enables companies toCopy their databases in 2 minutes with almost no storage overheadWe accomplish that by taking one initial copy and sharing the duplicate blocks Across all the clones
want data now.don’t understand DBAs.Db bigger and harder to copy.Devswant more copies.Reporting wants more copies.Everyone has storage constraints.If you can’t satisfy the business demands your process is broken
What are these technologiesWhat benefits and drawbacksTechnology is awesomeTechnology coming of age Coming of ageClonedb 3 pres @ OOWSMU just release month ago12c “clone” pluggable databases
Databases are largeMoving data around is hard work.Moving them takestime, resources, equipment and experience1) 10 years in support - No backups, hospital companies still don’t back up dev often dev is the new prod2) Full time DBA, half time copying3) OEM wanted me to do full phys cloning4) Was a consultant always wanted database to play on How many of you copy databases ? How much time does you spend on it?
Prod critical for businessPerformance of prod is top priorityProtect prod from load
2 options to create enough copies
Xxx spends 50% of time copying databases have to subset because not enough storagesubseting process constantly needs fixing modificationWhat happens when developers use subsets -- ****** -----
Stubhub (ebay) estimates that 20% of there production bugs arise from testing onSubsets instead of full database copies.
Wait orebay till next slide
Example at Ebay2 dozen developers have a massive shared copy of productionExample at DB3 development teams agree between themselves who is doing what testing this week as some runs destroy other teams data,If the database is shared it’s hard to get opportunity to refresh and a data get’s old
Only having enough equipment to support 2 or 3 environments causes massive delaysState of Colorado has a 100 projectsAnd they can only support 3 at a timeKLA tencor can only support 2 projects of a dozen
DB had databases which were not refreshed for 6 months+ due to refresh time and size
Slow downs mean bottlenecksThese bottlenecks cause failures in IT projectsI’m into eliminating bottlenecks (whether it is wait events, tuning sql or provisioning copies of dbs)
Development asks for a database it takes days or weeks.
90% of lost developer days at customer was due to waiting for environment builds
Happens both for dev and QA
Tightening constraining resourcesCascading affect on companies.The business doesn’t know or understand this DBA workDBAs are often the hardest resource for IT to justify because they are invisibleDBAs are already being asked to do a tremendous amountDBAs are often on call 24x7DBAs are foundational.
Delays cause failures*http://www.pcworld.com/article/246647/10_biggest_erp_software_failures_of_2011.html
MisguidedattributingRelax the constraints http://martinfowler.com/bliki/NoDBA.html
Fastest query is the query not run
Performance issuesSingle point in time
Performance issuesSingle point in time
Physical File System:Performance issues Multiple points in timeOccasional rebuild
Physical File System:Performance issues Multiple points in timeOccasional rebuild
Buisness Continuance Volume
Business Continuance Volume
Symetric Remote Data Facility
Physical File System:Performance issues Multiple points in timeOccasional rebuild
Oracle Database Cloning Solution Using Oracle Recovery Manager and Sun ZFS Storage Appliancehttp://www.oracle.com/technetwork/articles/systems-hardware-architecture/cloning-solution-353626.pdf
Oracle Database Cloning Solution Using Oracle Recovery Manager and Sun ZFS Storage Appliancehttp://www.oracle.com/technetwork/articles/systems-hardware-architecture/cloning-solution-353626.pdf
OverlayClonedb - 1 copy, performance issuesCopy on writeEMC - 16 points in time performance issuesVMware – not support with OracleAllocate on WriteZFS – manually config, performance issuesNetApp – 255 points in time , rolling window
Technology has existed 15+ years Why hasn’t there been more adoption ??
http://partners.netapp.com/go/techontap/empower-dba.html?fmt=printCreate Luns, aggr, snapshots, clonesMirroring filesystemsExporting file systemsMounting file systems
Requires expert storage admins specialized equipment scripting2-7 Days or 2-8 hours if everyone togetherCERN recently gave a presentation where they wrote almost 30,000 lines of code13k lines & 15k lines of PHP
Like the internetThin provisioning ultimately led to db virtualization, goes beyond thin provisioning eliminate the overhead of managingproviding significant agility gains
Like the internet
Automate and simplify
Saw it at OOW nothing till couple weeks agoCustomer had beta but wouldn’t allow us to take screen shots
I like at DelphixFrustratedSteve and Larry gave aweseom presentationsSteve jobs and Ellison ui combined forces now I have itDelphix GUI is what Oracle Enterprise Manager would look like if Apple had designed it
Development Provision and RefreshFullFreshFrequent (Many) Source control for code, data control for the database Data version per release version Federated cloning QA fork copies off to QA QA fork copies back to Dev Instant replay – set up and run destructive tests performance A/B Upgrade patching Recovery Backup 50 days in size of 1 copy, continuous data protection (use recent slide ob backup schedules full, incr,inrc,inrc, full) Restore logical recovery on prod logical recovery on Dev Debugging debug on clone instead of prod debug on data at the time of a problem Validate physical integrity (test for physical corruption)
In physical worldIf 3 Copies of a database
Fast = Fresh Full = Quality Many = jet pack on development-
Change mentality from few as possible to as many as accelerates the businessRemember Jinga ?
Self Service
Source Control for the database data
Physically independent but logically correlatedCloning multiple source databases at the same time can be a daunting task
One example with our customers is InformaticaWho had a project to integrate 6 databases into one central databaseThe time of the project was estimated at 12 monthsWith much of that coming from trying to orchestratingGetting copies of the 6 databases at the same point in timeLike herding cats
Informatical had a 12 month project to integrate 6 databases.After installing Delphix they did it in 6 months.I delivered this earlyI generated more revenueI freed up money and put it into innovationwon an award with Ventana Research for this project
Multiple scripted dumps or RMAN backups are used to move data today. With application awareness, we only request change blocks—dramatically reducing production loads by as much as 80%. We also eliminate the need for DBAs to manage custom scripts, which are expensive to maintain and support over time.
50 days of backups in the size of one backup
Development Provision and RefreshFullFreshFrequent (Many) Source control for code, data control for the database Data version per release version Federated cloning QA fork copies off to QA QA fork copies back to Dev Instant replay – set up and run destructive tests performance A/B Upgrade patching Recovery Backup 50 days in size of 1 copy, continuous data protection (use recent slide ob backup schedules full, incr,inrc,inrc, full) Restore logical recovery on prod logical recovery on Dev Debugging debug on clone instead of prod debug on data at the time of a problem Validate physical integrity (test for physical corruption)
If every MB was an Inch 300,000 customers 12 copies on average 100 GB avg size PB TB GB 300000*12*100 = 360,000,000 300000*1*.3*100 = 9,000,000 351 PB e p t g 1,191,290,000 feet to moon, 132,000,000 feet around the earthe p t g m k b 15,133,979,520 inches to the moone p t g m k b 351,000,000,00015,133,979,520 inches to the moone p t g m k b 35100000000015133979520 inches to the moon
HD 720TB down to 8TB ( create 19 x 36TB VDBs )
Informatica – finished 2x fasterStubhub - 2 x as many releases a yearKLA-Tencore- 5 x as many projectsQA/QualityStubhub - 20% less bugs in production, found full table scan that would have been missed on subsets
Moral of this storyInstead of dragging behind enormous amounts of infrastructureand bureaucracy required to provide database copiesUses db virteliminates the drag and provides power and acceleration To your companyDefining moment CompetitorsServices
Once Last Thinghttp://www.dadbm.com/wp-content/uploads/2013/01/12c_pluggable_database_vs_separate_database.png
250 pdb x 200 GB = 50 TBEMC sells 1GB$1000Dell sells 32GB $1,000.terabyte of RAM on a Dell costs around $32,000terabyte of RAM on a VMAX 40k costs around $1,000,000.
http://www.emc.com/collateral/emcwsca/master-price-list.pdf These prices obtain on pages 897/898:Storage engine for VMAX 40k with 256 GB RAM is around $393,000Storage engine for VMAX 40k with 48 GB RAM is around $200,000So, the cost of RAM here is 193,000 / 208 = $927 a gigabyte. That seems like a good deal for EMC, as Dell sells 32 GB RAM DIMMs for just over $1,000. So, a terabyte of RAM on a Dell costs around $32,000, and a terabyte of RAM on a VMAX 40k costs around $1,000,000.2) Most DBs have a buffer cache that is less than 0.5% (not 5%, 0.5%) of the datafile size.
reduces storagealleviates DBA of repetitive focus on innovationAccelerates DevelopmentEliminate bottleneck more code faster and of better quality
incremental forever collectionClones: fast to create, small foot print, can create from any point in time
Batch window under pressureProduction Support environments need to be refreshed daily.Support environments needs to be available from start of business day Refresh overruns impact Production support analystsTraining & Education environments need to be ‘rolled back’ after training course.Storage constraintsNot enough storage space to hold multiple copies on-line. Tape restores required for older clones.
Storage infrastructure and technical resources constraints directly impact project delivery. Storage availability, procurement and operational costs traditionally limits the number of OEBS clones possible. This impacts the number of parallel activities possible, resulting in extended project timescales.Availability of data center technical & DBA resources to provide ‘On-Demand’ clones Resulting in project delays whilst environments are being sourced.Technical teams need representative data volumes & Application set-up to undertake performance testing e.g.ASCP (Advance Supply Chain Planning), Period End / Year End processing etc..