SlideShare ist ein Scribd-Unternehmen logo
1 von 27
RecoverGuard™ Confidence in Business Continuity
Confidentiality - Important  ,[object Object],[object Object],[object Object]
Continuity Software Inc.
High Availability and DR challenges today ,[object Object],[object Object],[object Object]
Building the right infrastructure… ,[object Object],[object Object],[object Object]
The Problem: Configuration Drift ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],…  is only the first step to true HA and DR
The Solution – RecoverGuard™ ,[object Object],[object Object],[object Object],[object Object]
Complete HA/DR analytics solution Availability Management Data Protection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Recoverability & Availability Dashboard
RecoverGuard™ ticket
Sample finding
Sample #1: Partial Replication Result: Data Loss EMC SRDF, TimeFinder (BCV, Clone, Snap) HDS TrueCopy, HUR, ShadowImage, TrueImage NetApp SnapMirror, Snapshot, SnapVault CLARiiON MirrorView, SnapView …
Sample #2: Synchronous Replication RDF Group Replication Inconsistency Result: Data loss, increased time to recover EMC SRDF
Sample #3: Inconsistent Access to Storage by Cluster Result: downtime, increased time to recover
Result: DR failure and data corruption Sample #4: Tampering Risk
Sample #5: Local Replication with BCVs Replication Age Inconsistency Result: Data corruption
Hardware 8 x CPU 2.2Ghz 32 GB RAM 2 x HBA 2 x NIC Software OS: HP-UX 11.31 WebSphere Java 1.5 EMC PowerPath 4.4 Kernel Parameters Max up processes: 8192 Max # of semaphores: 600 Sample #6: Configuration drifts between Production and DR Result: Increased time to recover Hardware 2  x CPU 2.2Ghz 8  GB RAM 1  x HBA 1  x NIC Software OS: HP-UX  11.23 NO WebSphere Java  1.4.2 EMC PowerPath  3.0.5 Kernel Parameters Max up processes:  1024 Max # of semaphores:  128 More differences in the areas of DNS, NTP, Page files, Internet services, patches, etc
Hardware 2 x HBA Software Microsoft .NET 2.0 SP 2 Windows x64 SP 1 Oracle MTS Recovery Service DNS Configurartion 192.168.68.50 192.168.68.51 192.168.2.50 Page Files 1 x 1 GB (c: 1 x 4 GB (d: Kernel Parameters Number of open files: 32767 Sample #7: Configuration drifts between Production and HA Result: Downtime, manual intervention needed to recover
Result: Reduced MTBF, Downtime, Sub-optimal performance Sample #8 - SAN I/O path - single point of failure
Result: File System  not  usable at the DR site Sample #9: Replica create time inconsistency
[object Object],[object Object],[object Object],Sample #10: Mixed storage types
if RAID1 needed:  Data protection issue, reduced MTBF, suboptimal performance Otherwise:  Saving opportunity (if RAID5 needed) Sample #11: Mixed RAID levels
Result: Potential Data corruption Sample #12: Cluster Node Configured to Mount on Boot
How it works    Windows 2003 Server    Oracle 10g schema    SYMCLI/NaviCLI  “ proxy ” for EMC Symmetrix / CLARiiON    StorageScope API for EMC ECC    SSH / WMI using valid user credentials    JDBC using valid user credentials    IE6+ web client Java 1.5+    HiCommand API for HDS HiCommand     SSH / Telnet for NetApp filers Storage arrays Hosts DB2 Databases
Support matrix Manual Configuration required ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
For more information ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Joelith
 
Webinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsWebinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsStorage Switzerland
 
Presentation data domain advanced features and functions
Presentation   data domain advanced features and functionsPresentation   data domain advanced features and functions
Presentation data domain advanced features and functionsxKinAnx
 
Data core makes_ha_nas_practical_20mar12
Data core makes_ha_nas_practical_20mar12Data core makes_ha_nas_practical_20mar12
Data core makes_ha_nas_practical_20mar12jelenaveskovic
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++Hortonworks
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File Systemtutchiio
 
Veritas NetBackup benchmark comparison: Data protection in a large-scale virt...
Veritas NetBackup benchmark comparison: Data protection in a large-scale virt...Veritas NetBackup benchmark comparison: Data protection in a large-scale virt...
Veritas NetBackup benchmark comparison: Data protection in a large-scale virt...Principled Technologies
 
EMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed ReviewEMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed ReviewEMC
 
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...xKinAnx
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Doug O'Flaherty
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentalsemcbaltics
 
Arcserve udp recovery point server and global deduplication 12-2014
Arcserve udp   recovery point server and global deduplication 12-2014Arcserve udp   recovery point server and global deduplication 12-2014
Arcserve udp recovery point server and global deduplication 12-2014Gina Tragos
 
Avamar Run Book - 5-14-2015_v3
Avamar Run Book - 5-14-2015_v3Avamar Run Book - 5-14-2015_v3
Avamar Run Book - 5-14-2015_v3Bill Oliver
 
Delphix database virtualization v1.0
Delphix database virtualization v1.0Delphix database virtualization v1.0
Delphix database virtualization v1.0Arik Lev
 
TECHNICAL BRIEF▶ NetBackup 7.6 Deduplication Technology
TECHNICAL BRIEF▶ NetBackup 7.6 Deduplication TechnologyTECHNICAL BRIEF▶ NetBackup 7.6 Deduplication Technology
TECHNICAL BRIEF▶ NetBackup 7.6 Deduplication TechnologySymantec
 

Was ist angesagt? (20)

Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014
 
Caching
CachingCaching
Caching
 
Webinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsWebinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be Backups
 
Presentation data domain advanced features and functions
Presentation   data domain advanced features and functionsPresentation   data domain advanced features and functions
Presentation data domain advanced features and functions
 
Data core makes_ha_nas_practical_20mar12
Data core makes_ha_nas_practical_20mar12Data core makes_ha_nas_practical_20mar12
Data core makes_ha_nas_practical_20mar12
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Technical track 1: arcserve UDP deep dvie
Technical track 1: arcserve UDP deep dvieTechnical track 1: arcserve UDP deep dvie
Technical track 1: arcserve UDP deep dvie
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
Veritas NetBackup benchmark comparison: Data protection in a large-scale virt...
Veritas NetBackup benchmark comparison: Data protection in a large-scale virt...Veritas NetBackup benchmark comparison: Data protection in a large-scale virt...
Veritas NetBackup benchmark comparison: Data protection in a large-scale virt...
 
EMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed ReviewEMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed Review
 
Mike Resseler - Deduplication in windows server 2012 r2
Mike Resseler - Deduplication in windows server 2012 r2Mike Resseler - Deduplication in windows server 2012 r2
Mike Resseler - Deduplication in windows server 2012 r2
 
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
 
Caching Strategies
Caching StrategiesCaching Strategies
Caching Strategies
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentals
 
Arcserve udp recovery point server and global deduplication 12-2014
Arcserve udp   recovery point server and global deduplication 12-2014Arcserve udp   recovery point server and global deduplication 12-2014
Arcserve udp recovery point server and global deduplication 12-2014
 
Avamar Run Book - 5-14-2015_v3
Avamar Run Book - 5-14-2015_v3Avamar Run Book - 5-14-2015_v3
Avamar Run Book - 5-14-2015_v3
 
Delphix database virtualization v1.0
Delphix database virtualization v1.0Delphix database virtualization v1.0
Delphix database virtualization v1.0
 
TECHNICAL BRIEF▶ NetBackup 7.6 Deduplication Technology
TECHNICAL BRIEF▶ NetBackup 7.6 Deduplication TechnologyTECHNICAL BRIEF▶ NetBackup 7.6 Deduplication Technology
TECHNICAL BRIEF▶ NetBackup 7.6 Deduplication Technology
 

Ähnlich wie Continuity Software 4.3 Detailed Gaps

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
[NetApp] Simplified HA:DR Using Storage Solutions
[NetApp] Simplified HA:DR Using Storage Solutions[NetApp] Simplified HA:DR Using Storage Solutions
[NetApp] Simplified HA:DR Using Storage SolutionsPerforce
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
Ionut hrubaru, bogdan lazarescu sql server high availability
Ionut hrubaru, bogdan lazarescu   sql server high availabilityIonut hrubaru, bogdan lazarescu   sql server high availability
Ionut hrubaru, bogdan lazarescu sql server high availabilityCodecamp Romania
 
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...RainStor
 
Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL ServerBob Roudebush
 
Track 2, session 3, business continuity and disaster recovery in the virtuali...
Track 2, session 3, business continuity and disaster recovery in the virtuali...Track 2, session 3, business continuity and disaster recovery in the virtuali...
Track 2, session 3, business continuity and disaster recovery in the virtuali...EMC Forum India
 
SQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster RecoverySQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster RecoveryMichael Poremba
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Building Low Cost Scalable Web Applications Tools & Techniques
Building Low Cost Scalable Web Applications   Tools & TechniquesBuilding Low Cost Scalable Web Applications   Tools & Techniques
Building Low Cost Scalable Web Applications Tools & Techniquesrramesh
 
Champion Fas Deduplication
Champion Fas DeduplicationChampion Fas Deduplication
Champion Fas DeduplicationMichael Hudak
 
Storage essentials (by Merlin Ran)
Storage essentials (by Merlin Ran)Storage essentials (by Merlin Ran)
Storage essentials (by Merlin Ran)gigix1980
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithmDipak Badhe
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Oracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified StorageOracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified StorageDavid R. Klauser
 

Ähnlich wie Continuity Software 4.3 Detailed Gaps (20)

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
[NetApp] Simplified HA:DR Using Storage Solutions
[NetApp] Simplified HA:DR Using Storage Solutions[NetApp] Simplified HA:DR Using Storage Solutions
[NetApp] Simplified HA:DR Using Storage Solutions
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
MYSQL
MYSQLMYSQL
MYSQL
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Ionut hrubaru, bogdan lazarescu sql server high availability
Ionut hrubaru, bogdan lazarescu   sql server high availabilityIonut hrubaru, bogdan lazarescu   sql server high availability
Ionut hrubaru, bogdan lazarescu sql server high availability
 
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
 
Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL Server
 
Track 2, session 3, business continuity and disaster recovery in the virtuali...
Track 2, session 3, business continuity and disaster recovery in the virtuali...Track 2, session 3, business continuity and disaster recovery in the virtuali...
Track 2, session 3, business continuity and disaster recovery in the virtuali...
 
SQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster RecoverySQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster Recovery
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Building Low Cost Scalable Web Applications Tools & Techniques
Building Low Cost Scalable Web Applications   Tools & TechniquesBuilding Low Cost Scalable Web Applications   Tools & Techniques
Building Low Cost Scalable Web Applications Tools & Techniques
 
Champion Fas Deduplication
Champion Fas DeduplicationChampion Fas Deduplication
Champion Fas Deduplication
 
Storage essentials (by Merlin Ran)
Storage essentials (by Merlin Ran)Storage essentials (by Merlin Ran)
Storage essentials (by Merlin Ran)
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Oracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified StorageOracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified Storage
 

Continuity Software 4.3 Detailed Gaps

  • 1. RecoverGuard™ Confidence in Business Continuity
  • 2.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 12. Sample #1: Partial Replication Result: Data Loss EMC SRDF, TimeFinder (BCV, Clone, Snap) HDS TrueCopy, HUR, ShadowImage, TrueImage NetApp SnapMirror, Snapshot, SnapVault CLARiiON MirrorView, SnapView …
  • 13. Sample #2: Synchronous Replication RDF Group Replication Inconsistency Result: Data loss, increased time to recover EMC SRDF
  • 14. Sample #3: Inconsistent Access to Storage by Cluster Result: downtime, increased time to recover
  • 15. Result: DR failure and data corruption Sample #4: Tampering Risk
  • 16. Sample #5: Local Replication with BCVs Replication Age Inconsistency Result: Data corruption
  • 17. Hardware 8 x CPU 2.2Ghz 32 GB RAM 2 x HBA 2 x NIC Software OS: HP-UX 11.31 WebSphere Java 1.5 EMC PowerPath 4.4 Kernel Parameters Max up processes: 8192 Max # of semaphores: 600 Sample #6: Configuration drifts between Production and DR Result: Increased time to recover Hardware 2 x CPU 2.2Ghz 8 GB RAM 1 x HBA 1 x NIC Software OS: HP-UX 11.23 NO WebSphere Java 1.4.2 EMC PowerPath 3.0.5 Kernel Parameters Max up processes: 1024 Max # of semaphores: 128 More differences in the areas of DNS, NTP, Page files, Internet services, patches, etc
  • 18. Hardware 2 x HBA Software Microsoft .NET 2.0 SP 2 Windows x64 SP 1 Oracle MTS Recovery Service DNS Configurartion 192.168.68.50 192.168.68.51 192.168.2.50 Page Files 1 x 1 GB (c: 1 x 4 GB (d: Kernel Parameters Number of open files: 32767 Sample #7: Configuration drifts between Production and HA Result: Downtime, manual intervention needed to recover
  • 19. Result: Reduced MTBF, Downtime, Sub-optimal performance Sample #8 - SAN I/O path - single point of failure
  • 20. Result: File System not usable at the DR site Sample #9: Replica create time inconsistency
  • 21.
  • 22. if RAID1 needed: Data protection issue, reduced MTBF, suboptimal performance Otherwise: Saving opportunity (if RAID5 needed) Sample #11: Mixed RAID levels
  • 23. Result: Potential Data corruption Sample #12: Cluster Node Configured to Mount on Boot
  • 24. How it works  Windows 2003 Server  Oracle 10g schema  SYMCLI/NaviCLI “ proxy ” for EMC Symmetrix / CLARiiON  StorageScope API for EMC ECC  SSH / WMI using valid user credentials  JDBC using valid user credentials  IE6+ web client Java 1.5+  HiCommand API for HDS HiCommand  SSH / Telnet for NetApp filers Storage arrays Hosts DB2 Databases
  • 25.
  • 26.

Hinweis der Redaktion

  1. Continuity got the product that can find all those problems as they happen – instead of waiting a full year (which means no protection during the year)
  2. The RecoverGuard dashboard provides concise and valuable information regarding your DR coverage and status – in a glance. The top-left pane provides information regarding the last scan’s coverage, identifies hosts, databases, storage arrays and business services (or processes) scanned. It also points out which areas could not be reached to let the user decide on the appropriate action. Clicking the pane will reveal a detailed scan report, including scan history and statistics. The middle left pane provides a “snapshot” of the current business service protection state – identifying risks to data and system availability, as well as optimization opportunities. Clicking on any business service will reveal a more detailed information view and will allow for easy navigation into specific gap tickets The bottom tabbed view displays the top 5 currently-open tickets, as well as the top 5 recently detected ones. Clicking each ticket will open the appropriate ticket details view (see examples in next slides). Notice that each ticket is ranked by its threat level. The threat level computation process weighs, among other considerations, the following: The importance of the involved business service The role of the resource identified by the ticket (for example, is it production data? Is it a replica used for DR? is it a replica used for QA? Etc. Obviously the risk is different for each case) The technical severity of the identified gap (for example: is it a data incompleteness or inconsistency issue? Is it just a minor improvement opportunity?) As a result, the user can easily focus on the most important issues, in an educated fashion The two charts on the right provide statistics and trend information regarding identified risks.
  3. The signature Replication Tree Structure Inconsistency The impact In case of disaster, data will be lost. A production database, volume group, disk drive or file system is partially replicated. Data is not recoverable. Technical details In this example, the production database is across three storage volumes. The intent is to replicate these production storage volumes to the disaster recovery site, however, one production storage volume is missing an assigned replication device. Can it happen to me? This is a very common gap found in the environments we have scanned. There are many reasons it could happen, only to be revealed during actual disaster. The most common reason is the production storage volume is not added to the device group to be replicated Relevant Storage Vendors : All Relevant Replication Methods : All Relevant Operating Systems : All Relevant DBMS Vendors : All
  4. The signature Replication Inconsistency – Different RDF groups The impact In the event that one RDF group becomes out of sync with the other RDG group the database at the disaster recovery site would be corrupt and will not be recoverable from the replication technology. Data will need to be restored from a recent backup increasing the time to recovery. Technical details The storage volumes that are used for the database are in two different RDF groups. This is stated by EMC not to be a good practice if the RDF groups are not in a consistency group. Each RDF group is associated with different replication adapters and potentially different network infrastructures which can have failures independent of the other RDF group, which would result in corrupted replicas at the disaster recovery site Can it happen to me? This is a common gap found in large environments were multiple RDF groups are needed and only revealed during a RecoverGuard scan. The most common occurrence comes from the provision process when storage volumes from different RDF groups are provisioned to the host and used by the database. The provisioning tools do not alert or prevent provisioning storage to the same host that are in two different RDF groups. Relevant Storage Vendors : EMC Relevant Replication Methods : All Relevant Operating Systems : All Relevant DBMS Vendors : All IMPORTANT NOTE : A similar gap is relevant for ALL storage vendors – when a replicated database or a file system spans multiple arrays. Replica data consistency is not ensure between arrays. This is a common gap in a multi-array environment.
  5. The signature Inconsistent access to storage volumes by cluster nodes The impact In case of fail-over or switch-over to the passive node, data will not become available. Service groups will fail to go online. The result: DOWNTIME Technical details In this example, a database is running on the cluster active node and it stored on 3 storage volumes. Only 2 of these 3 volumes is mapped (accessible) by the cluster passive node. Can it happen to me? VERY common gap. When a new storage volume needed, typically mapped ONLY to the currently active node… Relevant Cluster Software: All Relevant Storage Vendors : All Relevant Replication Methods : All Relevant Operating Systems : All Relevant DBMS Vendors : All
  6. The signature In this example, a copy of production data is accessed by the designated standby, but also, unintentionally, by an unauthorized host The impact During a disaster, a racing condition will develop, as a result of which several unpleasant outcomes might arise: Scenario 1 - the unauthorized host might gain exclusive access to the erroneously mapped disk. In such a case, the designated standby could not mount and use the file-system. By the time the problem is isolated and fixed (which could take a long while), there is also the risk of the unauthorized host actually using the erroneously mapped disk, thereby rendering recovery impossible Scenario 2 – both the standby and unauthorized host might get concurrent access to the disk. If the unauthorized host will attempt to use the erroneously mapped disk, not only will the data get corrupt instantly, the now active standby might unexpectedly crash. Technical details Scenario 1 will occur if the disk is configured for mutual exclusive access . The first host to attempt access to the disk will exclusive access, locking the other from use. Scenario 2 will occur if the disk is multi-homed, or non-locked. Most filesystems in the market were developed under the assumption that an external modification of devices is not possible. This stems from the days only DAS was used and remains mostly unchanged. Clustered filesystems are also vulnerable to the same threat; although they do allow for multiple hosts accessing the same disk, they all assume that any such host is actually part of the cluster and therefore conforms to a predictable behavior. Some operating systems react violently to external tampering of their intrinsic data structure, which could result in a crash. Can it happen to me? This is a very common gap, found in around 80% of the environments we have scanned. There are dozens or reasons it could happen, and with nearly each one of these, it can remain dormant, only to be revealed during actual disaster. Here are some examples: Some arrays default to mapping all devices to all available ports when installed out-of-the box. It is the duty of the end-user to “prune” or restrict access by either re-defining the mapping on the array, and using masking on SAN ports or host HBA (or all of the above). It is easy to miss some spots. Furthermore, even if masking is used successfully at a certain time, any maintenance activity to the unauthorized host, including moving it to another SAN port or changing a failed HBA might give rise to erroneous mapping The erroneously mapped disk may have actually belong to the unauthorized host in the past, and then reclaimed, neglecting to remove the mapping definition from the storage array From time to time, extra mapping may be added to increase performance or resiliency of access to the disk. If zoning and masking are not controlled and managed from a central point, one of the paths might actually get “astray” Sometimes HBAs are replaced not because they are faulty, but rather since a greater capacity or is required. If soft-zoning is used and not updated accordingly, once such an old HBA is re-used on a different host, it may actually get that host access rights to the SAN devices allowed for the original host Many other possibilities exist Relevant Storage Vendors : All Relevant Replication Methods : All Relevant Operating Systems : All Relevant DBMS Vendors : All
  7. The signature Replication age inconsistency The impact The impact of inconsistent point-in-time copy devices, such as, a BCV, Clone, or Snap Volume, is if the data is needed for a recovery purpose, the data contained in the copy is corrupted due to devices being out of sync with each other. In a SRDF replication strategy, point-in-time copies safe guard against rolling disasters. Rolling disasters are when data corruption is replicated to the disaster recovery replica as well. Point-in-time copies become the disk based recovery. Technical details In this example, multiple point-in-time copy groups are associated with a volume group that contains three storage volumes for a production database. A device in each the point-in-time group is in the wrong point-in-time group. Any data contained across the device group would not be usable. Can it happen to me? Environments relying on rolling or revolving point-in-time copies often have this gap because they are not mounted and regularly used by other processes, . The gap is created when one or more devices are referenced in the wrong split and establish scripts. Relevant Storage Vendors : All Relevant Replication Methods : All (every method can be used to create point-in-time copies) Relevant Operating Systems : All Relevant DBMS Vendors : All
  8. The signature Configuration drifts between production and its standby DR host The impact  In the event of a disaster, fail over to the DR server will not be successful. Manual intervention will be needed to install missing hardware/software, upgrade software and configure kernel parameters correctly. This typically involves extended recovery time and an RTO violation, since the identification of the configuration errors commonly takes days (or even weeks). Technical details In this example, the corresponding DR server of a production host does not have enough resources to run the application with reasonable performance. Also, a few products are missing on the DR server while others have lower versions than what is installed on production. In addition, kernel parameters are configured with significantly lower values than in production. Typically, many applications depend on other products installed on the server and on kernel parameters configuration. For example, it is well known that Oracle is sensitive to configuration of semaphores-related kernel parameters. Can it happen to me? This is a very common gap found on DR environments. The configuration of a host involves so many details it can be very difficult to have a DR server fully synchronized to its production host at all times. Also, DR tests typically do not involve loading DR with expected production load, thus these configuration issues go undetected. Relevant Operating Systems : All (Windows, Solaris, HPUX, AIX, Linux)
  9. The signature Configuration drifts between HA cluster nodes The impact This will vary depending upon the specific drift, but can include a failure to switch-over/fail-over/switch-over to other node (causing downtime), or reduced performance after fail-over/switch-over which will, at best, create an operations slowdown and at worst leave the node unable to carry the load Technical details In this example, the passive node does not have redundancy in the HBA level nor in the DNS configuration. The currently active node is configured with redundancy for these elements. A single HBA/DNS server configuration is a single point of failure. Upon fail-over/switch-over to the currently passive node, the applications running on this cluster will suffer from reduced availability/MTBF and more downtime. In addition, the passive node is configured with significantly less maximum allowed open files, which may lead to application failures. Moreover, the passive node has only 1GB of swap while the active node was configured with additional 4GB. Upon fail-over, the applications may not have sufficient memory to run properly. Lastly, differences in installed products may have various impacts, depending on the product type. Can it happen to me? This situation occurs frequently in HA environments. The configuration of a host involves so many details that is it very difficult to ensure an HA server is fully synchronized to its production host at all times. Relevant Operating Systems : All (Windows, Solaris, HPUX, AIX, Linux)  
  10. The signature Production data accessed with no redundant path The impact The existence of a single array port mapping and a single path increases the chances that this storage volume may become unavailable. This may result in increased MTBF and frequent downtime issues. Also, any application which utilizes this storage volume may suffer from sub-optimal performance since I/O load balancing is unavailable (single path from host to the storage array). Technical details Typically in production environments it is considered a best practice to:   - Configure multiple LUN maps (array port mapping) for a storage volume   - Configure multiple paths for a storage volume In the example above, a database is stored on three storage volumes. Two of these volumes are configured according to these best practices. However, a third volume which was recently added doesn't comply with the best practices and has only a single array port mapping and a single I/O path. Can it happen to me? Yes. In production environments urgent requests are not infrequently, such as the need to add more storage space to specific business services. While handling such urgent matters, details such as redundancy in array port mappings and SAN I/O paths may be forgotten. After the change, everything works properly so the error goes unnoticed. The gap will only be discovered when a recovery is required.  Relevant Storage Vendors : All Relevant HBA Vendors: All Relevant Operating Systems : All Relevant DBMS Vendors : All
  11. The signature In this example, a critical file system is stored on three SAN volumes. The data is periodically synchronized, but it so happens that the copies are not of the exact same age. The impact The existence of such a scenario means that the copy is likely to be corrupt and unusable. If the file system is busy or servers access large files (such as database files which usually meet both criteria) it is extremely likely it would be corrupt. Technical details File systems have certain built-in self correction mechanisms, targeted at overcoming slight differences resulting from pending writes, unsuccessfully flushed from memory to disk as a result of abrupt shutdown (such as a power-failure, or “blue-screen”). These mechanism are not designed to handle disks which appear to “go back in time” minutes or hours. Replication of disks at various points in time could easily lead to such scenarios which would seem completely “unnatural” to the operating system at the DR site. Journaled file-systems will not help, because they either: (a) journal only files system metadata, and not the data itself; and (b) keep journal data spread on the disks themselves; which is also prone to the same time-difference corruption. Can it happen to me? This is one of the top-5 gaps found at even to most well-kept environment. There are dozens or reasons it could happen, and with nearly each one of these, it is nearly impossible to tell that the problem had happened. Because replication itself is successful, there is no indication to the user that something is wrong. Some examples are: All the disk synchs are correctly managed by one script, but there is another out there that runs afterwards, perhaps on a different host, which has a stray mapping to one of the source disks. All the disks are added to one array consistency group (or device group) which is used to synch them simultaneously. Note that the definition of the array consistency group is completely separate from the definition of the filesystem and underlying logical volume and volume group. It is easy to associate a disk newly added to the Volume Group on the host side to the wrong array consistency group There are dozens of permutations and variations of the same theme One of the disks is copied over a separate cross-array link than the others do. This link might be much busier and cause synch (or mirror, or split, etc. – depending on the vendor terminology) to take more time. Relevant Storage Vendors : All Relevant Operating Systems : All Relevant DBMS Vendors : All
  12. The signature Mixed Storage Types The impact In the event that a disaster replica is needed for DR purposes, these disaster recovery replica will be unusable and resulting in data loss. The production database or file system replication is incomplete or inconsistent and will not be recoverable from the replication technology. Data will need to be recovered at the disaster recovery site and restored from a recent backup increasing the time to recovery. Technical details In this example, the production database is across three storage volumes. The intent is to replicate these production storage volumes to the disaster recovery site, however, one production storage volume is not of the same storage type and is actual a local disk and therefore not being replicated. The result is a incomplete replica at the disaster recovery site. Can it happen to me? This is a common gap found in highly evolving environments with many teams involved in the provisioning process. The handoffs in the provisioning process involving the storage team, platform team and the database teams are complex and many times mixed storage (including local, EMC, NetApp etc.) devices are used to create volume groups (Veritas or other LVM software) in which databases are created or extended on. Relevant Storage Vendors : All Relevant Operating Systems : All Relevant DBMS Vendors : All
  13. The signature Mixed RAID types The impact The impact of mixing RAID type is far less critical than mixing storage types that require replication. This impact involves potential performance issues and less than optimal storage utilization. Technical details In this example, the production file system contains three storage volumes. Two storage volumes are RAID1 protected storage and one is RAID5 protected storage, which are replicated to the disaster recovery site. In some cases the production volumes are of the same RAID type, however, the disaster replica is of different RAID types and would potentially perform much differently for the production. Can it happen to me? This is a common gap when multiple RAID types are provisioned to the same host for databases, were RAID1 is used for logs and indexes and RAID5 is used for table spaces. Or, when different tiers of storage defined by RAID type are offered to the business. Relevant Storage Vendors : All Relevant Operating Systems : All Relevant DBMS Vendors : All
  14. The signature The file system defined within a cluster mount resource is mounted automatically upon booting. The impact Potential data corruption after fail-over, switch-over or node restart. Technical details In this example, the passive node is configured to automatically mount “/d01” on boot. If the passive node is restarted, it will attempt to mount a file system which is already mounted on the currently active node. In this case, data might become corrupted since typically a SAN LUN should only be accessed by a single server at a time. Note that the opposite scenario is problematic as well.  If the file system is configured to be mounted automatically on boot on the active node, then the same risk will exist after a fail-over  or switch-over. Can it happen to me? This is a very common gap in HA environments because it is difficult to constantly sync the configuration of the server with the cluster configuration.  As a result, configuration mismatches, such as the one described above, are created which lead to data protection and availability vulnerabilities Relevant Storage Vendors : All Relevant Operating Systems : All Relevant DBMS Vendors : All
  15. Notes: Basic support and gap detection for other clusters as well (HP ServiceGuard, Sun Cluster, Linux Cluster, Microsoft Cluster, RAC). Limited support for VMWare FC. Full support planned for 2009. Support for IBM DS is planned for 2009. Support for EMC SAN-Copy replication is planned for 2009. EMC Cellera is not supported.