SlideShare ist ein Scribd-Unternehmen logo
1 von 27
# 
Scott Stanford
# 
• Topology 
• Infrastructure 
• Backups & Disaster Recovery 
• Monitoring 
• Lessons Learned 
• Q&A
#
# 
Boston 
Traditional 
Proxy 
P4D 
(Sunnyvale) 
Pittsburg 
Traditional 
Proxy 
RTP 
Traditional 
Proxy 
Bangalore 
Traditional 
Proxy 
• 1.2 Tb database, mostly db.have 
• Average daily journal size 70 Gb 
• Average of 4.1 Million daily commands 
• 3722 users globally 
• 655 Gig of depots 
• 254,000 Clients, most with @ 200,000 
files 
• One Git-Fusion instance 
• 2014.1 version of Perforce 
• Environment has to be up 24x7x365
# 
RTP 
Edge 
Pittsburg 
Proxy 
Boston 
Proxy 
Commit 
(Sunnyvale) 
Sunnyvale 
Edge 
Bangalore 
Edge 
Boston 
Traditional 
Proxy 
Pittsburg 
Traditional 
Proxy 
RTP 
Bangalore 
Traditional 
Traditional 
Proxy 
Proxy 
• Currently migrating from a 
traditional model to Commit/Edge 
servers. 
• Traditional proxies will remain 
until the migration completes 
later this year 
• Initial Edge database is 85 Gig 
• Major sites have an Edge server, 
others a proxy off of the closest 
Edge (50ms improvement)
#
• All large sites have an 
# 
Edge server, formerly 
were proxies 
• High performance SAN 
storage used for the 
database, journal, and 
log storage 
• Proxies have a 
P4TARGET of the 
closest Edge server 
(RTP) 
• All hosts deployed with 
an active/standby host 
pairing 
7
# 
• Redundant Connectivity to storage 
• FC - redundant Fabric to each controller 
and HBA 
• SAS – each dual HBA connected to 
each controller 
• Filers has multiple redundant data LIFs 
• 2 x 10 Gig NICs, HA bond, for the network 
(NFS and p4d) 
• VIF for hosting public IP / hostname 
• Perforce licenses tied to this IP
Each Commit/Edge server is configured in a pair consisting of 
• A production host, controlled through a virtual NIC 
# 
– Allows for a quick failover of the p4d without any DNS or changes to the 
users environment 
• Standby host with a warm database or read-only replica 
• Dedicated SAN volume for low latency database storage 
• Multiple levels of redundancy (Network, Storage, Power, HBA) 
• Common init framework for all Perforce daemon binaries 
• SnapMirrored volume used for hosting the infrastructure binaries & tools 
(Perl, Ruby, Python, P4, Git-Fusion, common scripts)
# 
• Storage devices used 
– NetApp EF540 w/ FC for the Commit server 
• 24 x 800 Gig SSD 
– NetApp E5512 w/ FC or SAS for each Edge server 
• 24 x 600 Gig 15k SAS 
– All RAID 10 with multiple spare disks, XFS, dual controllers, and dual power supplies 
• Used for: 
– Warm database or read-only replica on stand-by host 
– Production journal 
• Hourly journal truncations, then copied to the filer 
– Production p4d log 
• Nightly log rotations, compressed and copied to the filer
# 
• NetApp cDOT clusters used at each site with FAS6290 or better 
• 10 Gig data LIF 
• Dedicated vserver for Perforce 
• Shared NFS volumes between production/standby pairs for longer term 
storage, snapshots, and offsite 
• Used for: 
– Depot storage 
– Rotated journals & p4d logs 
– Checkpoints 
– Warm database 
• used for creating checkpoints and if both hosts are down to run the daemon 
– Git-Fusion homedir & cache, dedicated volume per instance
#
# 
• Truncate the journal 
• Checksum the journal, copy to NFS and 
verify they match 
• Create a SnapShot of the NFS volumes 
• Remove any old snapshots 
• Replay the journal on the warm SAN 
database 
• Replay the journal on the warm NFS 
database 
• Once a week create a temporary snapshot 
on the NFS database and create a 
checkpoint (p4d –jd) 
Checksum 
journal on 
SAN 
Copy journal 
to NFS 
Compare 
checksums 
of local and 
NFS 
Create 
snapshot(s) 
Delete old 
snapshots 
Replay on 
warm NFS 
Replay on 
warm 
standby 
p4d -jj 
Every 1 hour
# 
Warm database 
• Trigger on the Edge server events.csv changing 
• If a jj event, then get the journals that may need to 
be applied: 
– p4 journals –F “jdate>=(event epoch – 1)” –T jfile,jnum” 
• For each journal, run a p4d –jr 
• Weekly checkpoint from a snapshot 
Read-only Replica from Edge 
• Weekly checkpoint 
• Created with: 
• p4 –p localhost:<port> admin checkpoint -Z 
Edge server 
captures event in 
events.csv 
Monit triggers 
backups on 
events.csv 
Determine which 
journals to apply 
Commit server 
truncates 
Apply journals
# 
• New process for Edge servers to avoid WAN NFS 
mounts 
• For all the clients on an Edge server, at each site: 
– Save the change output for any open changes 
– Generate the journal data for the client 
– Create an tarball of the open files 
– Retained for 14 days 
• A similar process will be used by users to clone 
clients across Edge servers
# 
• Snapshots: 
– Main backup method 
– Created and kept for: 
• 4 hours every 20 minutes (20 & 40 minutes past the hour) 
• 8 hours every hour (top of the hour) 
• 3 weeks of nightly during backups (@midnight PT) 
• SnapVault 
– Used for online backups 
– Created every 4 weeks, kept for 12 months 
• SnapMirrors 
– Contains all of the data needed to recreate the instance 
– Sunnyvale 
• DataProtection (DP) Mirror for data recovery 
• Stored in the Cluster 
• Allows the possibility of fast test instances being created 
from production snapshots with FlexClone 
– DR 
• RTP is the Disaster Recovery site for the Commit server 
• Sunnyvale is the Disaster Recovery site for the RTP and 
Bangalore Edge servers
#
# 
• Monit & M/Monit 
– Monitors and alerts 
• Filesystem thresholds, space and inodes 
• On specific processes, and file changes (timestamp/md5) 
• OS thresholds 
• Ganglia 
– Used for identifying host or performance issues 
• NetApp OnCommand 
– Storage monitoring 
• Internal Tools 
– Monitor both infrastructure and the end-user experience
# 
• Daemon that runs on each system, sends 
data to a single M/Monit instance 
• Monitors core daemons (Perforce and 
system) 
ssh, sendmail, ntpd, crond, ypbind, p4p, p4d, p4web, 
p4broker 
• Able to restart or take actions when 
conditions met (ie. clean a proxy cache or 
purge all) 
• Configured to alert on process children 
thresholds 
• Dynamic monitoring from init framework 
ties 
• Additional checks added for issues that 
have affected production in the past: 
– NIC errors 
– Number of filehandles 
– known patterns in the system log 
– p4d crashes
# 
• Multiple Monit (one per host) communicate the status to a 
single M/Monit instance 
• All alerts and rules are controlled through M/Monit 
• Provides the ability to remotely start/stop/restart daemons 
• Has a dashboard of all of the Monit instances 
• Keeps historical data of issues, both when found and 
recovered from
# 
• Collect historical data (depot, database, cache sizes, 
license trends, number of clients and opened files per 
p4d) 
• Benchmarks collected every hour with the top user 
commands 
– Alerts if a site is 15% slower than a historical average 
– Runs for both the Perforce binary and internal wrappers
#
# 
• Faster performance for end-users 
– Most noticeable for sites with higher latency WAN connections 
• Higher uptime for services since an Edge can service some 
commands when the WAN or Commit site are inaccessible 
• Much smaller databases, from 1.2Tb to 82G on a new Edge 
server 
• Automatic “backup” of the Commit server data through Edge 
servers 
• Easily move users to new instances 
• Can partially isolate some groups from affecting all users
# 
• Helpful to disable csv log rotations for frequent journal truncations 
– Set the dm.rotatelogwithjnl configurable to 0 
• Shared log volumes with multiple databases (warm or with a daemon) can cause 
interesting results with csv logs 
• Set global configurables where you can, monitor, rpl.*, track, etc 
• Use multiple pull –u threads to ensure the replicas have warm copies of the depot files 
• Need to have rock solid backups on all p4d’s with client data 
– Warm databases are harder to maintain with frequent journal truncations, no way to trigger 
on these events 
• Shelves are not automatically promoted 
• Users need to login to each edge server or ticket file updated from existing entries 
• Adjusting the perforce topologies may have unforeseen side-effects. Pointing proxies 
to new P4TARGETs can cause increased load on the WAN depending on the 
topology.
# 
Scott Stanford 
sstanfor@netapp.com
# 
Scott Stanford is the SCM Lead for NetApp where he also 
functions as a worldwide Perforce Administrator and tool 
developer. Scott has twenty years experience in software 
development, with thirteen years specializing in configuration 
management. Prior to joining NetApp, Scott was a Senior IT 
Architect at Synopsys.
# 
RESOURCES 
SnapShot: 
http://www.netapp.com/us/technology/storage-efficiency/se-technologies.aspx 
SnapVault & SnapMirror: 
http://www.netapp.com/us/products/protection-software/index.aspx 
Backup & Recovery of Perforce on NetApp: 
http://www.netapp.com/us/system/pdf-reader.aspx?pdfuri=tcm:10-107938-16&m=tr-4142.pdf 
Monit: 
http://mmonit.com/

Weitere ähnliche Inhalte

Was ist angesagt?

GlusterFS CTDB Integration
GlusterFS CTDB IntegrationGlusterFS CTDB Integration
GlusterFS CTDB Integration
Etsuji Nakai
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Community
 
Content Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpointContent Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpoint
Shi Junxiao
 

Was ist angesagt? (20)

100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vos
 
GlusterFS CTDB Integration
GlusterFS CTDB IntegrationGlusterFS CTDB Integration
GlusterFS CTDB Integration
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival SkillsEvergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
Bluestore
BluestoreBluestore
Bluestore
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Content Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpointContent Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpoint
 
Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
Accelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingAccelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet Processing
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM servers
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinksVSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 

Andere mochten auch

How to solve misalignment lun netapp on linux servers by Ivan
How to solve misalignment lun netapp on linux servers by IvanHow to solve misalignment lun netapp on linux servers by Ivan
How to solve misalignment lun netapp on linux servers by Ivan
Ivan Silva
 
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
OpenStack
 
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
Fujitsu India
 

Andere mochten auch (19)

NetApp SAPPHIRE 2016 in SUSE booth: "Safeguarding HANA"
NetApp SAPPHIRE 2016 in SUSE booth: "Safeguarding HANA"NetApp SAPPHIRE 2016 in SUSE booth: "Safeguarding HANA"
NetApp SAPPHIRE 2016 in SUSE booth: "Safeguarding HANA"
 
Top Mandalay Bay Attractions
Top Mandalay Bay Attractions Top Mandalay Bay Attractions
Top Mandalay Bay Attractions
 
Geekiest Conference Quotes by NetApp Insight Attendees
Geekiest Conference Quotes by NetApp Insight AttendeesGeekiest Conference Quotes by NetApp Insight Attendees
Geekiest Conference Quotes by NetApp Insight Attendees
 
NetApp-ClusteredONTAP-Fall2012
NetApp-ClusteredONTAP-Fall2012NetApp-ClusteredONTAP-Fall2012
NetApp-ClusteredONTAP-Fall2012
 
SOFTBANK TELECOM Corp.
SOFTBANK TELECOM Corp.SOFTBANK TELECOM Corp.
SOFTBANK TELECOM Corp.
 
How to make sure the right quality is delivered by my translation vendor? (Ed...
How to make sure the right quality is delivered by my translation vendor? (Ed...How to make sure the right quality is delivered by my translation vendor? (Ed...
How to make sure the right quality is delivered by my translation vendor? (Ed...
 
VMware PEX Boot Camp - VMware View on NetApp: Technical Integration to Drive ...
VMware PEX Boot Camp - VMware View on NetApp: Technical Integration to Drive ...VMware PEX Boot Camp - VMware View on NetApp: Technical Integration to Drive ...
VMware PEX Boot Camp - VMware View on NetApp: Technical Integration to Drive ...
 
NetApp Insight 2015 Las Vegas Sponsors Guide
NetApp Insight 2015 Las Vegas Sponsors GuideNetApp Insight 2015 Las Vegas Sponsors Guide
NetApp Insight 2015 Las Vegas Sponsors Guide
 
How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)
How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)
How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)
 
How to solve misalignment lun netapp on linux servers by Ivan
How to solve misalignment lun netapp on linux servers by IvanHow to solve misalignment lun netapp on linux servers by Ivan
How to solve misalignment lun netapp on linux servers by Ivan
 
Bringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherBringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack Together
 
VMware PEX Boot Camp - The Future Now: NetApp Clustered Storage and Flash for...
VMware PEX Boot Camp - The Future Now: NetApp Clustered Storage and Flash for...VMware PEX Boot Camp - The Future Now: NetApp Clustered Storage and Flash for...
VMware PEX Boot Camp - The Future Now: NetApp Clustered Storage and Flash for...
 
SCI Lab Test Validation Report: NetApp Storage Efficiency
SCI Lab Test Validation Report: NetApp Storage EfficiencySCI Lab Test Validation Report: NetApp Storage Efficiency
SCI Lab Test Validation Report: NetApp Storage Efficiency
 
10 Good Reasons: FlexPod
10 Good Reasons: FlexPod10 Good Reasons: FlexPod
10 Good Reasons: FlexPod
 
TVS for vROps - NetApp Storage
TVS for vROps - NetApp StorageTVS for vROps - NetApp Storage
TVS for vROps - NetApp Storage
 
FedRAMP Compliant FlexPod architecture from NetApp, Cisco, HyTrust and Coalfire
FedRAMP Compliant FlexPod architecture from NetApp, Cisco, HyTrust and CoalfireFedRAMP Compliant FlexPod architecture from NetApp, Cisco, HyTrust and Coalfire
FedRAMP Compliant FlexPod architecture from NetApp, Cisco, HyTrust and Coalfire
 
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
 
NetApp Product training
NetApp Product trainingNetApp Product training
NetApp Product training
 
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
 

Ähnlich wie Multi-Site Perforce at NetApp

Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01
Arunkumar Shanmugam
 

Ähnlich wie Multi-Site Perforce at NetApp (20)

LAB - Perforce Large Scale & Multi-Site Implementations
LAB - Perforce Large Scale & Multi-Site ImplementationsLAB - Perforce Large Scale & Multi-Site Implementations
LAB - Perforce Large Scale & Multi-Site Implementations
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
Training netbackup6x2
Training netbackup6x2Training netbackup6x2
Training netbackup6x2
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
 
Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01
 
Firehose Engineering
Firehose EngineeringFirehose Engineering
Firehose Engineering
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkoo
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
MAGPI: Advanced Services: IPv6, Multicast, DNSSEC
MAGPI: Advanced Services: IPv6, Multicast, DNSSECMAGPI: Advanced Services: IPv6, Multicast, DNSSEC
MAGPI: Advanced Services: IPv6, Multicast, DNSSEC
 
Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets  Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets
 
CNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise ServicesCNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise Services
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Tuning Linux for MongoDB
Tuning Linux for MongoDBTuning Linux for MongoDB
Tuning Linux for MongoDB
 
CNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise ServicesCNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise Services
 
NFS(Network File System)
NFS(Network File System)NFS(Network File System)
NFS(Network File System)
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
 
Hadoop
HadoopHadoop
Hadoop
 

Mehr von Perforce

Mehr von Perforce (20)

How to Organize Game Developers With Different Planning Needs
How to Organize Game Developers With Different Planning NeedsHow to Organize Game Developers With Different Planning Needs
How to Organize Game Developers With Different Planning Needs
 
Regulatory Traceability: How to Maintain Compliance, Quality, and Cost Effic...
Regulatory Traceability:  How to Maintain Compliance, Quality, and Cost Effic...Regulatory Traceability:  How to Maintain Compliance, Quality, and Cost Effic...
Regulatory Traceability: How to Maintain Compliance, Quality, and Cost Effic...
 
Efficient Security Development and Testing Using Dynamic and Static Code Anal...
Efficient Security Development and Testing Using Dynamic and Static Code Anal...Efficient Security Development and Testing Using Dynamic and Static Code Anal...
Efficient Security Development and Testing Using Dynamic and Static Code Anal...
 
Understanding Compliant Workflow Enforcement SOPs
Understanding Compliant Workflow Enforcement SOPsUnderstanding Compliant Workflow Enforcement SOPs
Understanding Compliant Workflow Enforcement SOPs
 
Branching Out: How To Automate Your Development Process
Branching Out: How To Automate Your Development ProcessBranching Out: How To Automate Your Development Process
Branching Out: How To Automate Your Development Process
 
How to Do Code Reviews at Massive Scale For DevOps
How to Do Code Reviews at Massive Scale For DevOpsHow to Do Code Reviews at Massive Scale For DevOps
How to Do Code Reviews at Massive Scale For DevOps
 
How to Spark Joy In Your Product Backlog
How to Spark Joy In Your Product Backlog How to Spark Joy In Your Product Backlog
How to Spark Joy In Your Product Backlog
 
Going Remote: Build Up Your Game Dev Team
Going Remote: Build Up Your Game Dev Team Going Remote: Build Up Your Game Dev Team
Going Remote: Build Up Your Game Dev Team
 
Shift to Remote: How to Manage Your New Workflow
Shift to Remote: How to Manage Your New WorkflowShift to Remote: How to Manage Your New Workflow
Shift to Remote: How to Manage Your New Workflow
 
Hybrid Development Methodology in a Regulated World
Hybrid Development Methodology in a Regulated WorldHybrid Development Methodology in a Regulated World
Hybrid Development Methodology in a Regulated World
 
Better, Faster, Easier: How to Make Git Really Work in the Enterprise
Better, Faster, Easier: How to Make Git Really Work in the EnterpriseBetter, Faster, Easier: How to Make Git Really Work in the Enterprise
Better, Faster, Easier: How to Make Git Really Work in the Enterprise
 
Easier Requirements Management Using Diagrams In Helix ALM
Easier Requirements Management Using Diagrams In Helix ALMEasier Requirements Management Using Diagrams In Helix ALM
Easier Requirements Management Using Diagrams In Helix ALM
 
How To Master Your Mega Backlog
How To Master Your Mega Backlog How To Master Your Mega Backlog
How To Master Your Mega Backlog
 
Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...
Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...
Achieving Software Safety, Security, and Reliability Part 3: What Does the Fu...
 
How to Scale With Helix Core and Microsoft Azure
How to Scale With Helix Core and Microsoft Azure How to Scale With Helix Core and Microsoft Azure
How to Scale With Helix Core and Microsoft Azure
 
Achieving Software Safety, Security, and Reliability Part 2
Achieving Software Safety, Security, and Reliability Part 2Achieving Software Safety, Security, and Reliability Part 2
Achieving Software Safety, Security, and Reliability Part 2
 
Should You Break Up With Your Monolith?
Should You Break Up With Your Monolith?Should You Break Up With Your Monolith?
Should You Break Up With Your Monolith?
 
Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...
Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...
Achieving Software Safety, Security, and Reliability Part 1: Common Industry ...
 
What's New in Helix ALM 2019.4
What's New in Helix ALM 2019.4What's New in Helix ALM 2019.4
What's New in Helix ALM 2019.4
 
Free Yourself From the MS Office Prison
Free Yourself From the MS Office Prison Free Yourself From the MS Office Prison
Free Yourself From the MS Office Prison
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Multi-Site Perforce at NetApp

  • 2. # • Topology • Infrastructure • Backups & Disaster Recovery • Monitoring • Lessons Learned • Q&A
  • 3. #
  • 4. # Boston Traditional Proxy P4D (Sunnyvale) Pittsburg Traditional Proxy RTP Traditional Proxy Bangalore Traditional Proxy • 1.2 Tb database, mostly db.have • Average daily journal size 70 Gb • Average of 4.1 Million daily commands • 3722 users globally • 655 Gig of depots • 254,000 Clients, most with @ 200,000 files • One Git-Fusion instance • 2014.1 version of Perforce • Environment has to be up 24x7x365
  • 5. # RTP Edge Pittsburg Proxy Boston Proxy Commit (Sunnyvale) Sunnyvale Edge Bangalore Edge Boston Traditional Proxy Pittsburg Traditional Proxy RTP Bangalore Traditional Traditional Proxy Proxy • Currently migrating from a traditional model to Commit/Edge servers. • Traditional proxies will remain until the migration completes later this year • Initial Edge database is 85 Gig • Major sites have an Edge server, others a proxy off of the closest Edge (50ms improvement)
  • 6. #
  • 7. • All large sites have an # Edge server, formerly were proxies • High performance SAN storage used for the database, journal, and log storage • Proxies have a P4TARGET of the closest Edge server (RTP) • All hosts deployed with an active/standby host pairing 7
  • 8. # • Redundant Connectivity to storage • FC - redundant Fabric to each controller and HBA • SAS – each dual HBA connected to each controller • Filers has multiple redundant data LIFs • 2 x 10 Gig NICs, HA bond, for the network (NFS and p4d) • VIF for hosting public IP / hostname • Perforce licenses tied to this IP
  • 9. Each Commit/Edge server is configured in a pair consisting of • A production host, controlled through a virtual NIC # – Allows for a quick failover of the p4d without any DNS or changes to the users environment • Standby host with a warm database or read-only replica • Dedicated SAN volume for low latency database storage • Multiple levels of redundancy (Network, Storage, Power, HBA) • Common init framework for all Perforce daemon binaries • SnapMirrored volume used for hosting the infrastructure binaries & tools (Perl, Ruby, Python, P4, Git-Fusion, common scripts)
  • 10. # • Storage devices used – NetApp EF540 w/ FC for the Commit server • 24 x 800 Gig SSD – NetApp E5512 w/ FC or SAS for each Edge server • 24 x 600 Gig 15k SAS – All RAID 10 with multiple spare disks, XFS, dual controllers, and dual power supplies • Used for: – Warm database or read-only replica on stand-by host – Production journal • Hourly journal truncations, then copied to the filer – Production p4d log • Nightly log rotations, compressed and copied to the filer
  • 11. # • NetApp cDOT clusters used at each site with FAS6290 or better • 10 Gig data LIF • Dedicated vserver for Perforce • Shared NFS volumes between production/standby pairs for longer term storage, snapshots, and offsite • Used for: – Depot storage – Rotated journals & p4d logs – Checkpoints – Warm database • used for creating checkpoints and if both hosts are down to run the daemon – Git-Fusion homedir & cache, dedicated volume per instance
  • 12. #
  • 13. # • Truncate the journal • Checksum the journal, copy to NFS and verify they match • Create a SnapShot of the NFS volumes • Remove any old snapshots • Replay the journal on the warm SAN database • Replay the journal on the warm NFS database • Once a week create a temporary snapshot on the NFS database and create a checkpoint (p4d –jd) Checksum journal on SAN Copy journal to NFS Compare checksums of local and NFS Create snapshot(s) Delete old snapshots Replay on warm NFS Replay on warm standby p4d -jj Every 1 hour
  • 14. # Warm database • Trigger on the Edge server events.csv changing • If a jj event, then get the journals that may need to be applied: – p4 journals –F “jdate>=(event epoch – 1)” –T jfile,jnum” • For each journal, run a p4d –jr • Weekly checkpoint from a snapshot Read-only Replica from Edge • Weekly checkpoint • Created with: • p4 –p localhost:<port> admin checkpoint -Z Edge server captures event in events.csv Monit triggers backups on events.csv Determine which journals to apply Commit server truncates Apply journals
  • 15. # • New process for Edge servers to avoid WAN NFS mounts • For all the clients on an Edge server, at each site: – Save the change output for any open changes – Generate the journal data for the client – Create an tarball of the open files – Retained for 14 days • A similar process will be used by users to clone clients across Edge servers
  • 16. # • Snapshots: – Main backup method – Created and kept for: • 4 hours every 20 minutes (20 & 40 minutes past the hour) • 8 hours every hour (top of the hour) • 3 weeks of nightly during backups (@midnight PT) • SnapVault – Used for online backups – Created every 4 weeks, kept for 12 months • SnapMirrors – Contains all of the data needed to recreate the instance – Sunnyvale • DataProtection (DP) Mirror for data recovery • Stored in the Cluster • Allows the possibility of fast test instances being created from production snapshots with FlexClone – DR • RTP is the Disaster Recovery site for the Commit server • Sunnyvale is the Disaster Recovery site for the RTP and Bangalore Edge servers
  • 17. #
  • 18. # • Monit & M/Monit – Monitors and alerts • Filesystem thresholds, space and inodes • On specific processes, and file changes (timestamp/md5) • OS thresholds • Ganglia – Used for identifying host or performance issues • NetApp OnCommand – Storage monitoring • Internal Tools – Monitor both infrastructure and the end-user experience
  • 19. # • Daemon that runs on each system, sends data to a single M/Monit instance • Monitors core daemons (Perforce and system) ssh, sendmail, ntpd, crond, ypbind, p4p, p4d, p4web, p4broker • Able to restart or take actions when conditions met (ie. clean a proxy cache or purge all) • Configured to alert on process children thresholds • Dynamic monitoring from init framework ties • Additional checks added for issues that have affected production in the past: – NIC errors – Number of filehandles – known patterns in the system log – p4d crashes
  • 20. # • Multiple Monit (one per host) communicate the status to a single M/Monit instance • All alerts and rules are controlled through M/Monit • Provides the ability to remotely start/stop/restart daemons • Has a dashboard of all of the Monit instances • Keeps historical data of issues, both when found and recovered from
  • 21. # • Collect historical data (depot, database, cache sizes, license trends, number of clients and opened files per p4d) • Benchmarks collected every hour with the top user commands – Alerts if a site is 15% slower than a historical average – Runs for both the Perforce binary and internal wrappers
  • 22. #
  • 23. # • Faster performance for end-users – Most noticeable for sites with higher latency WAN connections • Higher uptime for services since an Edge can service some commands when the WAN or Commit site are inaccessible • Much smaller databases, from 1.2Tb to 82G on a new Edge server • Automatic “backup” of the Commit server data through Edge servers • Easily move users to new instances • Can partially isolate some groups from affecting all users
  • 24. # • Helpful to disable csv log rotations for frequent journal truncations – Set the dm.rotatelogwithjnl configurable to 0 • Shared log volumes with multiple databases (warm or with a daemon) can cause interesting results with csv logs • Set global configurables where you can, monitor, rpl.*, track, etc • Use multiple pull –u threads to ensure the replicas have warm copies of the depot files • Need to have rock solid backups on all p4d’s with client data – Warm databases are harder to maintain with frequent journal truncations, no way to trigger on these events • Shelves are not automatically promoted • Users need to login to each edge server or ticket file updated from existing entries • Adjusting the perforce topologies may have unforeseen side-effects. Pointing proxies to new P4TARGETs can cause increased load on the WAN depending on the topology.
  • 25. # Scott Stanford sstanfor@netapp.com
  • 26. # Scott Stanford is the SCM Lead for NetApp where he also functions as a worldwide Perforce Administrator and tool developer. Scott has twenty years experience in software development, with thirteen years specializing in configuration management. Prior to joining NetApp, Scott was a Senior IT Architect at Synopsys.
  • 27. # RESOURCES SnapShot: http://www.netapp.com/us/technology/storage-efficiency/se-technologies.aspx SnapVault & SnapMirror: http://www.netapp.com/us/products/protection-software/index.aspx Backup & Recovery of Perforce on NetApp: http://www.netapp.com/us/system/pdf-reader.aspx?pdfuri=tcm:10-107938-16&m=tr-4142.pdf Monit: http://mmonit.com/