SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Infrastructure
Migrations
How many infrastructure migrations have I done? I’m
not sure. I stopped counting around 5.
One of the benefits of working for a small company
that’s growing quickly is that you get to experience a lot
of new things...and moving production and office
environments is one of them.

Thursday, August 2, 12
I am: Matt Simmons
• 10+ year sysadmin
• Small infrastructures
• 6+ infrastructure migrations
• http://www.standalone-sysadmin.com
You probably know this...

Thursday, August 2, 12
This is:
Infrastructure
Migrations

Thursday, August 2, 12
10,000ft view
• Pre-Planning
• Execution
• Post-Mortem

Thursday, August 2, 12

Like most things, 90% of the work is planning.
The other 90% is lifting heavy things.
There’s another 10-25% reserved for figuring out what went
wrong, and determining how to make it not happen again.
Considerations:
Types of Migrations

• Build in parallel
• Move Infrastructure
• Hybrid

You really, really want to build
in parallel. Sure it’s expensive,
but it means much, much
shorter periods of downtime.

Moving an infrastructure is hairraising, because there are only a few
million things that can go wrong.

Most people will probably end up doing hybrid migrations,
where you build some of the new infrastructure, then
migrate some from the existing setup.
Watch out for things like IP addressing issues, and that
you’ve made the correct assumptions about rack space and
power requirements for the machines that are moving.

Thursday, August 2, 12

And you don’t know scary until
you’re driving a U-Haul full of
ser vers across the Pennsylvania
Turnpike in the middle of a
rainstorm.
Considerations:
• Downtime Limits
• Uptime Requirements
• Service Window Length
Strangely enough, downtime limits and uptime
requirements aren’t the same.
Figure out what your uptime limits are according
to your user base’s expectations, then figure out
how much infrastructure needs to be running in
order to accommodate that. Good luck.

Thursday, August 2, 12

You might have a
maintenance window, where
downtime is planned and
doesn’t count against your
SLAs. If your migration can
fit within this, awesome
(hint: it can’t.)
So you need to figure out
what kind of downtime you
can afford, and remember to
schedule notices to your
customers far enough in
advance so that they aren’t
taken by surprise.
Considerations:
Upstream Network Changes
I think I could do an entire
presentation where I just list all
of the problems that could happen
when net work providers screw
things up.
Big ones to watch out for:

Thursday, August 2, 12

1. Is the test and turn-up date early
enough so that inevitable failures
don’t impact the go-live date?

2. Is the circuit exactly what
you ordered, and is what you
ordered exactly what you need?
3. Are cross-connects in the
datacenter ordered, and is the
datacenter net working team
working with the provider?
Considerations:
(Wo)man Power
You can’t lift all of the things you own.
You need friends to come help you move, right? And you
usually pay them beer and pizza for the effort.
Moving infrastructures is kind of like that, except
“money” typically substitutes for beer and pizza, and you
want to find people who are reasonably smart, because
you probably don’t own anything in your apartment that
costs as much as a high performance RAID array.

Thursday, August 2, 12

Figure out how many
people you need, then add
20% to cover the stuff
you didn’t think of.
Have another 10% at
home ready to come in if
the need arises.
Considerations:
How can we parallelize the work?
If you have teams, having them all work
independently but simultaneously is important,
so try not to have one team waiting around on
the result of another team. This is no different
than removing bottlenecks from a computing
infrastructure.

Thursday, August 2, 12
Establishing a Plan
Documentation shall set you free!

Thursday, August 2, 12
Build a checklist
Every good plan includes a checklist

• What needs to be done
• By whom?
• Where?
• In what order?
Thursday, August 2, 12
Build a checklist
Include all phases

•
•
•
•
•
•
Thursday, August 2, 12

Off site prior
On site prior
On site during
On site after
Testing
Signoff

Off site things before moves
are usually slow processes or
long-term changes that rely
on TTLs or human interaction
outside of your organization.
Build a checklist
Establish Dependencies
If item 23 relies on item 24 being done, then
it’s probably in the wrong place...
Figuring out all of these dependencies is like
untangling a knot. It’s slow, it’s difficult, and
when you’re done, no one seems to be as
appreciative of your hard work as you are.

Thursday, August 2, 12
Build a checklist
Build in checkpoints
Checkpoints are a great place
to stop all the teams at the
same time and make sure that
everyone’s on the same page.

Thursday, August 2, 12
Build a checklist
Include communication up-stream
Overcommunicate.
Keep your boss informed.
Keep your stakeholders informed.
If you have the kind of work
environment where your users
care, keep them informed.

Thursday, August 2, 12
Build a checklist
Multiple Checklists

• Per team?
• Per location?
• Per person?

Thursday, August 2, 12

If you’ve got multiple teams,
you are likely to need multiple
checklists.
Ditto if your locations are
farther apart.
If each person’s tasks are
complicated, give each person
an individual checklist, too.
Build a checklist
Schedule Breaks
Breaks are SO important.
You can’t work for 8 hours
without stopping to rest,
physically or mentally. Put
these into the schedule.

Thursday, August 2, 12
Change Management
Techniques
Establish tests for complicated steps
(or groups)
Would you build a new ser ver then put it into
production without testing it?
Of course not.
Build tests to see if your work so far is correct. It can be
as simple as “ this point, LED 7, 8, and 9 should be green,
at
and LED 10 should be amber”.

Thursday, August 2, 12
Change Management
Techniques
Establish roll-back procedures
Things happen. Stuff doesn’t
always go right.
Make sure your plan includes
when to roll-back and what
steps to take to do it.

Thursday, August 2, 12
Change Management
Techniques
Establish failure guidelines
Failures are inevitable.
Unhandled failures are
unnecessary though.
Know how to tell if
something has failed, and
know what to do about
it.

Thursday, August 2, 12

(What happens if...)

• ...a machine breaks?
• ...a router doesn’t boot?
• ...?
Identify Goods & Services
to be Purchased

These kinds of steps
require a lot of
planning, but more
planning just makes
the end result better.

• Cables of specific lengths, connectors, label
tape, velcro, rack shelves, etc

• Servers, routers, firmwares, licenses, etc
• Circuits, bandwidth, accounts, etc

Thursday, August 2, 12
Maintain
Communications
• Cellphones
• (at least one per team)
• 2-way radios
• (for lack of cellular service)
• Probably not IP phones

Cell reception in datacenters is
spotty. Using handheld 2-way
radios is much more reliable.

Don’t rely on your IP phone
infrastructure for critical
communications during
net work outages.

Just don’t.

Thursday, August 2, 12
Find Warm Bodies
Figure out how many people you need.
Add 20% for good measure
Have 10% standing by

Thursday, August 2, 12
Establish Roles
Zone: “Your job is to stay at this rack,
pulling things out in the order
prescribed by the checklist, and to
load them on the cart once removed”

Man to Man: “Your job is to cart
these servers to the truck, and once
the number of servers in the truck
matches the number prescribed by the
checklist, to drive the truck to the
new datacenter, and assist in loading
the ser vers onto the cart for the next
zone man”

• Zone
• Man to Man
• Point Guard
...and so on, as required by your migration.

Thursday, August 2, 12

Point Guard: “Your job is to act as
the communications hub, the
person to verify that check
points happen on schedule, and
that things are correct, as well
as to finalize sign-off and handoff once we’re done”
Communicate
the plan
Default to being too communicative
Have your point guard
annoy people with the
number of email updates.

Thursday, August 2, 12
Communicate
the plan
Get clearance from the stake-holders
Before ever starting
work, make sure that
everyone is on board
with the migration plan,
and that everyone has
agreed and signed off.

Thursday, August 2, 12
Communicate
the plan
Alert users multiple times

• Well in advance

(so long term projects aren’t scheduled)

• A week before
(so short-term pushes aren’t interrupted)

• Immediately before

(so last minute issues don’t compound)

Thursday, August 2, 12
Communicate
the plan
Give everyone the information they need

• Checklists
• Plan document
• Contact Information

I actually got to the point where every person involved
in the migration got a personalized envelope.
The contents were the checklist relevant to their job,
the diagrams of what the rack looked like before, what
the new racks were supposed to look like, and the
contact information for all of the other team members.

...and has signed off on it

Thursday, August 2, 12
Executing the plan
I love it when a plan comes together...

Thursday, August 2, 12
Executing the plan
Verify all goods were purchased
Doing inventory sucks, but
not having enough ethernet
cables that reach to the
switch sucks more...

Thursday, August 2, 12
Executing the plan
Clear personal schedules
“oh, that was this weekend? Crap, man,
I’m sorry. I have to go drink beer with
my other friends and have a good
weekend. Maybe next time, brah”

Thursday, August 2, 12
Executing the plan
Complete off-site checklist items
Verify that everyone at
both sites knows what’s
happening, when, and is
on board. Make sure the
datacenter has people on
hand to help who are
capable of helping.

Thursday, August 2, 12
Executing the plan
Show up early
,,,because something won’t be right.

Thursday, August 2, 12
Executing the plan
Verify assigned roles
Ask for questions
...and ask each person.
Make sure that they
know how to get ahold of
you and the point guard.

Thursday, August 2, 12
Executing the plan
Step through the list

Thursday, August 2, 12
Executing the plan
Verify completeness with each team

Thursday, August 2, 12
Executing the plan
Perform on-site and off-site post-complete items

Thursday, August 2, 12
Executing the plan
Go have a beer.
Seriously, celebrate
completing the task with
the team. I didn’t always
get to do this, and I’m still
sorry about it today.

Thursday, August 2, 12
Executing the plan
Complete post-mortem according to schedule
During the next workweek, complete the postmortem and identify
what went wrong as well
as what went right.
You can’t replicate success
and eliminate failure
unless you identify them.

Thursday, August 2, 12
Dealing with problems
Yes, you will have problems...

Thursday, August 2, 12
Dealing with problems
Two big take-aways:
1) Problems are
inevitable because they
are a condition of the
infrastructure, and
they arise from its
inherent complexity.
2) It’s not possible to
eliminate all failures,
but it’s desirable to
minimize them, and to
try to eliminate
repeating the same
failure by improving the
process and design.

Thursday, August 2, 12

Problems are inevitable
(It’s not “if”, it’s “when”)
Read “The Field Guide to Understanding
Human Error” by Sydney Dekker
http:/
/amzn.to/QFpcqY

During my talk, I gave
far more discussion on
this topic than I’m going
to give here.
Dealing with problems
• Identify & Acknowledge the problem
• Don’t punish the reporter
• Follow the failure guidelines
• Roll-back if necessary & reschedule
Thursday, August 2, 12
Post-mortem
• What went wrong?
• Why?
• The ‘Five Whys’
• What went right?
• What have we learned?
Thursday, August 2, 12
Thanks for your time.
I hope you were able to
get something out of it.

Infrastructure
Migrations
If you have questions,
feel free to contact me

@standaloneSA
Thursday, August 2, 12

standalone.sysadmin@gmail.com

Weitere ähnliche Inhalte

Andere mochten auch

Leveraging Good User Mojo
Leveraging Good User MojoLeveraging Good User Mojo
Leveraging Good User MojoMatt Simmons
 
Baking-In Transparency
Baking-In TransparencyBaking-In Transparency
Baking-In TransparencyMatt Simmons
 
Solid State Drive Technology - MIT Lincoln Labs
Solid State Drive Technology - MIT Lincoln LabsSolid State Drive Technology - MIT Lincoln Labs
Solid State Drive Technology - MIT Lincoln LabsMatt Simmons
 
CentOS Dojo - Good User Mojo
CentOS Dojo - Good User MojoCentOS Dojo - Good User Mojo
CentOS Dojo - Good User MojoMatt Simmons
 
Staying Sane with Nagios
Staying Sane with NagiosStaying Sane with Nagios
Staying Sane with NagiosMatt Simmons
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 
Introduction to Solid State Drives
Introduction to Solid State DrivesIntroduction to Solid State Drives
Introduction to Solid State DrivesMatt Simmons
 

Andere mochten auch (7)

Leveraging Good User Mojo
Leveraging Good User MojoLeveraging Good User Mojo
Leveraging Good User Mojo
 
Baking-In Transparency
Baking-In TransparencyBaking-In Transparency
Baking-In Transparency
 
Solid State Drive Technology - MIT Lincoln Labs
Solid State Drive Technology - MIT Lincoln LabsSolid State Drive Technology - MIT Lincoln Labs
Solid State Drive Technology - MIT Lincoln Labs
 
CentOS Dojo - Good User Mojo
CentOS Dojo - Good User MojoCentOS Dojo - Good User Mojo
CentOS Dojo - Good User Mojo
 
Staying Sane with Nagios
Staying Sane with NagiosStaying Sane with Nagios
Staying Sane with Nagios
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Introduction to Solid State Drives
Introduction to Solid State DrivesIntroduction to Solid State Drives
Introduction to Solid State Drives
 

Ähnlich wie Infrastructure Migration

Building a Loved Data Engineering Team at Afterpay
Building a Loved Data Engineering Team at AfterpayBuilding a Loved Data Engineering Team at Afterpay
Building a Loved Data Engineering Team at AfterpayNitish Mathew
 
Predictability at Axial
Predictability at AxialPredictability at Axial
Predictability at AxialMatt Story
 
The Business Case for DevOps - Justifying the Journey
The Business Case for DevOps - Justifying the JourneyThe Business Case for DevOps - Justifying the Journey
The Business Case for DevOps - Justifying the JourneyXebiaLabs
 
Anatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsRandy Shoup
 
CloudAustin Black Friday 2013
CloudAustin Black Friday 2013CloudAustin Black Friday 2013
CloudAustin Black Friday 2013Ernest Mueller
 
Data Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutData Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutLooker
 
Why Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityWhy Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityZach Gardner
 
Data Integrity - Patryk Hes
Data Integrity - Patryk HesData Integrity - Patryk Hes
Data Integrity - Patryk HesPROIDEA
 
Advanced Lean Training Manual Toolkit.ppt
Advanced Lean Training Manual Toolkit.pptAdvanced Lean Training Manual Toolkit.ppt
Advanced Lean Training Manual Toolkit.pptThinL389917
 
Agile 2013: Pat Reed and I discussing Scrum and Compliance
Agile 2013: Pat Reed and I discussing Scrum and Compliance Agile 2013: Pat Reed and I discussing Scrum and Compliance
Agile 2013: Pat Reed and I discussing Scrum and Compliance Laszlo Szalvay
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentTasktop
 
Conveyor belt project
Conveyor belt projectConveyor belt project
Conveyor belt projectSamehTeleb
 
Page 1A Payroll Automation ProposalPart C – Project Plan.docx
Page  1A Payroll Automation ProposalPart C – Project Plan.docxPage  1A Payroll Automation ProposalPart C – Project Plan.docx
Page 1A Payroll Automation ProposalPart C – Project Plan.docxalfred4lewis58146
 
(SPOT205) 5 Lessons for Managing Massive IT Transformation Projects
(SPOT205) 5 Lessons for Managing Massive IT Transformation Projects(SPOT205) 5 Lessons for Managing Massive IT Transformation Projects
(SPOT205) 5 Lessons for Managing Massive IT Transformation ProjectsAmazon Web Services
 
1documents--ECS_Introduction.docCIS 321 Case Study ‘Equipme.docx
1documents--ECS_Introduction.docCIS 321 Case Study ‘Equipme.docx1documents--ECS_Introduction.docCIS 321 Case Study ‘Equipme.docx
1documents--ECS_Introduction.docCIS 321 Case Study ‘Equipme.docxhyacinthshackley2629
 
Waste Not Want Not Best Practice Guide
Waste Not Want Not Best Practice GuideWaste Not Want Not Best Practice Guide
Waste Not Want Not Best Practice GuideGreg Fry
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptQingsong Yao
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries
 

Ähnlich wie Infrastructure Migration (20)

Building a Loved Data Engineering Team at Afterpay
Building a Loved Data Engineering Team at AfterpayBuilding a Loved Data Engineering Team at Afterpay
Building a Loved Data Engineering Team at Afterpay
 
Predictability at Axial
Predictability at AxialPredictability at Axial
Predictability at Axial
 
The Business Case for DevOps - Justifying the Journey
The Business Case for DevOps - Justifying the JourneyThe Business Case for DevOps - Justifying the Journey
The Business Case for DevOps - Justifying the Journey
 
Anatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and Lessons
 
CloudAustin Black Friday 2013
CloudAustin Black Friday 2013CloudAustin Black Friday 2013
CloudAustin Black Friday 2013
 
Data Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutData Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at Tout
 
Why Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityWhy Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best Opportunity
 
Data Integrity - Patryk Hes
Data Integrity - Patryk HesData Integrity - Patryk Hes
Data Integrity - Patryk Hes
 
bp
bpbp
bp
 
Advanced Lean Training Manual Toolkit.ppt
Advanced Lean Training Manual Toolkit.pptAdvanced Lean Training Manual Toolkit.ppt
Advanced Lean Training Manual Toolkit.ppt
 
Agile 2013: Pat Reed and I discussing Scrum and Compliance
Agile 2013: Pat Reed and I discussing Scrum and Compliance Agile 2013: Pat Reed and I discussing Scrum and Compliance
Agile 2013: Pat Reed and I discussing Scrum and Compliance
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics Environment
 
Conveyor belt project
Conveyor belt projectConveyor belt project
Conveyor belt project
 
Page 1A Payroll Automation ProposalPart C – Project Plan.docx
Page  1A Payroll Automation ProposalPart C – Project Plan.docxPage  1A Payroll Automation ProposalPart C – Project Plan.docx
Page 1A Payroll Automation ProposalPart C – Project Plan.docx
 
(SPOT205) 5 Lessons for Managing Massive IT Transformation Projects
(SPOT205) 5 Lessons for Managing Massive IT Transformation Projects(SPOT205) 5 Lessons for Managing Massive IT Transformation Projects
(SPOT205) 5 Lessons for Managing Massive IT Transformation Projects
 
1documents--ECS_Introduction.docCIS 321 Case Study ‘Equipme.docx
1documents--ECS_Introduction.docCIS 321 Case Study ‘Equipme.docx1documents--ECS_Introduction.docCIS 321 Case Study ‘Equipme.docx
1documents--ECS_Introduction.docCIS 321 Case Study ‘Equipme.docx
 
Waste Not Want Not Best Practice Guide
Waste Not Want Not Best Practice GuideWaste Not Want Not Best Practice Guide
Waste Not Want Not Best Practice Guide
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Infrastructure Migration

  • 1. Infrastructure Migrations How many infrastructure migrations have I done? I’m not sure. I stopped counting around 5. One of the benefits of working for a small company that’s growing quickly is that you get to experience a lot of new things...and moving production and office environments is one of them. Thursday, August 2, 12
  • 2. I am: Matt Simmons • 10+ year sysadmin • Small infrastructures • 6+ infrastructure migrations • http://www.standalone-sysadmin.com You probably know this... Thursday, August 2, 12
  • 4. 10,000ft view • Pre-Planning • Execution • Post-Mortem Thursday, August 2, 12 Like most things, 90% of the work is planning. The other 90% is lifting heavy things. There’s another 10-25% reserved for figuring out what went wrong, and determining how to make it not happen again.
  • 5. Considerations: Types of Migrations • Build in parallel • Move Infrastructure • Hybrid You really, really want to build in parallel. Sure it’s expensive, but it means much, much shorter periods of downtime. Moving an infrastructure is hairraising, because there are only a few million things that can go wrong. Most people will probably end up doing hybrid migrations, where you build some of the new infrastructure, then migrate some from the existing setup. Watch out for things like IP addressing issues, and that you’ve made the correct assumptions about rack space and power requirements for the machines that are moving. Thursday, August 2, 12 And you don’t know scary until you’re driving a U-Haul full of ser vers across the Pennsylvania Turnpike in the middle of a rainstorm.
  • 6. Considerations: • Downtime Limits • Uptime Requirements • Service Window Length Strangely enough, downtime limits and uptime requirements aren’t the same. Figure out what your uptime limits are according to your user base’s expectations, then figure out how much infrastructure needs to be running in order to accommodate that. Good luck. Thursday, August 2, 12 You might have a maintenance window, where downtime is planned and doesn’t count against your SLAs. If your migration can fit within this, awesome (hint: it can’t.) So you need to figure out what kind of downtime you can afford, and remember to schedule notices to your customers far enough in advance so that they aren’t taken by surprise.
  • 7. Considerations: Upstream Network Changes I think I could do an entire presentation where I just list all of the problems that could happen when net work providers screw things up. Big ones to watch out for: Thursday, August 2, 12 1. Is the test and turn-up date early enough so that inevitable failures don’t impact the go-live date? 2. Is the circuit exactly what you ordered, and is what you ordered exactly what you need? 3. Are cross-connects in the datacenter ordered, and is the datacenter net working team working with the provider?
  • 8. Considerations: (Wo)man Power You can’t lift all of the things you own. You need friends to come help you move, right? And you usually pay them beer and pizza for the effort. Moving infrastructures is kind of like that, except “money” typically substitutes for beer and pizza, and you want to find people who are reasonably smart, because you probably don’t own anything in your apartment that costs as much as a high performance RAID array. Thursday, August 2, 12 Figure out how many people you need, then add 20% to cover the stuff you didn’t think of. Have another 10% at home ready to come in if the need arises.
  • 9. Considerations: How can we parallelize the work? If you have teams, having them all work independently but simultaneously is important, so try not to have one team waiting around on the result of another team. This is no different than removing bottlenecks from a computing infrastructure. Thursday, August 2, 12
  • 10. Establishing a Plan Documentation shall set you free! Thursday, August 2, 12
  • 11. Build a checklist Every good plan includes a checklist • What needs to be done • By whom? • Where? • In what order? Thursday, August 2, 12
  • 12. Build a checklist Include all phases • • • • • • Thursday, August 2, 12 Off site prior On site prior On site during On site after Testing Signoff Off site things before moves are usually slow processes or long-term changes that rely on TTLs or human interaction outside of your organization.
  • 13. Build a checklist Establish Dependencies If item 23 relies on item 24 being done, then it’s probably in the wrong place... Figuring out all of these dependencies is like untangling a knot. It’s slow, it’s difficult, and when you’re done, no one seems to be as appreciative of your hard work as you are. Thursday, August 2, 12
  • 14. Build a checklist Build in checkpoints Checkpoints are a great place to stop all the teams at the same time and make sure that everyone’s on the same page. Thursday, August 2, 12
  • 15. Build a checklist Include communication up-stream Overcommunicate. Keep your boss informed. Keep your stakeholders informed. If you have the kind of work environment where your users care, keep them informed. Thursday, August 2, 12
  • 16. Build a checklist Multiple Checklists • Per team? • Per location? • Per person? Thursday, August 2, 12 If you’ve got multiple teams, you are likely to need multiple checklists. Ditto if your locations are farther apart. If each person’s tasks are complicated, give each person an individual checklist, too.
  • 17. Build a checklist Schedule Breaks Breaks are SO important. You can’t work for 8 hours without stopping to rest, physically or mentally. Put these into the schedule. Thursday, August 2, 12
  • 18. Change Management Techniques Establish tests for complicated steps (or groups) Would you build a new ser ver then put it into production without testing it? Of course not. Build tests to see if your work so far is correct. It can be as simple as “ this point, LED 7, 8, and 9 should be green, at and LED 10 should be amber”. Thursday, August 2, 12
  • 19. Change Management Techniques Establish roll-back procedures Things happen. Stuff doesn’t always go right. Make sure your plan includes when to roll-back and what steps to take to do it. Thursday, August 2, 12
  • 20. Change Management Techniques Establish failure guidelines Failures are inevitable. Unhandled failures are unnecessary though. Know how to tell if something has failed, and know what to do about it. Thursday, August 2, 12 (What happens if...) • ...a machine breaks? • ...a router doesn’t boot? • ...?
  • 21. Identify Goods & Services to be Purchased These kinds of steps require a lot of planning, but more planning just makes the end result better. • Cables of specific lengths, connectors, label tape, velcro, rack shelves, etc • Servers, routers, firmwares, licenses, etc • Circuits, bandwidth, accounts, etc Thursday, August 2, 12
  • 22. Maintain Communications • Cellphones • (at least one per team) • 2-way radios • (for lack of cellular service) • Probably not IP phones Cell reception in datacenters is spotty. Using handheld 2-way radios is much more reliable. Don’t rely on your IP phone infrastructure for critical communications during net work outages. Just don’t. Thursday, August 2, 12
  • 23. Find Warm Bodies Figure out how many people you need. Add 20% for good measure Have 10% standing by Thursday, August 2, 12
  • 24. Establish Roles Zone: “Your job is to stay at this rack, pulling things out in the order prescribed by the checklist, and to load them on the cart once removed” Man to Man: “Your job is to cart these servers to the truck, and once the number of servers in the truck matches the number prescribed by the checklist, to drive the truck to the new datacenter, and assist in loading the ser vers onto the cart for the next zone man” • Zone • Man to Man • Point Guard ...and so on, as required by your migration. Thursday, August 2, 12 Point Guard: “Your job is to act as the communications hub, the person to verify that check points happen on schedule, and that things are correct, as well as to finalize sign-off and handoff once we’re done”
  • 25. Communicate the plan Default to being too communicative Have your point guard annoy people with the number of email updates. Thursday, August 2, 12
  • 26. Communicate the plan Get clearance from the stake-holders Before ever starting work, make sure that everyone is on board with the migration plan, and that everyone has agreed and signed off. Thursday, August 2, 12
  • 27. Communicate the plan Alert users multiple times • Well in advance (so long term projects aren’t scheduled) • A week before (so short-term pushes aren’t interrupted) • Immediately before (so last minute issues don’t compound) Thursday, August 2, 12
  • 28. Communicate the plan Give everyone the information they need • Checklists • Plan document • Contact Information I actually got to the point where every person involved in the migration got a personalized envelope. The contents were the checklist relevant to their job, the diagrams of what the rack looked like before, what the new racks were supposed to look like, and the contact information for all of the other team members. ...and has signed off on it Thursday, August 2, 12
  • 29. Executing the plan I love it when a plan comes together... Thursday, August 2, 12
  • 30. Executing the plan Verify all goods were purchased Doing inventory sucks, but not having enough ethernet cables that reach to the switch sucks more... Thursday, August 2, 12
  • 31. Executing the plan Clear personal schedules “oh, that was this weekend? Crap, man, I’m sorry. I have to go drink beer with my other friends and have a good weekend. Maybe next time, brah” Thursday, August 2, 12
  • 32. Executing the plan Complete off-site checklist items Verify that everyone at both sites knows what’s happening, when, and is on board. Make sure the datacenter has people on hand to help who are capable of helping. Thursday, August 2, 12
  • 33. Executing the plan Show up early ,,,because something won’t be right. Thursday, August 2, 12
  • 34. Executing the plan Verify assigned roles Ask for questions ...and ask each person. Make sure that they know how to get ahold of you and the point guard. Thursday, August 2, 12
  • 35. Executing the plan Step through the list Thursday, August 2, 12
  • 36. Executing the plan Verify completeness with each team Thursday, August 2, 12
  • 37. Executing the plan Perform on-site and off-site post-complete items Thursday, August 2, 12
  • 38. Executing the plan Go have a beer. Seriously, celebrate completing the task with the team. I didn’t always get to do this, and I’m still sorry about it today. Thursday, August 2, 12
  • 39. Executing the plan Complete post-mortem according to schedule During the next workweek, complete the postmortem and identify what went wrong as well as what went right. You can’t replicate success and eliminate failure unless you identify them. Thursday, August 2, 12
  • 40. Dealing with problems Yes, you will have problems... Thursday, August 2, 12
  • 41. Dealing with problems Two big take-aways: 1) Problems are inevitable because they are a condition of the infrastructure, and they arise from its inherent complexity. 2) It’s not possible to eliminate all failures, but it’s desirable to minimize them, and to try to eliminate repeating the same failure by improving the process and design. Thursday, August 2, 12 Problems are inevitable (It’s not “if”, it’s “when”) Read “The Field Guide to Understanding Human Error” by Sydney Dekker http:/ /amzn.to/QFpcqY During my talk, I gave far more discussion on this topic than I’m going to give here.
  • 42. Dealing with problems • Identify & Acknowledge the problem • Don’t punish the reporter • Follow the failure guidelines • Roll-back if necessary & reschedule Thursday, August 2, 12
  • 43. Post-mortem • What went wrong? • Why? • The ‘Five Whys’ • What went right? • What have we learned? Thursday, August 2, 12
  • 44. Thanks for your time. I hope you were able to get something out of it. Infrastructure Migrations If you have questions, feel free to contact me @standaloneSA Thursday, August 2, 12 standalone.sysadmin@gmail.com