SlideShare a Scribd company logo
1 of 7
Download to read offline
3/3/12                                                         No Title




     Care and Feeding of Large Web Applications
     by Perrin Harkins

     So, you launched your website. Congratulations!

     And then there were a bunch of quick fixes. And you started getting traffic so you had to add more
     machines. And some more developers. And more features to keep your new users happy. And suddenly
     you find yourself spending all your time doing damage control on a site that seems to have taken on a life of
     its own and you can't make a new release because the regression testing alone would take three years.

     Usually, this is the part where everyone starts clamoring for a rewrite, and the CEO contemplates firing
     your ass and bringing in an army of consultants to rewrite it all in the flavor of the month.

     How can we avoid this mess? How can we create a web development process that is sustainable for years
     and doesn't hold back development?

     Backstory
     There's more than one way to do it, but I'll tell how my team did it, at a small startup company called Plus
     Three. Let me give you a few stats about our project:

             About 2.5 years of continuous development

             2 - 5 developers on the team during that time

             65,000+ lines of Perl code

             1600+ lines of SQL

             (Computed with David Wheeler's SLOCCount program)

             Plenty of HTML, CSS, and JavaScript too

             6000+ automated tests in 78 files

             169 CPAN modules

     It's a big system, built to support running websites for political campaigns and non-profit membership
     organizations. Some of the major components are a content management system, an e-commerce system
     with comprehensive reporting, a data warehouse with an AJAX query builder GUI, a large-scale e-mail
     campaign system, a variety of user-facing web apps, and an asynchronous job queue.

     This talk isn't meant to be about coding style, which I've discussed in some previous talks, but I'll give you
     the 10,000 foot overview:

             Object-oriented

             MVC-ish structure with the typical breakdown into controller classes, database classes, and templates.

file:///Users/perrinharkins/Conferences/care_and_feeding.html                                                          1/7
3/3/12                                                         No Title

             (Not very pure MVC, but that's a whole separate topic.)

             Our basic building blocks were CGI::Application, Class::DBI, and HTML::Template.

     Ok, that's the software. How did we keep it under control?

     Deployment
     Let's dive right in by talking about the hardest thing first: deployment. So hard to get right, but so rarely
     discussed and so hard to generalize. Everyone ends up with solutions that are tied very closely to their own
     organization's quirks.

     The first issue here is how to package a release. We used plain old .tar.gz files, built by a simple script after
     pulling a tagged release from our source control system. We tried to always release complete builds, not
     individual files. This is important in order to be sure you have a consistent production system that you can
     rebuild from scratch if necessary. It's also important for setting up QA testing. If you just upload a file here
     and there (or worse, vi a file in production!), you get yourself in a bad state where your source control no
     longer reflects what's really live and your testing misses things because of it. We managed to stick to the
     "full build release" rule, outside of dire emergencies.

     Like most big Perl projects we used a ton of CPAN modules. The first advice you'll get about how to install
     them is "just use the CPAN shell," possibly with a bundle or Task file. This is terrible advice.

     The most obvious problem with it is that as the number of CPAN modules increases, the probability of one
     of them failing to install via the CPAN shell for some obscure and irrelevant reason approaches 1.

     The second most obvious problem is that you don't want to install whatever the latest version of some
     module happens to be -- you want to install the specific version that you've been developing with and that
     you tested in QA. There might be something subtly different about the new version that will break your site.
     Test it first.

     Let me lay out the requirements we had for a CPAN installer:

             Install specific versions.

             Install from local media. Sometimes a huge CPAN download is not convenient.

             Handle versions with local patches. We always submitted our patches, but sometimes we couldn't
             afford to wait for a release that included them.

             Fully automated. That means that modules which ask pesky questions during install must be handled
             in some way. I'm looking at you, WWW::Mechanize.

             Install into a local directory. We don't want to put anything in the system directories because we want
             to be able to run multiple versions of our application on one machine, even if they require different
             versions of the same module.

             Skip the tests. I know this sounds like blasphemy, but bear with me. If you have a cluster of identical
             machines, running all the module tests on all of them is a waste of time. And the larger issue is that
             CPAN authors still don't all agree on what the purpose of tests is. Some modules come with tests that
             are effectively useless or simply fail unless you set up test databases or jump through similar hoops.

file:///Users/perrinharkins/Conferences/care_and_feeding.html                                                             2/7
3/3/12                                                         No Title

     Our solution to the installation problem was to write an automated build system that builds all the modules it
     finds in the src/ directory of our release package. (Note that this means we can doctor one of those modules
     if we have to.) We used the Expect module (which is included and bootstrapped at the beginning of the
     build) and gave it canned answers for the modules with chatty install scripts. We also made it build some
     non-CPAN things we needed: Apache and mod_perl, the SWISH-E search engine. If we could have
     bundled Perl and MySQL too, that would have been ideal.

     Why bundle the dependencies? Why not just use whatever apache binary we find lying around? In short,
     we didn't want to spend all of our time troubleshooting insane local configurations and builds where
     someone missed a step. A predictable runtime environment is important.

     To stress that point a little more, if your software is an internal application that's going to be run on
     dedicated hardware, you can save yourself a lot of trouble by only supporting very specific configurations.
     Just as an example, only supporting one version of one operating system cuts down the time and resources
     you need for QA testing. To this end, we specified exact versions of Perl, MySQL, Red Hat Linux, and a
     set of required packages and install options in addition to the things we bundled in our releases.

     That was the theory anyway. Reality intruded a bit here in the form of cheap legacy hardware that would
     work with some versions of Red Hat and not others. If we had a uniform cluster of hardware, we could
     have gone as far as creating automated installs, maybe even network booting, but the best we were able to
     do was keep our list of supported OS versions down to a handful. This is also a place where human nature
     can become a problem. If you have a separate sysadmin group, they can get territorial when developers try
     to dictate details of the OS to install. But that's another separate topic.

     The automated build worked out very well. Eventually though, as we added more modules, the builds
     started taking longer than we would have liked. Remember, we built them on every machine. Not the most
     efficient thing to do.

     The obvious next step would be binary distributions, possibly using RPMs, or just tar balls. Not trivial, but
     not too bad if you can insist on one version of Perl and one hardware architecture. If we were only
     concerned about distributing the CPAN modules, it might be possible to use something existing like PAR.

     If you're interested in seeing this build system, the Krang CMS (which we used) comes with a version of it,
     along with a pretty nice automated installer that checks dependencies and can be customized for different
     OSes. (http://krangcms.com/) You could probably make your own for the CPAN stuff using CPANPLUS,
     but you'd still need to do the Expect part and the non-CPAN builds.

     QA
     Upgrades
     We didn't automate upgrades enough. Changes on a production system are tense for everyone, and its much
     better to have them automated so that you can fully test them ahead of time and make the actual work to be
     done in the upgrade process as dumb as possible. We didn't fully automate this, but we did fully automate
     one of the crucial parts of it: data and database schema upgrades.

     Our procedure was pretty simple, and coincidentally similar to the Ruby on Rails schema upgrade
     approach. We kept the current schema version number in the database and the code version number in the
     release package, and when we ran our upgrade utility it would look for any upgrade scripts with versions
     between the one we were on and the one we wanted to go to. For example, when going from version 2.0 to
file:///Users/perrinharkins/Conferences/care_and_feeding.html                                                          3/7
3/3/12                                                         No Title

     3.0, it would look in the upgrade/ directory (also in our install bundle), find scripts named V2.1 and V3.0,
     and run them in order. Usually they just ran SQL scripts, but sometimes we needed to do some things in
     perl as well.

     Our SQL upgrade scripts were written by hand. I tried a couple of schema diffing utilities but they were
     pretty weak. They didn't pick up things like changes in default value for a column, or know what to do with
     changes in foreign keys. Maybe someday someone will make a good one. Even then, it will still require
     some manual intervention when columns and tables get renamed, or a table gets split into multiple tables.

     One cool thing we discovered recently is a decent way to test these upgrades on real data. We always set up
     a QA server with a copy of the current version of the system, and then try our upgrade procedure and
     continue with testing. This works fine except that when you fix a bug and need to do it again, it takes
     forever to set it up again. We tried VMWare snapshots, but the disk performance for Linux on VMWare
     was so poor that we had to abandon it. Backups over the network seemed like they would take a long time
     to restore. Then we tried LVM, the Linux Volume Manager. It let us take a snapshot just before the upgrade
     test, and then roll back to it almost instantly.

     Time-travel bug

     Plugin System
     Harder than it sounds

     Simple factory works for most things

     Configuration
     The trouble with highly configurable software is that someone has to configure it. Our configuration options
     expanded greatly as time went on, and we had to devise ways to make configuring it easier.

     We started with a simple config file containing defaults and comments, like the one that comes with
     Apache. In fact it was very much like that one because we used Config::ApacheFormat.

     In the beginning, this worked fine. Config::ApacheFormat supplied a concept of blocks that inherit from
     surrounding blocks, so that if you have a block for each server and a parameter that applies to all of them,
     you can put it outside of those blocks and avoid repeating it. You can even override that parameter in the
     one server that needs something different.

     As the number of parameters grew, we realized a few things:

             People will ignore configuration options they don't understand. Expectations are that if the server
             starts, it must be okay.

             A few of lines of comments in a config file is pretty weak documentation.

             Long config files full of things that you hardly ever need to change are pointless and look daunting.

     To deal with these problems, we started making extensive use of default values, so that things that didn't
     usually get changed could be left out of the file. We ended up creating a fairly complex config system in
     order to keep the file short. It does things like default several values based on one setting, e.g. setting the
     domain name for a server allows it to default the cookie domain, the e-mail account to use as the From
file:///Users/perrinharkins/Conferences/care_and_feeding.html                                                           4/7
3/3/12                                                         No Title

     address on site-related mail, etc.

     Of course this created the necessity to see what all of the values were defaulting to, so a config dumper
     utility was created.

     By the time we were done, we had moved to a level where using one of the complex config modules like
     Config::Scoped probably would have been a better choice than maintaining our own. Well, Config::Scoped
     still scares me, but something along those lines.

     Testing
     You all know the deal with testing. You have to have it. It's your only hope of being able to change the
     code later without breaking everything. This point became very clear to me when I did a couple of big
     refactorings and the test suite found all kind of problems I missed on my own.

     For any large application, you'll probably end up needing some local test libraries that save setup work in
     your test scripts. Ours had functions for doing common things like getting a WWW::Mechanize object all
     logged in and ready to go.

     When you're testing a large database-driven application, you need some strategies for generating and
     cleaning up test data. We created a module for this called Arcos::TestData. (Arcos is the name of the
     project.) The useage is like this:

         my $creator = Arcos::TestData->new();
         END { $creator->cleanup() }

         # create an Arcos::DB::ContactInfo
         my $contact_info = $creator->create_contact_info();

         # create one with some values specified
         my $aubrey = $creator->create_contact_info(first_name => 'George',
                                                    occupcation => 'housecat');

     This one is simple, but some of them will create a whole tree of dependent objects with default values to
     avoid needing to code all that in your test. When the END block runs, it deletes all the registered objects in
     reverse order, to avoid referential integrity problems.

     This seemed very clever at the time. However, after a while there were many situations that required special
     handling, like web-based tests that cause objects to be created by another process. We had solutions for
     each one, but they took programmer time, and at this point I think it might have been smarter to simply wipe
     the whole schema at the end of a test script. We could have just truncated all the non-lookup tables pretty
     quickly.

     We got a lot of mileage out of Test::WWW::Mechanize.

     Test::Class helps similar classes

     Testing web interfaces - Mech tricks - Selenium

     Smolder

     Testing difficult things

file:///Users/perrinharkins/Conferences/care_and_feeding.html                                                          5/7
3/3/12                                                         No Title


     Code Formatting
     This was the first project I worked on where we had an official Perl::Tidy spec and we all used it. Can I just
     say it was awesome? That's all I wanted to say about it. Developers who worked on Perl::Tidy, you have
     my thanks.

     Version Control
     A couple of years ago, only crackpots had opinions about version control. CVS was the only game in town.
     These days, there's several good open source choices and everyone wants to tell you about their favorite
     and why yours is crap.

     I'm not going to go into the choice of tools too much here. You can fight that out amongst yourselves. We
     used Subversion, but I'll try to talk about the theory without getting bogged down in the mechanics.

     Most projects need at least two branches: one for maintenance of the release currently in production, and
     one for new development. Most of you are familiar with this from open source projects.

     Here are the main ideas we used for source control:

             The main branch is for new development, but must be stable. Code should not to be checked in until
             all tests pass. (But more about that later.)

             When you make a release of the main branch, tag it. That means tagging the whole branch at that
             point. Example: tag release 2.0. The main branch is now for development of 3.0.

             For each main branch release, make a maintenance branch from the point where you tagged it.
             Example: make a "2.x" branch for fixing bugs that show up in production.

             When you make a bug fix release from a maintenance branch, tag the branch and then merge all
             changes since the last release on that branch to the main branch. This is the only merging ever done
             and it's always a merge of changes from one sequentially numbered tag to the next and into the main
             branch. Example: tag the 2.x branch bug fix release as 2.1. Merge all changes from 2.0 to 2.1 to the
             main development branch.

     This is about as simple as you can make it, and it worked very well for us for a long time. Eventually
     though, we discovered situations that didn't fit nicely. One of these was that sometimes there was a period
     of a few days during QA where part of the team would still be working on bug fixes on the development
     branch while others were ready to move on to working on features for the next major release. You can't do
     both in the same place. One solution is to create the maintenance branch at that point, for doing the final
     pre-release bug fixes, and let the main branch open up for major new development. It's a bad sign if you
     need to do this often. Usually the team should be sharing things evenly enough to make it unnecessary.

     Another problem, although less frequent than you might expect, is keeping the development branch stable at
     all times. Some changes are too big to be done safely as a single commit. At that point it becomes necessary
     to make a feature branch, working on it until the new feature is stable and all tests are passing again, and
     then merging it back to the main development branch.

     Beware of complicated merging, whether your tools support it well or not. A web app is not the Linux
     kernel. If you find yourself needing to do bidirectional merges or frequent repeated merges to the point

file:///Users/perrinharkins/Conferences/care_and_feeding.html                                                          6/7
3/3/12                                                         No Title

     where you have trouble keeping track of what's been merged, you may need to take a look at your process
     and see if there's some underlying reason. Maybe the source control system is being used as a substitute for
     basic personal communication on your team, or has become a battleground for warring factions. Some
     problems are easier to solve by talking to your co-workers than by devising a complex branching scheme.




file:///Users/perrinharkins/Conferences/care_and_feeding.html                                                        7/7

More Related Content

What's hot

Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013Marcus Barczak
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling TwitterBlaine
 
cache concepts and varnish-cache
cache concepts and varnish-cachecache concepts and varnish-cache
cache concepts and varnish-cacheMarc Cortinas Val
 
2015 ZendCon - Do you queue
2015 ZendCon - Do you queue2015 ZendCon - Do you queue
2015 ZendCon - Do you queueMike Willbanks
 
HTTP caching with Varnish
HTTP caching with VarnishHTTP caching with Varnish
HTTP caching with VarnishDavid de Boer
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisFastly
 
How to investigate and recover from a security breach in WordPress
How to investigate and recover from a security breach in WordPressHow to investigate and recover from a security breach in WordPress
How to investigate and recover from a security breach in WordPressOtto Kekäläinen
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersSeravo
 
Developing cacheable PHP applications - PHPLimburgBE 2018
Developing cacheable PHP applications - PHPLimburgBE 2018Developing cacheable PHP applications - PHPLimburgBE 2018
Developing cacheable PHP applications - PHPLimburgBE 2018Thijs Feryn
 
Search in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itSearch in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itOtto Kekäläinen
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Maarten Balliauw
 
(WEB304) Running and Scaling Magento on AWS | AWS re:Invent 2014
(WEB304) Running and Scaling Magento on AWS | AWS re:Invent 2014(WEB304) Running and Scaling Magento on AWS | AWS re:Invent 2014
(WEB304) Running and Scaling Magento on AWS | AWS re:Invent 2014Amazon Web Services
 
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic ContentCaching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic ContentFastly
 
Caching with Memcached and APC
Caching with Memcached and APCCaching with Memcached and APC
Caching with Memcached and APCBen Ramsey
 
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
Stupid Boot Tricks: using ipxe and chef to get to boot management blissStupid Boot Tricks: using ipxe and chef to get to boot management bliss
Stupid Boot Tricks: using ipxe and chef to get to boot management blissmacslide
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersoazabir
 
WordPress At Scale. WordCamp Dhaka 2019
WordPress At Scale. WordCamp Dhaka 2019WordPress At Scale. WordCamp Dhaka 2019
WordPress At Scale. WordCamp Dhaka 2019Anam Ahmed
 
Going crazy with Varnish and Symfony
Going crazy with Varnish and SymfonyGoing crazy with Varnish and Symfony
Going crazy with Varnish and SymfonyDavid de Boer
 
Roy foubister (hosting high traffic sites on a tight budget)
Roy foubister (hosting high traffic sites on a tight budget)Roy foubister (hosting high traffic sites on a tight budget)
Roy foubister (hosting high traffic sites on a tight budget)WordCamp Cape Town
 

What's hot (20)

Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
 
cache concepts and varnish-cache
cache concepts and varnish-cachecache concepts and varnish-cache
cache concepts and varnish-cache
 
2015 ZendCon - Do you queue
2015 ZendCon - Do you queue2015 ZendCon - Do you queue
2015 ZendCon - Do you queue
 
HTTP caching with Varnish
HTTP caching with VarnishHTTP caching with Varnish
HTTP caching with Varnish
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
How to investigate and recover from a security breach in WordPress
How to investigate and recover from a security breach in WordPressHow to investigate and recover from a security breach in WordPress
How to investigate and recover from a security breach in WordPress
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developers
 
Developing cacheable PHP applications - PHPLimburgBE 2018
Developing cacheable PHP applications - PHPLimburgBE 2018Developing cacheable PHP applications - PHPLimburgBE 2018
Developing cacheable PHP applications - PHPLimburgBE 2018
 
Search in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itSearch in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize it
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...
 
(WEB304) Running and Scaling Magento on AWS | AWS re:Invent 2014
(WEB304) Running and Scaling Magento on AWS | AWS re:Invent 2014(WEB304) Running and Scaling Magento on AWS | AWS re:Invent 2014
(WEB304) Running and Scaling Magento on AWS | AWS re:Invent 2014
 
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic ContentCaching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
 
Caching with Memcached and APC
Caching with Memcached and APCCaching with Memcached and APC
Caching with Memcached and APC
 
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
Stupid Boot Tricks: using ipxe and chef to get to boot management blissStupid Boot Tricks: using ipxe and chef to get to boot management bliss
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
 
Scaling PHP web apps
Scaling PHP web appsScaling PHP web apps
Scaling PHP web apps
 
WordPress At Scale. WordCamp Dhaka 2019
WordPress At Scale. WordCamp Dhaka 2019WordPress At Scale. WordCamp Dhaka 2019
WordPress At Scale. WordCamp Dhaka 2019
 
Going crazy with Varnish and Symfony
Going crazy with Varnish and SymfonyGoing crazy with Varnish and Symfony
Going crazy with Varnish and Symfony
 
Roy foubister (hosting high traffic sites on a tight budget)
Roy foubister (hosting high traffic sites on a tight budget)Roy foubister (hosting high traffic sites on a tight budget)
Roy foubister (hosting high traffic sites on a tight budget)
 

Similar to Care and feeding notes

Do modernizing the Mainframe for DevOps.
Do modernizing the Mainframe for DevOps.Do modernizing the Mainframe for DevOps.
Do modernizing the Mainframe for DevOps.Massimo Talia
 
The DevOps paradigm - the evolution of IT professionals and opensource toolkit
The DevOps paradigm - the evolution of IT professionals and opensource toolkitThe DevOps paradigm - the evolution of IT professionals and opensource toolkit
The DevOps paradigm - the evolution of IT professionals and opensource toolkitMarco Ferrigno
 
The DevOps Paradigm
The DevOps ParadigmThe DevOps Paradigm
The DevOps ParadigmNaLUG
 
Migraine Drupal - syncing your staging and live sites
Migraine Drupal - syncing your staging and live sitesMigraine Drupal - syncing your staging and live sites
Migraine Drupal - syncing your staging and live sitesdrupalindia
 
PHP North-East - Automated Deployment
PHP North-East - Automated DeploymentPHP North-East - Automated Deployment
PHP North-East - Automated DeploymentMichael Peacock
 
Automated Deployment
Automated DeploymentAutomated Deployment
Automated Deploymentphpne
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeSteve Mercier
 
Automating Perl deployments with Hudson
Automating Perl deployments with HudsonAutomating Perl deployments with Hudson
Automating Perl deployments with Hudsonnachbaur
 
SE2018_Lec-22_-Continuous-Integration-Tools
SE2018_Lec-22_-Continuous-Integration-ToolsSE2018_Lec-22_-Continuous-Integration-Tools
SE2018_Lec-22_-Continuous-Integration-ToolsAmr E. Mohamed
 
Simplified DevOps Bliss -with OpenAI API
Simplified DevOps Bliss -with OpenAI APISimplified DevOps Bliss -with OpenAI API
Simplified DevOps Bliss -with OpenAI APIVictorSzoltysek
 
PVS-Studio confesses its love for Linux
PVS-Studio confesses its love for LinuxPVS-Studio confesses its love for Linux
PVS-Studio confesses its love for LinuxPVS-Studio
 
Deploying systems using AWS DevOps tools
Deploying systems using AWS DevOps toolsDeploying systems using AWS DevOps tools
Deploying systems using AWS DevOps toolsMassTLC
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And ScalabilityJason Ragsdale
 
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep DiveRightScale
 
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaAmazon Web Services
 
Mage Titans USA 2016 - Jonathan Bownds - Magento CI and Testing
Mage Titans USA 2016 - Jonathan Bownds - Magento CI and Testing Mage Titans USA 2016 - Jonathan Bownds - Magento CI and Testing
Mage Titans USA 2016 - Jonathan Bownds - Magento CI and Testing Stacey Whitney
 
System design for Web Application
System design for Web ApplicationSystem design for Web Application
System design for Web ApplicationMichael Choi
 
DevOps and Build Automation
DevOps and Build AutomationDevOps and Build Automation
DevOps and Build AutomationHeiswayi Nrird
 

Similar to Care and feeding notes (20)

Do modernizing the Mainframe for DevOps.
Do modernizing the Mainframe for DevOps.Do modernizing the Mainframe for DevOps.
Do modernizing the Mainframe for DevOps.
 
The DevOps paradigm - the evolution of IT professionals and opensource toolkit
The DevOps paradigm - the evolution of IT professionals and opensource toolkitThe DevOps paradigm - the evolution of IT professionals and opensource toolkit
The DevOps paradigm - the evolution of IT professionals and opensource toolkit
 
The DevOps Paradigm
The DevOps ParadigmThe DevOps Paradigm
The DevOps Paradigm
 
Migraine Drupal - syncing your staging and live sites
Migraine Drupal - syncing your staging and live sitesMigraine Drupal - syncing your staging and live sites
Migraine Drupal - syncing your staging and live sites
 
PHP North-East - Automated Deployment
PHP North-East - Automated DeploymentPHP North-East - Automated Deployment
PHP North-East - Automated Deployment
 
Automated Deployment
Automated DeploymentAutomated Deployment
Automated Deployment
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
 
Automating Perl deployments with Hudson
Automating Perl deployments with HudsonAutomating Perl deployments with Hudson
Automating Perl deployments with Hudson
 
SE2018_Lec-22_-Continuous-Integration-Tools
SE2018_Lec-22_-Continuous-Integration-ToolsSE2018_Lec-22_-Continuous-Integration-Tools
SE2018_Lec-22_-Continuous-Integration-Tools
 
Simplified DevOps Bliss -with OpenAI API
Simplified DevOps Bliss -with OpenAI APISimplified DevOps Bliss -with OpenAI API
Simplified DevOps Bliss -with OpenAI API
 
PVS-Studio confesses its love for Linux
PVS-Studio confesses its love for LinuxPVS-Studio confesses its love for Linux
PVS-Studio confesses its love for Linux
 
DevOps demystified
DevOps demystifiedDevOps demystified
DevOps demystified
 
Deploying systems using AWS DevOps tools
Deploying systems using AWS DevOps toolsDeploying systems using AWS DevOps tools
Deploying systems using AWS DevOps tools
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And Scalability
 
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep Dive
 
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
 
Function as a Service
Function as a ServiceFunction as a Service
Function as a Service
 
Mage Titans USA 2016 - Jonathan Bownds - Magento CI and Testing
Mage Titans USA 2016 - Jonathan Bownds - Magento CI and Testing Mage Titans USA 2016 - Jonathan Bownds - Magento CI and Testing
Mage Titans USA 2016 - Jonathan Bownds - Magento CI and Testing
 
System design for Web Application
System design for Web ApplicationSystem design for Web Application
System design for Web Application
 
DevOps and Build Automation
DevOps and Build AutomationDevOps and Build Automation
DevOps and Build Automation
 

More from Perrin Harkins

PyGotham 2014 Introduction to Profiling
PyGotham 2014 Introduction to ProfilingPyGotham 2014 Introduction to Profiling
PyGotham 2014 Introduction to ProfilingPerrin Harkins
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsPerrin Harkins
 
Efficient Shared Data in Perl
Efficient Shared Data in PerlEfficient Shared Data in Perl
Efficient Shared Data in PerlPerrin Harkins
 
Choosing a Templating System
Choosing a Templating SystemChoosing a Templating System
Choosing a Templating SystemPerrin Harkins
 
Scaling Databases with DBIx::Router
Scaling Databases with DBIx::RouterScaling Databases with DBIx::Router
Scaling Databases with DBIx::RouterPerrin Harkins
 
Care and Feeding of Large Web Applications
Care and Feeding of Large Web ApplicationsCare and Feeding of Large Web Applications
Care and Feeding of Large Web ApplicationsPerrin Harkins
 
Top 10 Perl Performance Tips
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance TipsPerrin Harkins
 
The Most Common Template Toolkit Mistake
The Most Common Template Toolkit MistakeThe Most Common Template Toolkit Mistake
The Most Common Template Toolkit MistakePerrin Harkins
 

More from Perrin Harkins (9)

PyGotham 2014 Introduction to Profiling
PyGotham 2014 Introduction to ProfilingPyGotham 2014 Introduction to Profiling
PyGotham 2014 Introduction to Profiling
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
 
Efficient Shared Data in Perl
Efficient Shared Data in PerlEfficient Shared Data in Perl
Efficient Shared Data in Perl
 
Choosing a Templating System
Choosing a Templating SystemChoosing a Templating System
Choosing a Templating System
 
Scaling Databases with DBIx::Router
Scaling Databases with DBIx::RouterScaling Databases with DBIx::Router
Scaling Databases with DBIx::Router
 
Low-Maintenance Perl
Low-Maintenance PerlLow-Maintenance Perl
Low-Maintenance Perl
 
Care and Feeding of Large Web Applications
Care and Feeding of Large Web ApplicationsCare and Feeding of Large Web Applications
Care and Feeding of Large Web Applications
 
Top 10 Perl Performance Tips
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance Tips
 
The Most Common Template Toolkit Mistake
The Most Common Template Toolkit MistakeThe Most Common Template Toolkit Mistake
The Most Common Template Toolkit Mistake
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Care and feeding notes

  • 1. 3/3/12 No Title Care and Feeding of Large Web Applications by Perrin Harkins So, you launched your website. Congratulations! And then there were a bunch of quick fixes. And you started getting traffic so you had to add more machines. And some more developers. And more features to keep your new users happy. And suddenly you find yourself spending all your time doing damage control on a site that seems to have taken on a life of its own and you can't make a new release because the regression testing alone would take three years. Usually, this is the part where everyone starts clamoring for a rewrite, and the CEO contemplates firing your ass and bringing in an army of consultants to rewrite it all in the flavor of the month. How can we avoid this mess? How can we create a web development process that is sustainable for years and doesn't hold back development? Backstory There's more than one way to do it, but I'll tell how my team did it, at a small startup company called Plus Three. Let me give you a few stats about our project: About 2.5 years of continuous development 2 - 5 developers on the team during that time 65,000+ lines of Perl code 1600+ lines of SQL (Computed with David Wheeler's SLOCCount program) Plenty of HTML, CSS, and JavaScript too 6000+ automated tests in 78 files 169 CPAN modules It's a big system, built to support running websites for political campaigns and non-profit membership organizations. Some of the major components are a content management system, an e-commerce system with comprehensive reporting, a data warehouse with an AJAX query builder GUI, a large-scale e-mail campaign system, a variety of user-facing web apps, and an asynchronous job queue. This talk isn't meant to be about coding style, which I've discussed in some previous talks, but I'll give you the 10,000 foot overview: Object-oriented MVC-ish structure with the typical breakdown into controller classes, database classes, and templates. file:///Users/perrinharkins/Conferences/care_and_feeding.html 1/7
  • 2. 3/3/12 No Title (Not very pure MVC, but that's a whole separate topic.) Our basic building blocks were CGI::Application, Class::DBI, and HTML::Template. Ok, that's the software. How did we keep it under control? Deployment Let's dive right in by talking about the hardest thing first: deployment. So hard to get right, but so rarely discussed and so hard to generalize. Everyone ends up with solutions that are tied very closely to their own organization's quirks. The first issue here is how to package a release. We used plain old .tar.gz files, built by a simple script after pulling a tagged release from our source control system. We tried to always release complete builds, not individual files. This is important in order to be sure you have a consistent production system that you can rebuild from scratch if necessary. It's also important for setting up QA testing. If you just upload a file here and there (or worse, vi a file in production!), you get yourself in a bad state where your source control no longer reflects what's really live and your testing misses things because of it. We managed to stick to the "full build release" rule, outside of dire emergencies. Like most big Perl projects we used a ton of CPAN modules. The first advice you'll get about how to install them is "just use the CPAN shell," possibly with a bundle or Task file. This is terrible advice. The most obvious problem with it is that as the number of CPAN modules increases, the probability of one of them failing to install via the CPAN shell for some obscure and irrelevant reason approaches 1. The second most obvious problem is that you don't want to install whatever the latest version of some module happens to be -- you want to install the specific version that you've been developing with and that you tested in QA. There might be something subtly different about the new version that will break your site. Test it first. Let me lay out the requirements we had for a CPAN installer: Install specific versions. Install from local media. Sometimes a huge CPAN download is not convenient. Handle versions with local patches. We always submitted our patches, but sometimes we couldn't afford to wait for a release that included them. Fully automated. That means that modules which ask pesky questions during install must be handled in some way. I'm looking at you, WWW::Mechanize. Install into a local directory. We don't want to put anything in the system directories because we want to be able to run multiple versions of our application on one machine, even if they require different versions of the same module. Skip the tests. I know this sounds like blasphemy, but bear with me. If you have a cluster of identical machines, running all the module tests on all of them is a waste of time. And the larger issue is that CPAN authors still don't all agree on what the purpose of tests is. Some modules come with tests that are effectively useless or simply fail unless you set up test databases or jump through similar hoops. file:///Users/perrinharkins/Conferences/care_and_feeding.html 2/7
  • 3. 3/3/12 No Title Our solution to the installation problem was to write an automated build system that builds all the modules it finds in the src/ directory of our release package. (Note that this means we can doctor one of those modules if we have to.) We used the Expect module (which is included and bootstrapped at the beginning of the build) and gave it canned answers for the modules with chatty install scripts. We also made it build some non-CPAN things we needed: Apache and mod_perl, the SWISH-E search engine. If we could have bundled Perl and MySQL too, that would have been ideal. Why bundle the dependencies? Why not just use whatever apache binary we find lying around? In short, we didn't want to spend all of our time troubleshooting insane local configurations and builds where someone missed a step. A predictable runtime environment is important. To stress that point a little more, if your software is an internal application that's going to be run on dedicated hardware, you can save yourself a lot of trouble by only supporting very specific configurations. Just as an example, only supporting one version of one operating system cuts down the time and resources you need for QA testing. To this end, we specified exact versions of Perl, MySQL, Red Hat Linux, and a set of required packages and install options in addition to the things we bundled in our releases. That was the theory anyway. Reality intruded a bit here in the form of cheap legacy hardware that would work with some versions of Red Hat and not others. If we had a uniform cluster of hardware, we could have gone as far as creating automated installs, maybe even network booting, but the best we were able to do was keep our list of supported OS versions down to a handful. This is also a place where human nature can become a problem. If you have a separate sysadmin group, they can get territorial when developers try to dictate details of the OS to install. But that's another separate topic. The automated build worked out very well. Eventually though, as we added more modules, the builds started taking longer than we would have liked. Remember, we built them on every machine. Not the most efficient thing to do. The obvious next step would be binary distributions, possibly using RPMs, or just tar balls. Not trivial, but not too bad if you can insist on one version of Perl and one hardware architecture. If we were only concerned about distributing the CPAN modules, it might be possible to use something existing like PAR. If you're interested in seeing this build system, the Krang CMS (which we used) comes with a version of it, along with a pretty nice automated installer that checks dependencies and can be customized for different OSes. (http://krangcms.com/) You could probably make your own for the CPAN stuff using CPANPLUS, but you'd still need to do the Expect part and the non-CPAN builds. QA Upgrades We didn't automate upgrades enough. Changes on a production system are tense for everyone, and its much better to have them automated so that you can fully test them ahead of time and make the actual work to be done in the upgrade process as dumb as possible. We didn't fully automate this, but we did fully automate one of the crucial parts of it: data and database schema upgrades. Our procedure was pretty simple, and coincidentally similar to the Ruby on Rails schema upgrade approach. We kept the current schema version number in the database and the code version number in the release package, and when we ran our upgrade utility it would look for any upgrade scripts with versions between the one we were on and the one we wanted to go to. For example, when going from version 2.0 to file:///Users/perrinharkins/Conferences/care_and_feeding.html 3/7
  • 4. 3/3/12 No Title 3.0, it would look in the upgrade/ directory (also in our install bundle), find scripts named V2.1 and V3.0, and run them in order. Usually they just ran SQL scripts, but sometimes we needed to do some things in perl as well. Our SQL upgrade scripts were written by hand. I tried a couple of schema diffing utilities but they were pretty weak. They didn't pick up things like changes in default value for a column, or know what to do with changes in foreign keys. Maybe someday someone will make a good one. Even then, it will still require some manual intervention when columns and tables get renamed, or a table gets split into multiple tables. One cool thing we discovered recently is a decent way to test these upgrades on real data. We always set up a QA server with a copy of the current version of the system, and then try our upgrade procedure and continue with testing. This works fine except that when you fix a bug and need to do it again, it takes forever to set it up again. We tried VMWare snapshots, but the disk performance for Linux on VMWare was so poor that we had to abandon it. Backups over the network seemed like they would take a long time to restore. Then we tried LVM, the Linux Volume Manager. It let us take a snapshot just before the upgrade test, and then roll back to it almost instantly. Time-travel bug Plugin System Harder than it sounds Simple factory works for most things Configuration The trouble with highly configurable software is that someone has to configure it. Our configuration options expanded greatly as time went on, and we had to devise ways to make configuring it easier. We started with a simple config file containing defaults and comments, like the one that comes with Apache. In fact it was very much like that one because we used Config::ApacheFormat. In the beginning, this worked fine. Config::ApacheFormat supplied a concept of blocks that inherit from surrounding blocks, so that if you have a block for each server and a parameter that applies to all of them, you can put it outside of those blocks and avoid repeating it. You can even override that parameter in the one server that needs something different. As the number of parameters grew, we realized a few things: People will ignore configuration options they don't understand. Expectations are that if the server starts, it must be okay. A few of lines of comments in a config file is pretty weak documentation. Long config files full of things that you hardly ever need to change are pointless and look daunting. To deal with these problems, we started making extensive use of default values, so that things that didn't usually get changed could be left out of the file. We ended up creating a fairly complex config system in order to keep the file short. It does things like default several values based on one setting, e.g. setting the domain name for a server allows it to default the cookie domain, the e-mail account to use as the From file:///Users/perrinharkins/Conferences/care_and_feeding.html 4/7
  • 5. 3/3/12 No Title address on site-related mail, etc. Of course this created the necessity to see what all of the values were defaulting to, so a config dumper utility was created. By the time we were done, we had moved to a level where using one of the complex config modules like Config::Scoped probably would have been a better choice than maintaining our own. Well, Config::Scoped still scares me, but something along those lines. Testing You all know the deal with testing. You have to have it. It's your only hope of being able to change the code later without breaking everything. This point became very clear to me when I did a couple of big refactorings and the test suite found all kind of problems I missed on my own. For any large application, you'll probably end up needing some local test libraries that save setup work in your test scripts. Ours had functions for doing common things like getting a WWW::Mechanize object all logged in and ready to go. When you're testing a large database-driven application, you need some strategies for generating and cleaning up test data. We created a module for this called Arcos::TestData. (Arcos is the name of the project.) The useage is like this: my $creator = Arcos::TestData->new(); END { $creator->cleanup() } # create an Arcos::DB::ContactInfo my $contact_info = $creator->create_contact_info(); # create one with some values specified my $aubrey = $creator->create_contact_info(first_name => 'George', occupcation => 'housecat'); This one is simple, but some of them will create a whole tree of dependent objects with default values to avoid needing to code all that in your test. When the END block runs, it deletes all the registered objects in reverse order, to avoid referential integrity problems. This seemed very clever at the time. However, after a while there were many situations that required special handling, like web-based tests that cause objects to be created by another process. We had solutions for each one, but they took programmer time, and at this point I think it might have been smarter to simply wipe the whole schema at the end of a test script. We could have just truncated all the non-lookup tables pretty quickly. We got a lot of mileage out of Test::WWW::Mechanize. Test::Class helps similar classes Testing web interfaces - Mech tricks - Selenium Smolder Testing difficult things file:///Users/perrinharkins/Conferences/care_and_feeding.html 5/7
  • 6. 3/3/12 No Title Code Formatting This was the first project I worked on where we had an official Perl::Tidy spec and we all used it. Can I just say it was awesome? That's all I wanted to say about it. Developers who worked on Perl::Tidy, you have my thanks. Version Control A couple of years ago, only crackpots had opinions about version control. CVS was the only game in town. These days, there's several good open source choices and everyone wants to tell you about their favorite and why yours is crap. I'm not going to go into the choice of tools too much here. You can fight that out amongst yourselves. We used Subversion, but I'll try to talk about the theory without getting bogged down in the mechanics. Most projects need at least two branches: one for maintenance of the release currently in production, and one for new development. Most of you are familiar with this from open source projects. Here are the main ideas we used for source control: The main branch is for new development, but must be stable. Code should not to be checked in until all tests pass. (But more about that later.) When you make a release of the main branch, tag it. That means tagging the whole branch at that point. Example: tag release 2.0. The main branch is now for development of 3.0. For each main branch release, make a maintenance branch from the point where you tagged it. Example: make a "2.x" branch for fixing bugs that show up in production. When you make a bug fix release from a maintenance branch, tag the branch and then merge all changes since the last release on that branch to the main branch. This is the only merging ever done and it's always a merge of changes from one sequentially numbered tag to the next and into the main branch. Example: tag the 2.x branch bug fix release as 2.1. Merge all changes from 2.0 to 2.1 to the main development branch. This is about as simple as you can make it, and it worked very well for us for a long time. Eventually though, we discovered situations that didn't fit nicely. One of these was that sometimes there was a period of a few days during QA where part of the team would still be working on bug fixes on the development branch while others were ready to move on to working on features for the next major release. You can't do both in the same place. One solution is to create the maintenance branch at that point, for doing the final pre-release bug fixes, and let the main branch open up for major new development. It's a bad sign if you need to do this often. Usually the team should be sharing things evenly enough to make it unnecessary. Another problem, although less frequent than you might expect, is keeping the development branch stable at all times. Some changes are too big to be done safely as a single commit. At that point it becomes necessary to make a feature branch, working on it until the new feature is stable and all tests are passing again, and then merging it back to the main development branch. Beware of complicated merging, whether your tools support it well or not. A web app is not the Linux kernel. If you find yourself needing to do bidirectional merges or frequent repeated merges to the point file:///Users/perrinharkins/Conferences/care_and_feeding.html 6/7
  • 7. 3/3/12 No Title where you have trouble keeping track of what's been merged, you may need to take a look at your process and see if there's some underlying reason. Maybe the source control system is being used as a substitute for basic personal communication on your team, or has become a battleground for warring factions. Some problems are easier to solve by talking to your co-workers than by devising a complex branching scheme. file:///Users/perrinharkins/Conferences/care_and_feeding.html 7/7