14. The right optimisation is NOT EVIL
âWe should forget about small efficiencies,
say about 97% of the time:
Premature optimization is the root of all
evil.
Yet we should not pass up our opportunities
in that critical 3%.â
-
Donald Knuth (1974)
26. Reduce Risk: Product Spikes
Product Spike
â A discovery story used to analyse or answer a
question
â Yes - Further define story and continue
â No - Save analysis
â Time-boxed
â Quantified against our goals
35. The âPainâ ratio
Find out what percentage of users will experience a given âpercentileâ time
in a user session
p = percentile
n = Average number of pages per session
1 - (1-p)n
36. The âPainâ ratio example
Example Scenario:
Average number of pages per session = 20 pag
95th percentile total page time for your site = 6s1 - (1-p)n
1 - (1 - 0.05)20 = 1 - 0.358 => 64.2 %
64% chance that a user will hit a 6s page
Iâm John Clegg
Come from an Ops background
Been involved with building and scaling websites for a long time
I now work at Xero - Accounting software company which makes global accounting software
My talk is about âPerformance as a featureâ
This talk is really about how to make Performance âthinkingâ part of the dev and Ops culture
Itâs also the story of my team - Impossible Mission Force (IMF) - we are the performance and scalability team of Xero.
Weâre not a team the focuses exclusively on fixing slow web pages
Our mission is to:
Get right tooling in place
Create Standard metrics for business
Educate and train team
Assist teams with learning perf tooling
Our goal is once misson weâll âself destructâ
Research shows users hate slow pages
57% of users will abandon a site after 3 seconds
What most users do is CTRL-T / something else!
All the stats show us faster pages = faster conversions
In the world of cloud infrastructure
saving ms = saving $$$
We make a change to one our most popular pages and save a 1.3 seconds of time = 41,422 minutes of server time saved EVERY day
Most NZ sites are not mobile friendly
Slow websites = higher DATA = cost $$ and bad user experience
Weâll figure it out when its a problem...
Weâll figure it out when its a problem...
Performance tuning at the last step.
Or you simply run out of time in the rush to get features out the door.
Sometimes you canât do that because you need infrastructure changes and that can take time
Non functionals like security and performance often are ignored or low priority
The eternal push for features
When feature usage ramps up - team has moved onto the next feature
(Performance as part of V2 of a product)
Minimal metrics or not the right metrics.
Or customers are telling you that you are slow
Youâve not take account of product growth
Your metrics become a sea of data and you find out hard to spot issues.
Internal processes need to change when # dev increases & distributed (ie new offices)
Get data and metric of your site
We delivered a âState of the Nationâ performance report for the business.
Put it in terms that the business can understand
In customer terms - Number of customers who experience a problem every day . Percentages can mask the ârealâ impact
Customer support terms - eg tickets
Customer experience wasted â seconds
Once our favourites
We need to show progress of what weâre doing â
Understand the investment in building metrics and tooling.
Weâre always thinking about ROI
We have to be careful we donât get trapped looking for the perfect solution
There are always low hanging fruit and then optimisations get harder and long
Figure how you can deliver incremental improvement
Proof of concept spikes -
help the business reduce risk and how teams to understand effort
Teams need Time and resource is allocated to measure & test properly
Part of âfeature signoffâ
In practice this is something that can be measured and tested throughout the development process.
Customers and product needs change
You have to scale your performance metrics and testing to cater to the changes
We want to know where teams were at with performance thinking
So we started with what do they know about their pages
We asked the teams a simple question
How fast your pages in production?
We got mixed result some teams knew and some teams didnât
Who was looking after features that didnât have active teams.
We realised we need to surface better metrics to teams
Make all data available and shareable - Data dog + Sumo
Train teams on how to use it - What to look for.
We made templates, teams add application specific metrics to our templates
We live in data and metrics , whatâs important is important
Synthetic vs Real user metrics
Ie. median, averages, and percentiles can be affected
Median, Average, 95th percentiles
Think about worst cases, outliers
âHow not to measure Latencyâ - Gil Tene
Convert metrics to # customers affected
We converted some metrics to simple traffic light
Eg. pages response time to %5 customers affected (How many customers are affected- State of the nation report)
Find out how your code is running on your stack
Application performance monitoring tools like
Dev perspective - tools to help isolate and identify problems
Weâve found the best ROI for these tools are when you are delivering new features and triage problems
Simplify whatâs needed for a team to get started, simple templates and training to get started
On their own environments
Conscientiously decided NOT to have a dedicated environment for testing
You need to ability to test a before and after feature changes.
Create a simple template to test before and after and to be able to compare results
Itâs important that you can identify subcomponents eg. Ajax calls to isolate potential changes
Performance testing should be a part of the build process.
Devâs need to âflaggedâ early on that there are performance issues
Feature flagging
Not only the ability to turn a feature on / off
Limit
Internal users
Subset of users
percentages
This enables the devs and business to gain confidence in the quality and perf of a feature
Scientist path
Popularised by github
Run two code paths . Log results of second code path
This enables devs to test in production and check results
Really helps with edge cases
We donât know what we donât know
Facets - Making the training really approachable
Two phased approach
- Introductory Low entry / Practical .
2nd - Workshops to work on their own problems.
aimed QAs + Senior devs
Assist the teams and try not to do the work
Attend team reviews , be part of technical kick off discuss
Promote early discussion of getting performance metrics and testing
Celebrate the wins
Speed demon award â 2kg pack of jet planes
The stick - putting warnings into build and eventually failing builds???
Convert metrics to # customers affected
Eg. pages response time to %5 customers affected (How many customers are affected- State of the nation report)
Metrics
Performance testing
Github - story isnât complete and until its fast
Performance becomes part of code quality discussion
Itâs one of the criteria for pull requests
Criteria for build success.
Get them the tools
Metrics
Tools
Training
Carrot and Stick
Get them the tools
Metrics
Tools
Training
Carrot and Stick
Prove its a problem
Show ROI
Quantify investment
Prove its a problem
Show ROI
Reduce risk
Spikes, feature flagging, scientist
Prove its a problem
Show ROI
Quantify investment
Prove its a problem
Show ROI
Reduce risk
Spikes, feature flagging, scientist