New Relic - May 2015 Meetup @ thetrainline

•

0 likes•484 views

trainline Engineering

Improving customer happiness by Paul Kiddie (@pkiddie) for the New Relic May 2015 meetup hosted by thetrainline

Technology

Improving customer happiness
at thetrainline with New Relic
Paul Kiddie @pkiddie
@ttl_engineering

What we do
Train companies
Small business
Mobile AppsConsumer
Website
Services

Some vitals
• ~40 Environments
• over 1000 servers
• over 100 products
• Windows/.NET
• New Relic .NET agent / Server Monitor
• Automation is key!

Before New Relic
• Application errors logged to disk
• Production support team look at logs
– After production issue identified from customer
reports
– After platform release to check change in patterns
• Ad-hoc and reactive
• Errors difficult to reproduce as usually
hours/days after the event and out of context

Introducing New Relic at thetrainline
• Zero capital outlay, subscription model, up and
running in an hour
• Identified a product: leisure website
• Continuous delivery pipeline with blue/green
deployments to all environments
• Needed solution for continuous monitoring

Introducing New Relic at thetrainline
• New Relic agent / server monitor part of
webserver recipe
• Deployed with high security enabled
• Out of the box
– Near-real time error logging / alerting
– Application / end-user performance
– Deployment markers
– User funnels

Immediate value
• Error rate as a team key performance
indicator
• Drive down error rate through weekly health
checks
• Remediate top three errors by adding directly
to dev team backlog
• Stack traces visible and actionable by
developers without further analysis

$https://api.newrelic.com/v2/applications/{application_id}/metrics/data.json$

Taking it further
• Roll out New Relic across all machines in all
environments
– New machines created by Chef automation install
New Relic by default
– Else use SCCM to manage installation
Application/server monitoring built in and
zero effort for dev teams

Taking it further
Custom attributes
• Mimic high security mode in newrelic.config
– Create and deploy Chocolatey package through Chef /
SCCM
• Observations:
– New Relic .NET agent doesn’t check in to verify
highSecurity setting matches once it has started
<highSecurity enabled=“true” />

More value…
• Use custom attributes to augmentTransaction
and PageView events with more information
to form other business metrics.
• Phoenix’s real-time payments dashboard
– Spread of payment methods
– Effect of payment outages

Users of New Relic at thetrainline
• Monitoring/Production Support for near
real time running health of system
• Product owners home in and use funnels to
prioritise product spend
• Developers get rapid feedback on new
features
• Management get a holistic view of the
system through the map feature

What we’d like to see
• Javascript errors in Insights
• Better Javascript stack traces
• Per application retention period in Insights
• .NET async support

What’s next
• More custom attributes!
• Develop and run Node web apps in
production
– use New Relic node.js agent
– different deployment model, bundle agent/config
with the app
• Monitoring RabbitMQ instances

What's hot

Compliance watcher legal compliance toolkarisma hirapara

Scaling Enterprise DevOps w/ New Relic: Nationwide’s Modernization Journey, F...New Relic

Kovair QuickSync OverviewKovair

888 IT Operations Management with NolioNolio

Benefits of Real time KPIs & Metrics in an Integrated EnvironmentKovair

Streamline Workflows Using Salesforce Process builderSuyati Technologies

AutoCOBPage Solutions

Application Performance Management - Solving the Performance PuzzleLDragich

Smart (IoT) DevOps solutionPritesh Gandhi

New RelicGene Chuang

Brisbane Salesforce User Group - May 2015 - Lightning Process BuilderKevin Akermanis

OCCMS - Orbit InformationMark Donnison

Site24x7 Cloud MonitoringSite24x7

Service quality manager incluitRafael Ibanez

Alternative to SolarWindsSite24x7

Akamai Admin General SessionAkamai Developers & Admins

Site24x7 PHP Monitoring for DevOpsSite24x7

AppSphere 15 - DevOps and Agile: AppDynamics in Continuous Integration Enviro...AppDynamics

Learning Request ManagementCA | Automic Software

What's hot (19)

Compliance watcher legal compliance tool

Scaling Enterprise DevOps w/ New Relic: Nationwide’s Modernization Journey, F...

Kovair QuickSync Overview

888 IT Operations Management with Nolio

Benefits of Real time KPIs & Metrics in an Integrated Environment

Streamline Workflows Using Salesforce Process builder

AutoCOB

Application Performance Management - Solving the Performance Puzzle

Smart (IoT) DevOps solution

New Relic

Brisbane Salesforce User Group - May 2015 - Lightning Process Builder

OCCMS - Orbit Information

Site24x7 Cloud Monitoring

Service quality manager incluit

Alternative to SolarWinds

Akamai Admin General Session

Site24x7 PHP Monitoring for DevOps

AppSphere 15 - DevOps and Agile: AppDynamics in Continuous Integration Enviro...

Learning Request Management

Similar to New Relic - May 2015 Meetup @ thetrainline

Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...NETWAYS

System center seminar presentationC/D/H Technology Consultants

Monitoring at the Speed of DevOpsDevOps.com

Gain Insights, Make Decisions, and Take Action Across a Streamlined and Autom...Arraya Solutions

MineExcellence Drilling Platform MineExcellence

Continuous Delivery of Cloud Applications:Blue/Green and Canary DeploymentsPraveen Yalagandula

VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld

Journey to the center of DevOps - v6Venkat Janardhanam, MS, MBA

70% Improvement in Service and Product Delivery on Implementing DevOpsCygnet Infotech

Automated Testing ServicesScienceSoft

Road to agile: federal government case studyDavid Marsh

Ofer Maor - Security Automation in the SDLC - Real World Casescentralohioissa

Leveraging DevOps Principles for Release and DeploySerena Software

VMworld 2013: VMware and Puppet: How to Plan, Deploy & Manage Modern Applicat...VMworld

From Release Bottleneck to Deployment Flow - how Eaton Vance revolutionized t...Serena Software

Techcello webinar ppt slidesharekanimozhin

The Business Justification for APMJonah Kowall

VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...VMworld

[India Merge World Tour] Electric CloudPerforce

Improving DevOps through Cloud Automation and Management - Real-World Rocket ...Ostrato

Similar to New Relic - May 2015 Meetup @ thetrainline (20)

Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...

System center seminar presentation

Monitoring at the Speed of DevOps

Gain Insights, Make Decisions, and Take Action Across a Streamlined and Autom...

MineExcellence Drilling Platform

Continuous Delivery of Cloud Applications:Blue/Green and Canary Deployments

VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...

Journey to the center of DevOps - v6

70% Improvement in Service and Product Delivery on Implementing DevOps

Automated Testing Services

Road to agile: federal government case study

Ofer Maor - Security Automation in the SDLC - Real World Cases

Leveraging DevOps Principles for Release and Deploy

VMworld 2013: VMware and Puppet: How to Plan, Deploy & Manage Modern Applicat...

From Release Bottleneck to Deployment Flow - how Eaton Vance revolutionized t...

Techcello webinar ppt slideshare

The Business Justification for APM

VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...

[India Merge World Tour] Electric Cloud

Improving DevOps through Cloud Automation and Management - Real-World Rocket ...

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Histor y of HAM Radio presentation slidevu2urc

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service

Histor y of HAM Radio presentation slide

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Injustice - Developers Among Us (SciFiDevCon 2024)

[2024]Digital Global Overview Report 2024 Meltwater.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Google AI Hackathon: LLM based Evaluator for RAG

Breaking the Kubernetes Kill Chain: Host Path Mount

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Understanding the Laravel MVC Architecture

Boost PC performance: How more available memory can improve productivity

My Hashitalk Indonesia April 2024 Presentation

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

The 7 Things I Know About Cyber Security After 25 Years | April 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Maximizing Board Effectiveness 2024 Webinar.pptx

Finology Group – Insurtech Innovation Award 2024

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

IAC 2024 - IA Fast Track to Search Focused AI Solutions

New Relic - May 2015 Meetup @ thetrainline

1. Improving customer happiness at thetrainline with New Relic Paul Kiddie @pkiddie @ttl_engineering

2. What we do Train companies Small business Mobile AppsConsumer Website Services

3. Some vitals • ~40 Environments • over 1000 servers • over 100 products • Windows/.NET • New Relic .NET agent / Server Monitor • Automation is key!

4. Before New Relic • Application errors logged to disk • Production support team look at logs – After production issue identified from customer reports – After platform release to check change in patterns • Ad-hoc and reactive • Errors difficult to reproduce as usually hours/days after the event and out of context

5. Introducing New Relic at thetrainline • Zero capital outlay, subscription model, up and running in an hour • Identified a product: leisure website • Continuous delivery pipeline with blue/green deployments to all environments • Needed solution for continuous monitoring

6. Introducing New Relic at thetrainline • New Relic agent / server monitor part of webserver recipe • Deployed with high security enabled • Out of the box – Near-real time error logging / alerting – Application / end-user performance – Deployment markers – User funnels

8. Immediate value • Error rate as a team key performance indicator • Drive down error rate through weekly health checks • Remediate top three errors by adding directly to dev team backlog • Stack traces visible and actionable by developers without further analysis

9. https://api.newrelic.com/v2/applications/{application_id}/metrics/data.json

10. Taking it further • Roll out New Relic across all machines in all environments – New machines created by Chef automation install New Relic by default – Else use SCCM to manage installation Application/server monitoring built in and zero effort for dev teams

11. Taking it further Custom attributes • Mimic high security mode in newrelic.config – Create and deploy Chocolatey package through Chef / SCCM • Observations: – New Relic .NET agent doesn’t check in to verify highSecurity setting matches once it has started <highSecurity enabled=“true” />

12. More value… • Use custom attributes to augmentTransaction and PageView events with more information to form other business metrics. • Phoenix’s real-time payments dashboard – Spread of payment methods – Effect of payment outages

13.

14. Users of New Relic at thetrainline • Monitoring/Production Support for near real time running health of system • Product owners home in and use funnels to prioritise product spend • Developers get rapid feedback on new features • Management get a holistic view of the system through the map feature

15. What we’d like to see • Javascript errors in Insights • Better Javascript stack traces • Per application retention period in Insights • .NET async support

16. What’s next • More custom attributes! • Develop and run Node web apps in production – use New Relic node.js agent – different deployment model, bundle agent/config with the app • Monitoring RabbitMQ instances

Editor's Notes

Well, firstly I’d like to welcome you all to thetrainline offices. I’m Paul Kiddie, a web developer on the Tango team at thetrainline. This evening we’ll be talking about how we’ve used New Relic to improve customer happiness and use the insights there in order to hone in on the things that matter. You can get me on pkiddie, or the trainline engineering twitter account at ttl underscore engineering.
So, to set the scene a little, these are our core business areas. Our development teams are aligned to each of these business areas. We work with several train companies to provide the booking engine for them. We have an apps team responsible for the iOS and Android mobile apps. We also provide personalised booking engines for business, and underpinning all these are a set of core platform services. I work on the consumer website [CLICK], as part of the Tango team.
Just to give you an idea of the size of our infrastructure, these are some vitals. We have 40 environments including test environments, totalling 1000 severs, of which there are approximately 100 distinct deployable applications and services. Out stack is primarily .NET on Windows, so we’ve been using the .NET agent and server monitor for Windows. With a sizable estate we love automation for testability and repeatability.
So, the state of the world before New Relic was a dark place with our applications logging errors to disk, but unless a production issue was identified – most likely by customers spotting them, or after a platform release, these logs were left on disk and provided no/little value. This was very reactive and the lead time for analysis was hours/days after the event and usually out of context and so difficult to reproduce for the developers to investigate.
Picking New Relic was an easy choice for us – since there was zero capital outlay to get us up and running – which we managed within an hour within our test environments. We introduced New Relic by taking a product – the main website at www.thetrainline.com, ripping it our of legacy platform release cycle and built a continuous delivery pipeline with blue green deployments. But this required a solution to provide us with continuous monitoring, especially during switches to new builds of the website, to spot any problems before the customer does – by freeing errors from the logs on disk.
Through infrastructure automation the new relic .net agent and the server monitor were made part of a default application server build. We deployed with high security on to provide some guarantees that no sensitive data would be sent to new relic, and we’ve been enjoying the benefits since: Alerting for Javascript/app errors for immediate rollback or fix then re-deploy Performance metrics around business logic and end user speed. Deployment markers (which appear as vertical lines on most of New Relic’s graphs) Funnels
Like this one, which lets us hone in on parts of the booking flow we should be paying most attention to. Note the insights query actually uses our our own session identifier passed in as a custom attribute. This is so we can guarantee we can reason over a users journey, since the new relic session might be blocked by default browser policies, for example on iOS devices, where you’d get a new session per pageview.
So now we’ve got these metrics, we now use some of them as key performance indicators for the team. For example, we have a target error rate to achieve, set per quarter. We have weekly health checks where the New Relic data takes centre stage. We’ve hooked into the API to get week end error rates over the last six months (to give values like the email gives) and plotted them. We then take the top three errors, add them to our backlog. These backlog items contain a link back to New Relic with a stack trace [most of the time] so developers are able to FOCUS on the fix rather than doing manual repetitive work to analyse and get a head start investigating and fixing.
And this is the result of our work. Unfortunately, the graph doesn’t go back any further than this otherwise you’d be seeing a headline error rate of 0.5%...
We’ve been happily using New Relic in the Tango team for months and months so the next step was to roll it out across the entire estate. We’re big fans of Chocolatey for managing package installs in Windows, so our Chef recipes that provision our servers install our new relic Chocolatey package by default. For legacy boxes (which we’re in the process of replacing with automated provisioning) we’re using microsoft’s system center to manage it’s install. Either way, dev teams now get the benefit of application and server monitoring with zero effort. We’re poking in a custom application name for each of our products at deployment time so we don’t applications reporting in as “My Application”. Now that we have it rolled out t also means we can begin to take advantage of the map feature and view our web and service tiers.
To get more out of New Relic we’ve been taking advantage of custom attributes. We were running in high security mode across the entire estate and we needed to find a controlled way to disable it whilst eliminating any monitoring outage. But we still wanted the assurances that high security offers, so we mimic what high security mode does within the newrelic.config (as we’re using the .NET agent). This config is part of our Chocolatey package and is rolled out through our Chef recipes during automated provisioning, or SCCM otherwise. We then promoted the changes environment by environment. An observation we noted that was running .NET agents don’t currently verify high security once they’ve started - they don’t check in periodically. This allowed us decouple the disabling of high security at New Relic’s end with the high security setting in the config, since these changes required an iisreset.
By using custom attributes we are able to augment the existing events in Insights (PageViews and Transactions) with extra information. This information can form the basis of business metrics. One good example of this is Phoenix’s real-time payments dashboard. The Phoenix team are responsible for delivering a lot of the platform services the website relies on. This shows us the revenue we are delivering and other, more insightful information around the use of different payment methods - information wasn’t available before, at least, in near-real time.
These headline figures provide glanceable information, and allow us to to ask questions like if we improve error rates across the tiers, then how we affect our transaction rates? A similar dashboard to this allows us to assess the effect of a payment outage on the business.
We have a breadth of users of New Relic at thetrainline. Monitoring and production support use it to monitor the running health of the system. They’ve created Insights dashboards reporting page by page performance, to detect performance issues quickly, and drill down further. Product owners can use the funnels view to determine which part of the booking flow need attention and priortise where development effort should go. Developers get rapid feedback on new features Management get a holistic view of thetrainline’s systems through custom Insights dashboards and using the map feature.
So just some of what we’d like to see and start doing is: Reporting Javascript errors in Insights. We’ve seen some progress there as one of the latest agent updates gave us app errors in there, which means we can start using Insights rather than the API to plot trends over time. Our Javascript is minified but we do offer source maps – we’d love New Relic to be able to use these to provide more context on Javascript errors. Several of our apps are more critical than others – so tuning our data retention in Insights per application would be great. Most importantly, a lot of our code is moving to use async/await in .NET – and whilst there are workarounds New Relic doesn’t natively support it natively.
So, what’s next for us at thetrainline? Teams are just getting started with what custom attributes can offer. As a company, we’re going more polyglot and in the web team we’ll be running node apps in production and taking full advantage of the node js New Relic agent. This has a different deployment model to the .NET agent (which is a server level install). Instead the node.js agent is installed per application, so this will provide some challenges to assure the high security settings are right in config. Many of our services are moving to to use RabbitMQ, which means we need to understand at a glance the state of the system. We’re hoping New Relic can help here too! Thanks for listening!

New Relic - May 2015 Meetup @ thetrainline

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to New Relic - May 2015 Meetup @ thetrainline

Similar to New Relic - May 2015 Meetup @ thetrainline (20)

Recently uploaded

Recently uploaded (20)

New Relic - May 2015 Meetup @ thetrainline

Editor's Notes