Hear from AppDynamics client at AppD Global Tour Stockholm, Equinor on why Application Performance Monitoring was needed in their enterprise organisation.
9. - Is application
X available?
- Is Equinor
complying to
regulations?
- Is the PnL ready
for the new trading
day?
- Is everything
okay? Or is there
something that
requires me to
take action?
- While we have increasingly complex
applications, do they support employee
productivity?
1. 2. 3.
12. By using an APM suite we will according to risk management:
Reduce the probability of incidents with real-time alerts
Reduce the impact of incidents with early identification of the root-cause in order to improve Mean Time To Repair
(MTTR).
13. By knowing what normal look like. Using the policy based capabilities of an APM suite for 24/7 operations,
we can automatically:
1.
Auto-remediate based
on policies
2.
Notify external providers Create Incidents
3.
16. Capture infrastructure events that
are important to application/IT
Solution performance
Relationship between infrastructure
events and applications/IT events
automatically correlated in our CMDB
Urgency of the the corresponding alert
is reflected to production support via
real-time dashboards
19. Payment based on alignment to
monitoring thresholds
Fixed price for Incident
Handling
Agreed price for
Request Handling
Set up contacts
where sub-vendors
are given rewards…
:
…in order to prevent
incidents and to
ensure mutual
business outcomes.:
20. We aim to share APM dashboards with different vendors:
Established Collaboration Rooms Vendor Scorecards
Statoil changed name May 16th to Equinor
Equinor at a glanceWe are a Norwegian-based energy company with operations in more than 30 countries.Since 1972 we have explored, developed and produced oil and gas on the Norwegian continental shelf, where we are a leading operator. From the early nineties we have built a global business, with strongholds in Europe, Africa, North America and Brazil. We have developed a portfolio of new energy solutions, currently delivering wind power to 650,000 British households:
Energising the lives of 170 million people every day
World’s largest offshore operator
20,500 employees
I am responsible for piloting the use of APM towards the business area Marketing, Midstream and Processing (MMP)
MMP maximizes value across the oil, gas and electricity value chains
Transportation
Refining and processing
Marketing and trading
We in IT are focusing on utalising APM to manage the IT applications supporting Transportation and Marketing & trading for the commodities Gas and Electricity
As you see from the Key Facts listed
We are not only marketing and trading Equinor's own volumes but also act behalf of the Norwegian state’s direct financial interest.
Equinor ranks as one of the world’s largest crude oil traders on a net basis
We are EU’s second largest supplier of natural gas
Here you see the strategic agenda for MMP
(Animation on click) We are currently in the final sprint implementing APM. In a project together with AppDynamics, we are focusing on setting up metrics for IT applications supporting flow assurance and value creation:
Flow assurance – Transport/Scheduling is all about reputation, - we need to deliver as agreed (time and place) and according to the rules of the external operator of the gas pipelines and electricity networks. We also like to avoid penalties
Value creation: When trading and scheduling gas and electricity, we need to know both our physical and financial position. We need to manage risk. And, as those of you working in the Financial Sector know, we also need to be regulatory compliant (EU, US).
In this presentation, I will share with you our three step journey
(Animation for each of the steps)
For each of these steps, understanding what does normal look like is key
Image source: https://www.gettyimages.co.uk/license/527343219
Image source: https://www.gettyimages.co.uk/license/669260208
Robin’s “my story”:
VP of IT was last spring invited to a top management meeting in MMP. They challenged him about pour regulation in an important IT delivery related to End of Day and a daily Profit and Loss report.
Guess who just recently had taken on the responsibility in IT for producing this critical cross application report?Yes, it was me.
Guess who was invited to a meeting with the VP and asked to fix “the problem” and ensure that PnL was ready before each trading day?
I started to seek for something that could give us as IT vendor better control
Eureka! - APM
Key to us is to understand “normal”
We are moving away from a application centric IT Operation (e.g. is application X available?) towards IT Solutions (data flow between internal and external applications where the set of applications supporting a critical business function is essential, e.g:
is Equinor regulatory compliant?
is the PnL ready before new trading day?
We need transparency both for users in the business and responsible IT Production Line, e.g:
Is everything okay, is something required to do - by me?
At the same time as we aim for ease of use to increase employee productivity, the application complexity is exploding
Peter Drucker is the same who stated: The best way to predict the future is to create it.
By using an APM suite we will according to risk management:
Use real-time alerts as soon as functionality breaks down or response time dips below established thresholds to reduce the probability for incidents and avoid unexpectedly slow response times (as repeating slowness will impact how users see a service, not only application downtime and unavailability).
Early identification of the root cause will let us go directly into resolution modus and thereby reduce the impacts of an incident, Mean Time To Repair (MTTR).
Tired of
Watermelon KPIs
“the war of the innocents”
Reduce probability of undesirable events:
Proactive alerting based on dynamic baselines
Tight integration with ServiceNow for events and incident handling - act
Proactively optimise applications based on scorecards, remove bottlenecks - tuneIn addition:
Dashboards for Dev, Ops and Business – based on the same data, with ability to perform drill-downs to code-line/DB
Granular role-based access control (RBAC )
Reduce impact of undesirable events:
Rapid Root Cause Analysis and troubleshooting
Reduce Mean Time To Repair (MTTR) by up to 90%!
Single pane of glass – faster and easier collaboration, faster service restoration Code-level drill down from Web-UI) - share findings to production line members or product/service vendor
Policy-based auto remediationIn addition
Policy-based ServiceNow ticket creation
When we know what's normal look like. We can by utilizing the policy based capabilities of an APM suite for 24/7 operation, automatically:
restart (auto-remediate) services that have stopped based on policies
notify external providers (SaaS or others) of issues and incidents by e-mail to their service desk
create incidents in our Service Management system with the correct priority, so assignment groups handling infrastructure, integrations or messaging can respond
Building metrics:
Through building own data centric metrics, we are also trying to capture Tacit Knowledge, knowledge in the head of key personnel. This is even better than documentation as Runbooks – you have build the knowledge into your APM system.
Digitalisation of IT Operations (repeat):
restart (auto-remediate) services that have stopped based on policies
notify external providers (SaaS or others) of issues and incidents by e-mail to their service desk
create incidents in our Service Management system with the correct priority, so assignment groups handling infrastructure, integrations or messaging can respond
Why not to try capture and build Tacit Knowledge into your APM monitors
Through my position, as a IT leader delivering to MMP, I have an application centric view trying to ensure that critical functions are working => happy business and users
Events in infrastructure that impact an application or IT Solution are important to capture.
The relation between an infrastructure component and an IT Solution needs to be reflected in our CMDB.
If correlation between an infrastructure event and an alert/error for an IT solution:
The Service Offering (the urgency) needs to be reflected
Status of the infrastructure incident needs to be visible for the Production line operating the IT Solutions (visible in the dashboard)
To act upon an alert you need to understands it's criticality (Service Offering), if it is an infrastructure event you need to know which applications that might be effected (CMDB) and you need to know the status of the incident (is someone working to solve it? expected resolution time?). Trust is about transparency and communication. Responsible IT unit towards affected users need facts.
Question to the audience:
Do any of you use ServiceNow or a similar Service Mangement tool to corollate events in infrastructure with problems in an application?
App Centric View: We should use the application as the window. This does not mean that we do not monitor infrastructure, but we should look at APM as the window to infrastructure.
Data lake: We should collect as much data as possible so we can create a data lake where we use analytics to identify the correlations.
Establish a strategy: Using APM as a cornerstone to a well thought out Enterprise Monitoring strategy will allow us to support the speed for development without compromising end-user experience.
Creating a culture for continuously improvement focusing on availability – not on fix on failure
Question to the audience:
Do any of you use APM as a foundation for vendor performance management?
We aim so share the APM dashboard across vendors (therefore they need to be owned by us)
We believe daily stand-up can be and useful tool
We have established “collaboration rooms” where vendors are invited
We believe in incremental improvement (not a “fix on failure” culture) and aim to use vendor scorecards, e.g:
Used APM dashboard with drilldown, in my to last vendor meetings
Grady Booch (born February 27, 1955) is an American software engineer, best known for developing the Unified Modeling Language (UML) with Ivar Jacobson and James Rumbaugh. He is recognized internationally for his innovative work in software architecture, software engineering, and collaborative development environments.
To secure ownership, enable value creation and vendor independent solutions:
Bullet 1-5 (animation)
Based on our internal learning, feedback from Gartner and dialog with other companies, the recommendations shared with you, are seen as the foundation for an implementation in Equinor (not only MMP). We decided April 26th, to establish a corporate project.