Lessons Learned Monitoring Production

•

3 gefällt mir•1,751 views

As a growing company Wix has tried many monitoring solutions some worked better than others. In this talk we will go over the lessons we learned at Wix about what to monitor and how to monitor production systems; when to trigger alerts and also when not to trigger alerts. We will go over some of the tools we use and also some of the tools we built to help us sleep better at night while doing 400 deployments to production every month. http://www.youtube.com/watch?v=OLPA2KOWJ8I

Technologie Business

Red Alert Or False Alarm
Monitoring Production Systems
Aviran Mordo
Head Of Back-End Engineering @ Wix
@aviranm
http://www.linkedin.com/in/aviran
http://www.aviransplace.com
01:21

Wix in Numbers
• 40,000,000 users
– Adding over 1,000,000 new users each month
• Static storage is over 200TB of data
– Adding over 1TB of files every day
• 3 Data centers + 2 Clouds (Google AE, Amazon)
– Around 300 servers
• 400 Deployments a month (Continuous Delivery)
• Over 100,000,000 Server API calls per day
• Over 450 people work at Wix
– ~ 150 people in R&D
01:21

01:21
Cons
• No early warning – Only when site
is down
• Don’t know what is the problem
• Does not monitor API
Pros
• 24 / 7 Uptime monitoring
• Different Geo locations
Pingdom

01:21
Cons
• Manually record flows
• Does not monitor internal servers
Pros
• Transaction monitoring from real
user perspective
• Support Flash
• Different geo locations
Keynote

Monitor Hardware and OS
01:21
Cons
• Monitor at the OS level, not
application level*
• Does not know when there is a
problem with the application (the
Pros
• Monitor machine health
• Built-in integration with Graphite
• Custom checks

Server Logs
01:21
Cons
• Too much information
• Hard to read, Not friendly to
developers
• Pinpointing the problem takes long
time
• Server cluster need log
Pros
• Verbose and flexible

Log collections
01:21
• Client & Server logs are collected
with Flume and Syslog-ng
• Storm + Esper analyzes log events
and feeds Graphite
• Store in Hadoop+HBase for in-depth
analysis

Self Reporting Framework
01:21
• Automatic method level
performance reporting
• Custom metering
• Exception classifications
• 4 severity levels (Recoverable,
Warning, Error, Fatal)
• Business Exceptions
• System Exceptions

App-Info Monitoring
01:21
• Expose via API as JSON
• Collect Metrics via Nagios /
Graphite
• Nagios alerts based on app-info
metrics

App-Info Monitoring
01:21
Cons
• Cores grained information for an
overview
• Too much information
Pros
• Detailed and easy view of a server
• Almost no need to look at logs

Graphite
01:21
• All systems feed Graphite with
metrics (Nagios, App-info, Storm)
• Nagios query Graphite and triggers
alerts

Graphite
01:21
Cons
• Not a dashboard (you can build
dashboard on top of it)
• Design data schema (hierarchy) in
Pros
• Numerous formulas available
• Share graphs
• Easy to create new graphs

New Relic
01:21
Pros
• Easy to use – developer friendly
• Service level overview (both
cluster and single server)
• Customizable dashboards
• JVM profiler on production
• Code instrumentation
• Real User Monitoring

New Relic
01:21
Cons
• No distributed transaction trace
for specific server
• No exception classification
• A lot of false alarms due to
misbehaving bots
• False alarms for low throughput
services

01:21
Aviran Mordo
@aviranm
http://www.linkedin.com/in/aviran
http://www.aviransplace.com
http://www.slideshare.net/aviranwix/monitoring-production

Weitere ähnliche Inhalte

Was ist angesagt?

Designing Scalable Applications

Fabricio Epaminondas

The promise of DevOps is that we can push new ideas out to market faster while avoiding delivering serious defects into production. Andreas Grabner explains that testers are no longer measured by the number of defect reports they enter, nor are developers measured by the lines of code they write. As a team, you are measured by how fast you can deploy high quality functionality to the end user. Achieving this goal requires testers to increase their skills. It’s all about finding solutions—not just problems. Testers must transition from reporting “app crashes” to providing details such as “memory leak caused by bad cache implementation.” Instead of reporting “it’s slow,” testers must discover “wrong hibernate configuration causes too much traffic from the database.” Using three real-life examples, Andreas illustrates what it takes for testing teams to become part of the DevOps transformation—bringing more value to the entire organization.

DevOps: Find Solutions, Not More Defects

TechWell

TestCorner #22 - How DevOps helps QA daily works

HTC

2016 09-dev opsjourney-devopsdaysoslo

Jon Arild Tørresdal

Software Architecture for DevOps and Continuous Delivery

Eberhard Wolff

Thanks to all who came out and were part of our first customer user group! All our expectations for the day were exceeded and we hope you feel the same way. If you weren't able to make it, here's what you missed: Judy Chung, Product Manager, gave a summary of recent and upcoming features (site level fields, new UI of TestPad) as well as a sneak preview of our newest product (codename: Automation Hub). Elise Carmichael, VP of Quality, demo-ed several best practice topics, ranging from organizing your qTest repository to reviewing the different automation integration options. Erika Chestnut, Director of QA at Sterling Talent Solutions, shared her story as a QASymphony customer who recently replaced HP Quality Center with qTest and provided insight into leading change management across her organization.

QASymphony Atlanta Customer User Group Fall 2017

QASymphony

JIRA Performance Testing in Pictures - Edward Bukoski Michael March

Atlassian

DevOps and the Future of IT Operations

Correlsense

Presentation provide a comparison between workflow, process builder and triggers with a view of shining some light on two common salesforce myths: 1. Always choose clicks over code. 2. Always choose process builder over workflow. Presentation includes a deep dive into the salesforce order of execution to back up my views. Kudos to David K. Liu for his own excellent comparison (source: http://www.sfdc99.com/2018/01/22/workflow-process-builder-flow-apex/). You can see where I got my inspiration for the comparison graphs... : -)

Using The Right Tool For The Job

Chris Baldock

Continuous Delivery & DevOps in the Enterprise

Eberhard Wolff

Inspect THIS! mobile inspection tool for facility & asset management

Avandel Inc

Since its beginning, the Performance Advisory Council aims to promote engagement between various experts from around the world, to create relevant, value-added content sharing between members. For Neotys, to strengthen our position as a thought leader in load & performance testing. During this event, 12 participants convened in Chamonix (France) exploring several topics on the minds of today’s performance tester such as DevOps, Shift Left/Right, Test Automation, Blockchain and Artificial Intelligence.

Leandro Melendez - Switching Performance Left & Right

Neotys_Partner

Scheduled releases @ Commit Porto 2016

Fábio Oliveira

Building Better Collaboration Between Development and Testing in a DevOps World

QASymphony

Salesforce Process builder Vs Workflows

Prasanna Deshpande ☁

Extending JIRA to Enable High Volume KPI Benchmarking - Keyur Patel

Atlassian

Performance Tuning in the Trenches

Donald Belcham

DevOps 2016 summit

Chihyang Li

Releasing To Production Every Week

exortech

No matter if you’re managing 1 or 20 applications in development, catching any functional or non-functional failures early is a key part of any DevOps mindset process; saving you time and money with an increased quality. Automation, with monitoring throughout the process, is vital to this feat. Learn from Hasan Yasar, Technical Manager of the Secure Lifecycle Solutions Group at Carnegie Mellon’s Software Engineering Institute, will discuss how he is able to run 20 or more projects at once. You’ll learn: • All about auto-provisioning and deployment with Docker, Vagrant, Ansible, and more • The benefits of DevOps, and the roles automation and Continuous Delivery play • Why you need to monitor throughout the application development process • Where APM fits in DevOps • How integrated software development system will make your life easier • How Hasan is able to see if projects are failing or on track at a glance; and how you can too

How to Use DevOps & APM to Release Better Software Faster

Dynatrace

Was ist angesagt? (20)

Designing Scalable Applications

DevOps: Find Solutions, Not More Defects

TestCorner #22 - How DevOps helps QA daily works

2016 09-dev opsjourney-devopsdaysoslo

Software Architecture for DevOps and Continuous Delivery

QASymphony Atlanta Customer User Group Fall 2017

JIRA Performance Testing in Pictures - Edward Bukoski Michael March

DevOps and the Future of IT Operations

Using The Right Tool For The Job

Continuous Delivery & DevOps in the Enterprise

Inspect THIS! mobile inspection tool for facility & asset management

Leandro Melendez - Switching Performance Left & Right

Scheduled releases @ Commit Porto 2016

Building Better Collaboration Between Development and Testing in a DevOps World

Salesforce Process builder Vs Workflows

Extending JIRA to Enable High Volume KPI Benchmarking - Keyur Patel

Performance Tuning in the Trenches

DevOps 2016 summit

Releasing To Production Every Week

How to Use DevOps & APM to Release Better Software Faster

Andere mochten auch

Many small startups build their systems on top of a traditional toolset like Tomcat, Hibernate, and MySQL. These systems are used because they facilitate easy development and fast progress, but many of them are monolithic and have limited scalability. So as a startup grows, the team is confronted with the problem of how to evolve the system and make it scalable. Facing the same dilemma, Wix.com grew from 0 to 70 million users in just a few years. Facing some interesting challenges, like performance and availability. Traditional performance solutions, such as caching, would not help due to a very long tail problem which causes caching to be highly inefficient. And because every minute of downtime means customers lose money, the product needed to have near 100% availability. Solving these issues required some interesting and out-of-the-box thinking, and this talk will discuss some of these strategies: building a highly preformant, highly available and highly scalable system; and leveraging microservices architecture and multi-cloud platforms to help build a very efficient and cost-effective system.

Scaling wix with microservices and multi cloud - 2015

Aviran Mordo

Scaling up to 30M users - The Wix Story

Aviran Mordo

Working in a fast-growing company that doubles in size every year, maintaining the quality of products and engineers is a very challenging task. In this talk I will describe how Wix corporate structure evolved from functional teams to gangs, cross-functional teams responsible for end-to-end delivery; guilds, professional groups responsible for methodology, best practices, and training; and mini-companies that serve as internal startups to support rapid growth while maintaining velocity. I will also discuss how we poured our culture into a game-like “guild day”, that helps us maintain alignment, keep the high quality of our work and people, share knowledge, recruit and preserve the best developers, and support a quality-based culture of innovation.

Scaling Wix engineering

Aviran Mordo

Mircoservices, dev ops and Engineering best practices at Wix.com

Aviran Mordo

How do you know what 60 millions users like? Wix.com is conducting hundreds of experiments per month on production to understand which features our users like and which hurt or improve our business. In this talk we’ll explain how the engineering team is supporting product managers in making the right decisions and getting our product road map on the right path. We will also present some of the open source tools we developed that help us experimenting our products on humans.

The Art of A/B Testing

Aviran Mordo

How do you know what 55 millions users like? Wix.com is conducting hundreds of experiments every month on production to understand which features our users like and which hurt or improve our business. In this talk we’ll explain how our engineering team is supporting our product managers in making the right decisions and getting our product road map on the right path. We will also present some of the open source tools we developed that help us experimenting our products on humans. While A/B test is a very known and familiar methodology for conducting experiments on production when you do that on a large scale by changing your system behavior every 9 minutes, it entails many challenges in the organization level from developers, product managers, QA, marketing and management. In this talk we will explain what is the life-cycle of an experiment, some of the challenges we faced and the effect on our development process and product evolution.

Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Aviran Mordo

Wix Architecture at Scale - QCon London 2014

Aviran Mordo

Introduction to HTTP protocol

Aviran Mordo

Andere mochten auch (8)

Scaling wix with microservices and multi cloud - 2015

Scaling up to 30M users - The Wix Story

Scaling Wix engineering

Mircoservices, dev ops and Engineering best practices at Wix.com

The Art of A/B Testing

Experimenting on Humans - Advanced A/B Tests - QCon SF 2014

Wix Architecture at Scale - QCon London 2014

Introduction to HTTP protocol

Ähnlich wie Lessons Learned Monitoring Production

DevOps for Windows Admins

Rex Antony Peter

SolarWinds Technology Briefing- San Diego CA

SolarWinds

Applications Performance Monitoring with Applications Manager part 1

ManageEngine, Zoho Corporation

Adapting to Meet Today’s Trends and Technologies– Compliance vs. Enforcement

Flexera

Rackspace saved significant time and resources by improving their data processing tasks using API Wizard to help with their Service Contract creations and mass uploads. Rackspace was spending valuable time and resources on daily data entry and managing mass changes to Master Data. Rackspace utilizes a scalable, customizable solution to improve data standardizations, cleansing, data migrations, and improve complicated integrations/automations by leveraging APIWizard Objective 1: Improve data processing efficiency, Improve data quality, Save time and resources Objective 2: Expedite service contract creation and mass upload process Objective 3: Reduce data entry errors Objective 4: Reduce User dependency on IT and compatibility issues Objective 5: Utilize for Financials, Supply Chain, Manufacturing, Projects, HR / HCM, Asset Lifecycle Mgmt , Service Contracts

Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...

Vineeth Mylapur

Server monitoring made easy with Applications Manager

ManageEngine, Zoho Corporation

Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...

Adin Ermie

Server and application monitoring webinars [Applications Manager]: Part 1

ManageEngine, Zoho Corporation

Scale net apps in aws

Codecamp Romania

Scale net apps in aws

Codecamp Romania

Application Performance Management

Noriaki Tatsumi

Server and infrastructure monitoring from a single console

ManageEngine, Zoho Corporation

Server and application monitoring webinars [Applications Manager] - Part 2

ManageEngine, Zoho Corporation

Modernizing Cloud and Hyperconverged Infrastructure monitoring

ManageEngine, Zoho Corporation

IT infrastructure such as switches and servers is the traditional focus of network monitoring tools. Increasingly organisations are focusing on monitoring business critical applications sitting on top of this infrastructure. Altinity have deployed their Opsview software in a number enterprise environments to ensure availability of business critical applications and capture data for capacity planning. We will explain how we approach monitoring in these environment and what challenges we encounter. Opsview is an Open Source monitoring solution based on Nagios. Altinity are the commercial organsiation behind Opsview.

Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...

NETWAYS

Server Monitoring from the Cloud

Site24x7

How Applications Manager helps with application performance monitoring

ManageEngine, Zoho Corporation

• Learn how to investigate your SAP BusinessObjects BI 4.2 environment and diagnose issues causing outages and stability problems • Understand the various options available to resolve the issues you find and to stabilise your SAP BusinessObjects BI 4.2 environment • Consider factors which could have led to the issues on your landscape, and processes and safeguards you can put into place to avoid future issues • Identify areas that can be improved to boost the resilience of your SAP BusinessObjects BI 4.2 platform

How to Stabilise and Improve an SAP BusinessObjects BI 4.2 Enterprise Shared ...

Nicolas Henry

Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...

Vikas Sahni

Building azure applications ireland

Michael Meagher

Ähnlich wie Lessons Learned Monitoring Production (20)

DevOps for Windows Admins

SolarWinds Technology Briefing- San Diego CA

Applications Performance Monitoring with Applications Manager part 1

Adapting to Meet Today’s Trends and Technologies– Compliance vs. Enforcement

Don’t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...

Server monitoring made easy with Applications Manager

Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...

Server and application monitoring webinars [Applications Manager]: Part 1

Scale net apps in aws

Application Performance Management

Server and infrastructure monitoring from a single console

Server and application monitoring webinars [Applications Manager] - Part 2

Modernizing Cloud and Hyperconverged Infrastructure monitoring

Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...

Server Monitoring from the Cloud

How Applications Manager helps with application performance monitoring

How to Stabilise and Improve an SAP BusinessObjects BI 4.2 Enterprise Shared ...

Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...

Building azure applications ireland

Mehr von Aviran Mordo

In this talk, Aviran will describe how http://Wix.com is pushing this trend even further to build its own Platform as a Runtime (PaaR) infrastructure that allows developers to develop faster, better with higher quality. By allowing nano deployments of different modules into a “SingleRuntime” inside a robust internal platform that handles many of the non-functional concerns developers are facing on a daily basis.

Platform as a Runtime - PaaR QCON 2024 - Final

Aviran Mordo

Working in a fast-growing company that doubles in size every year, maintaining the quality of products and engineers is a very challenging task. In this talk I will describe how Wix org structure evolved from functional teams to gangs, cross-functional teams responsible for end-to-end delivery; guilds, professional groups responsible for methodology, best practices, and training; and mini-companies that serve as internal startups to support rapid growth while maintaining velocity. Unlike many implementations of the Guild structure, at Wix the guilds are operational guilds that are involved in the day to day life of a developers throughout their journey at the company. I will also discuss how we poured our culture into a game-like “guild day”, that helps us maintain alignment, keep the high quality of our work and people, share knowledge, recruit and preserve the best developers, and support a quality-based culture of innovation.

Scaling Engineering by Hacking Conway’s Law - Geecon,2022

Aviran Mordo

Scaling your application servers is easy with microservices, but the actual scaling and operation challenge is the data. Your database is your bottleneck and the biggest scaling and availability concern. Working with a large scale distributed system entails many challenges in data processing. How do you handle distributed transactions? How to scale your data beyond a single data center and how to handle the eventual consistency state that you may cause by doing that? How do you migrate data and database schemas without downtime? And many more issues when the world of microservices and large scale meets databases. In this talk we’ll try to answer this kind of questions, by exploring some patterns used by Wix.com, which operates hundreds of microservices and petabytes of data across multiple datacenters, as well as multiple clouds on a large scale. Hopefully you can adapt some of these patterns to better handle your data.

Arrested by the cap devoxx uk 2018

Aviran Mordo

Scaling wix.com to 100 million users

Aviran Mordo

How do you know what 60 millions users are like? Wix.com is conducting hundreds of experiments per month on production to understand, which features our users like and which hurt or improve our business. In this talk we’ll explain how the engineering team is supporting product managers in making the right decisions and getting our product road map on the right path. We will also present some of the open source tools we developed that help us experimenting our products on humans.

Advanced A/B Testing - Jax London 2015

Aviran Mordo

Scaling wix with microservices architecture jax london-2015

Aviran Mordo

Many small startups build their systems on top of a traditional toolset like Tomcat, Hibernate, and MySQL. These systems are used because they facilitate easy development and fast progress, but many of them are monolithic and have limited scalability. So as a startup grows, the team is confronted with the problem of how to evolve the system and make it scalable. Facing the same dilemma, Wix.com grew from 0 to 60 million users in just a few years. Facing some interesting challenges, like performance and availability. Traditional performance solutions, such as caching, would not help due to a very long tail problem which causes caching to be highly inefficient. And because every minute of downtime means customers lose money, the product needed to have near 100% availability. Solving these issues required some interesting and out-of-the-box thinking, and this talk will discuss some of these strategies: building a highly preformant, highly available and highly scalable system; and leveraging microservices architecture and multi-cloud platforms to help build a very efficient and cost-effective system.

Scaling wix with microservices architecture devoxx London 2015

Aviran Mordo

Wix.com Back-end Engineering Guild Manifesto

Aviran Mordo

In 6 years, Wix grew from a small startup with traditional system architecture (based on a monolithic server running on Tomcat, Hibernate, and MySQL) to a company that serves 60 million users. To keep up with this tremendous growth, Wix’s architecture had to evolve from a monolithic system to microservices, using some interesting patterns like CQRS to achieve our goal of building a blazing fast highly scalable and highly available system.

Scaling Wix with microservices architecture and multi-cloud platforms - Reve...

Aviran Mordo

Scaling r&d org while maintaining quality

Aviran Mordo

Mehr von Aviran Mordo (10)

Platform as a Runtime - PaaR QCON 2024 - Final

Scaling Engineering by Hacking Conway’s Law - Geecon,2022

Arrested by the cap devoxx uk 2018

Scaling wix.com to 100 million users

Advanced A/B Testing - Jax London 2015

Scaling wix with microservices architecture jax london-2015

Scaling wix with microservices architecture devoxx London 2015

Wix.com Back-end Engineering Guild Manifesto

Scaling Wix with microservices architecture and multi-cloud platforms - Reve...

Scaling r&d org while maintaining quality

Kürzlich hochgeladen

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

The action of the next cyber saga takes place in the mystical lands of the Asia-Pacific region, where the main characters began their digital activities in the middle of 2021 and qualitatively strengthened it in 2022. Corporate espionage, document theft, audio recordings, and data leaks from messaging platforms were all a matter of one day for Dark Pink. Their geographical focus may have started in the Asia-Pacific region, but their ambitions knew no bounds, targeting a European government ministry in a bold move to expand their portfolio. Their victim profile was as diverse as a UN meeting, targeting military organizations, government agencies, and even a religious organization. Because discrimination is not a fashionable agenda. In the world of cybercrime, they serve as a reminder that sometimes the most serious threats come in the most unassuming packages with a pink bow.

Cyberprint. Dark Pink Apt Group [EN].pdf

Overkill Security

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

When you’re building (micro)services, you have lots of framework options. Spring Boot is no doubt a popular choice. But there’s more! Take Quarkus, a framework that’s considered the rising star for Kubernetes-native Java. It always depends on what's best for your situation, but how to choose the best solution if you're comparing 2 frameworks? Both Spring Boot and Quarkus have their positives and negatives. Let us compare the two by live coding a couple of common use cases in Spring Boot and Quarkus. After this talk, you’ll be ready to get started with Quarkus yourself, and know when to select Quarkus or Spring Boot.

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

Jago de Vreede

Exploring Multimodal Embeddings with Milvus

Zilliz

CNIC Information System with Pakdata Cf In Pakistan

danishmna97

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

In the thrilling conclusion to 2023, ransomware groups had a banner year, really outdoing themselves in the "make everyone's life miserable" department. LockBit 3.0 took gold in the hacking olympics, followed by the plucky upstarts Clop and ALPHV/BlackCat. Apparently, 48% of organizations were feeling left out and decided to get in on the cyber attack action. Business services won the "most likely to get digitally mugged" award, with education and retail nipping at their heels. Hackers expanded their repertoire beyond boring old encryption to the much more exciting world of extortion. The US, UK and Canada took top honors in the "countries most likely to pay up" category. Bitcoins were the currency of choice for discerning hackers, because who doesn't love untraceable money?

Ransomware_Q4_2023. The report. [EN].pdf

Overkill Security

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

ICT role in 21st century education and its challenges

rafiqahmad00786416

Architecting Cloud Native Applications

WSO2

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

Kürzlich hochgeladen (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Cyberprint. Dark Pink Apt Group [EN].pdf

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

AWS Community Day CPH - Three problems of Terraform

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

Exploring Multimodal Embeddings with Milvus

CNIC Information System with Pakdata Cf In Pakistan

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

presentation ICT roal in 21st century education

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Ransomware_Q4_2023. The report. [EN].pdf

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Exploring the Future Potential of AI-Enabled Smartphone Processors

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Artificial Intelligence Chap.5 : Uncertainty

ICT role in 21st century education and its challenges

Architecting Cloud Native Applications

Axa Assurance Maroc - Insurer Innovation Award 2024

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Apidays New York 2024 - The value of a flexible API Management solution for O...

Lessons Learned Monitoring Production

1. Red Alert Or False Alarm Monitoring Production Systems Aviran Mordo Head Of Back-End Engineering @ Wix @aviranm http://www.linkedin.com/in/aviran http://www.aviransplace.com 01:21

2. About Wix 01:21

3. Wix in Numbers • 40,000,000 users – Adding over 1,000,000 new users each month • Static storage is over 200TB of data – Adding over 1TB of files every day • 3 Data centers + 2 Clouds (Google AE, Amazon) – Around 300 servers • 400 Deployments a month (Continuous Delivery) • Over 100,000,000 Server API calls per day • Over 450 people work at Wix – ~ 150 people in R&D 01:21

4. 01:21

5. 01:21

6. 01:21

7. 01:21 End user monitoring

8. 01:21 Cons • No early warning – Only when site is down • Don’t know what is the problem • Does not monitor API Pros • 24 / 7 Uptime monitoring • Different Geo locations Pingdom

9. 01:21 Cons • Manually record flows • Does not monitor internal servers Pros • Transaction monitoring from real user perspective • Support Flash • Different geo locations Keynote

10. Monitor Hardware and OS 01:21 Cons • Monitor at the OS level, not application level* • Does not know when there is a problem with the application (the Pros • Monitor machine health • Built-in integration with Graphite • Custom checks

11. 01:21 Look inside the application

12. Server Logs 01:21 Cons • Too much information • Hard to read, Not friendly to developers • Pinpointing the problem takes long time • Server cluster need log Pros • Verbose and flexible

13. Log collections 01:21 • Client & Server logs are collected with Flume and Syslog-ng • Storm + Esper analyzes log events and feeds Graphite • Store in Hadoop+HBase for in-depth analysis

14. Self Reporting Framework 01:21 • Automatic method level performance reporting • Custom metering • Exception classifications • 4 severity levels (Recoverable, Warning, Error, Fatal) • Business Exceptions • System Exceptions

15. App-Info 01:21

16. App-Info Monitoring 01:21 • Expose via API as JSON • Collect Metrics via Nagios / Graphite • Nagios alerts based on app-info metrics

17. App-Info Monitoring 01:21 Cons • Cores grained information for an overview • Too much information Pros • Detailed and easy view of a server • Almost no need to look at logs

18. Graphite 01:21 • All systems feed Graphite with metrics (Nagios, App-info, Storm) • Nagios query Graphite and triggers alerts

19. Graphite 01:21 Cons • Not a dashboard (you can build dashboard on top of it) • Design data schema (hierarchy) in Pros • Numerous formulas available • Share graphs • Easy to create new graphs

20. 01:21

21. New Relic 01:21 Pros • Easy to use – developer friendly • Service level overview (both cluster and single server) • Customizable dashboards • JVM profiler on production • Code instrumentation • Real User Monitoring

22. New Relic 01:21 Cons • No distributed transaction trace for specific server • No exception classification • A lot of false alarms due to misbehaving bots • False alarms for low throughput services

23. 01:21 What’s Next

24. 01:21

25. We Are Hiring  01:21

26. 01:21 Aviran Mordo @aviranm http://www.linkedin.com/in/aviran http://www.aviransplace.com http://www.slideshare.net/aviranwix/monitoring-production

Hinweis der Redaktion

Today I’m going to tell you how we grew our monitoring operations with the growth of the company
*Monitor applications via log parsing* Removing server, changing topology of serves
Logs usually don’t work

Lessons Learned Monitoring Production

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Lessons Learned Monitoring Production

Ähnlich wie Lessons Learned Monitoring Production (20)

Mehr von Aviran Mordo

Mehr von Aviran Mordo (10)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Lessons Learned Monitoring Production

Hinweis der Redaktion