Scaling Teams, Processes and Architectures

•

21 gefällt mir•41,221 views

Generic presentation about scalability challenges. First London Scalability Meetup. Quick overview of the DataSift architecture.

Technologie Business

Lorenzo Alberton
@lorenzoalberton

Scaling Teams,
Processes and
Architectures
Managing growth

London Scalability Group,
Innovation Warehouse, 16th April 2012
1

Scalability Is About...

People

Processes Technology

2

People
Stafﬁng, Roles, Management, Teams

3

Stafﬁng

Never compromise.

Only hire people smarter than you.

http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4

Stafﬁng

Hire people who can ﬁt
the company culture.

Promote fun in your
working environment.

http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4

Stafﬁng

Beware of
toxic people

http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4

Team Size and Structure
Micromanaging managers Poor communication
too small Overworked team members Low morale too big
Can’t accomplish much Low productivity

5

Why are processes critical?
Improve management of teams and employees
Standardise actions in repetitive tasks
Reduce mundane decisions to focus on grander ideas
Allow the team to react quickly to crisis
Determine system capacity and scalability needs

7

Determining Headroom

Capacity

Current Load

8

Determining Headroom
Why?
Capacity

Planning
annual
budget

Hiring plan
Current Load

Prioritisation
8

Controlling Change: Determine Risk

http://dilbert.com/strips/comic/2008-05-08/ 9

Risk Management
Risk is cumulative

Determine limits and tolerance
10

Load / Stress Testing
Load testing
- identify, document and eliminate
bottlenecks through a strict controlled
process of measurement and analysis
- measure system’s response and stability
- verify the app can meet the desired
performance objectives (SLA)

Stress testing
- determine the app’s stability when
subjected to above-normal loads
- verify the app’s behaviour when close
to the breaking point
- test the application recoverability
(negative testing)

11

Barrier Conditions

Code reviews
Manual and automated QA processes
Performance and stress testing
Release documentation checks (runbook)
Dev, Test, Stage and Live environments
Instrumentation checks

Protection from signiﬁcant failures
12

Technology
Architecting Scalable Solutions

13

Architectural Principles

+1
N + 1 design

14

Architectural Principles

+1
N + 1 design for rollback

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be
monitored

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be for multiple
monitored live sites

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be for multiple use mature
monitored live sites technology

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be for multiple use mature
monitored live sites technology

asynchronous
design
14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be for multiple use mature
monitored live sites technology

asynchronous stateless
design systems
14

Stateless, Asynchronous Systems

http://upload.wikimedia.org/wikipedia/commons/4/46/Synchronized_swimming_-_Russian_team.jpg 15

Fault Isolative Structures
Increase availability
Limit impact of
failures
Easier debugging

16

Fault Isolative Structures
Increase availability
Limit impact of
failures
Easier debugging

First

16

Fault Isolative Structures
Increase availability
Limit impact of
failures
Easier debugging
Functions
causing
repetitive
problems
First

16

Fault Isolative Structures
Increase availability
Limit impact of
failures
Easier debugging
Functions Natural layout
causing or topology
repetitive of the site
problems
First

16

Caching for Performance and Scale
Object Caches

Usually serialized
(marshalling /
unmarshalling)

get() / set() /
replace()

APC, Memcached

17

Caching for Performance and Scale
Object Caches Application Caches

Usually serialized Proxy caches
(marshalling /
Reverse proxy
unmarshalling)
caches

get() / set() / HTTP headers
replace()

ISP/Uni proxies
APC, Memcached Squid, Varnish,
mod_cache

17

Caching for Performance and Scale
Object Caches Application Caches CDNs

Usually serialized Proxy caches Multiple locations
(marshalling / / backbones
Reverse proxy
unmarshalling)
caches

get() / set() / HTTP headers CNAME entries
replace()

ISP/Uni proxies Akamai, Coral,
APC, Memcached Squid, Varnish,
Limelight...

mod_cache

17

Managing “Big Data”

storage costs
people and software
power and space
processing power
backup time and costs

18

Managing “Big Data”
The more storage

...the more
storage management
storage costs
people and software
power and space
processing power
backup time and costs

18

Monitoring: Measure Everything

1. Is there a problem? User experience / Business metrics monitors

2. Where is the problem? System monitors (threshold - variance)

3. What is the problem? Application monitors

19

Monitoring: Measure Everything

StatsD

1. Is there a problem? User experience / Business metrics monitors

2. Where is the problem? System monitors (threshold - variance)

3. What is the problem? Application monitors

Keep Signal vs. Noise ratio high
19

DataSift Architecture
Some Architecture Pr0n

20

DataSift Architecture

http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21

DataSift Architecture

SOA - loosely coupled,
independently scalable
services. Simple APIs

http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21

DataSift Architecture

SOA - loosely coupled,
independently scalable
services. Simple APIs

example

http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21

Our Stack
Languages: C++, PHP, Java, Scala, Ruby, Node.JS
Storage: MySQL, HBase
Cache: Memcached, APC, Redis
Queues: ZeroMQ, Kafka, Redis
Development/Deployment: GIT, Jenkins CI, RPM, Chef
Monitoring: StatsD + Graphite, Zenoss

23

Messaging
ZeroMQ: PUSH-PULL, REQ-REP, PUB-SUB (multicast, broadcast)

Internal communication: pass messages to the next processing
stage, control events, monitoring

Kafka/Redis: PUSH-PULL with persistence

Internal message / workload buffering and distribution

Node.js: WebSockets / HTTP Streaming

Message delivery (output)

24

0mq PUSH-PULL (workload distribution)

Consumer 1

Consumer 2

Consumer 3

[Round-Robin-ish]

25

0mq PUB-SUB (High Availability)

Listener 1

Publisher 1

Listener 2

Publisher 2
Listener 3

[Broadcast] [Dynamic Subscriptions]

26

0mq PUB-SUB (High Availability)

DC 1
Publisher 1

Publisher 2

DC 2

27

Internal “Firehose”

Publishers Subscribers

Alice’s John’s
Y Z timeline Inbox
X
subscribe
to topic X

Data Bus
subscribe
to topic Y

System Fred’s Tech
Monitor Followers Blog Feed

28

Instrumentation

https://play.google.com/store/apps/details?id=net.networksaremadeofstring.rhybudd 29

We’re Hiring!

http://datasift.com/whoweare/jobs
30

References
M. L. Abbot, M. T. Fisher,
“The Art Of Scalability”,
Addison Wesley
http://theartofscalability.com/

http://www.slideshare.net/quipo/the-art-of-scalability-managing-
growth
http://www.slideshare.net/postwait/scalable-internet-architecture
http://bit.ly/IJKwuc
http://agile.dzone.com/news/approaches-organizational
https://bitly.com/vCSd49

31

Lorenzo Alberton
@lorenzoalberton

Thank you!
lorenzo@alberton.info
http://www.alberton.info/talks

Questions?

32

Weitere ähnliche Inhalte

Was ist angesagt?

Test Driven Development & CI/CD

Shanmuga S Muthu

Agile methodology

Payod Soni

Agile software development

Rajesh Piryani

The Paved Road at Netflix

Dianne Marsh

Scrum In 15 Minutes

Srikanth Shreenivas

Agile vs Traditional Project Management

Saqib Javed John

Agile management, or agile process management, or simply agile refers to an iterative, incremental method of managing the design and build activities of engineering, information technology and other business areas that aim to provide new product or service development in a highly flexible and interactive manner; an example is its application in Scrum, an original form of agile software development.

Agile Project Management

Syed Zaid Irshad

Virtualization Vs. Containers

actualtechmedia

Agile project management

eng100

Overview of agile methodology

Lee D Clemons MBA, PMP, CSM

In a world of rapid changes and increasing uncertainties, organisations have to continuously adapt and evolve to remain competitive and excel in the market. In such a dynamic business landscape organisations need to design for adaptability. Combining different perspectives and techniques from business strategy (Wardley Mapping), software architecture and design (Domain-Driven Design), and team organisation (Team Topologies) provides a powerful toolset to design, build and evolve adaptive systems and team structures for a fast flow of change. This talk illustrates the concepts, connects the dots between these three perspectives, and demonstrates how these techniques help to evolve a fictitious legacy system for a fast flow of change. The journey describes the approach to evolve: * From functional silo teams to cross-functional autonomous stream-aligned teams and platform teams * From a monolithic big ball of mud to a modular, loosely coupled system * From running on-premises infrastructure to cloud-hosted services

Architecture for Flow w/ Wardley Mapping, Domain-Driven Design, and Team Topo...

Susanne Kaiser

Agile Project Management

Abdullah Khan

DevOps: Infrastructure as Code

Julio Aziz Flores Casab

Showcase development processes and methods with our content ready Devops PowerPoint Presentation Slide. Focus on rapid application delivery using our visually appealing development and operations PPT visuals. The operating system PowerPoint complete deck comprises self-explanatory and editable PowerPoint templates such as need for DevOps, best practices, criteria for choosing a pilot project, DevOps goals, timeline for DevOps transformation, current state future state, 30-60-90 day plan, roadmap for DevOps, transformation post successful DevOps Implementation, RACI matrix, dashboard to name a few. Users can easily customize all the templates as per their specific project needs. Furthermore, you can also use this IT operations management presentation deck to encourage your team to adopt DevOps culture practices and tools. Demonstrate DevOps goals like Increase automation and standardize the process, reduce cost effort & time to market and so on. Download our system development lifecycle PowerPoint templates to present ways to make improved products faster for greater client satisfaction. Handle deficiencies with our DevOps Powerpoint Presentation Slides. Initiate action to acquire desired assets. https://bit.ly/3y8q8NC

DevOps Powerpoint Presentation Slides

SlideTeam

Agile Methodology PPT

Mohit Kumar

by Brad Appleton, Agile Day Chicago 2018, October 26 2018; This presentation gives a comprehensive introduction to DevOps, for Agile development practitioners. In 2018, there are many misunderstandings about Agile & DevOps and how they relate to one another. Too many think of Agile (development) as primarily "Scrum", and that DevOps is Continuous Integration & Delivery (both of which are wrong). This presentation describes the meaning, origin & history of DevOps from an Agile development perspective.

DevOps - an Agile Perspective (at Scale)

Brad Appleton

**Watch the full webinar at Codefresh.io/events! You have finally split your big monolith into microservices. Now what? How do you validate a more complex application? And how do you make it scale? Instead of having one CI/CD pipeline, you have multiple. And as the number of microservices increases so does the number of pipelines. Managing pipelines for microservice applications can quickly get out of hand, especially when you try to reuse common pipeline parts between different applications. In this webinar, we will see how you can create CI/CD pipelines designed specifically for microservices and how you can reuse the same pipeline across different applications.

CICD Pipelines for Microservices Best Practices

Codefresh

Agile Scrum Methodology

Rajeev Misra

Introduction to Agile software testing

KMS Technology

There are many different approaches to how you let your microservices communicate between one another. Be it asynchronous or synchronous, choreographed or orchestrated, eventual consistent or distributedly transactional, fault tolerant or just a mess! In this session I will provide an overview on different concepts of microservice communication and their pros & cons. On the way I'll try to throw in some anecdotes, success stories and failures I learned from so that you can hopefully take something home with you.

Communication in a Microservice Architecture

Per Bernhardt

Was ist angesagt? (20)

Test Driven Development & CI/CD

Agile methodology

Agile software development

The Paved Road at Netflix

Scrum In 15 Minutes

Agile vs Traditional Project Management

Agile Project Management

Virtualization Vs. Containers

Agile project management

Overview of agile methodology

Architecture for Flow w/ Wardley Mapping, Domain-Driven Design, and Team Topo...

Agile Project Management

DevOps: Infrastructure as Code

DevOps Powerpoint Presentation Slides

Agile Methodology PPT

DevOps - an Agile Perspective (at Scale)

CICD Pipelines for Microservices Best Practices

Agile Scrum Methodology

Introduction to Agile software testing

Communication in a Microservice Architecture

Andere mochten auch

Scalable Architectures - Taming the Twitter Firehose

Lorenzo Alberton

At a certain scale, millions of events happen every second, and all of them are important to evaluate the health of the system. If not handled correctly, such a volume of information can overwhelm both the infrastructure that needs to support them, and people who have to make a sense out of thousands of signals and make decisions upon them, fast. By understanding how our rational mind works, how people process information, we can present data so it's more evident and intuitive. This talk will explain how to collect useful metrics, and to create the perfect monitoring dashboard to organise and display them, letting our intuition operate automatically and quickly, and saving attention and mental effort to activities that demand it.

Monitoring at scale - Intuitive dashboard design

Lorenzo Alberton

Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees

Lorenzo Alberton

The ability to grow (and shrink) according to the needs and the available resources is an essential part of designing applications. In this talk we'll cover the fundamental elements of scalability, including aspects involving people, processes and technology. With sound and proven principles and some advice on how to shape your organisation, set the right processes and design your application, this session is a must-see for developers and technical leads alike.

The Art of Scalability - Managing growth

Lorenzo Alberton

Despite the NoSQL movement trying to flag traditional databases as a dying breed, the RDBMS keeps evolving and adding new powerful weapons to its arsenal. In this talk we'll explore Common Table Expressions (SQL-99) and how SQL handles recursion, breaking the bi-dimensional barriers and paving the way to more complex data structures like trees and graphs, and how we can replicate features from social networks and recommendation systems. We'll also have a look at window functions (SQL:2003) and the advanced reporting features they make finally possible.

Graphs in the Database: Rdbms In The Social Networks Age

Lorenzo Alberton

NoSQL databases get a lot of press coverage, but there seems to be a lot of confusion surrounding them, as in which situations they work better than a Relational Database, and how to choose one over another. This talk will give an overview of the NoSQL landscape and a classification for the different architectural categories, clarifying the base concepts and the terminology, and will provide a comparison of the features, the strengths and the drawbacks of the most popular projects (CouchDB, MongoDB, Riak, Redis, Membase, Neo4j, Cassandra, HBase, Hypertable).

NoSQL Databases: Why, what and when

Lorenzo Alberton

Storing tree structures in a bi-dimensional table has always been problematic. The simplest tree models are usually quite inefficient, while more complex ones aren't necessarily better. In this talk I briefly go through the most used models (adjacency list, materialized path, nested sets) and introduce some more advanced ones belonging to the nested intervals family (Farey algorithm, Continued Fractions, and other encodings). I describe the advantages and pitfalls of each model, some proprietary solutions (e.g. Oracle's CONNECT BY) and one of the SQL Standard's upcoming features, Common Table Expressions.

Trees In The Database - Advanced data structures

Lorenzo Alberton

Andere mochten auch (7)

Scalable Architectures - Taming the Twitter Firehose

Monitoring at scale - Intuitive dashboard design

Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees

The Art of Scalability - Managing growth

Graphs in the Database: Rdbms In The Social Networks Age

NoSQL Databases: Why, what and when

Trees In The Database - Advanced data structures

Ähnlich wie Scaling Teams, Processes and Architectures

Value driven continuous delivery

Gabriel Prat

OSSCube - Zend Webinar

OSSCube

Jax Sql Saturday Scrum presentation #130

Christopher Daily

Agile values

DUONG Trong Tan

Amy.stapleton

NASAPMC

Agile developers create their own identity by Ajay Danait

Xebia IT Architects

Introduction To Scrum

Dave Neuman

Estimation Agile Projects

Ram Srivastava

Practices of an agile developer

DUONG Trong Tan

Scrum managing through complexity

Pierre E. NEIS

Bob Galen shares real-world stories where he’s seen “effectively partnered” teams and Product Owners truly deliver balanced value for their business stakeholders. In this session he’ll show you how story mapping and release planning can truly set the stage for effective team workflow—establishing a “Big Picture” for everyone to shoot for. How establishing shared goals, both at the iteration and release levels, truly cements the partnership between team and Product Owner. And finally, how setting a tempo of regular, focused backlog grooming sessions establishes a mechanism for the team and Product Owner to explore well-nuanced and high value backlogs.

The Essential Product Owner - Partnering with the team

Cprime

APO 2.0

Computer Aid, Inc

Agile

Jeff Bollinger

Introduction to Agile

Richard Cheng

AMI Presentation

Computer Aid, Inc

What is this thing called Agile?

John Goodpasture

Large Scale Software Project

Return on Intelligence

Agile intro module 1

André Heijstek

Consistently delivering and maintaining well performing applications doesn't just happen, it requires a solid architecture, sound development, continual attention, diligence and expertise. It also requires appropriate testing, not simply of release-candidate builds, but of designs, units, integrations, and physical components... both during development and in production. The question is, how can a team accomplish all of that under all of today's pressure to deliver quickly and cheaply? Join Scott Barber for this Keynote Address to hear about what successful organizations are doing to consistently deliver well performing applications, to learn the underlying principles and practices that enable those organizations to create, test, and maintain those well performing applications without breaking either the budget or the schedule, and what the key items are that virtually every team can implement right away, to dramatically improve the consistency and overall performance of their applications.

Iqnite keynote

Scott Barber

Randy Clark - a Black Belt and PowerSteering Software's Director of Six Sigma - shares best practice advice for how Process Excellence leaders can guarantee repeated victory with project replication, the process of taking successful projects and duplicating the results in other locations or business areas. View the slides to learn valuable tips to help your organization repeat the projects proven to yield financial touchdowns, significantly increase project completion percentages, and eliminate time consuming and costly fumbles.

Six Sigma Project Replication Webinar Slides

PowerSteering Software

Ähnlich wie Scaling Teams, Processes and Architectures (20)

Value driven continuous delivery

OSSCube - Zend Webinar

Jax Sql Saturday Scrum presentation #130

Agile values

Amy.stapleton

Agile developers create their own identity by Ajay Danait

Introduction To Scrum

Estimation Agile Projects

Practices of an agile developer

Scrum managing through complexity

The Essential Product Owner - Partnering with the team

APO 2.0

Agile

Introduction to Agile

AMI Presentation

What is this thing called Agile?

Large Scale Software Project

Agile intro module 1

Iqnite keynote

Six Sigma Project Replication Webinar Slides

Kürzlich hochgeladen

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

The action of the next cyber saga takes place in the mystical lands of the Asia-Pacific region, where the main characters began their digital activities in the middle of 2021 and qualitatively strengthened it in 2022. Corporate espionage, document theft, audio recordings, and data leaks from messaging platforms were all a matter of one day for Dark Pink. Their geographical focus may have started in the Asia-Pacific region, but their ambitions knew no bounds, targeting a European government ministry in a bold move to expand their portfolio. Their victim profile was as diverse as a UN meeting, targeting military organizations, government agencies, and even a religious organization. Because discrimination is not a fashionable agenda. In the world of cybercrime, they serve as a reminder that sometimes the most serious threats come in the most unassuming packages with a pink bow.

Cyberprint. Dark Pink Apt Group [EN].pdf

Overkill Security

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

AXA XL - Insurer Innovation Award Americas 2024

The Digital Insurer

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

When you’re building (micro)services, you have lots of framework options. Spring Boot is no doubt a popular choice. But there’s more! Take Quarkus, a framework that’s considered the rising star for Kubernetes-native Java. It always depends on what's best for your situation, but how to choose the best solution if you're comparing 2 frameworks? Both Spring Boot and Quarkus have their positives and negatives. Let us compare the two by live coding a couple of common use cases in Spring Boot and Quarkus. After this talk, you’ll be ready to get started with Quarkus yourself, and know when to select Quarkus or Spring Boot.

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

Jago de Vreede

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

ICT role in 21st century education and its challenges

rafiqahmad00786416

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

DBX First Quarter 2024 Investor Presentation

Dropbox

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Cyberprint. Dark Pink Apt Group [EN].pdf

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

AXA XL - Insurer Innovation Award Americas 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

presentation ICT roal in 21st century education

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

ICT role in 21st century education and its challenges

MS Copilot expands with MS Graph connectors

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

DBX First Quarter 2024 Investor Presentation

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Scaling Teams, Processes and Architectures

1. Lorenzo Alberton @lorenzoalberton Scaling Teams, Processes and Architectures Managing growth London Scalability Group, Innovation Warehouse, 16th April 2012 1

2. Scalability Is About... People Processes Technology 2

3. People Stafﬁng, Roles, Management, Teams 3

4. Stafﬁng Never compromise. Only hire people smarter than you. http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4

5. Stafﬁng Hire people who can ﬁt the company culture. Promote fun in your working environment. http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4

6. Stafﬁng Beware of toxic people http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4

7. Team Size and Structure Micromanaging managers Poor communication too small Overworked team members Low morale too big Can’t accomplish much Low productivity 5

8. Team Size and Structure Micromanaging managers Poor communication too small Overworked team members Low morale too big Can’t accomplish much Low productivity CTO functional PM PM PM Designer Developer Tester Designer Developer Tester Designer Developer Tester Designer Developer Tester Designers Developers Testers 5

9. Team Size and Structure Micromanaging managers Poor communication too small Overworked team members Low morale too big Can’t accomplish much Low productivity CTO functional matrix PM PM PM Proj 1 PM Designer Developer Tester Proj 2 PM Designer Developer Tester Proj 3 PM Designer Developer Tester Proj 4 PM Designer Developer Tester Designers Developers Testers 5

10. Processes 6

11. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs 7

12. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge 7

13. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount 7

14. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process 7

15. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process right time 7

16. Determining Headroom Capacity Current Load 8

17. Determining Headroom Why? Capacity Planning annual budget Hiring plan Current Load Prioritisation 8

18. Controlling Change: Determine Risk http://dilbert.com/strips/comic/2008-05-08/ 9

19. Controlling Change: Determine Risk http://dilbert.com/strips/comic/2008-05-08/ 9

20. Risk Management Risk is cumulative Determine limits and tolerance 10

21. Load / Stress Testing Load testing - identify, document and eliminate bottlenecks through a strict controlled process of measurement and analysis - measure system’s response and stability - verify the app can meet the desired performance objectives (SLA) Stress testing - determine the app’s stability when subjected to above-normal loads - verify the app’s behaviour when close to the breaking point - test the application recoverability (negative testing) 11

22. Barrier Conditions Code reviews Manual and automated QA processes Performance and stress testing Release documentation checks (runbook) Dev, Test, Stage and Live environments Instrumentation checks Protection from signiﬁcant failures 12

23. Technology Architecting Scalable Solutions 13

24. Architectural Principles 14

25. Architectural Principles +1 N + 1 design 14

26. Architectural Principles +1 N + 1 design for rollback 14

27. Architectural Principles +1 N + 1 design for rollback to be disabled 14

28. Architectural Principles +1 N + 1 design for rollback to be disabled to be monitored 14

29. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple monitored live sites 14

30. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology 14

31. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous design 14

32. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless design systems 14

33. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless buy when design systems non core 14

34. Stateless, Asynchronous Systems http://upload.wikimedia.org/wikipedia/commons/4/46/Synchronized_swimming_-_Russian_team.jpg 15

35. Fault Isolative Structures 16

36. Fault Isolative Structures Increase availability Limit impact of failures Easier debugging 16

37. Fault Isolative Structures Increase availability Limit impact of failures Easier debugging First 16

38. Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions causing repetitive problems First 16

39. Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions Natural layout causing or topology repetitive of the site problems First 16

40. Caching for Performance and Scale 17

41. Caching for Performance and Scale Object Caches Usually serialized (marshalling / unmarshalling) get() / set() / replace() APC, Memcached 17

42. Caching for Performance and Scale Object Caches Application Caches Usually serialized Proxy caches (marshalling / Reverse proxy unmarshalling) caches get() / set() / HTTP headers replace() ISP/Uni proxies APC, Memcached Squid, Varnish, mod_cache 17

43. Caching for Performance and Scale Object Caches Application Caches CDNs Usually serialized Proxy caches Multiple locations (marshalling / / backbones Reverse proxy unmarshalling) caches get() / set() / HTTP headers CNAME entries replace() ISP/Uni proxies Akamai, Coral, APC, Memcached Squid, Varnish, Limelight... mod_cache 17

44. Managing “Big Data” storage costs people and software power and space processing power backup time and costs 18

45. Managing “Big Data” The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs 18

46. Managing “Big Data” The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs Evaluate data retention policy Consider multi-tiered storage Distribute data/ work (Hadoop, M/R) 18

47. Monitoring: Measure Everything 19

48. Monitoring: Measure Everything 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors 19

49. Monitoring: Measure Everything 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 19

50. Monitoring: Measure Everything StatsD 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 19

51. DataSift Architecture Some Architecture Pr0n 20

52. DataSift Architecture http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21

53. DataSift Architecture SOA - loosely coupled, independently scalable services. Simple APIs http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21

54. DataSift Architecture SOA - loosely coupled, independently scalable services. Simple APIs example http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21

55. SOA - Scale Each Component 22

56. Our Stack Languages: C++, PHP, Java, Scala, Ruby, Node.JS Storage: MySQL, HBase Cache: Memcached, APC, Redis Queues: ZeroMQ, Kafka, Redis Development/Deployment: GIT, Jenkins CI, RPM, Chef Monitoring: StatsD + Graphite, Zenoss 23

57. Our Stack Languages: C++, PHP, Java, Scala, Ruby, Node.JS Storage: MySQL, HBase Cache: Memcached, APC, Redis Queues: ZeroMQ, Kafka, Redis Development/Deployment: GIT, Jenkins CI, RPM, Chef Monitoring: StatsD + Graphite, Zenoss Secret recipe: amazing people and working environment 23

58. Messaging ZeroMQ: PUSH-PULL, REQ-REP, PUB-SUB (multicast, broadcast) Internal communication: pass messages to the next processing stage, control events, monitoring Kafka/Redis: PUSH-PULL with persistence Internal message / workload buffering and distribution Node.js: WebSockets / HTTP Streaming Message delivery (output) 24

59. 0mq PUSH-PULL (workload distribution) Consumer 1 Consumer 2 Consumer 3 [Round-Robin-ish] 25

60. 0mq PUB-SUB (High Availability) Listener 1 Publisher 1 Listener 2 Publisher 2 Listener 3 [Broadcast] [Dynamic Subscriptions] 26

61. 0mq PUB-SUB (High Availability) DC 1 Publisher 1 Publisher 2 DC 2 27

62. Internal “Firehose” Publishers Subscribers Alice’s John’s Y Z timeline Inbox X subscribe to topic X Data Bus subscribe to topic Y System Fred’s Tech Monitor Followers Blog Feed 28

63. Instrumentation https://play.google.com/store/apps/details?id=net.networksaremadeofstring.rhybudd 29

64. We’re Hiring! http://datasift.com/whoweare/jobs 30

65. References M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley http://theartofscalability.com/ http://www.slideshare.net/quipo/the-art-of-scalability-managing- growth http://www.slideshare.net/postwait/scalable-internet-architecture http://bit.ly/IJKwuc http://agile.dzone.com/news/approaches-organizational https://bitly.com/vCSd49 31

66. Lorenzo Alberton @lorenzoalberton Thank you! lorenzo@alberton.info http://www.alberton.info/talks Questions? 32

Hinweis der Redaktion

\n
Let&#x2019;s start by focusing on the true foundation: people and process, without which true scalability cannot be built.\nPeople are the most important element of scalability, as without people there are no processes and no technology.\n
\n
Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
\n
Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
The intent of change management is to limit the impact of changes by controlling them through their release into the production environment and logging them as they are introduced to production. Gut feeling / finger in the air: thanks to experience + innate ability: fast, but not accurate.\nSemaphore: Assign a risk level of green/yellow/red to each small component, then assign an overall colour: methodical, repeatable, documentable, no longer relying on a single person, accurate. Better: Failure Mode and Effect Analysis.\n\n\n
Risk is cumulative. You might want to establish some limits to the amount of risk that you are willing to allow at a particular time of the day or customer volume.\nAlso consider the human factor, i.e. the level of risk tolerance that a person can have within a certain time frame.\n\n
The purpose of load testing is to identify, document and eliminate bottlenecks in the system through a strict controlled process of measurement and analysis. Load testing is the process of putting load or user demand on a system to measure its response and stability, to verify that the app can meet the desired performance objectives (SLA: service level agreement).\n Establish success criteria (concurrent usage, response time, ...)\n Establish the test environment (as close as possible to the production environment)\n Define the tests (Pareto rule 20% - 80%) to cover different things (endurance, most used, most visible, different components)\n Identify what needs to be monitored / what data needs to be collected\n Run, Analyse, Report to Engineers\n Repeat Tests and Analysis\nStress testing is a process used to determine an application&#x2019;s stability when subjected to above-normal loads, to verify the behaviour when close to the breaking point of the application.\nPositive testing is where the load is progressively increased to overwhelm the system&#x2019;s resources.\nNegative testing takes away resources such as memory, threads, connections, testing the application recoverability.\nThe way to insure that the headroom calculations remain accurate is to conduct performance testing on all your releases to insure you are not introducing unexpected load increases.\n
Good processes for the promotion of systems into the production environment have the capability of protecting you from significant failures. Developing effective barrier conditions and coupling them with a process and capability to roll back production changes are necessary components within any highly available service and are critical to the success of your scalability goals.\n
\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
Synchronous calls, if used excessively or incorrectly cause undue burden on the system and prevent it from scaling.\nSystems designed to interact synchronously have a higher failure rate than asynchronous ones. Their ability to scale is tied to the slowest system in the chain of communications. It&#x2019;s better to use callbacks, and timeouts to recover gracefully should they not receive responses in a timely fashion.\nSynchronisation is when two or more pieces of work must be in a specific order to accomplish a task. Asynchronous coordination between the original method and the invoked method requires a mechanism that the original method determines when or if a called method has completed executing (callbacks). Ensure they have a chance to recover gracefully with timeouts should they not receive responses in a timely fashion.\nA related problem is stateful versus stateless applications. An application that uses state relies on the current condition of execution as a determinant of the next action to be performed. \nThere are 3 basic approaches to solving the complexities of scaling an application that uses session data: 1) Avoidance (using no sessions or sticky sessions) avoid replication: Share-nothing architecture; 2) Decentralisation (store session data in the browser&#x2019;s cookie or in a db whose key is referenced by a hash in the cookie); 3) Centralisation (store cookies in the db / memcached).\n\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
What is the best way to handle large volumes of traffic? Answer: &#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&#x2019;s origin server, and usually lower costs. The total capacity of the CDN&#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
What is the best way to handle large volumes of traffic? Answer: &#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&#x2019;s origin server, and usually lower costs. The total capacity of the CDN&#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
What is the best way to handle large volumes of traffic? Answer: &#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&#x2019;s origin server, and usually lower costs. The total capacity of the CDN&#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
\n
\n
\n
\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
\n
\n
\n
\n
Use queues and workers to make processes asynchronous, distribute data to parallel workers. \n
happy to talk about any of them\n
\n
\n
\n
listeners can only subscribe to one or more topics. Different output channels.\nZeroMQ v3: filtering done on the publisher side\n
An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\n
We collect millions of events every second.\nThe importance of people: devops who know what to monitor, how, how to use and write tools, and have 100% dedication.\nWe use different technologies. It&#x2019;s very easy to set up a new ZeroMQ listener.\nWe use StatsD (from Flickr / Etsy), Zenoss, Graphite\n
shameless plug\n
\n
\n

Scaling Teams, Processes and Architectures

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Scaling Teams, Processes and Architectures

Ähnlich wie Scaling Teams, Processes and Architectures (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Scaling Teams, Processes and Architectures

Hinweis der Redaktion