15. Service-Oriented Architectures
Service Service Service
A B C
Loose Coupling - Separation of Responsibilities
http://en.wikipedia.org/wiki/Service-oriented_architecture 13
16. Service-Oriented Architectures
Consumer
Service Service Service
A B C
Separate ConsumersSeparation of Responsibilities
Loose Coupling - From Service Implementation
http://en.wikipedia.org/wiki/Service-oriented_architecture 13
17. Service-Oriented Architectures
Consumer Consumer
Proxy Cache
Service Service Service
A B C
Separate ConsumersSeparation of Responsibilities
Loose Couplingcaching atService Implementation
Aggressive - From application level
http://en.wikipedia.org/wiki/Service-oriented_architecture 13
18. Service-Oriented Architectures
Orchestrator
Service Service Service
A B C
Orchestration of distinctFrom ServiceResponsibilities
Separate ConsumersSeparation of Implementation
Loose Couplingcaching at accessible over a network
Aggressive - units application level
http://en.wikipedia.org/wiki/Service-oriented_architecture 13
19. Service-Oriented Architectures
Orchestrator
JSON Thrift
XML
Service Service Service
A B C
Communication distinctFrom Service Implementation
Separate ConsumersSeparation interoperablenetwork
Orchestration of via a -well-definedof Responsibilities
Loose Couplingcaching at accessible over a format
Aggressive units application level
http://en.wikipedia.org/wiki/Service-oriented_architecture 13
22. Independent Horizontal Scaling
Service
A
Orchestrator
Load Balancer
Service
B1 Load balancing
-
Service
Service Multiple nodes
B
B2
14
23. Independent Horizontal Scaling
Rev.Proxy
Better single-node
Service performances with
A application-level
caching
Orchestrator
Load Balancer
Service
B1 Load balancing
-
Service
Service Multiple nodes
B
B2
14
24. Cell Architectures
Ensure that everything
+1 you develop has at least
one additional instance
N + 1 design
of that system in the
event of failure.
Have multiple live,
isolated nodes of the
multiple
same type to distribute
live nodes the load.
http://highscalability.com/blog/2012/5/9/cell-architectures.html 15
25. Cardinality of Nodes on Each Service
3 2 2
5
2
2
2
8 8
5
7 60+
7 7
7 7 7
http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 16
30. Caching with Varnish
No special directives required to cache normal requests.
Just use the defaults, and set Cache-Control headers.
<?php
$ttl = 300; //cache for 5 minutes
$ts = new DateTime('@' . (time() + $ttl));
header("Expires: " . $ts->format(DateTime::RFC1123));
header("Cache-Control: max-age=$ttl, must-revalidate");
?>
Warning: by default, pages with cookies are not cached
21
35. Service Host Discovery - Config Mgr
GET /configuration/<servicename>/hosts
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
{
“service”: “<servicename>”,
“hosts”:[
“10.0.1.33:80”,
“10.0.1.34:80”
],
“base_path”: “/svc/xyz/”
}
26
36. Service Host Discovery - Zookeeper
ZooKeeper is a centralized service for maintaining configuration
information, naming, providing distributed synchronization, and
providing group services. http://zookeeper.apache.org/
<?php
$zk = new Zookeeper();
$zk->connect('localhost:2181');
//server
$params = array(array(
'perms' => Zookeeper::PERM_ALL,
'scheme' => 'world',
'id' => 'anyone'
));
if (!$zk->exists('/services/mysvc/host') {
$zk->create('/services', 'config for internal services', $params);
$zk->create('/services/mysvc', 'config for mysvc', $params);
$zk->create('/services/mysvc/host', 'http://my.site.com', $params);
}
27
37. Service Host Discovery - Zookeeper
ZooKeeper is a centralized service for maintaining configuration
information, naming, providing distributed synchronization, and
providing group services. http://zookeeper.apache.org/
<?php
$zk = new Zookeeper();
$zk->connect('localhost:2181');
//server
$params = array(array(
'perms' => Zookeeper::PERM_ALL,
'scheme' => 'world',
'id' => 'anyone'
));
if (!$zk->exists('/services/mysvc/host') {
$zk->create('/services', 'config for internal services', $params);
$zk->create('/services/mysvc', 'config for mysvc', $params);
$zk->create('/services/mysvc/host', 'http://my.site.com', $params);
}
27
38. Service Host Discovery - Zookeeper
ZooKeeper is a centralized service for maintaining configuration
information, naming, providing distributed synchronization, and
providing group services. http://zookeeper.apache.org/
<?php
$zk = new Zookeeper();
$zk->connect('localhost:2181');
//client
$host = $zk->get('/services/mysvc/host');
...
28
39. SOA - Scale Each Component
http://www.thisnext.com/item/647CD0BE/Matryoshkas-Nesting-Dolls 29
40. SOA - Scale Each Component
http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 30
41. SOA - Scale Each Component
SOA: Independently
scalable services.
Example on distributing
processing load:
http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 30
43. Workers for sharing processing load
Distribute
processing load
among workers.
Lightweight
orchestration,
heavy lifting in
separate,
asynchronous
processes
31
44. Scale all things!
http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 32
45. Scale all things!
Example on scaling
large data volumes:
http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 32
47. In Case of “Big Data”...
With lots of data,
move the processing
logic itself to the
storage nodes
(I/O is expensive)
Map/Reduce,
Parallel Processing
33
49. Messaging
ZeroMQ: PUSH-PULL, REQ-REP, PUB-SUB (multicast, broadcast)
Internal communication: pass messages to the next processing
stage in the pipeline, control events, monitoring.
Very high throughput. Socket library.
Kafka/Redis: PUSH-PULL with persistence*
Internal message / workload buffering and distribution
Node.js: WebSockets / HTTP Streaming
Message delivery (output)
35
50. Message queues as Buffers (Decoupling)
P C
Unpredictable load spikes
36
51. Message queues as Buffers (Decoupling)
P C
Unpredictable load spikes
P C
Load normalisation / smoothing
36
52. Message queues as Buffers (Decoupling)
P C
Unpredictable load spikes
P C
Load normalisation / smoothing
Batching ⇒ higher throughput
36
54. Redis Processing Queue
<?php //producer(s)
$redis = new Redis();
$redis->connect('127.0.0.1', 6379, 1.5); // timeout 1.5 seconds
...
// push items to the queue as they are produced
$redis->lPush('queue:xyz', $item);
...
<?php
... //consumer(s)
while (true) {
// read items off the queue as they are available
// block for up to 2 seconds (timeout)
$item = redis->brPop('queue:xyz', 2);
...
}
https://github.com/nicolasff/phpredis https://github.com/chrisboulton/php-resque 38
60. ZeroMQ Producer (PUSH)
<?php
$context = new ZMQContext();
$producer = $context->getSocket(ZMQ::SOCKET_PUSH);
$producer->bind('tcp://*:5555');
// send tasks to workers.
foreach ($tasks as $task) {
// Blocking operation until the message
// is received by one (and only one) worker
$producer->send($task);
}
...
http://zguide.zeromq.org/php:all 44
61. ZeroMQ Consumers (PULL)
<?php
$context = new ZMQContext();
$worker = $context->getSocket(ZMQ::SOCKET_PULL);
$worker->connect('tcp://myhost:5555');
// process tasks forever
while (true) {
// receive a message (blocking operation)
$task = $worker->recv();
...
}
45
62. 0mq PUSH-PULL (Mux)
Producer 1
pus
hR
1, R
2, R
3
push R4
Producer 2 pull Consumer
5 , R6 fair-queuing:
ushR
p R1, R4, R5,
R2, R6, R3
Producer 3
46
69. Internal “Firehose”
Publishers Subscribers
Alice’s John’s
Y Z timeline Inbox
X
subscribe
to topic X
Data Bus
subscribe
to topic Y
System Fred’s Tech
Monitor Followers Blog Feed
53
70. Internal “Firehose”
Publishers Data Feeds, Subscribers
User-generated Alice’s John’s
content, timeline Inbox
X Y Z
System events, ...
subscribe
to topic X
Data Bus
subscribe
to topic Y
System Fred’s Tech
Monitor Followers Blog Feed
53
71. Internal “Firehose”
Publishers Applications, Subscribers
Services,
Monitors, Alice’s John’s
Y Z Routers, timeline Inbox
X
Repeaters, subscribe
...
to topic X
Data Bus
subscribe
to topic Y
System Fred’s Tech
Monitor Followers Blog Feed
53
72. Internal “Firehose”
Publishers Subscribers
Alice’s John’s
Y Z timeline Inbox
X
subscribe
to topic X
Data Bus
subscribe
to topic Y
Everyone
connected to System Fred’s Tech
the data bus, no Monitor Followers Blog Feed
directed graph
53
73. Internal “Firehose”
Publishers Subscribers
Alice’s John’s
Y Z timeline Inbox
X
subscribe
to topic X
Data Bus
subscribe
to topic Y
System Fred’s Tech
Monitor Followers Blog Feed
53
76. Monitoring: Measure Everything
1. Is there a problem? User experience / Business metrics monitors
2. Where is the problem? System monitors (threshold - variance)
3. What is the problem? Application monitors
55
77. Monitoring: Measure Everything
1. Is there a problem? User experience / Business metrics monitors
2. Where is the problem? System monitors (threshold - variance)
3. What is the problem? Application monitors
Keep Signal vs. Noise ratio high
55
78. Monitoring: Measure Everything
StatsD
1. Is there a problem? User experience / Business metrics monitors
2. Where is the problem? System monitors (threshold - variance)
3. What is the problem? Application monitors
Keep Signal vs. Noise ratio high
55
82. StatsD + Graphite
Example
StatsD: Node.JS daemon. Listens for messages over a UDP port and
extracts metrics, which are dumped to Graphite for further processing
and visualisation.
Graphite: Real-time graphing system. Data is sent to carbon
(processing back-end) which stores data into Graphite’s db. Data
visualised via Graphite’s web interface.
58
83. StatsD Metrics
<?php ; statsd.ini
$statsTypePrefix = 'workerX.received.type.'; [statsd]
host = yourhost
$statsTimeKey = 'workerX.processing_time';
port = 8125
while (true) {
$batch = $worker->getBatchOfWork();
foreach ($batch as $item) {
// time how long it takes to process this item...
$time_start = microtime(true);
// ... process item here ...
$time = (int)(1000 * (microtime(true) - $time_start));
StatsD::timing($statsTimeKey, $time); // time in ms
// count items by type
StatsD::increment($statsTypePrefix . $item['type']);
}
https://github.com/etsy/statsd/ 59
84. StatsD Metrics
<?php ; statsd.ini
$statsTypePrefix = 'workerX.received.type.'; [statsd]
host = yourhost
$statsTimeKey = 'workerX.processing_time';
port = 8125
while (true) {
$batch = $worker->getBatchOfWork();
foreach ($batch as $item) {
// time how long it takes to process this item...
$time_start = microtime(true);
// ... process item here ...
$time = (int)(1000 * (microtime(true) - $time_start));
StatsD::timing($statsTimeKey, $time); // time in ms
// count items by type
StatsD::increment($statsTypePrefix . $item['type']);
}
https://github.com/etsy/statsd/ 59
87. Look! Rib cages! Network Load Viz
http://www.network-weathermap.com/ http://cacti.net 61
88. Look! Rib cages! Network Load Viz
Not enough!
Contextualise metrics
http://www.network-weathermap.com/ http://cacti.net 61
89. Cacti + WeatherMap
Example
Cacti: Network graphing solution harnessing the power of RRDTool’s
data storage and graphing functionality. Provides a fast poller, graph
templating, multiple data acquisition methods.
Weathermap: Cacti plugin to integrate network maps into the
Cacti web UI. Includes a web-based map editor.
62
94. Monitoring Reporting Guidelines
Make the subtle obvious
Make the complex/busy simple/clean
Group information by context
Detect anomalies/deviation from norm
Turn raw numbers into graphs
Appeal to intuition
64
97. Lorenzo Alberton
@lorenzoalberton
Thank you!
lorenzo@alberton.info
http://www.alberton.info/talks
https://joind.in/6372
67
Hinweis der Redaktion
I&#x2019;m Lorenzo, I&#x2019;m Italian but live in the UK. \nI&#x2019;ve been working on several large scale websites like the BBC, Channel 5, Ladbrokes, iPlayer.\nI spent the past two years as Chief Architect at DataSift, a hot big-data startup. \n
\n
\n
I&#x2019;m going to introduce DataSift to explain what we do and how we do it.\nDon&#x2019;t worry, this is not a sales pitch, I&#x2019;m just using DataSift as an example of how to build a scalable architecture based on lessons learnt in the past.\n
Some architecture porn.\n\n
Sources are Twitter, Facebook, YouTube, Flickr, Boards, Forums, etc.\nNews agencies: Thomson Reuters, Associated Press, Al-Jazeera, NYT, Chicago Tribune, etc.\nData Normalisation + Augmentation. Make data rich and structured.\nLanguage detection, demographics (gender detection), trends analysis, sentiment analysis, influence ranking, topic analysis, entities.\n
2nd stage: the core filtering engine. A scalable, highly parallel, custom-built C++ Virtual Machine.\nCan process thousands of incoming messages per second, and thousands of custom filters.\n
Web site, public API, Output streams (HTTP Streaming, WebSockets), Buffered streams (batches of messages), and finally...\n
...storage. We record everything in our Hadoop cluster (historical access, analytics).\nWe also have watchdogs to keep track of usage limits, licenses, etc.\n
I&#x2019;m going to give you some numbers to give you a sense of the scale we&#x2019;re operating at.\nBetween 3 and 9K/sec depending on the time of the day.\n\n
\n
\n
Now, everyone here heard about service-oriented architectures, but I&#x2019;m going to share some of the lessons I learnt in the past on how to scale a platform, that helped me designing and scaling DataSift and other large enterprise sites before it.\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
Avoid failover (hot-swap) configuration. They don&#x2019;t work well and usually involve downtime or data loss.\nCells provide a unit of parallelization that can be adjusted to any size as the user base grows.\nCell are added in an incremental fashion as more capacity is required.\nCells isolate failures. One cell failure does not impact other cells.\nCells provide isolation as the storage and application horsepower to process requests is independent of other cells.\nCells enable nice capabilities like the ability to test upgrades, implement rolling upgrades, and test different versions of software.\nCells can fail, be upgraded, and distributed across datacenters independent of other cells.\n\n
As an example, this is the current cardinality of servers we have for each service.\nEach box in the diagram has between 2 and 60+ nodes.\n
Let&#x2019;s have a look at how to practically implement load-balancing and application caching.\n
You can buy a hardware appliance (excellent, expensive), or use a software like HA-Proxy.\nSet the service nodes as backend servers.\nHA-Proxy will do health-checks, and reroute the traffic to the healthy nodes.\n
Use a random director to have weights (send more load to a more powerful machine).\nThe random director uses a random number to seed the backend selection.\nThe client director picks a backend based on the clients identity. You can set the VCL variable client.identity to identify the client by picking up the value of a session cookie or similar.\nThe hash director will pick a backend based on the URL hash value (req.hash).\nThe fallback director will pick the first backend that is healthy. It considers them in the order in which they are listed in its definition.\n
\n
It works out of the box, just set Cache-Control headers.\nIt supports ETags to cache several versions of the same page for different customers.\nEdge-Side Includes. Thijs\n
We&#x2019;ve seen some characteristics of Service Oriented Architectures, what they are and why they are useful. \nThere&#x2019;s another incredibly important defining characteristic of SOAs: the API, i.e. the contract between any two services. It&#x2019;s a software-to-software interface, not a user interface.\n
Keep it simple: RESTful verbs, actions on resources, simple data structures in exchange data format \nDefine the action, the endpoint, the parameters, the response\nReserve endpoint for description of the service&#x2019;s API.\nUse the response to generate API docs.\nFeed to test console as configuration.\n
I recommend a tool that really makes your API docs alive.\nMashery IO Docs: example of working documentation.\nDefine an API for all services (internal AND external)\nReserve an endpoint to describe the API for the service itself\nRESTful. Personal preference for plain-text format (XML or JSON)\n
Reserve the root endpoint (or a /discovery or /self endpoint) to a description of the service&#x2019;s API.\nBonus: if the response is in the Mashery IO Docs&#x2019; format, you can have a web interface to document and test the API.\n
Instead of hard-coding the configuration of all the services everywhere, expose the configuration via a separate service.\n\n
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.\nIt looks like a distributed file system, each node can have children and properties.\nEach service can register itself at startup and become available to receive requests.\n
The consumer simply reads the properties of a node (file / path)\n\n
As we saw, each component should be able to scale horizontally. \n\n
There are two possible problems:\n- when processing itself is expensive\n- when there&#x2019;s too much data\n
There are two possible problems:\n- when processing itself is expensive\n- when there&#x2019;s too much data\n
Internally\nUse queues and workers to make processes asynchronous, distribute data to parallel workers. \nCurl-multi, low timeouts.\n\n\n
Internally\nUse queues and workers to make processes asynchronous, distribute data to parallel workers. \nCurl-multi, low timeouts.\n\n\n
\n
\n
don&#x2019;t move the data to the processing nodes. I/O is very expensive.\n
2nd part of the talk: moving data around (communication across services).\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
At DataSift we use different message systems, depending on volume, destination, communication type.\n
Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
We&#x2019;ve seen simple buffering. Let&#x2019;s now see a few more useful patterns.\nThe first example shows how to move from one processor to several nodes, to distribute the data and process it in parallel.\nPUSH-PULL is an efficient pattern for workload distribution \n
\n
\n
Workload distribution with workers\n
You can also invert producers and consumer and have a multiplexer to join messages coming from several nodes back into a single one.\n
The second pattern shows how to distribute data in a non-exclusive way: each consumer gets a copy of the same data, the items are not removed from the queue when one consumer gets them. \nThe producer doesn&#x2019;t need to know who&#x2019;s listening, it doesn&#x2019;t need to have a registry of addresses of connected consumers.\nMongrel2\n
You can also broadcast to different datacenters.\nListeners can only subscribe to one or more topics. Different output channels.\nZeroMQ v3: filtering done on the publisher side\n
broadcasting\n
\n
\n
\n
An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
Statistics are better than logs. At certain volumes, logs are just noise (and a waste of space), make your application dynamically configurable to turn logging on only when strictly necessary.&#xA0; Statsd / Graphite.\nMonitor everything. Set alerts based on deviance from norm, not just on absolute thresholds.\n\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
We collect millions of events every second.\nThe importance of people: devops who know what to monitor, how, how to use and write tools, and have 100% dedication. Useful: mobile phone apps receiving alerts from Zenoss.\nWe use different technologies. It&#x2019;s very easy to set up a new ZeroMQ listener.\nWe use StatsD (from Flickr / Etsy), Zenoss, Graphite\n
Here&#x2019;s a photo of our monitoring wall. We even have an emergency lighting with a siren, triggered by Zenoss alerts.\n
Here&#x2019;s a photo of our monitoring wall. We even have an emergency lighting with a siren, triggered by Zenoss alerts.\n
With the Etsy library, you can sample the sending rate. UDP.We created a wrapper to buffer and aggregate stats in memory for a while and then to flush them at regular intervals, to save a LOT of bandwidth.\n
With the Etsy library, you can sample the sending rate. UDP.We created a wrapper to buffer and aggregate stats in memory for a while and then to flush them at regular intervals, to save a LOT of bandwidth.\n
With the Etsy library, you can sample the sending rate.We created a wrapper to buffer and aggregate stats in memory for a while and then to flush them at regular intervals, to save a LOT of bandwidth.\n
Monitoring at application level, system level, infrastructure level. Heatmap of any link of the pipeline (physical and logical). Network rib-cages like this one are NOT ENOUGH! You want to contextualise the metrics you receive.\n + Cacti\n
\n
When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
Information density is important, but don&#x2019;t overdo it: keep the signal-to-noise high.\nUse colours. Cognitive process: let the visual cortex do the work. Normalise.\nIntuition is involuntary, fast, effortless, invisible.\nAttention is voluntary, slow, difficult, visible.\n
\n
happy to talk about any of them\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
- N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
Synchronous calls, if used excessively or incorrectly cause undue burden on the system and prevent it from scaling.\nSystems designed to interact synchronously have a higher failure rate than asynchronous ones. Their ability to scale is tied to the slowest system in the chain of communications. It&#x2019;s better to use callbacks, and timeouts to recover gracefully should they not receive responses in a timely fashion.\nSynchronisation is when two or more pieces of work must be in a specific order to accomplish a task. Asynchronous coordination between the original method and the invoked method requires a mechanism that the original method determines when or if a called method has completed executing (callbacks). Ensure they have a chance to recover gracefully with timeouts should they not receive responses in a timely fashion.\nA related problem is stateful versus stateless applications. An application that uses state relies on the current condition of execution as a determinant of the next action to be performed. \nThere are 3 basic approaches to solving the complexities of scaling an application that uses session data: 1) Avoidance (using no sessions or sticky sessions) avoid replication: Share-nothing architecture; 2) Decentralisation (store session data in the browser&#x2019;s cookie or in a db whose key is referenced by a hash in the cookie); 3) Centralisation (store cookies in the db / memcached).\n\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
What is the best way to handle large volumes of traffic? Answer: &#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&#x2019;s origin server, and usually lower costs. The total capacity of the CDN&#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
What is the best way to handle large volumes of traffic? Answer: &#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&#x2019;s origin server, and usually lower costs. The total capacity of the CDN&#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
What is the best way to handle large volumes of traffic? Answer: &#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&#x2019;s origin server, and usually lower costs. The total capacity of the CDN&#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n