AEM Architecture slides for Sydney Adobe Meetup 16/9/2015
Discusses architectural components, considerations and philosophies to consider when designing for an AEM implementation
1. AEM Architecture
Adobe Meet-up : Sydney : 16th September 2015
Purpose of these slides:
To describe common AEM architecture options, outline
the various pros and cons, provide some best practise
recommendations for new and existing implementations.
Michael Henderson BAS, BSc(Hons)
Technical Director, NSW
BizTECH Enterprise Solutions
Mobile: +61 430 758 026
Email: mhenderson@btes.com.au
Website: www.btes.com.au
2. Agenda
Part 1: What to do when setting up Author (1 or more), Author-
Dispatcher (1 or more), Publish (1 or more), Dispatcher (1 or
more), CDN (there or not), Clustering/DB (there or not).
Part 2: Connectivity between the components; pointing out
different configurations, advantages/disadvantages, things to
consider, things to think about.
Part 3: Architectural philosophies like KISS, HA, Performance,
Scalability, etc.
3. Part 1 – Architecture Elements
Basic Architecture
What is an AEM instance?
AEM Repositories
Performance: Oak vs MongoDB
Author: Configuration Options
Publish: Configuration Options
Dispatcher: Configuration Options
CDN: Configuration Options
Recommended HA Architecture
4. Basic Architecture (1)
Author
Where all the authoring goodness happens
Publish
Where all the public goodness gets formulated
Dispatcher
Where public goodness gets secured and cached
Author-dispatcher
Where some authoring security takes place
CDN
Where you gain some localised caching and/or security
6. OSGi
What is an AEM instance?
JCR
Sling
AEM
The architecture layers that make up AEM:
7. AEM Repositories
AEM 5.x and older
JCR2, but who really cares anymore. If you do? Let it go.
AEM 6.0
AEM supports JCR2, JCR3, MongoDB
MongoDB introduced as clustering repository
AEM 6.1
AEM supports JCR3, DB2, MongoDB, Oracle
Clustering repository options expanded
Experimental
Support for MySQL, MariaDB and MS SQL Server
Coming in next release?
10. Performance: Oak vs MongoDB (1)
0
2
4
6
8
10
12
14
Oak
MongoMK
(Lower is better)
11. Performance: Oak vs MongoDB (2)
0
2
4
6
8
10
12
14
16
18
Oak
MongoDB
(Lower is better)
12. Performance: Oak vs MongoDB (3)
Summary:
Oak is by far the best performer
Use Oak unless you NEED to go DB-backed
Notes:
Oak = TarMK = JCR3 = JackRabbit3
I haven’t seen any performance stats for the other DB’s
Performance Reference:
http://www.slideshare.net/mmarth/aem-hub-oak-02-full
13. Author: Configuration Options (1)
1. Single Author: JCR (1x) : Not recommended
Where you have only one Author instance
No redundancy. If Author fails, go to backup (lose data)
2. Active/Standby: JCR (2x) : Recommended
Where one Author instance deals with all traffic
The standby Author instance stays synchronised
Referred to as “Cold Standby” (although not cold)
If Author fails, can start “standby” as master instance (no data loss)
3. Active/Active: DB (2+)
Where any Author instance deals with traffic
The Author instances are synchronised via a shared DB
Cannot run on JCR; must run on DB (DB2, Mongo, Oracle)
If one Author fails, can create new instance (or recover) and add to
the “cluster” (no author outage)
Ensure DB is clustered, so it’s not a single point of failure
Can run DataStore on shared disk or S3 (Amazon) for better
performance
14. Author: Configuration Options (2)
What configuration option should you use?
Really simple question to ask yourself:
“Can a single Author instance sustain all the required
author traffic?”
Answers:
Yes = Recommend: Active/Standby (JCR)
No = Recommend: Active/Active (DB)
15. Author: Configuration Options (3)
What can you do to help the Author instance load?
Ensure the project code is efficient
Ensure the Author is running on disk with high IOPS (I.e. a
SSD or striped high IO)
Are there tasks or processes running on the Author that
could be offloaded to another server?
Is the Author server as big as it can be?
16. Publish: Configuration Options (1)
1. Single Publish: JCR (1x) : Not recommended
Where you have only one Publish instance
No redundancy. Failure: Go to backup or rebuild, re-publish (outage)
2. TarMK Farm: JCR (2+) : Recommended
Active/Active configuration
Where any Publish instance deals with traffic
The Publish instance are synchronised via separate replication queues
on Author
If one Publish fails, can “rebuild” from existing Publish instance or Gold
instance; or go to backup or rebuild, re-publish
3. Cluster: DB (2+)
Active/Active configuration
Where any Publish instance deals with traffic
The Publish instances are synchronised via a shared DB
If one Publish fails, can create new instance (or recover) and add to the
“cluster” (no re-publish required)
Ensure DB is clustered, so it’s not a single point of failure
Note: Cannot do rolling deployments
17. Publish: Configuration Options (2)
What configuration option should you use?
One or Two questions to ask yourself:
1. “Do you require the public to submit something and have
it displayed on the website? (I.e. Social Communities)”
Answers:
No = Recommend: TarMK Farm (JCR)
Yes = Ask yourself the 2nd Question
2. “Is there a business requirement for very fast display of
the submitted content with no moderation?”
Answers:
No = Recommend: TarMK Farm (JCR)
Yes = Recommend: Cluster (DB)
18. Publish: Configuration Options (3)
Why are these questions important? I.e. Why should you care?
When your implementation requires public submitted content to appear on your
website, you need the information synchronised (somehow) across all of your
Publish instances.
These questions seek to determine what architecture you’re going to implement
for synchronising across the Publish instances, namely:
1. Reverse replicate to Author and then replicate to all Publish instances
2. Store in shared DB and trigger invalidation across all Dispatchers
If you need to moderate the social posts, then this is best placed to occur on the
Author. Even automated moderation should occur on the Author as it’s highly
likely that an Author will want to deal with the failures like false positives or
false negatives
Consider all the pros and cons of your architecture: I.e:
Don’t run a slower DB architecture unless you have good reasons
Ask yourself whether you’re ok to give up rolling (canary) deployments
Design a way to invalidate the pages on the dispatchers when you need to
19. Dispatcher: Configuration Options (1)
Web server plugin that caches files to aid in website
performance and with some security rules
Runs on: Apache httpd, Microsoft IIS, Oracle iPlanet
Goal is to cache everything for as long as possible
Operates on a publish/unpublish trigger, so cached objects
only get invalidated when they need to (as opposed to a TTL,
which expires content whether they need to or not)
Doesn’t cache everything, so design your implementation so it
will cache everything or at least as much as possible
20. Dispatcher: Configuration Options (2)
How can you cache as much as possible, for as long as possible?
Avoid use of querystrings or ignore irrelevant ones
http://domain/path/to/page.html?name=value
Make use of selectors instead (so long as the variants are not infinite)
http://domain/path/to/page.value.html
http://domain/path/to/page.name-value.html
Do work in Apache before sending to the dispatcher mod
Apply rewrite rules prior
Apply redirects prior & allowed vanity’s with new Dispatcher feature
Apply SSI, ESI or SDI directives prior
Spilt pages into different paths if caching policies are different
Use SSI, ESI or SDI directives, so all fragments can be cached
independently or at least minimising the dynamic activity
Avoid use of “Dispatcher: no-cache” directives (for obvious reasons)
Ignore the authorisation header if you can
21. CDN: Configuration Options
Caches the files closer to where the users are
Can also provide additional protection layers (DDOS, WAF, etc)
Great for (tagged) website assets like: images, CSS, JS, etc.
Ensure you utilise an etag, checksum or equivalent on the filename. This
ensures that if the file is updated, it generates a new filename and is
therefore distinguishable from the old version
Not so useful for HTML pages or fragments
Not all CDNs have a detailed API to flush selected objects or when they
do, the flushes can take a while to take effect, so you might have to rely
on TTLs instead
If you do want to use a CDN API for invalidation, then you need to write a
custom replicator for this. If so, where will it fire? (Author or Publish?)
Ensure you don’t introduce a race condition
Caution: CDNs can be used as a Band-Aid to poor implementation design
and/or utilisation of the Dispatcher layer, so know why you’re going to
use it
22. Recommended HA Architecture
Ensure you don’t have any single point of failure and
potential data loss if anything fails
Avoid using a DB (especially in Publish)
Avoid using a CDN for page (HTML) caching
23. Part 2 - Connectivity
Connection: Dispatcher to Author or Publish
Connection: Author to Publish
Connection: Publish to Author
Connection: Publish to Dispatcher
Connection: AEM to CDN
24. Connection: Dispatcher to
Author or Publish (1)
Dispatcher defines what server it talks to via the
/renders section in the dispatcher.any configuration file
/renders {
/0001 {
/hostname ”<publish1-name-or-ip>"
/port ”<publish1-port>"
}
# optional from here on...
/0002 {
/hostname ”<publish2-name-or-ip>"
/port ”<publish2-port>"
}
}
25. Connection: Dispatcher to
Author or Publish (2)
When you specify multiple renders the page build request will
be sent equally or will send them to the best performing
renderer based on any categories defined in the /statistics
section of the dispatcher.any file
/statistics
{
/categories
{
/search { /glob "*search.html" }
/html { /glob "*.html" }
/others { /glob "*" }
}
}
Tip: If you don’t have multiple renderers, don’t collect stats
26. Connection: Author to Publish (1)
Author to Publish is for publishing or unpublishing content.
Messages are sent via a standard connection called “replicator”
Replicators operate over a point-to-point architecture and
maintain a single queue per replicator
On the Author, create a replicator per Publish instance when
using a single Publish or a TarMK Farm
When using a DB, consider how you are going to replicate the
information without creating a dependency on one Publish node
or issuing duplicate replication messages (load balancer?)
Queues may get processed at different times, which can be a
good thing. I.e. When Publish is down or busy.
Note the potential affect on the Dispatcher when this happens!
27. Connection: Author to Publish (2)
Replicator
Settings
Transport
Proxy
Extra
Options
28. Connection: Publish to Author (1)
Publish to Author is for content that is submitted by the
public and chosen to be stored within Author and then
possibly later to be presented out within the website on each
Publish instance.
Messages are sent via a connection called “reverse replicator”
Reverse replicators operate over a point-to-point architecture
and maintain a single queue per replicator
Content is pulled from Author, so the connection is initiated
(typically) from a more secure zone to a less secure zone.
This is network best practise.
Polling frequency is set to 30s by default
29. Connection: Publish to Author (2)
Two Parts:
1. Publish:
(outbox)
Reverse
Replicator
Settings
Transport
Proxy
Extra
Options
30. Connection: Publish to Author (3)
Two Parts:
2. Author:
(Pull)
Reverse
Replicator
Settings
Transport
Proxy
Extra
Options
31. Connection: Publish to Dispatcher (1)
Publish to Dispatcher is to tell the Dispatcher what
items have changed, so it can follow its rules to
invalidate what it needs to.
Messages are sent via a connection called “dispatcher
flush”
Dispatcher Flushes operate over a point-to-point
architecture and maintain a single queue per replicator
Any dispatcher that can send traffic to this Publish
instance should have a dispatcher flush connection
established
32. Connection: Publish to Dispatcher (2)
Dispatcher
Flush:
Settings
Transport
Proxy
Extra
Options
33. Load Balancer
M:N Dispatcher to Publish
Requests go nicely through a load balancer
Invalidations can’t go through a load balancer
Needs point-to-point connection from each Publish to each
Dispatcher
Publish Tier
Dispatcher Tier
P2P1 P3 P4
D2D1 D3 D4
LBRequest
s
Invalidation
34. Paired Dispatcher to Publish
Each Dispatcher has an assigned Publish instance
Requests go only to the assigned Publish instance
Invalidations go only to the assigned Dispatcher instance
If one instance fails, they both need to be inaccessible
Benefit for easier auto-scaling (independent module)
Publish Tier
Dispatcher Tier
P2P1 P3 P4
D2D1 D3 D4
Request
s
Invalidation
35. Connection: AEM to CDN
AEM to CDN is for when you want to use a CDN API or flush
objects that have been cached there and you don’t want to
wait for a TTL
Although the CDN may have a single entry point, the message
will need to be configured as a replicator on one (or more) of
your AEM instances (Author or Publish)
Flush from Author and run the risk of a race condition (caused
by a Publish instance that was slow to process the message)
Flush from Publish and you will have to choose:
1. Send from just one Publish instance, introducing a possible single
point of failure
2. Send from all your Publish instances, introducing a duplication of
flush messages for the same action
Flush from a custom controller app, but unless you check the
processing queue of all your Publish instances, you may still
run the risk of a race condition
36. Part 3 – Architecture Principles
KISS
HA
Performance
Scalability
Code Debt
37. KISS – Keep it simple, stupid
Design principle coined by the US Navy in 1960
Key philosophy of this principle being:
“Most systems work best if they are kept simple rather than made
complicated”
Often as architects and developers we can get led astray from
“keeping it simple” by cool tech or trends in the market
When adopting “cool tech” or “trendy tech” into an
implementation, which may seem like a great idea at the
time, if not aligned to the core product architecture and its
future roadmap, it can make the implementation unstable or
not able to be upgraded later on
I.e. It’s generally not a good idea to put “frameworks” within
“frameworks”
38. HA – Highly Available
Fact: Hardware and software fails from time to time
This principle is ensuring that the architecture is not prone to
becoming unavailable if one component fails
This generally is aimed at the public delivery side, but is also
important internally as if systems are down, people can’t do
their job
Ensure that every part of the core systems can continue to
operate if one host/application/tool fails
Think about all core (and dependent) areas: Author-
Dispatcher, Author, Publish, Dispatcher, Load Balancers,
Firewalls, LDAP, Databases, Email servers, Networks,
Switches, Cables, Internet Providers, Backend applications or
systems, etc.
39. Performance
Fact: People don’t like slow websites
One of the funniest architect statements I’ve ever heard is:
“You don’t need to cache anything if the servers are fast enough to handle it”
Performance should be a core design consideration from Day 1
and beyond the implementation going live
Websites get more popular, more websites get added to the
system, traffic has peaks and troughs, there are press releases,
product releases, social or environmental events, and seasonal
activity. Servers fail or need patching and sometimes people
hack or attack your environment
Having your site perform as well as it can, (although important)
is not about saving hardware or license costs, it’s about making
your implementation more resilient and pleasant to end users
40. Scalability
The ability to react to market demand and scale the
environment to keep meeting it
One of the beautiful aspects of the AEM architecture is the
modularity of the components. This feature provides a
fantastic platform to support elastic architecture; one that
can automatically scale up or down
With the introduction of virtual servers many years ago and
now with cloud infrastructure, we can tap into available
resources and scale to meet demand if/when required
By creating pigeon-pairs of Dispatcher & Publish, you have a
modular, self-contained architecture that can be easily scaled
up or down.
41. Code Debt
The art of creating unnecessary or convoluted code that you or
someone else needs to look after
Don’t create code that has already been implemented for you:
Examples: sling or acs-commons
Sling: http://sling.apache.org/index.html
ACS Commons: http://adobe-consulting-services.github.io/acs-aem-
commons/
Don’t over engineer your solution
Build what is required now, not what might be needed in the future
Don’t over complicate something that can be implemented more
simply
Consider that someone may need to look after your code
Provide useful comments and appropriate debug statements
43. Summary
We’ve talked about the various components that
typically make up an AEM implementation architecture
We’ve talked about how each of these components
connect to each other and what to think about
We’ve talked about key architecture principles to
consider