The document provides an overview and roadmap of the Alfresco platform. It discusses ongoing projects to improve scalability, simplify upgrades, separate Share, and consolidate/expand APIs and SDKs. It details how Alfresco tested a deployment with 1.2 billion documents on AWS using 10 nodes and 20 Solr shards, indexing in 5 days. It recommends sharding for performance and operations. The roadmap targets releasing improvements in early 2016 with Alfresco.next and ongoing strategic work in 2016 on REST APIs, modularity, and Share releases.
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Boston
1. Alfresco Platform Update & Roadmap
Gabriele Columbro
Sr. Product Manager, API / SDK / Platform
@mindthegabz
Alfresco Day Boston, November 2015
2. 2
A look at today’s presentation agenda
Alfresco Platform Roadmap
Platform Vision
What problems does the Alfresco
Platform helps and will help solve and
for which personas
Platform projects
Overview of current ongoing platform
initiatives
3. 3
Extreme Scalability
Proving Alfresco at Cloud scale and
providing tools & reference point for real
life implementations
Upgrade Task Force
Simplification of the customer
maintenance lifecycle, in response of
overwhelming customer validation
Share separation
Effects of the Share separation and Core
platform modularization
Dev Platform & SDK
Consolidate & Expand APIs / extension points to ensure high
longevity Alfresco application development and greatly
simplify SDK based Alfresco development
4. 4
A look at today’s presentation agenda
Alfresco Platform Roadmap
What’s in it & when?
When can you expect release of the
ongoing projects, what are backlog and
horizon 2 projects
Conclusions and QA
Recap of the platform lifecycle makeover
and open discussion
5. Vision for the Alfresco Platform
Objectives and guiding forces driving development of the Alfresco Platform
6. 6
Build an open and scalable platform to power the rapid development and
deployment of hybrid content centric applications in the Alfresco
extended ecosystem
Platform Vision
7. 7
Technology & market innovation driving Alfresco Platform strategy
Driving Forces
Hybrid ECM Innovate at Cloud speed Think Big Customer driven
Platform and solutions should be
able to run on premise, on cloud
or both
Deliver innovation to the on
premise and cloud products with
agility typical of pure SaaS players
Enable the scaling of people,
processes and products
Customer feedback, research,
validation, pretotyping at the core of
ideation and decision making process
8. 8
Key platform improvements research has uncovered
Customer data Driven
Backwards
Compatibility
Java Modules
Improve content
reindexing
Backwards
Compatibility
Share Extensions
Modules
Isolation
In place
upgrade
SP & HF
Lack of
Zero downtime
upgrades
Backwards
Compatibility
Remote Applications
#3 #1#5 #2 #4#6 #7
10. 10
Platform Investments
An end to end Platform lifecycle makeover
DeploymentTesting Release Integration Maintenance
Standard Dev Env
Share Separation
API BCKs
Xtreme scalability
Share separation API compatibility
JAR modules
Modules isolation
Dev Docs / Samples
Solr Sharding
Suite installers
In-place SP & HF
API Compatibility
Share separation
Development
11. Alfresco reaches the 1B document mark on AWS
• 10 Alfresco 5.1 nodes, 20 Solr 4 nodes in Sharding mode, 1 Aurora DB
• Loaded 1B documents at 1000 docs / sec – 86M per day
• Indexed 1B documents in 5 days – > 2000 docs / sec
• No degradation in ingestion or content access upon content growth
• Tested up to 500 Share concurrent users and 200 CMIS concurrent sessions
“We applaud Alfresco’s ability to leverage Amazon Aurora to
address business requirements of the modern digital enterprise,
and enable a more agile and cost-effective content
deployments.”
Anurag Gupta, Vice President, Database Services, Amazon Web Services, Inc. –
2015 October 6th
11
Highlights
Press release
12. 12
BechmarkResults
Introducing the Extreme Scalability benchmark
• Repository Layout
– 10k sites; 2 levels deep; 10 folders per level; 1000 files per folder
– 100 kb avg plain text files with varying content complexity (for indexing purpose)
– Default content model
• Scenarios
– Share interaction (Enterprise Collaboration)
• First focused on the Repository, no Search
• Then with Search, including Solr4 Sharding
– CMIS interaction (Headless Content Platform)
• Transactional Metadata Query testing
• AWS Fully cloud environment (provisioned by chef-alfresco)
– Alfresco 5.1 + Share 5.1 (development code, unreleased)
– AWS EC2 / Aurora (Mysql compatible and Alfresco supported)
– Ephemeral for Index storage / EBS for content storage (spoofed)
13. 13
Cloudstack
1.2B documents execution environment
UI Test x 20 m3.2xlarge
Simulate 500 Users
• Selenium / Firefox
• 1 hour constant load
• 10 sec think time
UI Test UI Test
Alfresco Alfresco Alfresco x 10 c3.2xlarge
Alfresco Repo and
Share
Solr x 20 m3.2xlargeSolr Solr
Aurora x 1 db.r3.xlarge
ELB
Sharded Solr 4
sites folders files transactions dbSize GB
10,804 1,168,206 1,168,206,000 15,475,064 3,185
EBS
Ingestion
(in place)
EBS
14. 14
Cloudscaletesting
How did we test it?
• Repository Loaded using
bm-dataload (with file
spoofing option)
• 1B document benchmark
AKA BM-0004 - Testing
Repository Limits base on
bm-share
• Scalability & Sizing testing
on Enterprise Collaboration
Scenario (bm-share) and
Headless Content Platform
(bm-cmis)
https://wiki.alfresco.com/wiki/Benchmark_Testing_with_Alfresco
https://github.com/derekhulley/alfresco-benchmark
Benchmark Server
Tomcat 7
Rest API
MongoDB
Config Data
Services
MongoDB
Test Data
UI
Benchmark Driver (xN)
Benchmark Driver (xN)
Benchmark Driver
Tomcat 7 Extras
(Selenium)
Servers / APIs Servers / APIs
Load Balancer
Servers / APIs
Test
Services
Rest API
15. 15
BenchmarkResults
Getting to 1B documents
• Ingestion
– With 10 nodes, 1000 documents / second (3 million per hour, 86M per day, 12 days for the full repo) – spoofed
content comparable to in place BFSIT loading
– Load rate consistent even beyond 1B documents
– Throughput grew linearly by adding ingestion nodes (100 docs / sec per node)
– Adding additional loading nodes likely to raise ingestion throughput – as Aurora was only at 50% CPU
• Indexing
– Index distributed over 20 Alfresco Index Servers, sharding on ACLs (good for site based repository), with
Alfresco dedicated tracking instance
– Each shard holds approx (in excess of) 50M nodes
– Re-Indexing completed in about 5 days (each node tracks a sub-set of the 1B)
– Dynamic sharding autoconfiguration (5.1 feature)
NOTE: requires Alfresco tracking nodes to be in the cluster
16. 16
The following information is based on an development version of the unreleased Alfresco 5.1.
Performance data is provisional and subtle to change based on testing the final Alfresco 5.1 release.
17. 1717
BechmarkResults
Testing Alfresco on 1b docs
• Repository Only (500 Share users) test
– Sub-second login times and good, linear responses for other actions
• Open Library: 4.5s / Page Results: 1s / Navigate to Site: 2.3
– CPU loads:
• Database: 8-10% / Alfresco (each of 10 nodes): 25-30%
• Shows room for growth up to 1000 concurrent users
• Repository + Search (100 Share users)
– Metadata and full text search ~ 5s (on 1B documents)
– 1.2 searches / sec hitting the 20 shards
• TMDQ queries (database only, no index) via CMIS
– IN_FOLDER (sorted, limited) ~ 160ms at CMIS interface
– CMIS:NAME (=, LIKE) ~ 20ms at CMIS interface
18. 18
Recommendations
Lessons Learned
• A single Alfresco repository can grow to 1B documents on AWS without notable issues, especially
with a scalable DB like AWS Aurora
• As for the index, Shard, Shard, Shard
– Shard to cope with content growth
• Single Solr instance tuned for 50M docs / 32GB
– Shard for performance / SLA
• Improve performance of search on large scale repositories to hit SLA requirements
– Shard for operational reasons
• Improve reindexing time (1B docs re-index in 5 days with 20 shards)
NOTE: Sharding has a cost of results post-ranking. Use reasonably.
• No indications of any size-related bottlenecks with 1.1 Billion Documents
• DB Indexes optimized (no index scans) even at a 3.2TB Aurora DB
19. 19
5.1
Key Alfresco 5.1 scalability items to look forward to
• Alfresco Solr Sharding
– On ACL
– Tested up to 80M documents per shard and 20 shards
• Improved Transactional metadata queries
– Boolean, Double and OR construct
• Easy deployment and scaling in AWS using provisioning technologies like chef-alfresco
• Alfresco support for Amazon Aurora (also available in Alfresco 5.0)
• Updated field collaterals
– Scalability Blueprint for Alfresco 5.1
– Sizing Guide for Alfresco 5.1
– AWS Reference architecture, implementation guide and CloudFormation template for Alfresco 5.0 and 5.1
21. 21
Enabling a seamless maintenance for Alfresco
Upgrade Task Force
1. In place application of SP & HF (not major and minor upgrades, for now)
2. Separation of Share and Platform releases for independent consumption (and definition
of a clear compatibility matrix)
3. Consolidation of Public API Lifecycle to ensure high longevity customizations (no need for
re-test)
NOTE: Not tied to Alfresco 5.1, the update assistant will be released for earlier versions
22. 22
Effects to the product lifecycle
Share / Platform separation
Platform and Share
can be built
and developed
independently
Dev Release Install
Platform and Share
can be released
independently (or
together)
Maintain
Suite and
independent
installers for
Alfresco and Share
Consume new
version of Platform
& Share
independently
23. 23
Modularizing the platform
Breaking the monolith
Alfresco Platform
Core set of functionalities exposing
extension points including Java and
ReST APIs
Transformation services
Can be scaled independently using the
transformation server or in MM for
video transformations
Share services
(New!)Subset of platform functionalities now
extracted in a separate module (AMP)
following the Share release lifecycle
Search services
Can be scaled independently as it relies on
Solr4 standalone (with Replication and
Sharding support)
24. 24
Share separation takeaways
1. Share (only) releases will now contain a share-services.amp which contains Share
specific backing APIs
2. Platform (only) released will no longer contain Share specific Java services
3. All-in-one installers (Share + Alfresco + AMPs) will be produced
4. Compatibility between Share & Alfresco is driven by the Java (not ReST) APIs
compatibility policy (wait for it…in the next slides!)
5. Expect more frequent Share releases on prem (quarterly) and on cloud
What you need to know!
25. 25
Alfresco for the Developers
1. Comprehensive set of content management & workflow Java and ReST API
2. Modular UI framework to custom business solutions
3. De facto standard based and enterprise ready SDKs for web and mobile development
What’s great about Alfresco Dev Platform
26. 26
Multiple ways Alfresco helps you achieve your custom solutions
The Alfresco Developer conundrum
Compatibility
Dev Env
Compatibility
Aikau based
Dev Env
ReST - StrategicJava - Tactical
28. 28
Developer platform consolidation
1. Documentation of supported Platform, Share and ReST extension points
Move old webscript ReST API to Limited Support
Invest on the on new Alfresco ReST API V1
Cleary identify and document supported Java and Share extension points
2. API lifecycle, support and Backward compatibility
In process - Major version support
ReST - Independently versioned and inherently backward compatible
3. Customer success driven tactical investments on the Java platform & modules
JAR simple module support (for Alfresco and Share)
Physical isolation of modules without need to modify Alfresco (immutable)
Share modules support and reporting
Ongoing activities targeting Alfresco.next
31. 31
So what about compatibility?
1. Major version for Platform and Share extensions (modules)
Your custom module built on 5.1 Public API will work throughout the whole 5.x
Alfresco modules can be compatible for a major version
2. ReST API version driven support for integrations (standalone apps)
Not bound to the Alfresco version
Clear rules for versioning of ReST APIs
Supported until v+2 is released or 1y after v+1 is released (the earliest)
For internal and external Alfresco extensions and integrations
32. 32
Alfresco SDK
What’s out already
Alfresco SDK 2.1.0 - Compatible with 5.0, with hot reloading (Platform & Share)
Alfresco SDK 2.1.1 - Multiple bug-fixes, backward compatible
Together with Alfresco next
Fully supported, easily forkable and complete set of samples on alfresco-sdk-
samples (in Github)
Improved hot reloading
Customer value driven prioritization of Public Github issues. Request
enhancements at https://github.com/Alfresco/alfresco-sdk/issues
Making Alfresco development even more productive, safe and fun
34. 34
Information provided in the following slide is roadmap information and therefore subtle to change in subject, timelines, context.
35. 35
Platform release targets
1. Target: Alfresco.next —> Early 2016
Both Platform and Share
Includes all major Developer Platform improvements
Solr sharding and scalability collaterals
Full revamp of developer documentation
2. Post Alfresco.next —> 2016
Share can follow a more frequent release schedule
Strategic improvements in the ReST API (vs Java), functionally and non functionally
More modularization, for agility and scalability purposes
37. 37
Take-aways
1. API Lifecycle
Fundamental to avoid dependency hell
Clear, documented, easy to use and supported extension points
Key factor to drive seamless upgrades
2. Developer Platform
Jar modules
Share modules support and reporting
3. Extreme Scalability
Solr Sharding
MDQ improvements
New collaterals for sizing, scalability and reference architectures
3. Share lifecycle separation
4. Upgrade task force
What you really need to remember about today’s session
38. 38
WHAT WHY WHERE WHEN WHO HOW
Any Question ???
Feel free to send your feedback at gabriele.columbro@alfresco.com or
@mindthegabz