2. 22
Discovering the 2 in Search Services 2.0
Tech Talk Live
• Solr Core and Solr Schema
• Security, Performance and Precision
• Enterprise Enhancements
• One more thing...
• Q&A
14th October 2020
4. 4
Solr Content Store Removal
ACS
Repository
Content Store
Search Services
1.4
Content StoreDB Solr Index
COMMUNITY
5. 5
Solr Content Store Removal
ACS
Repository
Content Store
Search Services
1.4
Content StoreDB Solr Index
ACS
Repository
Content Store
Search Services
2.0
DB Solr Index
COMMUNITY
6. 6
Solr Content Store Removal Benefits
Removed custom code
9,311 lines of code removed
https://github.com/Alfresco/SearchServices/blob/mas
ter/search-services/alfresco-
search/doc/architecture/solr-content-store-
removal/00001-solr-content-store-removal.md
Helps leverage built-in Solr features
It's now possible to make use of built-in Solr features
(e.g. replication and backups)
Reduces I/O work
Particularly in systems with replication
Reduced disk usage
Search Services Version 1.4 2.0
Index Size (bytes per doc) 1 3,000
Content Store Size (bytes per doc) 40,000 0
COMMUNITY
7. 7
Solr Content Store Removal Reindex
• Moving data from the content store to the index requires a reindex
Reindexing with sharding: Demo later
For more information see:
https://github.com/aborroy/solr-sharding-reindex
For more information about
reindexing see:
https://www.alfresco.com/events/webinars/
tech-talk-live-reindexing-large-repositories
COMMUNITY
TTL
#120
8. 8
Solr Content Store Removal Impact
● More efficient replication as we're now using the default Solr
mechanism
○ Docker-compose example available at
https://github.com/aborroy/search-services-replication
● Now using atomic updates instead of removing and
recreating documents
○ To achieve this we enabled the SOLR Transaction Log
● Review your backup and restore procedures, as the folder
$SOLR_HOME/contentstore is not created anymore
$ du -h /opt/alfresco-search-
services/data/alfresco
4.7M ./index
8.5M ./tlog
4.0K ./snapshot_metadata
COMMUNITY
FTSSTATUS
9. 9
Full information for a
Document can be still
recovered by using Solr
Queries.
Solr Content Store Removal Impact
http://127.0.0.1:8983/solr/alfresco/select?fl=*,[cached]&indent=on&q=DBID:563
COMMUNITY
10. 10
New Destructured Date Fields
Solr schema simplification solrhome/core/conf/schema.xml
Improved storage of DATE fields
quarter
day_of_month
day_of_year
day_of_week
COMMUNITY
11. 11
New fields *_unit_of_time_* can be used to build queries
Get all the documents created in 2020
SOLR FTS
Nb. CMIS is also supported, but not for this example:
● cm:created is not supported as cm:auditable aspect is not exposed for CMIS protocol
New Destructured Date Fields
COMMUNITY
12. 12
Asynchronous Actions and Maintenance
SearchServices
Administrator
Maintenance Queue
Retryt1
Commit TrackerIndex
----
----
----
https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html
COMMUNITY
13. 13
Asynchronous Actions and Maintenance
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Commit TrackerIndex
----
----
----
https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html
COMMUNITY
14. 14
Asynchronous Actions and Maintenance
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Purge
t3
Commit TrackerIndex
----
----
----
https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html
COMMUNITY
15. 15
Asynchronous Actions and Maintenance
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Purge
t3
Fixt4
Commit TrackerIndex
----
----
----
https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html
COMMUNITY
16. 16
Asynchronous Actions and Maintenance
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Purge
t3
Fixt4
Commit TrackerIndex
----
----
----
t5
Dequeues scheduled work
https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html
COMMUNITY
17. 17
Asynchronous Actions and Maintenance
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Purge
t3
Fixt4
Commit TrackerIndex
-+-
--+
+--
t5
Dequeues scheduled work
t6
Index management
https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html
COMMUNITY
18. 18
The FIX tool finds transactions and ACL change sets which are mismatched between the DB and Solr
It adds them to be reindexed on the next maintenance cycle performed by the CommitTracker
FIX Tool
{
"responseHeader": {
"QTime": 1,
"status": 0
},
"action": {
"status": "scheduled",
"txToReindex": [1, 2],
"aclChangeSetToReindex": [3, 4]
}
}
Old Response Shape
● “status” is always scheduled
● Only two error categories
● Each category contains the corresponding
transaction identifiers
COMMUNITY
19. 19
{
"responseHeader": {
// As before
},
"action": {
"dryRun": true,
"status": "notScheduled",
"txToReindex": {
"txInIndexNotInDb": {
"192": 282, // Tx 192 is associated to 282 nodes
"827": 99 // Tx 192 is associated to 282 nodes
},
"duplicatedTxInIndex": {...},
"missingTxInIndex": {...}
},
"aclChangeSetToReindex": {
// Very similar to txToReindex, but for ACLs
}
}
}
FIX Tool New Features
● dryRun (defaults to true): If true the output report is
generated, but no reindex work is scheduled.
● fromTxCommitTime: The lower bound (the minimum
transaction commit time) of the target transactions
that you want to check or fix.
● toTxCommitTime: The upper bound (the maximum
transaction commit time) of the target transactions
that you want to check or fix.
● maxScheduledTransactions: The maximum number
of transactions that will be scheduled. The default is
500 but this can be overridden in solrcore.properties.
COMMUNITY
20. 20
Enable/Disable Indexing
Motivation: Disable indexing in order to cancel a huge maintenance load
• Enable / disable indexing on a specific core or on all master/standalone cores
• MetadataTracker, ContentTracker, CascadeTracker, AclTracker are affected
• CommitTracker, ModelTracker, ShardStatePublisher are not affected
• When disabled, some admin endpoints (e.g. PURGE,INDEX) won’t execute
• When disabled, the FIX endpoint will be forced to run in dryRun mode
• If indexing is disabled in the middle of a tracking process, trackers will be set to rollback mode
• Commands are idempotent
• For more information see https://issues.alfresco.com/jira/browse/SEARCH-2330
Examples:
Disable indexing on all master/standalone cores
http://localhost:8983/solr/admin/cores?action=enable-indexing
Disable indexing on a specific (master or standalone core)
http://localhost:8983/solr/admin/cores?action=enable-indexing&core=alfresco
COMMUNITY
21. 21
FIX Tool Demo
Postman Collection containing the example requests used in the demo
https://www.getpostman.com/collections/4c2fbe407a0134729546
COMMUNITY
23. 23
● Communication between Repository and SOLR (for searching and indexing) may be
protected using mTLS Protocol with client authentication [1]
● New password handling mechanism has been introduced from ASS 2.0 / ACS 6.2.N [2]:
○ Switch from storing configuration in property files with passwords in plain text to JVM system
properties
○ The old way of configuring should still work for backwards compatibility, but is discouraged for security
reasons
[2] ACS 6.2.N is not released yet!
New mTLS Configuration
[1] https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-6-1-is-coming-with-mutual-tls-authentication-by-default/ba-p/287905
COMMUNITY
24. 24
alfresco-ssl-generator command Line Tool to generate self-
signed certificates (classic and current formats)
https://github.com/Alfresco/alfresco-ssl-generator
alfresco-solr-docker-mtls sample configuration (repo using
classic and solr using current)
https://github.com/aborroy/alfresco-solr-docker-mtls
Additional resources
Installing and configuring Search Services with mutual TLS using the
distribution zip
https://docs.alfresco.com/search-community/tasks/solr-install.html
Alfresco mTLS Configuration Deep Dive
https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-mtls-
configuration-deep-dive/ba-p/296422
New mTLS Configuration
COMMUNITY
26. 26
Trackers Reworking
Transaction Batch Size for nodes and ACLs has an impact while the
maximum number for your deployment is not reached. After that, you can
increase this batch size but there will be no performance changes
alfresco.transactionDocsBatchSize (default 2000)
alfresco.changeSetAclsBatchSize (default 500)
Increasing the Node Batch Size can improve your performance up to an
optimal point for your deployment. After that, you can increase this batch
size but the performance will be penalised
alfresco.nodeBatchSize (default 100)
alfresco.cascade.tracker.nodeBatchSize (default 10)
alfresco.contentUpdateBatchSize (default 2000)
alfresco.aclBatchSize (default 100)
Increasing the maximum number of Parallel Threads improved performance
until the maximum number for the deployment is reached.
alfresco.metadata.tracker.maxParallelism (default 32)
alfresco.cascade.tracker.maxParallelism (default 32)
alfresco.content.tracker.maxParallelism (default 32)
alfresco.acl.tracker.maxParallelism (default 32)
HOTSPOT
HOTSPOT
Execution
Time
Parameter
Size
solrcore.properties
1
2
3
COMMUNITY
27. 27
FTS operator = has changed behaviour in 2.0.0
● Detailed information is available in https://hub.alfresco.com/t5/alfresco-content-services-blog/exact-term-queries-in-
search-services-2-0/ba-p/302200
● Thanks @AFaust for noticing this issue: https://issues.alfresco.com/jira/browse/SEARCH-2461
Exact Search
COMMUNITY
29. 29
In previous releases, Shard State was communicated to the repository as part of the retrieval of
information from the Metadata Tracker.
That could generate problems when the Metadata Tracker cycle takes long time to execute.
A new Shard State Publisher tracker has been added in order to report the state to the repository on
regular basis.
The new configuration for this tracker includes the following property.
alfresco.nodestate.tracker.cron
If this property is not specified, default cron is applied:
alfresco.cron=0/10 * * * * ? *
ShardState Tracker
solrcore.properties
ENTERPRISE
Sharding
30. 30
DB_ID_RANGE Sharding
• When a shard goes down then search can now be restored more quickly
For more details see MNT-21591
ACS Node 1
ACS Node 2
SOLR Shard 1
DB_ID_RANGE
SOLR Shard 2
DB_ID_RANGE
Replica 1
Replica 2
ACS (alfresco-global.properties):
search.solrShardRegistry.shardInstanceTimeoutInSeconds = 30
(Historically this should be set to more like 300 seconds)
InsightEngine (solrcore.properties):
alfresco.nodestate.tracker.cron=0/10 * * * * ? *
This should be more frequent than the value set in ACS
ENTERPRISE
Sharding
31. 31
Solr Sharding Reindex
When re-indexing a living Alfresco Repository with SOLR Sharding and
solr.useDynamicShardRegistration enabled, the new SOLR Shard Indexer services should be
configured with Alfresco NodeState Tracker off.
Using this approach, the SOLR Indexer services are not registered in the living Alfresco Repository as
available SOLR Shards and the living system can operate normally.
Sharding Reindex (Demo)
https://github.com/aborroy/solr-sharding-reindex
This configuration uses two Docker Compose templates:
● living is an ACS server running 2 SOLR Shards configured with DB_ID
method and Alfresco Search Services 1.4.3
● indexer is an Indexer service running 2 SOLR Shards configured with
DB_ID method and Alfresco Search Services 2.0.0.1
ENTERPRISE
Sharding
32. 32
● Improved SOLR JDBC support
● Added support for Excel and Tableau to Alfresco Search and Insight Engine using an ODBC Driver
provided by a 3rd party company called CDATA
○ Download the driver in https://www.cdata.com/drivers/alfresco/
Alfresco
REPOSITORY
BI Tool Support
ENTERPRISE
BI Tools
33. 33
Improvements to SQL Support (JDBC & ODBC)
• Support for Date Functions in SELECT Clause
• Support for Date Functions in WHERE Clause
• Support for Date Functions in GROUP BY Clause
• Support for SQL avg(field) with multiple GROUP BY
• Support for Date Functions in ORDER BY Clause
• Support SQL TIMESTAMP format
• Support for CAST AS TIMESTAMP function
• Support for QUARTER function
• Support for DAYOFMONTH, DAYOFWEEK, DAYOFYEAR functions
• Support for TIMESTAMPADD(timeUnit, integer, datetime) function
ENTERPRISE
BI Tools
34. 34
JDBC Driver with DBVisualizer (Demo)
ENTERPRISE
BI Tools
Alfresco
REPOSITORY
>> Working JDBC Client sample is available in https://github.com/aborroy/solr-jdbc-client
35. 35
CDATA ODBC installation
The driver is simple to install on your machine and can be done using the steps on the following page:
http://cdn.cdata.com/help/SJF/odbc/
Installation and setup is a simple two-step process, to be performed on end user’s machine
1. Install the driver
2. Configure the ODBC data source
Configuration is fully documented by Cdata.
ENTERPRISE
BI Tools
36. 36
ODBC for Tableau
• Can connect to your relevant data source and portray the results in a table from the source.
• The results can be displayed by using the table directly or by entering a custom sql query to portray results specific
to what the user wants to see.
• Tableau consists of worksheets where we can build views of our data using the fields and graphs.
• Each worksheet builds the results of one query through the use of the fields.
• Can visualise our results as pie charts, bar charts, stacked bar charts, continuous line graphs and many more
• We can edit out results by applying filters within Tableau on our selected fields.
• Tableau has the ability to create dashboards to store all of our related queries on each of the sheets in one place.
• Can preview the results on different devices like desktop, tablet and more.
ENTERPRISE
BI Tools
37. 37
ODBC for Excel
• Simply start by doing a data dump into excel
• Similar process to connect to the ODBC source like Tableau where you can connect and view all the results from the
table or provide a custom sql query similar to Tableau.
• Excel gives a preview of the results before going on to displaying the results on a different sheet.
• You can filter the data before displaying the results through the preview by clicking the ‘transform’ button and then
going on to filter your data to how you want.
• You can use native excel functionality from your chosen dataset without heavily relying on SQL in comparison to
using Zeppelin.
ENTERPRISE
BI Tools
38. 38
Supported Stack
• Linux (Red Hat Enterprise v7.6 x64)
• CentOS 7 x64
• Ubuntu 18.04
• SUSE 12.0 SP1 x64
• Windows Server 2012 R2 (x64)
• Windows Server 2016
Server OS
• Solr 6.6.5
Solr
• OpenJDK 11.0.8
• Oracle JDK 11.0.1
Java
• Alfresco Enterprise Edition (ACS) 6.2
• Alfresco Community Edition 201911 GA
Alfresco Content Services
COMMUNITY
ENTERPRISE
Release notes
https://hub.alfresco.com/t5/alfresco-content-services-blog/search-services-2-0-0-release/ba-p/301308
39. 39
2.0.0.0
2.0.0.1
shared.properties
• Suggestable Properties and Cross Locale fields
• This may have an impact in the SOLR index
• Spellcheck and Tokenisation work by default
2.0.x
• Settings changed back to commented out
by default like previous versions
2.0.0.1
COMMUNITY
ENTERPRISE
42. 42
Index Checker Tool
https://github.com/AlfrescoLabs/index-checker
Simple report
$ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=false --run.fix.actions=false
Count SOLR documents = 814
Count DB nodes = 815
The database contains 2 nodes more than SOLR Index for {http://www.alfresco.org/model/content/1.0}category
SOLR indexed 1 nodes more than the existing in database for {http://www.alfresco.org/model/content/1.0}content
Count SOLR permissions = 58
Count DB permissions = 58
>> Available from Search Services 1.4.3
43. 43
Index Checker Tool
Detailed report
$ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=true --run.fix.actions=false
Count SOLR documents = 814
Count DB nodes = 815
The database contains 2 nodes more than SOLR Index for {http://www.alfresco.org/model/content/1.0}category
TYPE {http://www.alfresco.org/model/content/1.0}category: DbIds present in DB but missed in SOLR [212, 213]
SOLR indexed 1 nodes more than the existing in database for {http://www.alfresco.org/model/content/1.0}content
TYPE {http://www.alfresco.org/model/content/1.0}content: DbIds present in SOLR but missed in DB [584]
Count SOLR permissions = 58
Count DB permissions = 58
Batches of
1,000
elements
44. 44
Fix actions
$ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=true --run.fix.actions=true
Count SOLR documents = 814
Count DB nodes = 815
...
No Database Rows Were Harmed in the Fixing of This Solr Index
$ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=false --run.fix.actions=false
Count SOLR documents = 815
Count DB nodes = 815
Index Checker Tool
>> Watch the living demo in https://youtu.be/YU-WyNgCH2U