Synchronizing Clusters in Fusion: CDCR and Streaming Expressions

STAY CONNECTED
Twitter @activate_conf
Facebook @activateconf
#Activate19
Log in to wifi, follow Activate on social media,
and download the event app where you can
submit an evaluation after the session
WIFI NETWORK: Activate2019
PASSWORD: Lucidworks
DOWNLOAD THE ACTIVATE 2019 MOBILE APP
Search Activate2019 in the App/Play store
Or visit: http://crowd.cc/activate19

Today’s speaker…
Who is the hippie mad scientist giving this talk?
PA U L A N D E R S O N
Information Architect
Dynatrace
Synchronizing Clusters in Fusion: CDCR and Streaming
Expressions
Synchronizing a search application across multiple clusters is a complex
challenge and the solution evolves with our tools (Solr and Fusion
AppStudio). Paul Anderson discusses how Dynatrace's cluster
synchronization strategy changed over the last two years to ensure that
customers worldwide have a consistent search experience. The talk
focuses on two Solr features, CDCR, and Streaming Expressions,
explaining what they do well, where they fall down, and where they need
to improve. Paul also covers how to modify your index pipelines and
signal aggregations to support cluster synchronization.

Dynatrace is software intelligence built
for the enterprise cloud
Go beyond APM with the Dynatrace all-in-one platform
Application
performance monitoring
Cloud infrastructure
monitoring
AIOps
Digital experience
management
Software Intelligence Platform

Dynatrace is the clear leader
#1
#1
Ecosystem
25
Gartner APM highest ability to execute and furthest
completenessof vision
MajorReleasesperYear
Employees2000+

Agenda
• Why multiple clusters?
• What needs to be synced?
• How to sync?
• How to monitor the sync?
• Q & A

Multiple clusters enhance performance
Search Apdex with one cluster in US
Apdex is an open standard for measuring performance of software applications in computing (https://en.wikipedia.org/wiki/Apdex).
Search Apdex with two clusters, US and EU

Multiple clusters support failover
us-east-1
Region Datacenter
eu-west-1
Region Datacenter
Incoming
Search
Traffic

First, the obvious stuff…
• Infrastructure (same for on-premises or in the cloud)
• Installed applications
– Java
– Fusion
• Your Fusion application
– Some cluster-specific differences in fusion.properties, solrconfig, etc.
• With the obvious out of the way… what about collection data?

Your search data…
• Search index
– Can’t you just index independently in each cluster?
– Sure, but indexing is expensive
– Recent test showed that the only search requests that weren’t fast were during a crawl.
– Syncing the index is preferable
• Signal data
– Really? Why?

Why sync signals?
Signals power several aspects of relevance boosting
• They provide click counts for determining popularity
– Perhaps a slight impact, based on locale-based differences in user preference
• They can power re-ranking algorithms
• They can serve as the ground truth for our learning-to-rank models
• If you want consistent results across clusters, signals should be synced

Why sync signals?
Personal boosting example
us-east-1
Region Datacenter
eu-west-1
Region Datacenter
Incoming
Search
Traffic

Search sync to-do list
• Main document collection (search index)
• Signals
– In both directions (Yikes!)
• Anything else?
– User permission data? Transient, short lived cache, no big advantage.
– System logs? Heck no, that needs to remain unique.
– Aggregated signal data? Easy to regenerate, no big advantage.
– Bueller?

How to sync the search index?
Three options: none of them perfect…
• Use Solr’s Cross Data Center Replication service (aka. CDCR)
– Configure one cluster as the source, the other as the target.
– Crawl in the source cluster, all changes (adds, updates, deletes) replicated to target.
• Set up separate crawl schedules in each cluster
– Doing each crawl twice; not ideal.
– Shouldn’t crawl in both clusters at the same time (leave one fully available for queries).
– More chance for minor differences in search index.
• What about streaming expressions?
– Negative; can’t handle deletions.
• Which one to pick…?

Isn’t CDCR the better option?
It can be, but…
• We used unidirectional CDCR with great success for over a year
• We introduced new datasources with different update logic with no crawl DB
– Delete existing docs first…
– Then recrawl the same docs (single fast REST call)
• CDCR stopped working... and never worked correctly again
• Suggestion:
– Try CDCR in a test environment (two test clusters)
– If it works, try it in production…
– …but have a crawl schedule for each cluster ready to go.

Search index recommendation
• Make crawl schedules for your clusters, even if you plan to try CDCR first
– Avoid crawl schedule overlap between clusters for maximum performance
– A traffic policy based on latency and cluster health is a really good idea
• If CDCR fails, enable the crawl jobs in the target cluster
• More about the future of CDCR later

How to sync signals?
• That depends on your Fusion version…

Signal sync in Fusion 3.x
Nothing but click signals to worry about…
US Twigkit/AppkitEU Twigkit/Appkit Other web properties (API)
US _signals_ingest
index pipeline
US Signals Collection (Solr)
US Signals Aggregation Job
US Signals Aggr Collection
(Solr)
EU Signals Collection (Solr)
EU Signals Aggregation Job
EU Signals Aggr Collection
(Solr)
Unidirectional CDCR
Unidirectional CDCR
Tried this instead.

But I thought CDCR was bad?
• Signals collections are all about adding
– Never an update
– Rarely a delete (periodic history cleanup for GDPR)
• In this scenario, we found that CDCR can be relatively stable

Signal sync in Fusion 4.x
Click signals and response signals and session signals, oh my…
US Appkit/AppstudioEU Appkit/Appstudio Other web properties (API)
US _signals_ingest
index pipeline
US Signals Collection (Solr)
US Signals Aggregation Job
US Signals Aggr Collection
(Solr)
EU Signals Collection (Solr)
EU Signals Aggregation Job
EU Signals Aggr Collection
(Solr)
Bidirectional CDCR (?)
EU _signals_ingest
index pipeline
US Query PipelineEU Query Pipeline
EU Session Rollup
Job
US Session Rollup
Job
Challenging part!

Perfect job for bidirectional CDCR?
Sadly, no.
• Bidirectional CDCR was designed for:
– Easy failover,…
– …without having to edit your Solrconfig.
– Source and target can swap their behavior automatically
• Activate 2018 discussions…
– Since signals are always additive…
– …and neither cluster will create the same signal ID…
– …it… should… work!
• But it doesn’t

Bidirectional CDCR’s fatal flaw
• The source/target swap logic is not very fault tolerant.
• In a test environment, it works quite nicely.
• In a production environment, under load, it quietly stops working,…
– Replicating in one direction only,…
– And starts accumulating tlog files like a banshee.

The verdict on CDCR
• Simple unidirectional implementations can work, but…
• It’s too fragile
• It fails for unexplained reasons
• It doesn’t support the Solr authentication or authorization plugins
• Any bidirectional implementation is bound to fail
• Solr committers admit that it has serious design flaws
• I can’t recommend it right now.
• Just say no… for now

Hope for CDCR
Dying, but due for resuscitation
• A band of Lucidworks developers want to champion fixes for CDCR
• They need your help to gather buy-in from the rest of the Solr project
• If you need/want CDCR to be righteous, let the Solr project know

Streaming Expressions to the rescue!

What are streaming expressions?
• The gospel according to the Solr doc:
Streaming expressions are a suite of functions that can be combined to
perform many different parallel computing tasks.
• How many of you are are already using streaming expressions?
• What do we need for our signal sync scenario:
– We need one streaming expression to push new signals from the US cluster to the EU cluster.
– We need one streaming expression to push new signals from the EU cluster to the US cluster.
• We’re going to leverage:
– A Topic stream source, nested inside…
– An Update stream decorator, nested inside…
– A Daemon stream decorator.

Configuring the Topic
First, a supporting requirement
• The topic is a stream source, in this case, a query into the signals collection
• We only want to query signals that originated in the US
– Remember, we’re going to create another streaming expression to send EU signals to the US
– We don’t want the new signals being passed back and forth forever
• In Fusion, create field in the _signal_ingest index pipeline that sets the cluster.
– useast1
– euwest1
– From a source control standpoint, this is a difference between the clusters.

Second, another supporting requirement
• Session signals (rollup aggregation) are written back into the signals collection
• If you want them synced, they need to have a cluster field value
– From a source control standpoint, this is a global change common to all clusters.
WITH session_agg AS (
SELECT COUNT(1) AS activity_count,
MIN(timestamp_tdt) AS start,
MAX(timestamp_tdt) AS end,
timediff(MAX(timestamp_tdt), MIN(timestamp_tdt), "MINUTES") AS duration,
'session' AS type,
first(user_id) AS user,
first(cluster) AS cluster,
session_keywords(query) AS keywords,
session
FROM ${inputCollection}
WHERE timestamp_tdt IS NOT NULL
AND type != 'session'
AND session IS NOT NULL
AND session NOT IN (SELECT session FROM ${inputCollection} WHERE type = 'session' AND session IS NOT NULL)
GROUP BY session
HAVING timediff(current_timestamp(), MAX(timestamp_tdt), "SECONDS") >= ${elapsedSecsSinceLastActivity} OR timediff(current_timestamp(),
MIN(timestamp_tdt), "SECONDS") >= ${elapsedSecsSinceSessionStart})
SELECT activity_count, start, end, duration, type, user, cluster, keywords, session FROM session_agg

• A collection to store the stream progress
(checkpoints)
• The collection to query
• The query to execute (cluster specific)
• The fields to return (all)
• An ID to associate with the stored
checkpoints
topic(stream_checkpoints,
my_signals,
q="cluster:useast1",
fl="*",
id="signals_topic"
)
The expression itself, in order of parameters

• A collection to store the checkpoints for the
stream
• You'll need to create this collection before
you run start the streaming expression
daemon.
• I used a two-shard, single replica
collection for this purpose, but it could
have easily been single shard and two
replica.
• In this example, the collection is
stream_checkpoints. If you have multiple
streams configured, you'll want a more
descriptive name.
my_signals,
fl="*",
id="signals_topic"
)
Checkpoints collection, part 1

• The checkpoint “document” in the
stream_checkpoints collection is shown
to the right.
• The id for the topic (signals_topic)
identifies the checkpoints, so you can use
a single checkpoints collection to store all
the checkpoints for the topics in your
cluster.
• One checkpoint is stored per shard and it
is the version number of the last
processed document; version numbers
always go up.
{
"id":"signals_topic",
"checkpoint_ss":[
"shard2~1643423444105166848",
"shard1~1643424248539119616"
],
"_version_":1643424257580990464}]
}
Checkpoints collection, part 2

• The target collection for the query:
my_signals
• No zkhost string is necessary since the
target collection is local topic(stream_checkpoints,
my_signals,
fl="*",
id="signals_topic"
)
Target collection

• The query to run against the target
collection
• Query should leverage the new cluster
field we added to:
– The _signals_ingest index pipeline
– The session rollup aggregation job
my_signals,
fl="*",
id="signals_topic"
)
Query to execute

• The fields to return in the results.
– For signals, return them all: "*".
• Note that we don't have to exclude the
_version_ field. After the push to the
target cluster, the same signal will have
the same ID in both clusters but different
_version_ field values.
my_signals,
fl="*",
id="signals_topic"
)
Fields to return

• A name for the signals topic, used to
identify it's checkpoints in the checkpoints
collection.
• It appears in the checkpoints collection as
seen below.
my_signals,
fl="*",
id="signals_topic"
)
Topic ID
{
"id":"signals_topic",
"checkpoint_ss":[
"shard2~1643423444105166848",
"shard1~1643424248539119616"
],
"_version_":1643424257580990464}]
}

One note about Topics
• In the Solr doc on topics, you'll notice the following warning:
The topic function should be considered in beta until SOLR-8709 is
committed and released.
• This has to do with the possibility of out-of-order version numbers that
would make the topic miss certain documents because a new document
appeared with a lower version number than in the checkpoint for the last
execution.
• My spy network of Solr committers reports that several efforts have been
made to break topic with out-of-order version numbers, but nobody has
been successful.
• In other words, nothing to see here..
Solr doc needs an update…

Configuring the Update
This sets the target of the results returned by
the topic, which we want to be the same
collection in the target cluster.
• The collection to write to (my_signals)
• The batch size.
• The zkhost string for the target cluster,
including solr path
• The topic expression we created earlier.
update(my_signals,
batchSize=500,
zkHost="10.123.1.7:9983,
10.123.1.8:9983,
10.123.1.9:9983/lwfusion/4.1.2/solr",
my_signals,
fl="*",
id="signals_topic“
)
)

• The collection to write to (my_signals)
• The collection must already exist in the
target cluster
• Unlike CDCR configurations, you're not
required to maintain the same number of
shards and replicas for this collection
across clusters.
– I do anyway, but you don't have to.
update(my_signals,
batchSize=500,
zkHost="10.123.1.7:9983,
10.123.1.8:9983,
10.123.1.9:9983/lwfusion/4.1.2/solr",
my_signals,
fl="*",
)
)
Target collection

• The number of documents in each batch
sent.
• Signals are small, so I set it to 500.
• I like to set this in concert with the
runInterval (described later) to, if
possible, process all new signals in a
single batch.
update(my_signals,
batchSize=500,
zkHost="10.123.1.7:9983,
10.123.1.8:9983,
10.123.1.9:9983/lwfusion/4.1.2/solr",
my_signals,
fl="*",
)
)
Batch size

• The zkHost string of the target cluster,
including any solr path specification.
• Make sure you open up the Zookeeper
port 9983 between your clusters.
Note: zkhost shown on multiple lines for
convenience; don’t break it up.
update(my_signals,
batchSize=500,
zkHost="10.123.1.7:9983,
10.123.1.8:9983,
10.123.1.9:9983/lwfusion/4.1.2/solr",
my_signals,
fl="*",
)
)
zkHost string

Configuring the Daemon
A daemon decorator that wraps the update
decorator and topic stream source
• Give the daemon an ID
• How often to run?
• Whether to keep running
• The update and topic we configured earlier
daemon(id="signals_daemon",
runInterval="10000",
terminate="false",
update(my_signals,
batchSize=500,
zkHost="10.123.1.7:9983,
10.123.1.8:9983,
10.123.1.9:9983/lwfusion/4.1.2/solr",
my_signals,
fl="*",
)
)
)

• The id of the daemon: signals_daemon
• This name will appear in subsequent
action list requests that report on the
status of each daemon (below).
• If you have multiple daemons, you would
want a more descriptive name.
terminate="false",
update(my_signals,
batchSize=500,
zkHost="10.123.1.7:9983,
10.123.1.8:9983,
10.123.1.9:9983/lwfusion/4.1.2/solr",
my_signals,
fl="*",
)
)
)
Daemon ID
{
"result-set":{
"docs":[{
"startTime":1567272917772,
"stopTime":0,
"id":"signals_daemon",
"state":"TIMED_WAITING",
"iterations":888643}
,{
"EOF":true}]}}

• The run interval in milliseconds: 10000
• This is how often the daemon will run the
topic query and send the results to the
specified target in the update decorator.
• I like to set this in concert with the
batchsize (bold) to, if possible, process all
new signals in a single batch.
terminate="false",
update(my_signals,
batchSize=500,
zkHost="10.123.1.7:9983,
10.123.1.8:9983,
10.123.1.9:9983/lwfusion/4.1.2/solr",
my_signals,
fl="*",
)
)
)
Run interval

• Whether the daemon terminates: false
• If true, the daemon will stay resident, but
will only run the topic query and send
results once.
• To keep running at the interval, set this to
false.
terminate="false",
update(my_signals,
batchSize=500,
zkHost="10.123.1.7:9983,
10.123.1.8:9983,
10.123.1.9:9983/lwfusion/4.1.2/solr",
my_signals,
fl="*",
)
)
)
Termination(?)

Starting the Daemon
curl http://localhost:8983/solr/stream_daemon_host/stream -d 'expr=
terminate="false",
update(my_signals,
batchSize=500,
zkHost="10.123.1.7:9983,10.123.1.8:9983,10.123.1.9:9983/lwfusion/4.1.2/solr",
my_signals,
fl="*",
)
)
)
'
The entire request, as run from the instance hosting daemon

Starting the Daemon
• Make a request to the stream API for a given collection and attach the full
daemon expression as a payload.
– The entire request must be one line (mind your whitespace).
• The collection (stream_daemon_host) is where the daemon will be created
as a new thread for that collection.
What are we actually doing?
curl http://localhost:8983/solr/stream_daemon_host/stream -d 'expr=
terminate="false",
…
)
'

Starting the Daemon
• The temptation is to use the same collection that you're querying.
• If you only specify a collection name, Solr will randomly select a specific
shard and replica for the daemon thread to attach to but it won't tell you
which one.
• When you subsequently make a request for daemon status via a stream
action list, Solr will, again, randomly select a shard/replica and send the
request to that shard/replica.
• Consequently, you can successfully start a daemon and subsequently can't
find it.
Avoiding the vanishing daemon, part 1

Starting the Daemon
• Possible solution: specify a specific shard/replica in your request
– Just hope that you don't delete/re-create that replica later.
• Better solution: Create a single-shard, single-replica collection exclusively to
host the daemon (create it before you run the daemon).
– That way, all requests to the daemon host collection are consistent.
• I usually create this collection on the same instance in the cluster where I'll
be running the daemon.
• I like to put the daemon start code in a shell script and then run that script
from our Jenkins pipeline during builds.
Avoiding the vanishing daemon, part 2

Checking the search index sync
• Monitor the document counts per cluster
– OK: Entire collection
– Better: By datasource
• I wrote a shell script that:
– Performs a query on each datasource in each cluster: _lw_data_source_s:my_datasource_name
– Parses out the numFound number
– Adds all the counts to a row in a csv file that matches an Excel report spreadsheet we have
• Backlog item: Extend this process to a report in our Business Intelligence system

Checking the signal sync
• Monitor the signal counts in each cluster
– Timing of the daemon intervals means the total is rarely exactly equal
• That shell script (previous slide) also captures signal counts
Keep one eye on the collection…

Checking the signal sync
• Monitor the daemon status with a Stream API call
http://host-name-or-ip:8983/solr/stream_daemon_host/stream?action=list
• If you're running in on the local instance:
http://localhost:8983/solr/stream_daemon_host/stream?action=list
Keep the other eye on the streaming expression daemons…

Healthy daemon response
• The id is the daemon ID from your
daemon decorator fields
• The startTime and stopTime are UNIX
epoch dates (down to milliseconds)
• A stopTime of 0 means the daemon is still
active and running
• The state can be WAITING (no new
signals) or TIMED_WAITING (between
intervals)
• The iterations are the number of
documents sent.
{
"result-set":{
"docs":[{
"startTime":1567272866535,
"stopTime":0,
"state":"WAITING",
,{
"EOF":true}]}}

Terminated daemon response
• A state of TERMINATED, combined with a
non-zero stopTime, means the daemon
has failed for some reason.
• We’ll talk about responding to this status a
little later.
{
"result-set":{
"docs":[{
"startTime":1564408742092,
"stopTime":1566732689909,
"state":"TERMINATED",
,{
"EOF":true}]}}

How does the daemon
process signals?

Typical daemon startup activity
• Starting state:
– There are existing documents (signals) in the collection
– There are no existing checkpoints in the checkpoint collection
• Daemon actions:
– Set the checkpoints to NOW
– Wait for the next interval

Normal daemon process interval
• Starting state:
– There are existing checkpoints in the checkpoint collection
– There are new signals added since the last checkpoint
• Daemon actions:
– Query for new documents (signals) added since the last checkpoint
– Send those documents to the target cluster
– Update the checkpoints to the last signal sent (tracked by shard)

Daemon restart activity
• Starting state:
– There are existing checkpoints in the checkpoint collection
– There are new signals added (very likely) since the last checkpoint
• Daemon actions:
– Query for new documents (signals) added since the last checkpoint
This is what happens after a daemon failure and you restart the daemon

Daemon bootstrap copy
• Starting state:
• Intervention actions:
– Delete any stored checkpoints for this topic in the checkpoint collection (if they exist)
– Start the daemon with an extra initialCheckpoint=0 parameter in the topic (below)
• Daemon actions:
– Query for all documents (signals) in the collection
• Bulk copy is faster (an arguably safer) than this
Process to copy entire collection to the target cluster
my_signals,
fl="*",
id="signals_topic",
initialCheckpoint=0
)

Streaming expressions Rock!
Any downsides?
• Streaming expressions do not yet support:
– Solr authentication plugin
– Solr authorization plugin

Daemon occasionally fail
• Streaming expression daemon failures are not very common.
– Five (5) failures in five (5) months
• All our failures have been due to Zookeeper connection timeouts.
– We doubled our ZK timeout from 30 to 60 seconds
– This helped reduce failures, though 60 seconds seems excessive
• The process for recovering from a failure is really easy:
– Restart the daemon
– The persisted checkpoints allow updates to continue from where the topic left off

And they can self-heal…
• Run a self-heal shell script as a cron job on the same instance as the daemon
– Do this in each cluster
• The script:
– Runs a test and see if the daemon is TERMINATED
– If it is not, log an OK message
– If it is TERMINATED, log the failure, and restart the daemon

Self-heal script for the daemon

Expanding the number of clusters

Adding another cluster
us-east-1
Region Datacenter
eu-west-1
Region Datacenter
q="cluster:useast1
"
q="cluster:euwest1"
ap-southeast-1
Region Datacenter
q="cluster:useast1 OR
cluster:apsoutheast1" q="cluster:euwest1
OR
cluster:useast1"
q="cluster:apsoutheast1
OR cluster:euwest1"

Summary
• Sync search index with crawl schedules in each cluster
– Until CDCR 2 comes out
• Sync signals with streaming expression daemons
– Unless you have to use the Solr authentication or authorization plugins

Questions & (hopefully) Answers

Synchronizing Clusters in Fusion: CDCR and Streaming Expressions

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Synchronizing Clusters in Fusion: CDCR and Streaming Expressions

Ähnlich wie Synchronizing Clusters in Fusion: CDCR and Streaming Expressions (20)

Mehr von Lucidworks

Mehr von Lucidworks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Synchronizing Clusters in Fusion: CDCR and Streaming Expressions

Hinweis der Redaktion