SlideShare ist ein Scribd-Unternehmen logo
1 von 33
SOLR CDCR on AWS
Nishant S Karve
SearchStax
X
• Introductions
• Disaster Recovery (DR), RTO and RPO
• Apache Solr and State of CDCR
• Test scenarios and Assumptions
• CDCR Architecture and Cross region VPC peering and SOLR
configurations.
• Demonstrate CDCR
• Observations
• Questions
Agenda
X
Sr. Consultant with SearchStax (formerly known as Measured
Search) since December 2015. 

Serving as a Search Engineer with Allstate Insurance company in
their SOLR Enterprise Search team. 

Previous clients include United Airlines, US Bank.

Extensive experience with middleware applications such as
TIBCO, IBM Websphere, etc.

SearchStax company information : https://www.searchstax.com/

The company was named one of the “Top 20 Open Source
Software Solutions for 2017” by CIOReview Magazine
Nishant Karve
About Me
X
Why do we need DR plans?
1. You’re only as strong as your weakest link.
An ideal disaster recovery plan would place your production
servers in a top tier data center with no single point of failure on
the power and network connections.

2. Customer retention is costly after a DR.
While on average it’s much cheaper to retain a customer then to
acquire a new one, re-acquiring an old customer after an IT
disaster is very expensive. 

3. Customers expect perfection.
With ever increasing competition and the varied choices
available to the customer, we are nearing a phase where the
customer expects perfection from your online service.

4  Machines and hardware fail.

With the highly distributed nature of computing it’s quite
obvious that machines will fail. Fried motherboards, faulty
network switches, corrupted hard drives all contribute to a
disaster.
X
Any talk about Disaster recovery is incomplete without discussing about RPO and RTO

RPO (Recovery Point Objective): Is focused on data and your company’s loss tolerance in relation to
your data. It is determined by looking at the time between backups and the amount of data that could
be lost in between your backups.

RTO (Recovery Time Objective): Is the target time you set for the recovery of your IT and business
activities after a disaster has struck. The goal here is to calculate how quickly you need to recover,
which can then dictate the type of DR classification (Tier 0 - Tier 7, tier 0 indicates no off site data and
hence possibly no recovery).

While they may be different, both metrics need to be considered to develop an effective DR plan.
Time
Last known good
copy of data
DR Initiated Data Restored
Normal Business resumed
RPO RTO
RTO and RPO
X
What is Apache Solr?
Solr is the popular, blazing-fast, open source
enterprise search platform built on Apache
Lucene™
X
finding
tickets
finding job
finding restaurant/services
Enterprise
Search
Media
Search
Retail
Customer
Search
Fraud
Analytics
Publishing
RecruitingTravelResearch
Business
Intelligence
Where is Apache Solr Used?
X
CDCR Introduction
• Disasters strike without notice. IT Companies are always prepared with a redundant copy of their
database(s) in one or multiple databases on a secondary site, possibly far away from the primary.
• Cross data center replication, also known as XDCR, is about dual data center writes to ensure business
continuity during a disaster.
• DR plans are extremely important for a consistent user experience and customer retention. Customer
retention is easier than acquiring new customers.
• CDCR can also be used for replicating a subset of your Production data to provide a production like test
environment for your developers and testers.
X
CDCR prior to out of box support in Apache Solr
DC1
DC2
1
2
Partition	1
Partition	2
Partition	3
Partition	4
Partition	1
Partition	2
Partition	3
Partition	4
Client
APPLICATION
X
1. Onus on Application to write to both data centers is
taken away.

2. Synchronization of data happens out of the box between
two Solr clusters.

3. Bi-directional is supported with minimal configuration
changes.
4. Multiple collections can also be replicated
5. Asynchronous data transfer
CDCR Support in Apache Solr post 6.6.x
X
• Test A: CDCR on AWS: Across 2 regions (Virginia and Ohio)
using AWS provided VPC peering.

• Test- B: CDCR on AWS: Within the same region but different
availability zones

• CDCR on premise. Out of scope for this discussion. However,
the solution works perfectly on on-premise DC
CDCR Scenarios
X
Several assumptions were made while testing CDCR on AWS. They are outlined as follows.

• SOLR 7.2 was used for evaluation.

• Bidirectional CDCR, which is a new offering with SOLR 7.2 was tested.

• A single node SOLR cluster with an external zookeeper was used for each region for cross region CDCR.
For test within the same region two separate clusters were used. 

• VPC peering is not available for all AWS regions. Inter-Region VPC Peering is available in AWS US East
(N. Virginia), US East (Ohio), US West (Oregon) and EU (Ireland). Virginia and Ohio were used as two data
centers with VPC peering enabled between them. 

• No performance test was done around the CDCR process. 10,000 documents were indexed using curl
and basic out of the box settings for CDCR. 

• Custom VPC’s were created with different CIDR ranges since the default VPC CIDR ranges in Virginia,
Ohio and Ireland overlapped. As per AWS you cannot VPC peer two regions if their CIDR ranges overlap.
Assumptions
X
Test A- Cross region CDCR-AWS Setup
For the bidirectional CDCR to work cross region the two regions have to be VPC peered. VPC peering
is required to ensure that data could flow through the CDCR replicator across the network. This
section describes the AWS setup required to VPC peer two regions.
X
AWS Setup
• Step 1: Create a VPC (CIDR 10.0.0.0/16) , a public subnet, attach an Internet gateway to the
VPC and spin two EC2 instances in the Virginia region. One EC2 instance will host the
zookeeper on 2181 and the other EC2 instance will host the SOLR instance on 8983.
• Step 2: Create a VPC (CIDR 20.0.0.0/16), a public subnet, attach an Internet gateway to the
VPC and spin two EC2 instances in the Ohio region. One EC2 instance will host the
zookeeper on 2181 and the other EC2 instance will host the SOLR instance on 8983.
• Step 3: Create security groups in each region to allow inbound traffic on the following ports.
Port 22: For administering the EC2 instance. For Production, ensure that you allow traffic only
for the IP addresses that are authorized to install SOLR and Zookeeper on the EC2 instance.
Port 2181: For Zookeeper traffic
Port 8983: For SOLR traffic
Ephemeral traffic for all TCP ports between 0-65535 for the Security group itself.
X
VPC Peering..continued
• From the Virginia VPC generate a VPC peering request
• Head on to the Ohio VPC and accept the peering request.
• Once the peering request is accepted a Peering connection ID will be generated. Use this ID in each
regions Route table along with the CIDR range. This will ensure that traffic will be accepted cross
region from the IP’s within the CIDR range.
X
VPC Peering..continued
• Following is the route table for Ohio VPC. 10.0.0.0/16 is the CIDR for Virginia region.
• Following is the route table for Virginia VPC. 20.0.0.0/16 is the CIDR for Virginia region.
X
Testing Cross region setup
• Before you start installing SOLR and Zookeeper on the EC2 instances ensure that you are able to ping the
private IP addresses of each EC2 instance cross region. This will save you a lot of troubleshooting headache
later on. Please note: Since the Security group does not have a rule to allow inbound traffic on ICMP, ping
may not function. For testing reasons allow ICMP traffic. Once the test is complete disable the rule.
• It’s very critical that all EC2 instances can see each other for a successful CDCR. For on-premise networks
ensure all firewalls, ACL’s and other settings are in place before proceeding with the CDCR setup.
Duplicate the following on both regions.
1) Ensure that Java 1.8 is installed on both machines
2) Install SOLR 7.2 on one EC2 instance. Ensure that the /etc/hosts file on the EC2 instance reflects the
public IP of the machine.
3) Install Zookeeper 3.4.6 on another EC2 instance.
4) Start the Zookeeper instance on port 2181. If you would like to start the zookeeper on a different port,
adjust the security groups accordingly.
5) Start the SOLR instance in a cloud mode (SOLR cloud) on port 2181. If you would like to start the SOLR
instance on a different port, adjust the security groups accordingly. CDCR does not work if the SOLR
deployment mode is stand alone.
Install SOLR and Zookeeper
X
CDCR setup
For the test, default configuration files were used for creating a core. Following modifications were done to
the solrconfig.xml to enable it for CDCR. On a very high level for CDCR to function the idea is to allow the
Virginia cluster zookeeper talk to the Ohio cluster zookeeper and vice versa.
For the sake of this conversation, let’s assume that the Virginia DC is our Source DC for indexing and the
Ohio DC is the target. As per the official SOLR documentation only one DC can act as a source for
indexing documents at a given time. If for any reason a decision is made to flip the primary data center,
then the new source for indexing will be the Ohio data center. For search queries, both data centers can
be used at the same time. The setup is more like an Active-Passive setup than an active-active DR cluster.
Virginia DC Ohio DC
ZKHostZKHost
SOLR SOLR
PUSH
SOURCE TARGET
X
Virginia DC setup..solrconfig.xml
• Make the following changes in the solrconfig.xml for the Virginia cluster. This will allow to enable CDCR from Virginia to
Ohio cluster through the VPC peered network.
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="cdcr-processor-chain">
<processor class=“solr.CdcrUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
<lst name="replica">
<str name=“zkHost"><<OhioZK:2181>></str>
<str name=“source">music</str> <!— Source collection in Virginia —>
<str name=“target">music</str> <!— Target collection in Ohio —>
</lst>
<lst name="replicator">
<str name="threadPoolSize">8</str>
<str name="schedule">1000</str>
<str name="batchSize">128</str>
</lst>
<lst name="updateLogSynchronizer">
<str name="schedule">1000</str>
</lst> <!- - Missing in the documentation - - >
</requestHandler>
X
Virginia DC setup. continued
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog class="solr.CdcrUpdateLog">
<str name="dir">${solr.ulog.dir:}</str>
<!--Any parameters from the original <updateLog> section -->
</updateLog>
</updateHandler>
• Make the following changes in the solrconfig.xml for the Ohio cluster. This will allow to enable
CDCR from Ohio to Virginia cluster through the VPC peered network.
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name=“cdcr-processor-chain">
<processor class=“solr.CdcrUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
<lst name="replica">
<str name=“zkHost"><<VirginiaZK:2181>></str>
<str name=“source">music</str> <!— Source collection in Virginia —>
<str name=“target">music</str> <!— Target collection in Ohio —>
</lst>
Ohio DC Setup..solrconfig.xml
X
Ohio DC Setup.continued
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog class="solr.CdcrUpdateLog">
<str name="dir">${solr.ulog.dir:}</str>
<!--Any parameters from the original <updateLog> section -->
</updateLog>
</updateHandler>
Ensure that openSearcher in the solrconfig.xml is set to true if you want to make the documents
searchable once they are committed to the index.

• Data sourced from https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking/data.

• This dataset contains the daily ranking of the 200 most listened songs in 53 countries from 2017 and
2018 by Spotify users. I have chosen 10,000 such records for demo
track Name of the song
artist Artist who performed it
streams Total number of streams
url URL on spotify
era Release date
region region code
X
Enabling CDCR
• In the previous slides we enabled the configuration files for CDCR. SOLR offers a CDCR API to
interact with the CDCR handler added in the solrconfig.xml. This CDCR API allows us to set the
direction of the CDCR, check for the status of the CDCR, disable buffer, check for CDCR logs and
errors. Let’s enable CDCR from Virginia to Ohio.
• http://<<VirginiaSOLR:8983>>/solr/music/cdcr?action=DISABLEBUFFER
http://<<OhioSOLR:8983>>/solr/music/cdcr?action=DISABLEBUFFER
http://<<VirginiaSOLR:8983>>/solr/music/cdcr?action=START
• Nothing else needs to be done in the target data center.
This will set the Virginia cluster as the Primary data center where the indexing queries should go.
Once this cluster receives the updates, they will be forwarded to the Ohio data center through the
“replica” element setting in the solrconfig.xml. The search queries can occur on both data centers.
• The documents once indexed in Virginia cluster should be available in the Ohio cluster. The replication
is largely dependent on the AWS backbone (due to VPC peering). I have noticed that the document is
available within 100 ms in the other region. I have tried to replicate 10,0000 documents using curl and
didn’t see any performance degradation.
X
CDCR tips
While CDCR is a great way to ensure that you always have a back up of your primary DC, it does
come with some limitations.
• CDCR is unlikely to be satisfactory with bulk operations
• CDCR works robustly when the Source and Target data centers have the same number of shards in
the collection.
• Running CDCR with the indexes on HDFS is currently not supported. There is an ongoing JIRA
issue for the same
• Configuration files are not automatically synched between data centers as previously mentioned.
• Always stop the CDCR process if your backup data center is going to be out of service for an
indefinite amount of time.
X
CDCR Tweaks
While the document clearly states that CDCR indexing operations should occur only on one data center
at a time (Active-Passive), I tried enabling CDCR from both directions.
Disclaimer: Before you implement this in your production cluster make sure that you understand the full
implications of enabling CDCR across both regions. I tested with around 10,000 documents that were
indexed at the same time on both clusters. I was able to successfully see 20,000 documents in each
cluster indicating that the data was successfully replicated in either cluster. This demonstrates that an
active active setup is possible and works well. However, a proper performance test should be conducted
with your use case to guarantee safe operation.
To enable CDCR on both clusters perform the following actions
On Virginia cluster
http://<<VirginiaSOLR:8983>/solr/music/cdcr?action=DISABLEBUFFER
http://<<VirginiaSOLR:8983>/solr/music/cdcr?action=START
On Ohio cluster
http://<<OhioSOLR:8983>/solr/music/cdcr?action=DISABLEBUFFER
http://<<OhioSOLR:8983>/solr/music/cdcr?action=START
Index a document on both data centers and they should get replicated across.
X
Why stop here?
• The current flavor of CDCR supports replicating data to one or more target data centers. This
opens up a plethora of opportunities for interesting setups. Here is what I tried
Virginia Data center
Production Primary
Ohio Data Center
Production Secondary
Ireland Data Center
OR
On-Premise Data center
zkHost (Ohio)
zkHost (Ireland)
zkHost (Virginia)
zkHost (Ireland)
VPC Peering
VPC Peering VPC Peering
zkHost (Ireland)
Clients that use
Ireland or on-
premise cluster
for research and
analytics
REPLICA 1
REPLICA 2
REPLICATOR
When indexing occurs in Virgina it’s replicated to the
Ohio cluster and the Ireland cluster.
In the event of an outage the CDCR direction is
flipped from Ohio to Virginia using the CDCR API.
The data indexed from Ohio to Virginia is available in
Ireland as well.
The data in Ireland can be used by your Research
and Analytics group.
In rare scenarios it can also be used as a secondary
backup.
X
Setup
The setup to replicate data to 2 data centers from a
source data center is pretty straightforward.
Repeat the replica element in the source data center
for the clusters you want the data synched too.
If the CDCR direction is flipped and Ohio becomes
the new primary, the requirement is to sync data to
Virginia and Ireland.
Hence the solrconfig.xml files for Virginia and Ohio
have the zookeeper settings for Ireland as well.
Ireland cluster is purely used as either a second
back up cluster or a cluster that can be used for
research and analytics.
Ensure that you create the appropriate collections on
each data center.
VPC peering needs to be done between Virginia and
Ohio, Ohio and Ireland and Virginia and Ireland.
Ensure that the CIDR block ranges in all 3 VPC’s
don’t overlap before VPC peering them.
Virginia solrconfig.xml
<lst name=“replica">
<str name=“zkHost"><<OhioZK:2181>></str>
<strname=“source">music</str>
<str name=“target">music</str>
</lst>
<lst name=“replica">
<str name=“zkHost"><<IrelandZK:2181>></str>
<strname=“source">music</str>
<str name=“target">music</str>
</lst>
Ohio solrconfig.xml
<lst name=“replica">
<str name=“zkHost"><<VirginiaZK:2181>></str>
<strname=“source">music</str>
<str name=“target">music</str>
</lst>
<lst name=“replica">
<str name=“zkHost"><<IrelandZK:2181>></str>
<strname=“source">music</str>
<str name=“target">music</str>
</lst>
X
Example of a sync between two data centers
and your corporate data center.
X
Few pointers
There are certain rules to follow while replicating data across regions.
1. The 3 peered VPC regions cannot replicate data in a transitive fashion. If Virginia is peered to
Ohio, Ohio is peered to Ireland and Ireland in turn is peered to Virginia, data added in Virginia
will not end up back in Virginia again. Cross region replicated data only moves a single hop. In
the above scenario, if data is added to the Virginia cluster it will end up to the directly peered
regions. This data will not be forwarded to any other regions that are peered downstream to
the direct ones.
2. In the scenario below, documents indexed in Virginia will not end up in California or Ireland via
Ohio.
Virginia
Ohio
Ireland
California
Doc A
Replica 1
VPC Peered
Replica 2
VPC Peered
VPC peered
Index update
X
Few pointers…continued
• Cross region VPC peering is constrained to the regions where cross region VPC peering is offered via AWS.
Third party tools can be used to extend the peering regions.
• The CIDR blocks for the regions you want to peer should not overlap. This is to ensure that the IP addresses
provided by AWS do not conflict across the regions.
• In the configuration I tried, I deployed the SOLR instances in a public subnet. The instance can be deployed
to a private instance fronted by a NAT gateway and have an EC2 instance in the public subnet handling the
user queries. This shields the SOLR instance from the internet.
• The indexes can be stored on S3 as well to take advantage of the multi-level replication and data lifecycle
management. However, doing so would introduce latencies when the EC2 connection tries to reach out to S3
for indexing/querying data.
• Cross region replication can also be achieved by taking a snapshot of the EBS volume on the primary and
copying the image to the other region. However, this will not provide a real time index copy for DR reasons.
• For the example scenario I have used standard SDD as a volume. In production scenarios, it’s highly
recommended to use Provisioned IOPS SDD for better performance. If your index/search requirements are
high, used EBS backed EC2 instances. For cross region replication use Network optimized EC2 instances
for high volume queries.
• The SOLR configuration files for your core should be stored in a versioning system, e.g. GitHub. The
configuration files have to be applied separately on each DC. To update the configuration, stop indexing on
the secondary and ensure that you enable the BUFFER on the the primary. Apply the configuration changes
to the secondary Once the secondary is up it will consume all the accrued updates from the Primary
BUFFER. Move all indexing to the secondary and then follow the same steps to update the primary. It all
depends on the type of configuration changes that are pushed.
X
Test- B: CDCR on AWS: Same region; different AZ
• This test was conducted on a single region but different availability zones. US-EAST-1 (N. Virginia) was
chosen for this test.
• 4 EC2 (2 SOLR and 2 ZK) instances were deployed in a public subnet. 2 instances were deployed in
AZ1 and the other 2 in AZ2.
• Configuration changes are exactly the same as TEST A.
• Since the communication between all the servers occur in the same region no additional changes are
needed in the route tables.
• Data was easily synched across the SOLR instances that were mimicked as Data center 1 and Data
center 2
X
Cluster consistency
• Data synched between 2 regions is eventually consistent. From my tests I observed that data is
synched immediately within the 2 peered regions. However, I tested with a smaller volume of data.
During peak production volumes, when indexing and querying is occurring on the primary cluster,
it’s imperative that there is some kind of check to ensure that the data is synched correctly to your
backup SOLR cluster.
• When a document is indexed to the Primary cluster, SOLR uses the _version_ field to generate a
unique version number for the document. This version number is indexed, along with other data
elements in the document, to the Primary data center. The same version number is used while
synching the data to the secondary data center. SOLR does not generate a new version number for
the same document while updating the index to the secondary data center.
• To ensure cluster consistency, one can build a small utility to keep a check of these version
numbers across the two clusters. If DOC A is indexed in the primary cluster with VERSION 1, then
the same document will get indexed to the secondary cluster with the same version number. This
utility will compare the version numbers for the two documents. If they match, then the document is
in sync. If the version number in the Primary cluster is higher than the one in secondary then the
secondary has not received the update yet (eventual consistency). In this case the utility needs to
execute the same comparison on the document after some time.
X
X
Thank you!
Nishant Karve - Sr. Search Consultant
email: nishant@searchstax.com
twitter: @karvenishant

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
What is AWS Cloud Watch
What is AWS Cloud WatchWhat is AWS Cloud Watch
What is AWS Cloud Watch
 
Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWSDisaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS
 
AWS Global Infrastructure Foundations
AWS Global Infrastructure Foundations AWS Global Infrastructure Foundations
AWS Global Infrastructure Foundations
 
Introduction to Block and File storage on AWS
Introduction to Block and File storage on AWSIntroduction to Block and File storage on AWS
Introduction to Block and File storage on AWS
 
Aws Autoscaling
Aws AutoscalingAws Autoscaling
Aws Autoscaling
 
Dynamodb ppt
Dynamodb pptDynamodb ppt
Dynamodb ppt
 
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)
 
Cs6660 compiler design may june 2017 answer key
Cs6660 compiler design may june 2017  answer keyCs6660 compiler design may june 2017  answer key
Cs6660 compiler design may june 2017 answer key
 
Everything you wanted to know about the mobile web but were afraid to ask...b...
Everything you wanted to know about the mobile web but were afraid to ask...b...Everything you wanted to know about the mobile web but were afraid to ask...b...
Everything you wanted to know about the mobile web but were afraid to ask...b...
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
AWS on Splunk, Splunk on AWS
AWS on Splunk, Splunk on AWSAWS on Splunk, Splunk on AWS
AWS on Splunk, Splunk on AWS
 
Amazon services ec2
Amazon services ec2Amazon services ec2
Amazon services ec2
 
Web Development
Web DevelopmentWeb Development
Web Development
 
Aws route 53
Aws route 53Aws route 53
Aws route 53
 
DevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoDevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San Francisco
 
AWS Elastic Compute Cloud (EC2)
AWS Elastic Compute Cloud (EC2) AWS Elastic Compute Cloud (EC2)
AWS Elastic Compute Cloud (EC2)
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 

Ähnlich wie Solr CDCR (Cross Data Center Replication) in AWS

AWS VPC NOTES _ LEARN AWS EFFECTIVELY and Easily
AWS VPC NOTES _ LEARN AWS EFFECTIVELY and EasilyAWS VPC NOTES _ LEARN AWS EFFECTIVELY and Easily
AWS VPC NOTES _ LEARN AWS EFFECTIVELY and Easily
akramemohemat
 

Ähnlich wie Solr CDCR (Cross Data Center Replication) in AWS (20)

AWS BaseCamp: AWS Architecture Fundamentals
AWS BaseCamp: AWS  Architecture FundamentalsAWS BaseCamp: AWS  Architecture Fundamentals
AWS BaseCamp: AWS Architecture Fundamentals
 
Aws Architecture Fundamentals
Aws Architecture FundamentalsAws Architecture Fundamentals
Aws Architecture Fundamentals
 
Cloud On-Ramp Project Briefing
Cloud On-Ramp Project BriefingCloud On-Ramp Project Briefing
Cloud On-Ramp Project Briefing
 
(ARC403) From One to Many: Evolving VPC Design | AWS re:Invent 2014
(ARC403) From One to Many: Evolving VPC Design | AWS re:Invent 2014(ARC403) From One to Many: Evolving VPC Design | AWS re:Invent 2014
(ARC403) From One to Many: Evolving VPC Design | AWS re:Invent 2014
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
Toronto Virtual Meetup #7 - Anypoint VPC, VPN and DLB Architecture
Toronto Virtual Meetup #7 - Anypoint VPC, VPN and DLB ArchitectureToronto Virtual Meetup #7 - Anypoint VPC, VPN and DLB Architecture
Toronto Virtual Meetup #7 - Anypoint VPC, VPN and DLB Architecture
 
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
 
AWS VPC NOTES _ LEARN AWS EFFECTIVELY and Easily
AWS VPC NOTES _ LEARN AWS EFFECTIVELY and EasilyAWS VPC NOTES _ LEARN AWS EFFECTIVELY and Easily
AWS VPC NOTES _ LEARN AWS EFFECTIVELY and Easily
 
AWS VPC
AWS VPCAWS VPC
AWS VPC
 
The Best of Both Worlds: Implementing Hybrid IT with AWS
The Best of Both Worlds: Implementing Hybrid IT with AWSThe Best of Both Worlds: Implementing Hybrid IT with AWS
The Best of Both Worlds: Implementing Hybrid IT with AWS
 
002 AWSSlides.pdf
002 AWSSlides.pdf002 AWSSlides.pdf
002 AWSSlides.pdf
 
Customer Case Study: Achieving PCI Compliance in AWS
Customer Case Study: Achieving PCI Compliance in AWSCustomer Case Study: Achieving PCI Compliance in AWS
Customer Case Study: Achieving PCI Compliance in AWS
 
How Easy to Automate Application Deployment on AWS
How Easy to Automate Application Deployment on AWSHow Easy to Automate Application Deployment on AWS
How Easy to Automate Application Deployment on AWS
 
Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)
Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)
Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)
 
Aws Architecture Fundamentals | Dallas
Aws Architecture Fundamentals | DallasAws Architecture Fundamentals | Dallas
Aws Architecture Fundamentals | Dallas
 
Informix - The Ideal Database for IoT
Informix - The Ideal Database for IoTInformix - The Ideal Database for IoT
Informix - The Ideal Database for IoT
 
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
 
Self-Service Supercomputing
Self-Service SupercomputingSelf-Service Supercomputing
Self-Service Supercomputing
 
Designing High Availability for HashiCorp Vault in AWS
Designing High Availability for HashiCorp Vault in AWSDesigning High Availability for HashiCorp Vault in AWS
Designing High Availability for HashiCorp Vault in AWS
 
Rapid deployment of Sitecore on AWS
Rapid deployment of Sitecore on AWSRapid deployment of Sitecore on AWS
Rapid deployment of Sitecore on AWS
 

Mehr von SearchStax

Mehr von SearchStax (6)

How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
 
Five Considerations When Migrating Your On-Premise Solr Infrastructure to Goo...
Five Considerations When Migrating Your On-Premise Solr Infrastructure to Goo...Five Considerations When Migrating Your On-Premise Solr Infrastructure to Goo...
Five Considerations When Migrating Your On-Premise Solr Infrastructure to Goo...
 
Customer Webinar: Scaling and Optimizing a Large e-Commerce Drupal Solution
Customer Webinar: Scaling and Optimizing a Large e-Commerce Drupal SolutionCustomer Webinar: Scaling and Optimizing a Large e-Commerce Drupal Solution
Customer Webinar: Scaling and Optimizing a Large e-Commerce Drupal Solution
 
Future of enterprise apps is open source and cloud computing
Future of enterprise apps is open source and cloud computingFuture of enterprise apps is open source and cloud computing
Future of enterprise apps is open source and cloud computing
 
Data-Driven Approach to Search Relevance
Data-Driven Approach to Search RelevanceData-Driven Approach to Search Relevance
Data-Driven Approach to Search Relevance
 
Securing Search Data in the Cloud
Securing Search Data in the CloudSecuring Search Data in the Cloud
Securing Search Data in the Cloud
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Solr CDCR (Cross Data Center Replication) in AWS

  • 1. SOLR CDCR on AWS Nishant S Karve SearchStax
  • 2. X • Introductions • Disaster Recovery (DR), RTO and RPO • Apache Solr and State of CDCR • Test scenarios and Assumptions • CDCR Architecture and Cross region VPC peering and SOLR configurations. • Demonstrate CDCR • Observations • Questions Agenda
  • 3. X Sr. Consultant with SearchStax (formerly known as Measured Search) since December 2015. Serving as a Search Engineer with Allstate Insurance company in their SOLR Enterprise Search team. Previous clients include United Airlines, US Bank. Extensive experience with middleware applications such as TIBCO, IBM Websphere, etc. SearchStax company information : https://www.searchstax.com/ The company was named one of the “Top 20 Open Source Software Solutions for 2017” by CIOReview Magazine Nishant Karve About Me
  • 4. X Why do we need DR plans? 1. You’re only as strong as your weakest link. An ideal disaster recovery plan would place your production servers in a top tier data center with no single point of failure on the power and network connections. 2. Customer retention is costly after a DR. While on average it’s much cheaper to retain a customer then to acquire a new one, re-acquiring an old customer after an IT disaster is very expensive. 3. Customers expect perfection. With ever increasing competition and the varied choices available to the customer, we are nearing a phase where the customer expects perfection from your online service. 4  Machines and hardware fail. With the highly distributed nature of computing it’s quite obvious that machines will fail. Fried motherboards, faulty network switches, corrupted hard drives all contribute to a disaster.
  • 5. X Any talk about Disaster recovery is incomplete without discussing about RPO and RTO RPO (Recovery Point Objective): Is focused on data and your company’s loss tolerance in relation to your data. It is determined by looking at the time between backups and the amount of data that could be lost in between your backups. RTO (Recovery Time Objective): Is the target time you set for the recovery of your IT and business activities after a disaster has struck. The goal here is to calculate how quickly you need to recover, which can then dictate the type of DR classification (Tier 0 - Tier 7, tier 0 indicates no off site data and hence possibly no recovery). While they may be different, both metrics need to be considered to develop an effective DR plan. Time Last known good copy of data DR Initiated Data Restored Normal Business resumed RPO RTO RTO and RPO
  • 6. X What is Apache Solr? Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene™
  • 8. X CDCR Introduction • Disasters strike without notice. IT Companies are always prepared with a redundant copy of their database(s) in one or multiple databases on a secondary site, possibly far away from the primary. • Cross data center replication, also known as XDCR, is about dual data center writes to ensure business continuity during a disaster. • DR plans are extremely important for a consistent user experience and customer retention. Customer retention is easier than acquiring new customers. • CDCR can also be used for replicating a subset of your Production data to provide a production like test environment for your developers and testers.
  • 9. X CDCR prior to out of box support in Apache Solr DC1 DC2 1 2 Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 Client APPLICATION
  • 10. X 1. Onus on Application to write to both data centers is taken away.
 2. Synchronization of data happens out of the box between two Solr clusters.
 3. Bi-directional is supported with minimal configuration changes. 4. Multiple collections can also be replicated 5. Asynchronous data transfer CDCR Support in Apache Solr post 6.6.x
  • 11. X • Test A: CDCR on AWS: Across 2 regions (Virginia and Ohio) using AWS provided VPC peering. • Test- B: CDCR on AWS: Within the same region but different availability zones • CDCR on premise. Out of scope for this discussion. However, the solution works perfectly on on-premise DC CDCR Scenarios
  • 12. X Several assumptions were made while testing CDCR on AWS. They are outlined as follows. • SOLR 7.2 was used for evaluation. • Bidirectional CDCR, which is a new offering with SOLR 7.2 was tested. • A single node SOLR cluster with an external zookeeper was used for each region for cross region CDCR. For test within the same region two separate clusters were used. • VPC peering is not available for all AWS regions. Inter-Region VPC Peering is available in AWS US East (N. Virginia), US East (Ohio), US West (Oregon) and EU (Ireland). Virginia and Ohio were used as two data centers with VPC peering enabled between them. • No performance test was done around the CDCR process. 10,000 documents were indexed using curl and basic out of the box settings for CDCR. • Custom VPC’s were created with different CIDR ranges since the default VPC CIDR ranges in Virginia, Ohio and Ireland overlapped. As per AWS you cannot VPC peer two regions if their CIDR ranges overlap. Assumptions
  • 13. X Test A- Cross region CDCR-AWS Setup For the bidirectional CDCR to work cross region the two regions have to be VPC peered. VPC peering is required to ensure that data could flow through the CDCR replicator across the network. This section describes the AWS setup required to VPC peer two regions.
  • 14. X AWS Setup • Step 1: Create a VPC (CIDR 10.0.0.0/16) , a public subnet, attach an Internet gateway to the VPC and spin two EC2 instances in the Virginia region. One EC2 instance will host the zookeeper on 2181 and the other EC2 instance will host the SOLR instance on 8983. • Step 2: Create a VPC (CIDR 20.0.0.0/16), a public subnet, attach an Internet gateway to the VPC and spin two EC2 instances in the Ohio region. One EC2 instance will host the zookeeper on 2181 and the other EC2 instance will host the SOLR instance on 8983. • Step 3: Create security groups in each region to allow inbound traffic on the following ports. Port 22: For administering the EC2 instance. For Production, ensure that you allow traffic only for the IP addresses that are authorized to install SOLR and Zookeeper on the EC2 instance. Port 2181: For Zookeeper traffic Port 8983: For SOLR traffic Ephemeral traffic for all TCP ports between 0-65535 for the Security group itself.
  • 15. X VPC Peering..continued • From the Virginia VPC generate a VPC peering request • Head on to the Ohio VPC and accept the peering request. • Once the peering request is accepted a Peering connection ID will be generated. Use this ID in each regions Route table along with the CIDR range. This will ensure that traffic will be accepted cross region from the IP’s within the CIDR range.
  • 16. X VPC Peering..continued • Following is the route table for Ohio VPC. 10.0.0.0/16 is the CIDR for Virginia region. • Following is the route table for Virginia VPC. 20.0.0.0/16 is the CIDR for Virginia region.
  • 17. X Testing Cross region setup • Before you start installing SOLR and Zookeeper on the EC2 instances ensure that you are able to ping the private IP addresses of each EC2 instance cross region. This will save you a lot of troubleshooting headache later on. Please note: Since the Security group does not have a rule to allow inbound traffic on ICMP, ping may not function. For testing reasons allow ICMP traffic. Once the test is complete disable the rule. • It’s very critical that all EC2 instances can see each other for a successful CDCR. For on-premise networks ensure all firewalls, ACL’s and other settings are in place before proceeding with the CDCR setup. Duplicate the following on both regions. 1) Ensure that Java 1.8 is installed on both machines 2) Install SOLR 7.2 on one EC2 instance. Ensure that the /etc/hosts file on the EC2 instance reflects the public IP of the machine. 3) Install Zookeeper 3.4.6 on another EC2 instance. 4) Start the Zookeeper instance on port 2181. If you would like to start the zookeeper on a different port, adjust the security groups accordingly. 5) Start the SOLR instance in a cloud mode (SOLR cloud) on port 2181. If you would like to start the SOLR instance on a different port, adjust the security groups accordingly. CDCR does not work if the SOLR deployment mode is stand alone. Install SOLR and Zookeeper
  • 18. X CDCR setup For the test, default configuration files were used for creating a core. Following modifications were done to the solrconfig.xml to enable it for CDCR. On a very high level for CDCR to function the idea is to allow the Virginia cluster zookeeper talk to the Ohio cluster zookeeper and vice versa. For the sake of this conversation, let’s assume that the Virginia DC is our Source DC for indexing and the Ohio DC is the target. As per the official SOLR documentation only one DC can act as a source for indexing documents at a given time. If for any reason a decision is made to flip the primary data center, then the new source for indexing will be the Ohio data center. For search queries, both data centers can be used at the same time. The setup is more like an Active-Passive setup than an active-active DR cluster. Virginia DC Ohio DC ZKHostZKHost SOLR SOLR PUSH SOURCE TARGET
  • 19. X Virginia DC setup..solrconfig.xml • Make the following changes in the solrconfig.xml for the Virginia cluster. This will allow to enable CDCR from Virginia to Ohio cluster through the VPC peered network. <requestHandler name="/update" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="update.chain">cdcr-processor-chain</str> </lst> </requestHandler> <updateRequestProcessorChain name="cdcr-processor-chain"> <processor class=“solr.CdcrUpdateProcessorFactory"/> <processor class="solr.RunUpdateProcessorFactory"/> </updateRequestProcessorChain> <requestHandler name="/cdcr" class="solr.CdcrRequestHandler"> <lst name="replica"> <str name=“zkHost"><<OhioZK:2181>></str> <str name=“source">music</str> <!— Source collection in Virginia —> <str name=“target">music</str> <!— Target collection in Ohio —> </lst> <lst name="replicator"> <str name="threadPoolSize">8</str> <str name="schedule">1000</str> <str name="batchSize">128</str> </lst> <lst name="updateLogSynchronizer"> <str name="schedule">1000</str> </lst> <!- - Missing in the documentation - - > </requestHandler>
  • 20. X Virginia DC setup. continued <updateHandler class="solr.DirectUpdateHandler2"> <updateLog class="solr.CdcrUpdateLog"> <str name="dir">${solr.ulog.dir:}</str> <!--Any parameters from the original <updateLog> section --> </updateLog> </updateHandler> • Make the following changes in the solrconfig.xml for the Ohio cluster. This will allow to enable CDCR from Ohio to Virginia cluster through the VPC peered network. <requestHandler name="/update" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="update.chain">cdcr-processor-chain</str> </lst> </requestHandler> <updateRequestProcessorChain name=“cdcr-processor-chain"> <processor class=“solr.CdcrUpdateProcessorFactory"/> <processor class="solr.RunUpdateProcessorFactory"/> </updateRequestProcessorChain> <requestHandler name="/cdcr" class="solr.CdcrRequestHandler"> <lst name="replica"> <str name=“zkHost"><<VirginiaZK:2181>></str> <str name=“source">music</str> <!— Source collection in Virginia —> <str name=“target">music</str> <!— Target collection in Ohio —> </lst> Ohio DC Setup..solrconfig.xml
  • 21. X Ohio DC Setup.continued <updateHandler class="solr.DirectUpdateHandler2"> <updateLog class="solr.CdcrUpdateLog"> <str name="dir">${solr.ulog.dir:}</str> <!--Any parameters from the original <updateLog> section --> </updateLog> </updateHandler> Ensure that openSearcher in the solrconfig.xml is set to true if you want to make the documents searchable once they are committed to the index. • Data sourced from https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking/data. • This dataset contains the daily ranking of the 200 most listened songs in 53 countries from 2017 and 2018 by Spotify users. I have chosen 10,000 such records for demo track Name of the song artist Artist who performed it streams Total number of streams url URL on spotify era Release date region region code
  • 22. X Enabling CDCR • In the previous slides we enabled the configuration files for CDCR. SOLR offers a CDCR API to interact with the CDCR handler added in the solrconfig.xml. This CDCR API allows us to set the direction of the CDCR, check for the status of the CDCR, disable buffer, check for CDCR logs and errors. Let’s enable CDCR from Virginia to Ohio. • http://<<VirginiaSOLR:8983>>/solr/music/cdcr?action=DISABLEBUFFER http://<<OhioSOLR:8983>>/solr/music/cdcr?action=DISABLEBUFFER http://<<VirginiaSOLR:8983>>/solr/music/cdcr?action=START • Nothing else needs to be done in the target data center. This will set the Virginia cluster as the Primary data center where the indexing queries should go. Once this cluster receives the updates, they will be forwarded to the Ohio data center through the “replica” element setting in the solrconfig.xml. The search queries can occur on both data centers. • The documents once indexed in Virginia cluster should be available in the Ohio cluster. The replication is largely dependent on the AWS backbone (due to VPC peering). I have noticed that the document is available within 100 ms in the other region. I have tried to replicate 10,0000 documents using curl and didn’t see any performance degradation.
  • 23. X CDCR tips While CDCR is a great way to ensure that you always have a back up of your primary DC, it does come with some limitations. • CDCR is unlikely to be satisfactory with bulk operations • CDCR works robustly when the Source and Target data centers have the same number of shards in the collection. • Running CDCR with the indexes on HDFS is currently not supported. There is an ongoing JIRA issue for the same • Configuration files are not automatically synched between data centers as previously mentioned. • Always stop the CDCR process if your backup data center is going to be out of service for an indefinite amount of time.
  • 24. X CDCR Tweaks While the document clearly states that CDCR indexing operations should occur only on one data center at a time (Active-Passive), I tried enabling CDCR from both directions. Disclaimer: Before you implement this in your production cluster make sure that you understand the full implications of enabling CDCR across both regions. I tested with around 10,000 documents that were indexed at the same time on both clusters. I was able to successfully see 20,000 documents in each cluster indicating that the data was successfully replicated in either cluster. This demonstrates that an active active setup is possible and works well. However, a proper performance test should be conducted with your use case to guarantee safe operation. To enable CDCR on both clusters perform the following actions On Virginia cluster http://<<VirginiaSOLR:8983>/solr/music/cdcr?action=DISABLEBUFFER http://<<VirginiaSOLR:8983>/solr/music/cdcr?action=START On Ohio cluster http://<<OhioSOLR:8983>/solr/music/cdcr?action=DISABLEBUFFER http://<<OhioSOLR:8983>/solr/music/cdcr?action=START Index a document on both data centers and they should get replicated across.
  • 25. X Why stop here? • The current flavor of CDCR supports replicating data to one or more target data centers. This opens up a plethora of opportunities for interesting setups. Here is what I tried Virginia Data center Production Primary Ohio Data Center Production Secondary Ireland Data Center OR On-Premise Data center zkHost (Ohio) zkHost (Ireland) zkHost (Virginia) zkHost (Ireland) VPC Peering VPC Peering VPC Peering zkHost (Ireland) Clients that use Ireland or on- premise cluster for research and analytics REPLICA 1 REPLICA 2 REPLICATOR When indexing occurs in Virgina it’s replicated to the Ohio cluster and the Ireland cluster. In the event of an outage the CDCR direction is flipped from Ohio to Virginia using the CDCR API. The data indexed from Ohio to Virginia is available in Ireland as well. The data in Ireland can be used by your Research and Analytics group. In rare scenarios it can also be used as a secondary backup.
  • 26. X Setup The setup to replicate data to 2 data centers from a source data center is pretty straightforward. Repeat the replica element in the source data center for the clusters you want the data synched too. If the CDCR direction is flipped and Ohio becomes the new primary, the requirement is to sync data to Virginia and Ireland. Hence the solrconfig.xml files for Virginia and Ohio have the zookeeper settings for Ireland as well. Ireland cluster is purely used as either a second back up cluster or a cluster that can be used for research and analytics. Ensure that you create the appropriate collections on each data center. VPC peering needs to be done between Virginia and Ohio, Ohio and Ireland and Virginia and Ireland. Ensure that the CIDR block ranges in all 3 VPC’s don’t overlap before VPC peering them. Virginia solrconfig.xml <lst name=“replica"> <str name=“zkHost"><<OhioZK:2181>></str> <strname=“source">music</str> <str name=“target">music</str> </lst> <lst name=“replica"> <str name=“zkHost"><<IrelandZK:2181>></str> <strname=“source">music</str> <str name=“target">music</str> </lst> Ohio solrconfig.xml <lst name=“replica"> <str name=“zkHost"><<VirginiaZK:2181>></str> <strname=“source">music</str> <str name=“target">music</str> </lst> <lst name=“replica"> <str name=“zkHost"><<IrelandZK:2181>></str> <strname=“source">music</str> <str name=“target">music</str> </lst>
  • 27. X Example of a sync between two data centers and your corporate data center.
  • 28. X Few pointers There are certain rules to follow while replicating data across regions. 1. The 3 peered VPC regions cannot replicate data in a transitive fashion. If Virginia is peered to Ohio, Ohio is peered to Ireland and Ireland in turn is peered to Virginia, data added in Virginia will not end up back in Virginia again. Cross region replicated data only moves a single hop. In the above scenario, if data is added to the Virginia cluster it will end up to the directly peered regions. This data will not be forwarded to any other regions that are peered downstream to the direct ones. 2. In the scenario below, documents indexed in Virginia will not end up in California or Ireland via Ohio. Virginia Ohio Ireland California Doc A Replica 1 VPC Peered Replica 2 VPC Peered VPC peered Index update
  • 29. X Few pointers…continued • Cross region VPC peering is constrained to the regions where cross region VPC peering is offered via AWS. Third party tools can be used to extend the peering regions. • The CIDR blocks for the regions you want to peer should not overlap. This is to ensure that the IP addresses provided by AWS do not conflict across the regions. • In the configuration I tried, I deployed the SOLR instances in a public subnet. The instance can be deployed to a private instance fronted by a NAT gateway and have an EC2 instance in the public subnet handling the user queries. This shields the SOLR instance from the internet. • The indexes can be stored on S3 as well to take advantage of the multi-level replication and data lifecycle management. However, doing so would introduce latencies when the EC2 connection tries to reach out to S3 for indexing/querying data. • Cross region replication can also be achieved by taking a snapshot of the EBS volume on the primary and copying the image to the other region. However, this will not provide a real time index copy for DR reasons. • For the example scenario I have used standard SDD as a volume. In production scenarios, it’s highly recommended to use Provisioned IOPS SDD for better performance. If your index/search requirements are high, used EBS backed EC2 instances. For cross region replication use Network optimized EC2 instances for high volume queries. • The SOLR configuration files for your core should be stored in a versioning system, e.g. GitHub. The configuration files have to be applied separately on each DC. To update the configuration, stop indexing on the secondary and ensure that you enable the BUFFER on the the primary. Apply the configuration changes to the secondary Once the secondary is up it will consume all the accrued updates from the Primary BUFFER. Move all indexing to the secondary and then follow the same steps to update the primary. It all depends on the type of configuration changes that are pushed.
  • 30. X Test- B: CDCR on AWS: Same region; different AZ • This test was conducted on a single region but different availability zones. US-EAST-1 (N. Virginia) was chosen for this test. • 4 EC2 (2 SOLR and 2 ZK) instances were deployed in a public subnet. 2 instances were deployed in AZ1 and the other 2 in AZ2. • Configuration changes are exactly the same as TEST A. • Since the communication between all the servers occur in the same region no additional changes are needed in the route tables. • Data was easily synched across the SOLR instances that were mimicked as Data center 1 and Data center 2
  • 31. X Cluster consistency • Data synched between 2 regions is eventually consistent. From my tests I observed that data is synched immediately within the 2 peered regions. However, I tested with a smaller volume of data. During peak production volumes, when indexing and querying is occurring on the primary cluster, it’s imperative that there is some kind of check to ensure that the data is synched correctly to your backup SOLR cluster. • When a document is indexed to the Primary cluster, SOLR uses the _version_ field to generate a unique version number for the document. This version number is indexed, along with other data elements in the document, to the Primary data center. The same version number is used while synching the data to the secondary data center. SOLR does not generate a new version number for the same document while updating the index to the secondary data center. • To ensure cluster consistency, one can build a small utility to keep a check of these version numbers across the two clusters. If DOC A is indexed in the primary cluster with VERSION 1, then the same document will get indexed to the secondary cluster with the same version number. This utility will compare the version numbers for the two documents. If they match, then the document is in sync. If the version number in the Primary cluster is higher than the one in secondary then the secondary has not received the update yet (eventual consistency). In this case the utility needs to execute the same comparison on the document after some time.
  • 32. X
  • 33. X Thank you! Nishant Karve - Sr. Search Consultant email: nishant@searchstax.com twitter: @karvenishant