SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Building a Large Scale SEO/SEM 
Application with Apache Solr 
Rahul Jain 
Freelance Big-data/Search Consultant 
@rahuldausa 
dynamicrahul2020@gmail.com
About Me… 
• Freelance Big-data/Search Consultant based out of Hyderabad, India 
• Provide Consulting services and solutions for Solr, Elasticsearch and other Big data 
solutions (Apache Hadoop and Spark) 
• Organizer of two Meetup groups in Hyderabad 
• Hyderabad Apache Solr/Lucene 
• Big Data Hyderabad
What I am going to talk 
Share our experience in working on Search in this application … 
• What all issues we have faced and Lessons learned 
• How we do Database Import, Batch Indexing… 
• Techniques to Scale and improve Search latency 
• The System Architecture 
• Some tips for tuning Solr 
• Q/A
What does the Application do 
§ Keyword Research and Competitor Analysis Tool for SEO (Search Engine Optimization) and SEM 
(Search Engine Marketing) Professionals 
§ End user search for a keyword or a domain, and get all insights about that. 
§ Aggregate data for the top 50 results of Google and Bing across 3 countries for 80million+ keywords. 
§ Provide key metrics like keywords, CPM (Cost per mille), CPC (Cost per click), competitor’s details etc. 
Web 
crawling 
Data 
Processing 
& 
Aggrega4on 
Ad 
Networks 
Apis 
Databases 
Data 
Collec4on 
*All 
trademarks 
and 
logos 
belong 
to 
their 
respec1ve 
owners.
Technology Stack
High level Architecture 
Load 
Balancer 
(HAProxy) 
Managed 
Cache 
Apache 
Solr 
Cache 
Cluster 
(Redis) 
Apache 
Solr 
Internet 
Database 
(MySQL) 
App 
Server 
(Tomcat) 
Apache 
Solr 
Search 
Head 
Web 
Server 
Farm 
Php 
App 
(Nginx) 
Cluster 
Manager 
(Custom 
using 
Zookeeper) 
Search 
Head 
: 
• Is 
a 
Solr 
Server 
which 
does 
not 
contain 
any 
data. 
• Make 
a 
Distributed 
Search 
request 
and 
aggregate 
the 
Search 
Results 
• Also 
works 
as 
a 
Load 
Balancer 
for 
search 
queries. 
Apache 
Solr 
Search 
Head 
(Solr) 
1 2 3 
4 
8 
5 
6 
7 
Ids 
lookup 
Cache 
Fetch 
cluster 
Mapping 
for 
which 
month’ 
cluster
Search - Key challenges 
§ After processing we have ~40 billion records every month in MySQL database 
including 
§ 80+ Million Keywords 
§ 110+ Million Domains 
§ 1billion+ URLs 
§ Multiple keywords for a Single URL and vice-versa 
§ Multiple tables with varying size from 50million to 12billion 
§ Search is a core functionality, so all records (selected fields) must be Indexed in Solr 
§ Page load time (including all 24 widgets, Max) < 1.5 sec (Critical) 
§ But… we need to load this data only once every month for all countries, so we can 
do Batch Indexing and as this data never changes, we can apply caching.
Making Data Import and Batch Indexing Faster
Data Import from MySQL to Solr 
• Solr’s DataImportHanlder is awesome but quickly become pretty slow for large volume 
• We wrote our Custom Data Importer that can read(pull) documents from Database and pushses (Async) these into 
Solr. 
Data 
Importer 
(Custom) 
Solr 
Solr 
Solr 
Table 
ID 
(Primary/ 
Unique 
Key 
with 
Index) 
Columns 
1 
Record1 
2 
Record2 
………… 
5000 
Record 
5000 
*6000 
Record 
6000 
-­‐-­‐-­‐-­‐-­‐-­‐-­‐ 
n… 
Record 
n… 
Database 
Batch 
1-­‐2000 
Batch 
2001-­‐4000 
Importer 
batches 
these 
database 
Batches 
into 
a 
Bigger 
Batch 
(10k 
documents) 
and 
Flushes 
to 
selected 
Solr 
servers 
Asynchronously 
in 
a 
round 
robin 
fashion 
Rather 
than 
using 
“limit” 
func4on 
of 
Database, 
it 
queries 
by 
Range 
of 
IDs 
(Primary 
Key). 
Importer 
Fetches 
10 
batches 
at 
a 
4me 
from 
MySQL 
database, 
each 
having 
2k 
Records. 
Each 
call 
is 
Stateless. 
Downside: 
• We 
“select * from table t 
where ID=1 to ID<=2000″ 
“select * from table t 
where ID=2001 to ID<=4000″ 
must 
need 
to 
have 
a 
primary 
key 
and 
that 
can 
be 
slow 
while 
crea4ng 
it 
in 
database. 
• This 
approach 
required 
more 
number 
of 
calls, 
if 
the 
IDs 
are 
not 
sequen4al. 
……… 
*Non-­‐sequen4al 
Id
Batch Indexing
Indexing 
All 
tables 
into 
a 
Single 
Big 
Index 
• All tables in same Index, distributed on multiple Solr cores 
and Solr servers (Java processes) 
• Commit on every 120million records or in every 15 minutes 
whichever is earlier 
• Disabled Soft-commit and updates (overwrite=false), as 
each call to addDocument calls updateDocument under 
the hood 
• But still.. Indexing was slow (due to being sequential for all 
tables) and we need to stop it after 2 days. 
• Search was also awfully slow (order of Minutes) 
From 
cache, 
aber 
warm-­‐up 
Bunch 
of 
shards 
~100
Creating a Distributed Index for each table 
How many shards ? 
• Each table have varying number of records from 50million to 
12billion 
• If we choose 100million per shard (core), it means for 12billion, we 
need to query 120 shards, awfully slow. 
• Other side If we choose 500million/shard, a table with 500million 
records will have only 1 shard, high latency, high memory usage 
(Heap) and no distributed search*. 
• Hybrid Approach : Determine number of shards based on max 
number of records in table. 
• Did a benchmarking to find the best sweet spot for max documents 
(records) per shard with most optimal Search latency 
• Commit at the end for each table. 
Records/Max 
Shards 
Table 
Max 
Number 
of 
Records 
in 
table 
Max 
number 
of 
Shards 
(cores) 
Allowed 
<100 
million 
1 
100-­‐300million 
2 
<500 
million 
4 
< 
1 
billion 
6 
1-­‐5 
billion 
8 
>5 
billion 
16 
* 
Distributed 
Search 
improves 
latency 
but 
may 
not 
be 
faster 
always 
as 
search 
latency 
is 
limited 
by 
4me 
taken 
by 
last 
shard 
in 
responding.
It worked fine but one day suddenly…. 
java.lang.OutOfMemoryError: 
Java heap Space 
• All Solr servers were crashed. 
• We restarted but they keep crashing randomly after every other day 
• Took a Heap dump and realized that it is due to Field Cache 
• Found a Solution : Doc values and never had this issue again till date.
Doc Values (Life Saver) 
• Disk based Field Data a.ka. Doc values 
• Document to value mapping built at index time 
• Store Field values on Disk (or Memory) in a column stride fashion 
• Better compression than Field Cache 
• Replacement of Field Cache (not completely) 
• Quite suitable for Custom Scoring, Sorting and Faceting 
References: 
• Old article (but Good): http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/ 
• https://cwiki.apache.org/confluence/display/solr/DocValues 
• http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/doc-values.html
Scaling and Making Search Faster…
Partitioning 
• 3 Level Partitioning, by Month, Country and Table name 
• Each Month has its own Cluster and a Cluster Manager. 
• Latency and Throughput are tradeoff, you can’t have both at the same time. 
Node 
n 
Node 
n 
Web 
server 
Farm 
Load 
Balancer 
App 
Server 
Search 
Head 
(US) 
Search 
Head 
(UK) 
Search 
Head 
(AU) 
Master 
Cluster 
Manager 
Internet 
Cluster 
2 
for 
another 
month 
e.g 
Feb 
Fetch 
Cluster 
Mapping 
and 
make 
a 
request 
to 
Search 
Head 
with 
respec4ve 
Solr 
cores 
for 
that 
Country, 
Month 
and 
Table 
ApApp 
pS 
eSrevrevre 
r 
Cluster 
1 
for 
a 
Month 
e.g. 
Jan 
Solr 
Solr 
Solr 
Cluster 
1 
Cluster 
2 
Solr 
Solr 
Solr 
Node 
1 
Solr 
Solr 
Solr 
Solr 
Solr 
Node 
1 
Solr 
Cluster 
1 
Cluster 
Manager 
Cluster 
Manager 
A 
P 
*A 
: 
Ac4ve 
P 
: 
Passive 
Cluster 
Manager 
Cluster 
Manager 
A 
P 
Real 
4me 
Sync 
1 
user 
24 
UI 
widgets, 
24 
Ajax 
requests 
41 
search 
requests 
Search 
Head 
(US) 
Search 
Head 
(UK) 
Search 
Head 
(AU)
Index Optimization Strategy 
• Running optimization on ~200+ Solr cores is very-very time consuming 
• Solr Cores with bigger Index size (~70GB) have 2500+ segments due to higher Merge Factor while Indexing. 
• Can’ t be run in parallel on all Cores in a Single Machine as heavily dependent on Cpu and Disk IO 
• Optimizing large segments into a very small number is very very time consuming and can take upto 3x Index size on Disk 
• Other side Small number of segments improve performance drastically, so need to have a balance. 
Node 
1 
Solr 
Solr 
Staging 
Cluster 
Manager 
*As 
per 
our 
observa4on 
for 
our 
data, 
Op4miza4on 
process 
takes 
~42-­‐46 
seconds 
for 
1GB 
We 
need 
to 
do 
it 
for 
4.6TB 
(including 
all 
boxes), 
the 
total 
Solr 
Index 
size 
for 
a 
Single 
Month 
Solr 
Op4mizer 
Produc4on 
Cluster 
Manager 
Fetches Cluster 
Mapping (list of all 
cores) 
Once optimization and cache 
warmup is done, pushes the 
Cluster Mapping to Production 
Cluster manager, making all 
Indices live 
Optimizing a Solr core into a very small 
number of segments takes a huge time. 
so we do it iteratively. 
Algo: 
Choose Max 3 cores on a 
Machine to optimize in 
parallel. Start with least size 
of Index 
Index 
Size 
Number 
of 
docs 
Determine 
Max 
Segments 
Allowed 
Reduce 
Segments 
to 
*.90 
in 
each 
Run 
Current 
Segments 
Aber 
op4miza4on 
Node 
2 
Solr 
Solr 
Solr
Finally after optimization and cache warm-up… 
A shard look like this. 
Max 
Segments 
aber 
op4miza4on
External Caching 
• In Distributed search, for a repeated query request, all Solr severs 
needs to be hit, even though result is served from Solr’s cache. It 
increase search latency with lowering throughput. 
• Solution: cache most frequently accessed query results in app layer 
(LRU based eviction) 
• We use Redis for Caching 
• All complex aggregation queries’ results, once fetched from multiple 
Solr servers are served from cache on subsequent requests. 
Why Redis… 
• Advanced In-Memory key-value store 
• Insane fast 
• Response time in order of 5-10ms 
• Provides Cache behavior (set, get) with advance data structures like 
hashes, lists, sets, sorted sets, bitmaps etc. 
• http://redis.io/
Hardware 
• We use Bare Metal, Dedicated servers for Solr due to below reasons 
1. Performance gain (with virtual servers, performance dropped by ~18-20%) 
2. Better value of computing power/$ spent 
• 2.6Ghz, 32 core (4x8 core), 384GB RAM, 6TB SAS 15k (RAID10) 
• 2.6Ghz, 16 core (2x8 core), 192GB RAM, 4TB SAS 15k (RAID10) 
• Since Index size is 4.6TB/month, we want to cache more data in Disk Cache with bigger RAM. 
SSD vs SAS 
1. SSD : Indexing rate - peek (MySQL to Solr) : 330k docs/sec (each doc: ~100-125 bytes) 
2. SAS 15k: 182k docs/sec (dropped by ~45%) 
3. SAS 15k is quite cheaper than SSD for bigger hard disks. 
4. We are using SAS 15k, as being cost effective but have plans to move to SSD in future.
Conclusion : Key takeaways 
General: 
• Understand the characteristics of the data and partition it well. 
Cache: 
§ Spend time in analyzing the Cache usage. Tune them. It is 10x-50x faster. 
§ Always use Filter Query (fq) wherever it is possible as that will improve the performance due to Filter cache. 
GC : 
§ Keep your JVM heap size to lower value (proportional to machine’s RAM) with leaving enough RAM for kernel as bigger 
heap will lead to frequent GC. 4GB to 8GB heap allocation is quite good range. but we use 12GB/16GB. 
§ Keep an eye on Garbage collection (GC) logs specially on Full GC. 
Tuning Params: 
§ Don’t use Soft Commit if you don’t need it. Specially in Batch Loading 
§ Always explore tuning of Solr for High performance, like ramBufferSize, MergeFactor, HttpShardHandler’s 
various configurations. 
§ Use hash in Redis to minimize the memory usage. 
Read the whole experience for more detail: 
http://rahuldausa.wordpress.com/2014/05/16/real-time-search-on-40-billion-records-month-with-solr/
Thank you! 
Twitter: @rahuldausa 
dynamicrahul2020@gmail.com 
http://www.linkedin.com/in/rahuldausa

Weitere ähnliche Inhalte

Was ist angesagt?

Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Lucidworks
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetLucidworks
 
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Lucidworks
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksLucidworks
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaLucidworks
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologyLucidworks
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Lucidworks
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...Lucidworks
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...Lucidworks
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchMark Miller
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...Lucidworks
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoLucidworks
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksLucidworks
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataShalin Shekhar Mangar
 

Was ist angesagt? (20)

Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
 
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 

Andere mochten auch

Search is the UI
Search is the UI Search is the UI
Search is the UI danielbeach
 
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solrlucenerevolution
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.Oliver Kriska
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solrsagar chaturvedi
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanGregg Donovan
 
Large Scale SEO - Method to the madness
Large Scale SEO - Method to the madnessLarge Scale SEO - Method to the madness
Large Scale SEO - Method to the madnessJoost de Valk
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
Ruby language overview
Ruby language overviewRuby language overview
Ruby language overviewUptech
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Ruby is Awesome and Rust is Awesome and Building a Game in Both is AWESOME
Ruby is Awesome and Rust is Awesome and Building a Game in Both is AWESOMERuby is Awesome and Rust is Awesome and Building a Game in Both is AWESOME
Ruby is Awesome and Rust is Awesome and Building a Game in Both is AWESOMEJulien Fitzpatrick
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaDropsolid
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Node.js'e Hızlı Bir Bakış
Node.js'e Hızlı Bir BakışNode.js'e Hızlı Bir Bakış
Node.js'e Hızlı Bir BakışMustafa Dağdelen
 
From Java To Clojure (English version)
From Java To Clojure (English version)From Java To Clojure (English version)
From Java To Clojure (English version)Kent Ohashi
 
Fullstack End-to-end test automation with Node.js, one year later
Fullstack End-to-end test automation with Node.js, one year laterFullstack End-to-end test automation with Node.js, one year later
Fullstack End-to-end test automation with Node.js, one year laterMek Srunyu Stittri
 
7 Stages of Scaling Web Applications
7 Stages of Scaling Web Applications7 Stages of Scaling Web Applications
7 Stages of Scaling Web ApplicationsDavid Mitzenmacher
 

Andere mochten auch (20)

Search is the UI
Search is the UI Search is the UI
Search is the UI
 
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solr
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
 
Large Scale SEO - Method to the madness
Large Scale SEO - Method to the madnessLarge Scale SEO - Method to the madness
Large Scale SEO - Method to the madness
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Ruby language overview
Ruby language overviewRuby language overview
Ruby language overview
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Ruby is Awesome and Rust is Awesome and Building a Game in Both is AWESOME
Ruby is Awesome and Rust is Awesome and Building a Game in Both is AWESOMERuby is Awesome and Rust is Awesome and Building a Game in Both is AWESOME
Ruby is Awesome and Rust is Awesome and Building a Game in Both is AWESOME
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Node.js'e Hızlı Bir Bakış
Node.js'e Hızlı Bir BakışNode.js'e Hızlı Bir Bakış
Node.js'e Hızlı Bir Bakış
 
From Java To Clojure (English version)
From Java To Clojure (English version)From Java To Clojure (English version)
From Java To Clojure (English version)
 
The road to php 7.1
The road to php 7.1The road to php 7.1
The road to php 7.1
 
Fullstack End-to-end test automation with Node.js, one year later
Fullstack End-to-end test automation with Node.js, one year laterFullstack End-to-end test automation with Node.js, one year later
Fullstack End-to-end test automation with Node.js, one year later
 
7 Stages of Scaling Web Applications
7 Stages of Scaling Web Applications7 Stages of Scaling Web Applications
7 Stages of Scaling Web Applications
 

Ähnlich wie Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rahul Jain

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Speedment, Inc.
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Deep Dive into DynamoDB
Deep Dive into DynamoDBDeep Dive into DynamoDB
Deep Dive into DynamoDBAWS Germany
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at nightMichael Yarichuk
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectRemy Rosenbaum
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017Roy Russo
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 

Ähnlich wie Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rahul Jain (20)

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Deep Dive into DynamoDB
Deep Dive into DynamoDBDeep Dive into DynamoDB
Deep Dive into DynamoDB
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 

Mehr von Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Mehr von Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Kürzlich hochgeladen

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 

Kürzlich hochgeladen (20)

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 

Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rahul Jain

  • 1.
  • 2. Building a Large Scale SEO/SEM Application with Apache Solr Rahul Jain Freelance Big-data/Search Consultant @rahuldausa dynamicrahul2020@gmail.com
  • 3. About Me… • Freelance Big-data/Search Consultant based out of Hyderabad, India • Provide Consulting services and solutions for Solr, Elasticsearch and other Big data solutions (Apache Hadoop and Spark) • Organizer of two Meetup groups in Hyderabad • Hyderabad Apache Solr/Lucene • Big Data Hyderabad
  • 4. What I am going to talk Share our experience in working on Search in this application … • What all issues we have faced and Lessons learned • How we do Database Import, Batch Indexing… • Techniques to Scale and improve Search latency • The System Architecture • Some tips for tuning Solr • Q/A
  • 5. What does the Application do § Keyword Research and Competitor Analysis Tool for SEO (Search Engine Optimization) and SEM (Search Engine Marketing) Professionals § End user search for a keyword or a domain, and get all insights about that. § Aggregate data for the top 50 results of Google and Bing across 3 countries for 80million+ keywords. § Provide key metrics like keywords, CPM (Cost per mille), CPC (Cost per click), competitor’s details etc. Web crawling Data Processing & Aggrega4on Ad Networks Apis Databases Data Collec4on *All trademarks and logos belong to their respec1ve owners.
  • 7. High level Architecture Load Balancer (HAProxy) Managed Cache Apache Solr Cache Cluster (Redis) Apache Solr Internet Database (MySQL) App Server (Tomcat) Apache Solr Search Head Web Server Farm Php App (Nginx) Cluster Manager (Custom using Zookeeper) Search Head : • Is a Solr Server which does not contain any data. • Make a Distributed Search request and aggregate the Search Results • Also works as a Load Balancer for search queries. Apache Solr Search Head (Solr) 1 2 3 4 8 5 6 7 Ids lookup Cache Fetch cluster Mapping for which month’ cluster
  • 8. Search - Key challenges § After processing we have ~40 billion records every month in MySQL database including § 80+ Million Keywords § 110+ Million Domains § 1billion+ URLs § Multiple keywords for a Single URL and vice-versa § Multiple tables with varying size from 50million to 12billion § Search is a core functionality, so all records (selected fields) must be Indexed in Solr § Page load time (including all 24 widgets, Max) < 1.5 sec (Critical) § But… we need to load this data only once every month for all countries, so we can do Batch Indexing and as this data never changes, we can apply caching.
  • 9. Making Data Import and Batch Indexing Faster
  • 10. Data Import from MySQL to Solr • Solr’s DataImportHanlder is awesome but quickly become pretty slow for large volume • We wrote our Custom Data Importer that can read(pull) documents from Database and pushses (Async) these into Solr. Data Importer (Custom) Solr Solr Solr Table ID (Primary/ Unique Key with Index) Columns 1 Record1 2 Record2 ………… 5000 Record 5000 *6000 Record 6000 -­‐-­‐-­‐-­‐-­‐-­‐-­‐ n… Record n… Database Batch 1-­‐2000 Batch 2001-­‐4000 Importer batches these database Batches into a Bigger Batch (10k documents) and Flushes to selected Solr servers Asynchronously in a round robin fashion Rather than using “limit” func4on of Database, it queries by Range of IDs (Primary Key). Importer Fetches 10 batches at a 4me from MySQL database, each having 2k Records. Each call is Stateless. Downside: • We “select * from table t where ID=1 to ID<=2000″ “select * from table t where ID=2001 to ID<=4000″ must need to have a primary key and that can be slow while crea4ng it in database. • This approach required more number of calls, if the IDs are not sequen4al. ……… *Non-­‐sequen4al Id
  • 12. Indexing All tables into a Single Big Index • All tables in same Index, distributed on multiple Solr cores and Solr servers (Java processes) • Commit on every 120million records or in every 15 minutes whichever is earlier • Disabled Soft-commit and updates (overwrite=false), as each call to addDocument calls updateDocument under the hood • But still.. Indexing was slow (due to being sequential for all tables) and we need to stop it after 2 days. • Search was also awfully slow (order of Minutes) From cache, aber warm-­‐up Bunch of shards ~100
  • 13. Creating a Distributed Index for each table How many shards ? • Each table have varying number of records from 50million to 12billion • If we choose 100million per shard (core), it means for 12billion, we need to query 120 shards, awfully slow. • Other side If we choose 500million/shard, a table with 500million records will have only 1 shard, high latency, high memory usage (Heap) and no distributed search*. • Hybrid Approach : Determine number of shards based on max number of records in table. • Did a benchmarking to find the best sweet spot for max documents (records) per shard with most optimal Search latency • Commit at the end for each table. Records/Max Shards Table Max Number of Records in table Max number of Shards (cores) Allowed <100 million 1 100-­‐300million 2 <500 million 4 < 1 billion 6 1-­‐5 billion 8 >5 billion 16 * Distributed Search improves latency but may not be faster always as search latency is limited by 4me taken by last shard in responding.
  • 14. It worked fine but one day suddenly…. java.lang.OutOfMemoryError: Java heap Space • All Solr servers were crashed. • We restarted but they keep crashing randomly after every other day • Took a Heap dump and realized that it is due to Field Cache • Found a Solution : Doc values and never had this issue again till date.
  • 15. Doc Values (Life Saver) • Disk based Field Data a.ka. Doc values • Document to value mapping built at index time • Store Field values on Disk (or Memory) in a column stride fashion • Better compression than Field Cache • Replacement of Field Cache (not completely) • Quite suitable for Custom Scoring, Sorting and Faceting References: • Old article (but Good): http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/ • https://cwiki.apache.org/confluence/display/solr/DocValues • http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/doc-values.html
  • 16. Scaling and Making Search Faster…
  • 17. Partitioning • 3 Level Partitioning, by Month, Country and Table name • Each Month has its own Cluster and a Cluster Manager. • Latency and Throughput are tradeoff, you can’t have both at the same time. Node n Node n Web server Farm Load Balancer App Server Search Head (US) Search Head (UK) Search Head (AU) Master Cluster Manager Internet Cluster 2 for another month e.g Feb Fetch Cluster Mapping and make a request to Search Head with respec4ve Solr cores for that Country, Month and Table ApApp pS eSrevrevre r Cluster 1 for a Month e.g. Jan Solr Solr Solr Cluster 1 Cluster 2 Solr Solr Solr Node 1 Solr Solr Solr Solr Solr Node 1 Solr Cluster 1 Cluster Manager Cluster Manager A P *A : Ac4ve P : Passive Cluster Manager Cluster Manager A P Real 4me Sync 1 user 24 UI widgets, 24 Ajax requests 41 search requests Search Head (US) Search Head (UK) Search Head (AU)
  • 18. Index Optimization Strategy • Running optimization on ~200+ Solr cores is very-very time consuming • Solr Cores with bigger Index size (~70GB) have 2500+ segments due to higher Merge Factor while Indexing. • Can’ t be run in parallel on all Cores in a Single Machine as heavily dependent on Cpu and Disk IO • Optimizing large segments into a very small number is very very time consuming and can take upto 3x Index size on Disk • Other side Small number of segments improve performance drastically, so need to have a balance. Node 1 Solr Solr Staging Cluster Manager *As per our observa4on for our data, Op4miza4on process takes ~42-­‐46 seconds for 1GB We need to do it for 4.6TB (including all boxes), the total Solr Index size for a Single Month Solr Op4mizer Produc4on Cluster Manager Fetches Cluster Mapping (list of all cores) Once optimization and cache warmup is done, pushes the Cluster Mapping to Production Cluster manager, making all Indices live Optimizing a Solr core into a very small number of segments takes a huge time. so we do it iteratively. Algo: Choose Max 3 cores on a Machine to optimize in parallel. Start with least size of Index Index Size Number of docs Determine Max Segments Allowed Reduce Segments to *.90 in each Run Current Segments Aber op4miza4on Node 2 Solr Solr Solr
  • 19. Finally after optimization and cache warm-up… A shard look like this. Max Segments aber op4miza4on
  • 20. External Caching • In Distributed search, for a repeated query request, all Solr severs needs to be hit, even though result is served from Solr’s cache. It increase search latency with lowering throughput. • Solution: cache most frequently accessed query results in app layer (LRU based eviction) • We use Redis for Caching • All complex aggregation queries’ results, once fetched from multiple Solr servers are served from cache on subsequent requests. Why Redis… • Advanced In-Memory key-value store • Insane fast • Response time in order of 5-10ms • Provides Cache behavior (set, get) with advance data structures like hashes, lists, sets, sorted sets, bitmaps etc. • http://redis.io/
  • 21. Hardware • We use Bare Metal, Dedicated servers for Solr due to below reasons 1. Performance gain (with virtual servers, performance dropped by ~18-20%) 2. Better value of computing power/$ spent • 2.6Ghz, 32 core (4x8 core), 384GB RAM, 6TB SAS 15k (RAID10) • 2.6Ghz, 16 core (2x8 core), 192GB RAM, 4TB SAS 15k (RAID10) • Since Index size is 4.6TB/month, we want to cache more data in Disk Cache with bigger RAM. SSD vs SAS 1. SSD : Indexing rate - peek (MySQL to Solr) : 330k docs/sec (each doc: ~100-125 bytes) 2. SAS 15k: 182k docs/sec (dropped by ~45%) 3. SAS 15k is quite cheaper than SSD for bigger hard disks. 4. We are using SAS 15k, as being cost effective but have plans to move to SSD in future.
  • 22. Conclusion : Key takeaways General: • Understand the characteristics of the data and partition it well. Cache: § Spend time in analyzing the Cache usage. Tune them. It is 10x-50x faster. § Always use Filter Query (fq) wherever it is possible as that will improve the performance due to Filter cache. GC : § Keep your JVM heap size to lower value (proportional to machine’s RAM) with leaving enough RAM for kernel as bigger heap will lead to frequent GC. 4GB to 8GB heap allocation is quite good range. but we use 12GB/16GB. § Keep an eye on Garbage collection (GC) logs specially on Full GC. Tuning Params: § Don’t use Soft Commit if you don’t need it. Specially in Batch Loading § Always explore tuning of Solr for High performance, like ramBufferSize, MergeFactor, HttpShardHandler’s various configurations. § Use hash in Redis to minimize the memory usage. Read the whole experience for more detail: http://rahuldausa.wordpress.com/2014/05/16/real-time-search-on-40-billion-records-month-with-solr/
  • 23. Thank you! Twitter: @rahuldausa dynamicrahul2020@gmail.com http://www.linkedin.com/in/rahuldausa