SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Full text search for lazy guys
STARRING APACHE SOLR
Agenda
• Introduction
• FTS solutions
• FTS patterns
• Apache Solr
• Architecture
• Client libraries
• data treatment pipeline
• Index modeling
• ingestion
• Searching
• Demo 1
• Solr in clustered environment
• Architecture
• Idexing
• quering
• Demo 2
• Advanced Solr
• Cool features overview
• Performance tuning
• Q&A sessions
FTS solutions attributes
1. Search by content of documents rather than by attributes
2. Read-oriented
3. Flexible data structure
4. 1 dedicated tailored index used further for search
5. index contains unique terms and their position in all documents
6. Indexer takes into account language-specific nuances like stop words, stemming,
shingling (word-grams, common-grams)
FTS architectures
Id
Price
Weight
Description
RDBMS
FTS
FTS server
Index
FTS usage patterns
1. Spell checking
2. Full text search
3. Highlighting
FTS usage patterns
1. Suggestions
2. Faceted search
3. Paging
Market leaders
FTS scope
Q&A
Solr
• True open source (under Apache) full text search engine
• Built over Lucene
• Multi-language support
• Rich document parsing (rtf, pdf, …)
• Various client APIs
• Versatile query language
• Scalable
• Full of additional features 
Well-known Solr users
and many others in https://wiki.apache.org/solr/PublicServers
Architecture
Client access
1. Main REST API
• Common operations
• Schema API
• Rebalance/collection API
• Search API
• Faceted API
2. Native JAVA client SolrJ
3. Client bindings like Ruby, .Net, Python, PHP, Scala – see
https://wiki.apache.org/solr/IntegratingSolr +
https://wiki.apache.org/solr/SolPython
4. Parallel SQL (via REST and JDBC)
Inverted index
Index modeling
Choose Solr mode:
1. Schema
2. Schema-less
Define field attributes:
1. Indexed (query, sort, facet, group by, provide query suggestions for, execute
function)
2. Stored – all fields which are intended to be shown in a response
3. Mandatory
4. Data type
5. Multivalued
6. Copy field (calculated)
Choose a field for UniqueIdentifier
Field data types
1. Dates
2. Strings
3. Numeric
4. Guid
5. Spatial
6. Boolean
7. Currency and etc
Real life-schema
Text processing
Intended to mitigate differences between terms to provide perfect search
Text processing
Set of filters to get desired results
Text processing
Set of filters to get desired results
Ingestion
Transaction management
1. Solr doesn’t expose immediately new data as well as not remove deleted
2. Commit/rollback should be issued
Commit types:
1. Soft
Data indexed in memory
1. Hard
It moves data to hard-drive
Risks:
1. Commits are slow
2. Many simultaneous commits could lead to Solr exceptions (too many commits)
<h2>HTTP ERROR: 503</h2>
<pre>Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.</pre>
3. Commit command works on instance level – not on user one
Transaction log
Intention:
1. recovery/durability
2. Nearly-Real-Time (NRT) update
3. Replication for Solr cloud
4. Atomic document update, in-place update (syntax is different)
5. Optimistic concurrency
Transaction log could be enabled in solrconfig.xml
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
</updateLog>
Atomic update example:
{"id":"mydoc",
"price":{"set":99},
"popularity":{"inc":20},
"categories":{"add":["toys","games"]},
"promo_ids":{"remove":"a123x"},
"tags":{"remove":["free_to_try"," on_sale"]}
}
Data modification Rest API
Rest API accepts:
1. Json objects
2. Xml-update
3. CSV
Solr UPDATE = UPSERT if schema.xml has <UniqueIdentifier>
Data modification Rest API
curl http://192.168.77.65:8983/solr/single-core/update?commit=true -H 'Content-type:application/json' -d '
[
{"id" : "3",
"internal_name":"post 2",
},
{"id" : “1",
"internal_name":"post 1",
}
]‘
Data.xml
<add>
<doc>
<field name='id'>8</field>
<field name='internal_name'>test1</field>
<doc>
<doc>
<field name='id'>9</field>
<field name='internal_name'>test6</field>
<doc>
</add>
curl -X POST 'http://192.168.77.65:8983/solr/single-core/update?commit=true&wt=json' -H 'Content-Type:text/xml' -d @data.xml
Delete.xml
<delete>
<id>11604</id>
<id>:11603</id>
</delete>
Delete_with_query.xml
<delete>
<query>id:[1 TO 85]</query>
</delete>
Post utility
1. Java-written utility
2. Intended to load files
3. Works extremely fast
4. Loads csv, json
5. Loads files by mask of file-by-file
bin/post -c http://localhost:8983/cloud tags*.json
ISSUE: doesn’t work with Solr Cloud
Data import handler
1. Solr loads data itself
2. DIH could access to JDBC, ATOM/RSS, HTTP, XML, SMTP datasource
3. Delta approach could be implemented (statements for new, updated and deleted
data)
4. Loading progress could be tracked
5. Various transformation could be done inside (regexp, conversion, javascript)
6. Own datasource loaders could be implemented via Java
7. Web console to run/monitor/modify
Data import handler
How to implement:
1. Create data config
<dataConfig>
<dataSource name="jdbc" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost/db"
user="admin" readOnly="true" autoCommit="false" />
<document>
<entity name="artist" dataSource="jdbc" pk="id"
query="select *from artist a"
transformer="DateFormatTransformer"
>
<field column="id" name="id"/>
<field column="department_code" name="department_code"/>
<field column="department_name" name="department_name"/>
<field column = "begin_date" dateTimeFormat="yyyy-MM-dd" />
</entity>
</document>
</dataConfig>
2. Publish in solrconfig.xml
<requestHandler name="/jdbc"
class="org.apache.solr.handler.dataimport.DataImportHandler ">
<lst name=“default">
<str name="jdbc.xml</str>
</lst>
</requestHandler>
DIH could be started via REST call
curl http://localhost:8983/cloud/jdbc -F command=full-import
Data import handler
In process:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="initArgs">
<lst name="defaults">
<str name="config">jdbc.xml</str>
</lst>
</lst>
<str name="status">busy</str>
<str name="importResponse">A command is still running...</str>
<lst name="statusMessages">
<str name="Time Elapsed">0:1:15.460</str>
<str name="Total Requests made to DataSource">39547</str>
<str name="Total Rows Fetched">59319</str>
<str name="Total Documents Processed">19772</ str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2010-10-03 14:28:00</str>
</lst>
<str name="WARNING">This response format is experimental. It is likely to change in the future.</ str>
</response>
Data import handler
After import:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="initArgs">
<lst name="defaults">
<str name="config">jdbc.xml</str>
</lst>
</lst>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages">
<str name="Total Requests made to DataSource">2118645</str>
<str name="Total Rows Fetched">3177966</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2010-10-03 14:28:00</str>
<str name="">Indexing completed. Added/Updated: 1059322 documents. Deleted 0
documents.</str>
<str name="Committed">2010-10-03 14:55:20</str>
<str name="Optimized">2010-10-03 14:55:20</str>
<str name="Total Documents Processed">1059322</str>
<str name="Time taken ">0:27:20.325</str>
</lst>
<str name="WARNING">This response format is experimental. It is likely to change in the
future.</str>
</response>
Search
Search
Search typesFuzzy
Developer~ Developer~1 Developer~4
It matches developer, developers, development and etc.
Proximity
“solr search developer”~ “solr search developer”~1
It matches: solr search developer, solr senior developer
Wildcard
Deal* Com*n C??t
Need *xed? Add ReversedWildcardFilterFactory.
Range
[1 TO25] {23 TO50} {23 TO90]
Search characteristics
1. Similarity
2. Term frequency
Similarity could be changed via boosting:
q=title:(solr for developers)^2.5 AND description:(professional)
q=title:(java)^0.5 AND description:(professional)^3
Search result customization
Field list
/query?=&fl=id, genre /query?=&fl=*,score
Sort
/query?=&fl=id, name&sort=date, score desc
Paging
select?q=*:*&sort=id&fl=id&rows=5&start=5
Transformers
[docid] [shard]
Debuging
/query?=&fl=id&debug=true
Format
/query?=&fl=id&wt=json /query?=&fl=id&wt=xml
Search queries examples
Parameter style
curl "http://localhost:8983/cloud/ query?q=heroy&fq=inStock:true"
JSON API
$ curl http://localhost:8983/cloud/query -d '
{
query:"hero"
"filter" : "inStock:true"
}'
Response
{
"responseHeader":{
"status":0,
"QTime":2,
"params":{
"json":"n{n query:"hero" "filter" : "inStock:true" n}"}},
"response":{"numFound":1,"start":0,"docs":[
{
"id":"book3",
"author":"Brandon Sanderson",
"author_s":"Brandon Sanderson",
"title":["The Hero of Aages"],
"series_s":"Mistborn",
"sequence_i":3,
"genre_s":"fantasy",
"_version_":1486581355536973824
}]
}
}
Q&A
SolrCloud
ALEX 2 turn!!!!
Advanced Solr
1. Streaming language
Special language tailored mostly for Solr Cloud, parallel processing, map-reduce style
approach. The idea is to process and return big datasets.
Commands like: search, jdbc, intersect, parallel, or, and
2. Parallel query
JDBC/REST to process data in SQL style. Works on many Solr nodes in MPP style.
curl --data-urlencode 'stmt=SELECT to, count(*) FROM collection4 GROUP BY to ORDER BY count(*) desc LIMIT 10'
http://localhost:8983/solr/cloud/sql
3. Graph functions
Graph traversal, aggregations, cycle detection, export to GraphML format
4. Spatial queries
There is field datatype Location. It permits to deal with spatial conditions like filtering by distance (circle,
square, sphere)
and etc.
&q=*:*&fq=(state:"FL" AND city:"Jacksonville")&sort=geodist()+asc
5. Spellchecking
It could be based on a current index, another index, file or using word breaks. Many options what to
return: most similar,
more popular etc
http://localhost:8983/solr/cloud/spell?df=text&spellcheck.q=delll+ultra+sharp&spellcheck=true
6. Suggestions
http://localhost:8983/solr/cloud/a_term_suggest?q=sma&wt=json
7. Highlighter
Marks fragments in found document
http://localhost:8983/solr/cloud/select?hl=on&q=apple
8. Facets
Arrangement of search results into categories based on indexed terms with statistics. Could be done by
values, range, dates, interval, heatmap
Performance tuning Cache
Be aware of Solr cache types:
1. Filter cache
Holds unordered document identifiers associated with filter queries that have
been executed (only if fq query parameter is used)
2. Query result cache
Holds ordered document identifiers resulting from queries that have been
executed
3. Document cache
Holds Lucene document instances for access to fields marked as stored
Identify most suitable cache class
1. LRUCache – last recently used are evicted first, track time
2. FastLRUCache – the same but works in separate thread
3. LFUCache – least frequently used are evicted first, track usage count
Play with auto-warm
<filterCache class="solr.FastLRUCache" size="512“ initialSize=“100" autowarmCount=“10"/>
Be aware how auto-warm works internally – doesn’t delete data, repopulated
completely
Performance tuning Memory
Care about OS memory for disk caching
Estimate properly Java heap size for Solr – use
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_0/dev -tools/size-estimator-lucene-solr.xls
Performance tuning Schema design
1. Try to decrease number of stored fields mark as indexed only
2. If fields are used only to be returned in search results – use stored only
Performance tuning Ingestion
1. Use bulk sending data rather than per-document
2. If you use SolrJ use ConcurentUpdateSolrServer class
3. Disable ID uniqueness checking
4. Identify proper mergeFactor + maxSegments for Lucene segment merge
5. Issue OPTIMIZE after huge bulk loadings
6. If you use DIH try to not use transformers – pass them to DB level in SQL
7. Configure AUTOCOMMIT properly
Performance tuning Search
1. Choose appropriate query parser based on use case
2. Use Solr pagination to return data without waiting for a long time
3. If you return huge data set use Solr cursors rather than pagination
4. Use fq clause to speed up queries with one equal condition – time for scoring
isn’t used + results are put in cache
5. If you have a lot of stored fields but queries don’t show all of them use field lazy
loading
<enableLazyFieldLoading>true</enableLazyFieldLoading>
6. Use shingling to make phrasal search faster
<filter class="solr.ShingleFilterFactory“ maxShingleSize="2" outputUnigrams="true"/>
<filter class="solr.CommonGramsQueryFilterFactory“ words="commongrams.txt" ignoreCase="true""/>
Q&A
THANKYOU
AND
WE ARE HIRING!
Alexander Tokarev
Senior Developer, DataArt
atokarev@dataart.com

Weitere ähnliche Inhalte

Was ist angesagt?

Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
Jonathan Levin
 
MongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and MonitoringMongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and Monitoring
MongoDB
 
Базы данных. HBase
Базы данных. HBaseБазы данных. HBase
Базы данных. HBase
Vadim Tsesko
 

Was ist angesagt? (20)

Sql killedserver
Sql killedserverSql killedserver
Sql killedserver
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategies
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
Hibernate performance tuning
Hibernate performance tuningHibernate performance tuning
Hibernate performance tuning
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools short
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Inside sql server in memory oltp sql sat nyc 2017
Inside sql server in memory oltp sql sat nyc 2017Inside sql server in memory oltp sql sat nyc 2017
Inside sql server in memory oltp sql sat nyc 2017
 
MongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and MonitoringMongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and Monitoring
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016
 
Базы данных. HBase
Базы данных. HBaseБазы данных. HBase
Базы данных. HBase
 
Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...
Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...
Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 

Ähnlich wie Apache Solr for begginers

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
JSGB
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 

Ähnlich wie Apache Solr for begginers (20)

IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr5
Solr5Solr5
Solr5
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr 101
Solr 101Solr 101
Solr 101
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentation
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 

Mehr von Alexander Tokarev

Mehr von Alexander Tokarev (16)

Rate limits and all about
Rate limits and all aboutRate limits and all about
Rate limits and all about
 
rnd teams.pptx
rnd teams.pptxrnd teams.pptx
rnd teams.pptx
 
FinOps for private cloud
FinOps for private cloudFinOps for private cloud
FinOps for private cloud
 
Graph ql and enterprise
Graph ql and enterpriseGraph ql and enterprise
Graph ql and enterprise
 
FinOps introduction
FinOps introductionFinOps introduction
FinOps introduction
 
Open Policy Agent for governance as a code
Open Policy Agent for governance as a code Open Policy Agent for governance as a code
Open Policy Agent for governance as a code
 
Relational databases for BigData
Relational databases for BigDataRelational databases for BigData
Relational databases for BigData
 
Cloud DWH deep dive
Cloud DWH deep diveCloud DWH deep dive
Cloud DWH deep dive
 
Cloud dwh
Cloud dwhCloud dwh
Cloud dwh
 
Inmemory BI based on opensource stack
Inmemory BI based on opensource stackInmemory BI based on opensource stack
Inmemory BI based on opensource stack
 
Oracle InMemory hardcore edition
Oracle InMemory hardcore editionOracle InMemory hardcore edition
Oracle InMemory hardcore edition
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
 
Faceted search with Oracle InMemory option
Faceted search with Oracle InMemory optionFaceted search with Oracle InMemory option
Faceted search with Oracle InMemory option
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
 
Data structures for cloud tag storage
Data structures for cloud tag storageData structures for cloud tag storage
Data structures for cloud tag storage
 
Oracle High Availabiltity for application developers
Oracle High Availabiltity for application developersOracle High Availabiltity for application developers
Oracle High Availabiltity for application developers
 

Kürzlich hochgeladen

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Apache Solr for begginers

  • 1. Full text search for lazy guys STARRING APACHE SOLR
  • 2. Agenda • Introduction • FTS solutions • FTS patterns • Apache Solr • Architecture • Client libraries • data treatment pipeline • Index modeling • ingestion • Searching • Demo 1 • Solr in clustered environment • Architecture • Idexing • quering • Demo 2 • Advanced Solr • Cool features overview • Performance tuning • Q&A sessions
  • 3. FTS solutions attributes 1. Search by content of documents rather than by attributes 2. Read-oriented 3. Flexible data structure 4. 1 dedicated tailored index used further for search 5. index contains unique terms and their position in all documents 6. Indexer takes into account language-specific nuances like stop words, stemming, shingling (word-grams, common-grams)
  • 5. FTS usage patterns 1. Spell checking 2. Full text search 3. Highlighting
  • 6. FTS usage patterns 1. Suggestions 2. Faceted search 3. Paging
  • 9. Q&A
  • 10. Solr • True open source (under Apache) full text search engine • Built over Lucene • Multi-language support • Rich document parsing (rtf, pdf, …) • Various client APIs • Versatile query language • Scalable • Full of additional features 
  • 11. Well-known Solr users and many others in https://wiki.apache.org/solr/PublicServers
  • 13. Client access 1. Main REST API • Common operations • Schema API • Rebalance/collection API • Search API • Faceted API 2. Native JAVA client SolrJ 3. Client bindings like Ruby, .Net, Python, PHP, Scala – see https://wiki.apache.org/solr/IntegratingSolr + https://wiki.apache.org/solr/SolPython 4. Parallel SQL (via REST and JDBC)
  • 15. Index modeling Choose Solr mode: 1. Schema 2. Schema-less Define field attributes: 1. Indexed (query, sort, facet, group by, provide query suggestions for, execute function) 2. Stored – all fields which are intended to be shown in a response 3. Mandatory 4. Data type 5. Multivalued 6. Copy field (calculated) Choose a field for UniqueIdentifier
  • 16. Field data types 1. Dates 2. Strings 3. Numeric 4. Guid 5. Spatial 6. Boolean 7. Currency and etc
  • 18. Text processing Intended to mitigate differences between terms to provide perfect search
  • 19. Text processing Set of filters to get desired results
  • 20. Text processing Set of filters to get desired results
  • 22. Transaction management 1. Solr doesn’t expose immediately new data as well as not remove deleted 2. Commit/rollback should be issued Commit types: 1. Soft Data indexed in memory 1. Hard It moves data to hard-drive Risks: 1. Commits are slow 2. Many simultaneous commits could lead to Solr exceptions (too many commits) <h2>HTTP ERROR: 503</h2> <pre>Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.</pre> 3. Commit command works on instance level – not on user one
  • 23. Transaction log Intention: 1. recovery/durability 2. Nearly-Real-Time (NRT) update 3. Replication for Solr cloud 4. Atomic document update, in-place update (syntax is different) 5. Optimistic concurrency Transaction log could be enabled in solrconfig.xml <updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog> Atomic update example: {"id":"mydoc", "price":{"set":99}, "popularity":{"inc":20}, "categories":{"add":["toys","games"]}, "promo_ids":{"remove":"a123x"}, "tags":{"remove":["free_to_try"," on_sale"]} }
  • 24. Data modification Rest API Rest API accepts: 1. Json objects 2. Xml-update 3. CSV Solr UPDATE = UPSERT if schema.xml has <UniqueIdentifier>
  • 25. Data modification Rest API curl http://192.168.77.65:8983/solr/single-core/update?commit=true -H 'Content-type:application/json' -d ' [ {"id" : "3", "internal_name":"post 2", }, {"id" : “1", "internal_name":"post 1", } ]‘ Data.xml <add> <doc> <field name='id'>8</field> <field name='internal_name'>test1</field> <doc> <doc> <field name='id'>9</field> <field name='internal_name'>test6</field> <doc> </add> curl -X POST 'http://192.168.77.65:8983/solr/single-core/update?commit=true&wt=json' -H 'Content-Type:text/xml' -d @data.xml Delete.xml <delete> <id>11604</id> <id>:11603</id> </delete> Delete_with_query.xml <delete> <query>id:[1 TO 85]</query> </delete>
  • 26. Post utility 1. Java-written utility 2. Intended to load files 3. Works extremely fast 4. Loads csv, json 5. Loads files by mask of file-by-file bin/post -c http://localhost:8983/cloud tags*.json ISSUE: doesn’t work with Solr Cloud
  • 27. Data import handler 1. Solr loads data itself 2. DIH could access to JDBC, ATOM/RSS, HTTP, XML, SMTP datasource 3. Delta approach could be implemented (statements for new, updated and deleted data) 4. Loading progress could be tracked 5. Various transformation could be done inside (regexp, conversion, javascript) 6. Own datasource loaders could be implemented via Java 7. Web console to run/monitor/modify
  • 28. Data import handler How to implement: 1. Create data config <dataConfig> <dataSource name="jdbc" driver="org.postgresql.Driver" url="jdbc:postgresql://localhost/db" user="admin" readOnly="true" autoCommit="false" /> <document> <entity name="artist" dataSource="jdbc" pk="id" query="select *from artist a" transformer="DateFormatTransformer" > <field column="id" name="id"/> <field column="department_code" name="department_code"/> <field column="department_name" name="department_name"/> <field column = "begin_date" dateTimeFormat="yyyy-MM-dd" /> </entity> </document> </dataConfig> 2. Publish in solrconfig.xml <requestHandler name="/jdbc" class="org.apache.solr.handler.dataimport.DataImportHandler "> <lst name=“default"> <str name="jdbc.xml</str> </lst> </requestHandler> DIH could be started via REST call curl http://localhost:8983/cloud/jdbc -F command=full-import
  • 29. Data import handler In process: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> <lst name="initArgs"> <lst name="defaults"> <str name="config">jdbc.xml</str> </lst> </lst> <str name="status">busy</str> <str name="importResponse">A command is still running...</str> <lst name="statusMessages"> <str name="Time Elapsed">0:1:15.460</str> <str name="Total Requests made to DataSource">39547</str> <str name="Total Rows Fetched">59319</str> <str name="Total Documents Processed">19772</ str> <str name="Total Documents Skipped">0</str> <str name="Full Dump Started">2010-10-03 14:28:00</str> </lst> <str name="WARNING">This response format is experimental. It is likely to change in the future.</ str> </response>
  • 30. Data import handler After import: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> <lst name="initArgs"> <lst name="defaults"> <str name="config">jdbc.xml</str> </lst> </lst> <str name="status">idle</str> <str name="importResponse"/> <lst name="statusMessages"> <str name="Total Requests made to DataSource">2118645</str> <str name="Total Rows Fetched">3177966</str> <str name="Total Documents Skipped">0</str> <str name="Full Dump Started">2010-10-03 14:28:00</str> <str name="">Indexing completed. Added/Updated: 1059322 documents. Deleted 0 documents.</str> <str name="Committed">2010-10-03 14:55:20</str> <str name="Optimized">2010-10-03 14:55:20</str> <str name="Total Documents Processed">1059322</str> <str name="Time taken ">0:27:20.325</str> </lst> <str name="WARNING">This response format is experimental. It is likely to change in the future.</str> </response>
  • 33. Search typesFuzzy Developer~ Developer~1 Developer~4 It matches developer, developers, development and etc. Proximity “solr search developer”~ “solr search developer”~1 It matches: solr search developer, solr senior developer Wildcard Deal* Com*n C??t Need *xed? Add ReversedWildcardFilterFactory. Range [1 TO25] {23 TO50} {23 TO90]
  • 34. Search characteristics 1. Similarity 2. Term frequency Similarity could be changed via boosting: q=title:(solr for developers)^2.5 AND description:(professional) q=title:(java)^0.5 AND description:(professional)^3
  • 35. Search result customization Field list /query?=&fl=id, genre /query?=&fl=*,score Sort /query?=&fl=id, name&sort=date, score desc Paging select?q=*:*&sort=id&fl=id&rows=5&start=5 Transformers [docid] [shard] Debuging /query?=&fl=id&debug=true Format /query?=&fl=id&wt=json /query?=&fl=id&wt=xml
  • 36. Search queries examples Parameter style curl "http://localhost:8983/cloud/ query?q=heroy&fq=inStock:true" JSON API $ curl http://localhost:8983/cloud/query -d ' { query:"hero" "filter" : "inStock:true" }' Response { "responseHeader":{ "status":0, "QTime":2, "params":{ "json":"n{n query:"hero" "filter" : "inStock:true" n}"}}, "response":{"numFound":1,"start":0,"docs":[ { "id":"book3", "author":"Brandon Sanderson", "author_s":"Brandon Sanderson", "title":["The Hero of Aages"], "series_s":"Mistborn", "sequence_i":3, "genre_s":"fantasy", "_version_":1486581355536973824 }] } }
  • 37. Q&A
  • 39. Advanced Solr 1. Streaming language Special language tailored mostly for Solr Cloud, parallel processing, map-reduce style approach. The idea is to process and return big datasets. Commands like: search, jdbc, intersect, parallel, or, and 2. Parallel query JDBC/REST to process data in SQL style. Works on many Solr nodes in MPP style. curl --data-urlencode 'stmt=SELECT to, count(*) FROM collection4 GROUP BY to ORDER BY count(*) desc LIMIT 10' http://localhost:8983/solr/cloud/sql 3. Graph functions Graph traversal, aggregations, cycle detection, export to GraphML format 4. Spatial queries There is field datatype Location. It permits to deal with spatial conditions like filtering by distance (circle, square, sphere) and etc. &q=*:*&fq=(state:"FL" AND city:"Jacksonville")&sort=geodist()+asc 5. Spellchecking It could be based on a current index, another index, file or using word breaks. Many options what to return: most similar, more popular etc http://localhost:8983/solr/cloud/spell?df=text&spellcheck.q=delll+ultra+sharp&spellcheck=true 6. Suggestions http://localhost:8983/solr/cloud/a_term_suggest?q=sma&wt=json 7. Highlighter Marks fragments in found document http://localhost:8983/solr/cloud/select?hl=on&q=apple 8. Facets Arrangement of search results into categories based on indexed terms with statistics. Could be done by values, range, dates, interval, heatmap
  • 40. Performance tuning Cache Be aware of Solr cache types: 1. Filter cache Holds unordered document identifiers associated with filter queries that have been executed (only if fq query parameter is used) 2. Query result cache Holds ordered document identifiers resulting from queries that have been executed 3. Document cache Holds Lucene document instances for access to fields marked as stored Identify most suitable cache class 1. LRUCache – last recently used are evicted first, track time 2. FastLRUCache – the same but works in separate thread 3. LFUCache – least frequently used are evicted first, track usage count Play with auto-warm <filterCache class="solr.FastLRUCache" size="512“ initialSize=“100" autowarmCount=“10"/> Be aware how auto-warm works internally – doesn’t delete data, repopulated completely
  • 41. Performance tuning Memory Care about OS memory for disk caching Estimate properly Java heap size for Solr – use https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_0/dev -tools/size-estimator-lucene-solr.xls
  • 42. Performance tuning Schema design 1. Try to decrease number of stored fields mark as indexed only 2. If fields are used only to be returned in search results – use stored only
  • 43. Performance tuning Ingestion 1. Use bulk sending data rather than per-document 2. If you use SolrJ use ConcurentUpdateSolrServer class 3. Disable ID uniqueness checking 4. Identify proper mergeFactor + maxSegments for Lucene segment merge 5. Issue OPTIMIZE after huge bulk loadings 6. If you use DIH try to not use transformers – pass them to DB level in SQL 7. Configure AUTOCOMMIT properly
  • 44. Performance tuning Search 1. Choose appropriate query parser based on use case 2. Use Solr pagination to return data without waiting for a long time 3. If you return huge data set use Solr cursors rather than pagination 4. Use fq clause to speed up queries with one equal condition – time for scoring isn’t used + results are put in cache 5. If you have a lot of stored fields but queries don’t show all of them use field lazy loading <enableLazyFieldLoading>true</enableLazyFieldLoading> 6. Use shingling to make phrasal search faster <filter class="solr.ShingleFilterFactory“ maxShingleSize="2" outputUnigrams="true"/> <filter class="solr.CommonGramsQueryFilterFactory“ words="commongrams.txt" ignoreCase="true""/>
  • 45. Q&A
  • 46. THANKYOU AND WE ARE HIRING! Alexander Tokarev Senior Developer, DataArt atokarev@dataart.com

Hinweis der Redaktion

  1. Hello. My name is Alex and today we are going to tell you about full text search solutions. The project I’m currently working doesn’t use dedicated search server and it leads to some issues. We decided to check how we could address it using tailored software. In order to create POC we chosen Apache Solr and would like to share our experience with you. Alex on behalf of devops team will show us how to achieve fault tolerance and scalability.
  2. I plan to have intermediary breaks for small q&a sessions
  3. What distinguishes fts solutions from others databases. Do you know what stemming is? It is word normalization i.e. drive, drove and driven will be written as drive Consider the text "The quick brown fox jumped over the lazy dog". The use of shingling in a typical configuration would yield the indexed terms (shingles) "the quick", "quick brown", "brown fox", "fox jumped", "jumped over", "over the", "the lazy", and "lazy dog" in addition to all of the original nine terms. Common-grams is a more selective variation of shingling that only shingles when one of the consecutive words is in a configured list. Given the preceding sentence using an English stop word list, the indexed terms would be "the quick", "over the", "the lazy", and the original nine terms.
  4. There are 2 common approaches: fts index is created inside main database and dedicated FTS server. Which solution is better? It depends from your tasks, performance and scalability requirements. What is obvious FTS servers suggest reach function set but requires hardware, administration and development overhead. We will concentrate on dedicated FTS server.
  5. In spite of FTS solutions looks like intended for content search only the spectrum of their usage patterns is rather big.
  6. Pay attention that figures are calculated by faceted search engine Suggestions could be made tailored for a particular user All these patterns are done via FTS API which permits to reuse them without wasting time
  7. Please pay attention that Lucene and Xapian are set of libraries. For instance Elasticsearch and Solr are based on Lucene
  8. Full text search is rather sophisticated stuff throughout enterprise due it affects all aspects. We will have a look into some of these aspects during last part of our presentation. Any questions before we move to Apache Solr world?
  9. It is worth mentioning that initially it was full text search engine – now I would rather name it Search engine
  10. SOLR is j2ee application which as I mentioned uses Lucene library. Storage stores metadata and inverted index in a file store. Solr could be configured to be stored for hdfs storage Container Lucene DIH – export data from external sources Velocity template – UI of Solr admin tool RH – what’s process user requests: search, schema management, et.c
  11. SOLR has REST API for main operations like search, indexing. Solr developers stated there are some groups of API. Main idea was Solr api should be transparent enough to work without any additional payload – only by URI (in opposite of Elastic) but queries become more complicated and URI looks unreadable SolrJ is included in Solr distributive
  12. It is main structure. Please pay attention that stemming and stopwords aren’t used. As you could see it stores the position as well. It is done for phrase queries like “New Car”
  13. Data types
  14. show real schema
  15. Let’s have a look into ideal reverse index content
  16. Ascif remove e akstegu The first one removes continuous letters like cofeeeeee Why synonym isn’t linked – it actually done on query time rather than on indexing
  17. Let’s have a look into ideal reverse index content
  18. Rollback + nrt + soft/hard commit + indexes – what is new index handler p. 3 – it means if an user issue commit changes of others users will be committed as well There is Autocommit and CommitWIthin – it mention dataframe
  19. p. 4 – update only small part of the document rather than reindex it at all. Without it all document should be loaded for update. In-place – only for dovValues p. 5 is based in mandatory _Version field.
  20. Ordinal, json, xml, csv, rtf, csv
  21. My lovely feature
  22. Data import handler
  23. Data import handler
  24. Data import handler
  25. Query parsers
  26. Pay attention to searcher – it reads read-only snapshot of Lucen index. once we commit the search is reopening which leads to cache invalidation. Searcher uses query parser. There are 3 of them but we will concentrate on mostly used Lucene query parser.
  27. ~ - number of replacements. So named edit distance Proximity the same as Fuzzu but edit distance in terms of words Please pay attention that we don’t consider function usage, cross index and cross document joins, faceting
  28. About boosting, relevancy, similarity
  29. Fields are returned only for stored fields To load huge datasets so named cursors are used – out of the scope Pay attention to score – it is search relevancy measure. You could manage it via boosting
  30. We will have a look more examples in demo + with debug
  31. These features are shown in my own interest range Solr has some advanced features which are out of the presentation but should be mentioned Streams is tailored lightweight json format for decent volumes of data (source, decorator, evaluator) p. 2 and 3 are based on p 1 p. 3 is used for recommendation engines p. 8 is the most complicated stuff, 2 api, a lot of performance tricks
  32. the Administration Console reports (Plugin/Stats | Cache) There are additional caches which are out of control – field cache and field value cache. There is also an interface to implement own caching strategy as well as warming up. sizing of document cache is to be larger than the max results * max concurrent queries being executed by Solr to prevent documents from being re-fetched during a query.
  33. ConcurentUpdate uses many threads to connect to Solr as well as a compression to deliver documents faster remove the QueryElevationComponent from solrconfig.xml  the more static your content is (that is, the less frequent you need to commit data), the lower the merge factor you want. number of segments on the Overview screen's Statistics section. No term vector, docvalues and e.t.c
  34. ConcurentUpdate uses many threads to connect to Solr as well as a compression to deliver documents faster remove the QueryElevationComponent from solrconfig.xml  the more static your content is (that is, the less frequent you need to commit data), the lower the merge factor you want. number of segments on the Overview screen's Statistics section. No term vector, docvalues and e.t.c