2. 2
th
MeetUp May 8 2011 ()
– Velkommen
Bakgrunnen for MeetUp'en
– (Reklamepause)
– Presentasjonsrunde
– Ønsker for MeetUp-gruppen (diskusjon)
– Lyn-taler á 10min (ca kl 18:30-19:00)
• Sture Svensson ""Querying Solr in various ways"
• Jan Høydahl ""What can I do with SolrCloud today"
• NN?
– Formelt slutt (ca 19:15)
– Mingling...
3. 3
Scaling & HA (redundancy) ()
– Index up to 25-100 million documents on a single server*
• Scale linearly by adding servers (shards)
– Query up to 50-1000 QPS on a single server
• Scale linearly by adding servers (replicas)
– Add redundancy or backup through extra replicas
– Built-in software Load Balancer, auto failover
– Indexing redundancy not out of the box
• But possible to have every row do index+search
– High Availability for config/admin using Apache ZooKeeper
(TRUNK)
5. 5
Replication ()
– Goals:
• Increase QPS capacity
• High availability of search
– Replication adds another "search row"
– Done as a PULL from slave
– ReplicationHandler is configured in solrconfig.xml
http://wiki.apache.org/solr/SolrReplication
6. 6
Sharding ()
– Goals:
• Split an index too large for one box into smaller chunks
• Lower HW footprint by smart partitioning of data
– News search: One shard for last month, one shard per year
• Lower latency by having smaller index per node
– A shard is a core which participates in a collection
• Shards A and B may thus be on different or same host
• Shards A and B should but do not need to share schema
– Shard distribution must be done by client application,
adding documents to correct shard based on some policy
• Most common policy is hash-based distribution
• May also be date based or whatever client chooses
– Work under way to add shard distribution natively to Solr,
see SOLR-2358
7. 7
Solr Cloud ()
– Solr Cloud is the popular name for an initiative to make Solr
more easily scalable and managable in a distributed world
– Enables centralized configuration and cluster status
monitoring
– Solr TRUNK contains the first features
• Apache ZooKeeper support, including built-in ZK
• Support for easy distrib=true query (by means of ZK)
• NOTE: Still experimental, work in progress
– Expected features to come
• Auto index shard distribution using ZK
• Tools to manage the config in ZK
• Easy addition of row/shard through API
– NOTE: We do not know when SolrCloud will be included in
a released version of Solr. If you need it, use TRUNK
http://wiki.apache.org/solr/SolrCloud
8. 8
Solr Cloud... ()
– Setting up SolrCloud for our YP example
• We'll setup a 4-node cluster on our laptops using four
instances of Jetty, on different ports
• We'll have 2 shards, each with one replica
• We'll index 5000 listings to each shard
• And finally do distributed queries
• For convenience, we'll use the ZK shipping with Solr
– Bootstrapping ZooKeeper to create a config "yp-conf"
• java -Dbootstrap_confdir=./solr/conf
-Dcollection.configName=yp-conf -DzkRun -jar start.jar
– Starting the other Jetty nodes
• java -Djetty.port=<port> -DhostPort=<port>
-DzkHost=localhost:9983 -jar start.jar
– Zookeeper admin
• http://localhost:8983/solr/yp/admin/zookeeper.jsp
http://wiki.apache.org/solr/SolrCloud
9. 9
Solr Cloud... ()
– Solr Cloud will resolve all shards and replicas in a
collection based on what is configured in solr.xml
– Querying /solr/yp/select?q=foo&distrib=true on this core will
cause SolrCloud to resolve the core name to "yp-cloud"
and then distribute the request to each of the shards which
are members of the same collection
– Often, the core name and collection name will be the same
– SolrCloud will load balance between replicas within the
same shard
http://wiki.apache.org/solr/SolrCloud
10. 10
Solr Cloud, 2x2 setup ()
localhost:8983 localhost:7973
Run ZK: localhost:9983 Run ZK: no
-DzkHost=localhost:9983
Core: yp Core: yp
Shard: A (master) Shard: B (master)
Colleciton: yp-collection Colleciton: yp-collection
localhost:6963 localhost:5953
Run ZK: no Run ZK: N/A
-DzkHost=localhost:9983 -DzkHost=localhost:9983
Core: yp Core: yp
Shard: A (replica) Shard: B (replica)
Colleciton: yp-collection Colleciton: yp-collection