Presented by Ben Brown, Software Architect, Cerner Corporation
Our team made their first foray into Solr building out Chart Search, an offering on top of Cerner's primary EMR to help make search over a patient's chart smarter and easier. After bringing on over 100 client hospitals and indexing many tens of billions of clinical documents and discrete results we've (thankfully) learned a couple of things.
The traditional hashed document ID over many shards and no easily accessible source of truth doesn't make for a flexible index.
Learn the finer points of the strategy where we shifted our source of truth to HBase. How we deploy new indexes with the click of a button, take an existing index and expand the number of shards on the fly, and several other fancy features we enabled.
1. Brahe - Flexible Indexing At Scale
Ben Brown
Software Architect, Cerner Corporation
2. Who I Am
• Ben Brown
Software Architect
• Cerner
Healthcare IT Company
• Semantic Solutions
Team of 10
Search Services
Fun Stuff
NLP, Medical Ontologies, ML
11. Query Touch Points
One User Action ~ 4 Queries
35 Shards - 432 Touch Points
140 Shards - 1692 Touch Points
• Works, but not efficient
• Chance for variance killing performance
• Failure is a massive config headache
12. Growth
• Hashed ID does not play well with resizing
• Deploy Again
• Reindex Everything
Document Hash modulo Shard Count
Doc One:Hash(abc123) = 15
Doc Two: Hash(efg456) = 8
Doc Three: Hash(hij789) = 7
3 Shards
Doc One -> Shard 0
Doc Two -> Shard 2
Doc Three -> Shard 1
4 Shards
Doc One -> Shard 3
Doc Two -> Shard 0
Doc Three -> Shard 3
13. We Have a Problem
Painful Growth
Lots of Deploys
Variance Risk
Image: http://bit.ly/Y7oBD6
14. What Would Be Better?
Load Balance at the Client
Automated Failover
Easy Deployments
Simplified Splitting
Minimized Touch Points
Disconnected Stages
16. Why HBase?
Lexically organized keys
Efficient key range scans
Efficient time based scans
We're pretty good at operating it
17. Coordinate With ZooKeeper
|-- Index name
|-- Version
|-- Solr Schema/Config
|-- Table Name + Connection Info
|-- Shard Number
|-- Shard Boundary Info
|-- Replica Number
|-- Ephemeral Claim
|-- Solr Connection Info
|-- Ephemeral Online
18. Custom Core Admin
Work with ZooKeeper for claim process
Creates solr core after claims
Controls pulling data from HBase
50. Coordinate With ZooKeeper
|-- Index name
|-- Version
|-- Solr Schema/Config
|-- Table Name + Connection Info
|-- Shard Number
|-- Shard Boundary Info
|-- Replica Number
|-- Ephemeral Claim
|-- Solr Connection Info
|-- Ephemeral Online
51. Queries
• Client inspects ZooKeeper
• Finds online nodes
o Only for the keyspace it cares about
o Issues distributed queries if necessary
• Balances in the Client
• Retries if queries fail
52. Ends Thoughts
• Keep things simple
• Disconnect your stages
• Keep your touchpoints at a minimum
• Organize your data around your queries
• Use what you’re good at