Brahe mass scale flexible indexing

•

0 gefällt mir•1,174 views

Presented by Ben Brown, Software Architect, Cerner Corporation Our team made their first foray into Solr building out Chart Search, an offering on top of Cerner's primary EMR to help make search over a patient's chart smarter and easier. After bringing on over 100 client hospitals and indexing many tens of billions of clinical documents and discrete results we've (thankfully) learned a couple of things. The traditional hashed document ID over many shards and no easily accessible source of truth doesn't make for a flexible index. Learn the finer points of the strategy where we shifted our source of truth to HBase. How we deploy new indexes with the click of a button, take an existing index and expand the number of shards on the fly, and several other fancy features we enabled.

Bildung Business Technologie

Brahe - Flexible Indexing At Scale
Ben Brown
Software Architect, Cerner Corporation

Who I Am
• Ben Brown
Software Architect
• Cerner
Healthcare IT Company
• Semantic Solutions
Team of 10
Search Services
Fun Stuff
NLP, Medical Ontologies, ML

Chart Search
Taking This
Photo: http://bit.ly/Y7kTJt

Chart Search Does
• Faceting
• NLP
• Semantic Concept Markup
Makes for a heavy record
(Especially on Solr 1.4)

Where We Started
Started Major Engineering in 2009
IBM Dev Works: http://ibm.co/14ZrtqX

Scale
• Clusters partitioned by client
• Raw and processed data in HDFS
• All processing & indexing done through map
reduce

Shard Size
Limiting Factor ~26 Million Discrete Results Per
Shard
Average of 35 Shards Per Client
Range 5 to 140

Query Touch Points
One User Action ~ 4 Queries
35 Shards - 432 Touch Points
140 Shards - 1692 Touch Points
• Works, but not efficient
• Chance for variance killing performance
• Failure is a massive config headache

Growth
• Hashed ID does not play well with resizing
• Deploy Again
• Reindex Everything
Document Hash modulo Shard Count
Doc One:Hash(abc123) = 15
Doc Two: Hash(efg456) = 8
Doc Three: Hash(hij789) = 7
3 Shards
Doc One -> Shard 0
Doc Two -> Shard 2
Doc Three -> Shard 1
4 Shards
Doc One -> Shard 3
Doc Two -> Shard 0
Doc Three -> Shard 3

We Have a Problem
Painful Growth
Lots of Deploys
Variance Risk
Image: http://bit.ly/Y7oBD6

What Would Be Better?
Load Balance at the Client
Automated Failover
Easy Deployments
Simplified Splitting
Minimized Touch Points
Disconnected Stages

Solution
Shift Master to HBase
Image: http://bit.ly/ZXO2na

Why HBase?
Lexically organized keys
Efficient key range scans
Efficient time based scans
We're pretty good at operating it

Coordinate With ZooKeeper
|-- Index name
|-- Version
|-- Solr Schema/Config
|-- Table Name + Connection Info
|-- Shard Number
|-- Shard Boundary Info
|-- Replica Number
|-- Ephemeral Claim
|-- Solr Connection Info
|-- Ephemeral Online

Custom Core Admin
Work with ZooKeeper for claim process
Creates solr core after claims
Controls pulling data from HBase

Claim Process
Image: http://bit.ly/Or317R

Queries
• Client inspects ZooKeeper
• Finds online nodes
o Only for the keyspace it cares about
o Issues distributed queries if necessary
• Balances in the Client
• Retries if queries fail

Ends Thoughts
• Keep things simple
• Disconnect your stages
• Keep your touchpoints at a minimum
• Organize your data around your queries
• Use what you’re good at

CONTACT
Ben Brown
http://linkd.in/ZZIBK4
@b_brown
ENGINEERING BLOG
https://engineering.cerner.com/
WE’RE HIRING!
http://www.cerner.com/About_Cerner/Careers/

Weitere ähnliche Inhalte

Ähnlich wie Brahe mass scale flexible indexing

Was l iberty for java batch and jsr352

sflynn073

Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows

Yong Feng

Insights on Knative and how it changes the serverless landscape

Jeremias Werner

IT organizations using ServiceNow for their IT Operation Management (ITOM) rely on the Configuration Management Database (CMDB) to manage infrastructure changes and diagnose problems. But, without an automated Discovery process, companies struggle to keep their CMDBs current, and they can lack information necessary to drive IT processes effectively. As a result, IT staff can waste time and energy trying to determine which business services are impacted by changes, failures, or performance issues as they try to determine root causes for business service problems. ServiceNow® Discovery provides IT with visibility into IT infrastructure and its changes by automatically discovering physical and virtual devices such as laptops, desktops, servers (physical and virtual), switches, routers, storage, and applications, as well as the dependent relationships between them. However, if you are a company that relies on traditional IBM systems like mainframes or Power Systems running IBM i to support your critical applications for your company, ServiceNow Discovery lacks the ability to integrate CI’s from these systems. Ironstream for ServiceNow fills this gap by seamlessly integrating with ServiceNow to include these critical systems in the regular, automated ServiceNow Discovery process. Watch this on-demand webinar and learn: • How critical the Discovery process is to build an effective IT Operations strategy • How critical Ironstream for ServiceNow is for traditional IBM systems like mainframes and IBM i servers • How customers have realized the benefits of a successful Discovery process for their IT Operations efforts

Enhance ServiceNow with Automated Discovery for Mainframe and IBM i

Precisely

Gemfire Introduction

VMware Tanzu Korea

EEDC 2010. Scaling Web Applications

Expertos en TI

Benchmarking Hadoop and Big Data

Nicolas Poggi

General 05 integration design vs migration design

Scribe Software Corp.

Node.js is a very popular framework for developing asynchronous, event-driven, reactive applications. Red Hat JBoss Data Grid, an in-memory distributed database designed for fast access to large volumes of data and scalability, has recently gained compatibility with Node.js letting reactive applications use it as a persistence layer. Thanks to near caching, JBoss Data Grid offers excellent response times for data queried regularly, and its continuous remote event support means data can get pushed from the data grid to the Node.js application instead of having to wait for the data grid to serve it. In this session, we'll show how to build Node.js applications that use JBoss Data Grid as a persistence layer.

Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...

Red Hat Developers

Bitfusion Nimbix Dev Summit Heterogeneous Architectures

Subbu Rama

Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse

Rizaldy Ignacio

Informix 14.1 launch Webinar

ModusOptimum

J1 - Keynote Data Platform - Rohan Kumar

MS Cloud Summit

If you haven’t evaluated the benefits of utilizing a cloud-based HA/DR solution for your Power System, you may be missing out on a tremendous opportunity. Choosing to utilize a cloud-based approach to your data protection can yield benefits in scalability, reliability, security and ease of use. However, before enjoying these benefits, you need to be aware of how to address the challenges. These challenges range from data synchronization to testing to planning for fallback in the event of problems. Join us for this webinar to hear about: • Benefits of cloud-based HA/DR • Important capabilities to consider when choosing a provider • How cloud-based HA/DR can be easier than an on-premises approach

Protecting Your Power Systems with Cloud-based HA/DR

Precisely

Originally Published on Sep 25, 2014 Do you experience the snowball effect where you deliver one analytics report and your organization thinks of another and another they need? BLU Acceleration in-memory computing can help. It processes analytics queries at lightning fast speeds, and is a simple to use "load and go" solution. Learn more in this presentation delivered at TDWI San Diego on September 24, 2014.

TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...

IBM Analytics

File Manager for z/OS - Overview

DevOps for Enterprise Systems

IBM Informix 14.10 has doubled down on addressing these needs with several improvements to speed, simplicity, security, and analytics. During this webinar our Informix experts will delve into each of these enhancements. View the full webcast: https://event.on24.com/eventRegistration/EventLobbyServlet?target=reg20.jsp&referrer=&eventid=1957150&sessionid=1&key=A209D77D1A1D39ED93D9D95CDFF64993&regTag=&sourcepage=register

Informix 14.1 launch webinar

ModusOptimum

Presentation design - key concepts and approaches for designing your deskto...

xKinAnx

presentation slides

webhostingguy

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...

Cloudera, Inc.

Ähnlich wie Brahe mass scale flexible indexing (20)

Was l iberty for java batch and jsr352

Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows

Insights on Knative and how it changes the serverless landscape

Enhance ServiceNow with Automated Discovery for Mainframe and IBM i

Gemfire Introduction

EEDC 2010. Scaling Web Applications

Benchmarking Hadoop and Big Data

General 05 integration design vs migration design

Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...

Bitfusion Nimbix Dev Summit Heterogeneous Architectures

Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse

Informix 14.1 launch Webinar

J1 - Keynote Data Platform - Rohan Kumar

Protecting Your Power Systems with Cloud-based HA/DR

TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...

File Manager for z/OS - Overview

Informix 14.1 launch webinar

Presentation design - key concepts and approaches for designing your deskto...

presentation slides

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...

Mehr von lucenerevolution

Presented by Isabel Drost-Fromm, Software Developer, Apache Software Foundation/Nokia Gate 5 GmbH at Lucene/Solr Revolution 2013 Dublin Text classification automates the task of filing documents into pre-defined categories based on a set of example documents. The first step in automating classification is to transform the documents to feature vectors. Though this step is highly domain specific Apache Mahout provides you with a lot of easy to use tooling to help you get started, most of which relies heavily on Apache Lucene for analysis, tokenisation and filtering. This session shows how to use facetting to quickly get an understanding of the fields in your document. It will walk you through the steps necessary to convert your text documents into feature vectors that Mahout classifiers can use including a few anecdotes on drafting domain specific features. Configure

Text Classification Powered by Apache Mahout and Lucene

lucenerevolution

Presented by Markus Klose, Search + Big Data Consultant SHI Elektronische Medien GmbH at Lucene/Solr Revolution 2013 Dublin Kibana4Solr is search-driven, scalable, browser based and extremely user friendly (also for non-technical users). Logs are everywhere. Any device, system or human can potentially produce a huge amount of information saved in logs. The amount of available logs and their semi-structured nature make a meaningful processing in real-time quite a difficult task. Thus, valuable business insights stored in logs might be not found. Kibana4Solr is a search-driven approach to handle that challenge. It offers user-friendly and browser-based dashboard which can be easily customized to particular needs. In the session the Kibana4Solr will be introduced. Some light will be shed on the architectural features of Kibana4Solr. Some ideas will be given in terms of possible business uses cases. And finally a live demo of Kibana4Solr will be shown. Configure

State of the Art Logging. Kibana4Solr is Here!

lucenerevolution

Search at Twitter

lucenerevolution

Presented by Daniel Beach, Search Application Developer, OpenSource Connections Solr is a powerful search engine, but creating a custom user interface can be daunting. In this fast paced session I will present an overview of how to implement a client-side search application using Solr. Using open-source frameworks like SpyGlass (to be released in September) can be a powerful way to jumpstart your development by giving you out-of-the box results views with support for faceting, autocomplete, and detail views. During this talk I will also demonstrate how we have built and deployed lightweight applications that are able to be performant under large user loads, with minimal server resources.

Building Client-side Search Applications with Solr

lucenerevolution

Presented by Timothy Potter, Founder, Text Centrix Storm is a real-time distributed computation system used to process massive streams of data. Many organizations are turning to technologies like Storm to complement batch-oriented big data technologies, such as Hadoop, to deliver time-sensitive analytics at scale. This talk introduces on an emerging architectural pattern of integrating Solr and Storm to process big data in real time. There are a number of natural integration points between Solr and Storm, such as populating a Solr index or supplying data to Storm using Solr’s real-time get support. In this session, Timothy will cover the basic concepts of Storm, such as spouts and bolts. He’ll then provide examples of how to integrate Solr into Storm to perform large-scale indexing in near real-time. In addition, we'll see how to embed Solr in a Storm bolt to match incoming tuples against pre-configured queries, commonly known as percolator. Attendees will come away from this presentation with a good introduction to stream processing technologies and several real-world use cases of how to integrate Solr with Storm.

Integrate Solr with real-time stream processing applications

lucenerevolution

Configure your Solr cluster to handle hundreds of millions of documents without even noticing, handle queries in milliseconds, use Near Real Time indexing and searching with document versioning. Scale your cluster both horizontally and vertically by using shards and replicas. In this session you'll learn how to make your indexing process blazing fast and make your queries efficient even with large amounts of data in your collections. You'll also see how to optimize your queries to leverage caches as much as your deployment allows and how to observe your cluster with Solr administration panel, JMX, and third party tools. Finally, learn how to make changes to already deployed collections —split their shards and alter their schema by using Solr API.

Scaling Solr with SolrCloud

lucenerevolution

Presented by Rafal Kuć, Consultant and Software engineer, , Sematext Group, Inc. Even though Solr can run without causing any troubles for long periods of time it is very important to monitor and understand what is happening in your cluster. In this session you will learn how to use various tools to monitor how Solr is behaving at a high level, but also on Lucene, JVM, and operating system level. You'll see how to react to what you see and how to make changes to configuration, index structure and shards layout using Solr API. We will also discuss different performance metrics to which you ought to pay extra attention. Finally, you'll learn what to do when things go awry - we will share a few examples of troubleshooting and then dissect what was wrong and what had to be done to make things work again.

Administering and Monitoring SolrCloud Clusters

lucenerevolution

In a recent project with the United States Patent and Trademark Office, Opensource Connections was asked to prototype the next generation of patent search - using Solr and Lucene. An important aspect of this project was the implementation of BRS, a specialized search syntax used by patent examiners during the examination process. In this fast paced session we will relate our experiences and describe how we used a combination of Parboiled (a Parser Expression Grammar [PEG] parser), Lucene Queries and SpanQueries, and an extension of Solr's QParserPlugin to build BRS search functionality in Solr. First we will characterize the patent search problem and then define the BRS syntax itself. We will then introduce the Parboiled parser and discuss various considerations that one must make when designing a syntax parser. Following this we will describe the methodology used to implement the search functionality in Lucene/Solr. Finally, we will include an overview our syntactic and semantic testing strategies. The audience will leave this session with an understanding of how Solr, Lucene, and Parboiled may be used to implement their own custom search parser.

Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled

lucenerevolution

Many of us tend to hate or simply ignore logs, and rightfully so: they’re typically hard to find, difficult to handle, and are cryptic to the human eye. But can we make logs more valuable and more usable if we index them in Solr, so we can search and run real-time statistics on them? Indeed we can, and in this session you’ll learn how to make that happen. In the first part of the session we’ll explain why centralized logging is important, what valuable information one can extract from logs, and we’ll introduce the leading tools from the logging ecosystems everyone should be aware of - from syslog and log4j to LogStash and Flume. In the second part we’ll teach you how to use these tools in tandem with Solr. We’ll show how to use Solr in a SolrCloud setup to index large volumes of logs continuously and efficiently. Then, we'll look at how to scale the Solr cluster as your data volume grows. Finally, we'll see how you can parse your unstructured logs and convert them to nicely structured Solr documents suitable for analytical queries.

Using Solr to Search and Analyze Logs

lucenerevolution

Enhancing relevancy through personalization & semantic search

lucenerevolution

Building real-time notification systems is often limited to basic filtering and pattern matching against incoming records. Allowing users to query incoming documents using Solr's full range of capabilities is much more powerful. In our environment we needed a way to allow for tens of thousands of such query subscriptions, meaning we needed to find a way to distribute the query processing in the cloud. By creating in-memory Lucene indices from our Solr configuration, we were able to parallelize our queries across our cluster. To achieve this distribution, we wrapped the processing in a Storm topology to provide a flexible way to scale and manage our infrastructure. This presentation will describe our experiences creating this distributed, real-time inverted search notification framework.

Real-time Inverted Search in the Cloud Using Lucene and Storm

lucenerevolution

Like many Web-Applications in the past, the Solr Admin UI up until 4.0 was entirely server based. It used separate code on the server to generate their Dashboards, Overviews and Statistics. All that code had to be maintained and still ... you weren't really able to use that kind of data for the things you needed it for. It was wrapped into HTML, most of the time difficult to extract and changed the structure from time to time w/o announcement. After a short look back, we're going to look into the current state of the Solr Admin UI - a client-side application, running completely in your browser. We'll see how it works, where it gets its data from and how you can get the very same data and wire that into your own custom applications, dashboards and/oder monitoring systems.

Solr's Admin UI - Where does the data come from?

lucenerevolution

Schemaless Solr and the Solr Schema REST API

lucenerevolution

Presented by Renaud Delbru, Co-Founder, SindiceTech In this presentation, we will discuss how Lucene and Solr can be used for very efficient search of tree-shaped schemaless document, e.g. JSON or XML, and can be then made to address both graph and relational data search. We will discuss the capabilities of SIREn, a Lucene/Solr plugin we have developed to deal with huge collections of tree-shaped schemaless documents, and how SIREn is built using Lucene extensibility capabilities (Analysis, Codec, Flexible Query Parser). We will compare it with Lucene's BlockJoin Query API in nested schemaless data intensive scenarios. We will then go through use cases that show how relational or graph data can be turned into JSON documents using Hadoop and Pig, and how this can be used in conjunction with SIREn to create relational faceting systems with unprecedented performance. Take-away lessons from this session will be awareness about using Lucene/Solr and Hadoop for relational and graph data search, as well as the awareness that it is now possible to have relational faceted browsers with sub-second response time on commodity hardware.

High Performance JSON Search and Relational Faceted Browsing with Lucene

lucenerevolution

In this session we will show how to build a text classifier using the Apache Lucene/Solr with libSVM libraries. We classify our corpus of job offers into a number of predefined categories. Each indexed document (a job offer) then belongs to zero, one or more categories. Known machine learning techniques for text classification include naïve bayes model, logistic regression, neural network, support vector machine (SVM), etc. We use Lucene/Solr to construct the features vector. Then we use the libsvm library known as the reference implementation of the SVM model to classify the document. We construct as many one-vs-all svm classifiers as there are classes in our setting, then using the Hadoop MapReduce Framework we reconcile the result of our classifiers. The end result is a scalable multi-class classifier. Finally we outline how the classifier is used to enrich basic solr keyword search.

Text Classification with Lucene/Solr, Apache Hadoop and LibSVM

lucenerevolution

Faceted search is a powerful technique to let users easily navigate the search results. It can also be used to develop rich user interfaces, which give an analyst quick insights about the documents space. In this session I will introduce the Facets module, how to use it, under-the-hood details as well as optimizations and best practices. I will also describe advanced faceted search capabilities with Lucene Facets.

Faceted Search with Lucene

lucenerevolution

Presented by Shai Erera, Researcher, IBM Lucene's arsenal has recently expanded to include two new modules: Index Sorting and Replication. Index sorting lets you keep an index consistently sorted based on some criteria (e.g. modification date). This allows for efficient search early-termination as well as achieve better index compression. Index replication lets you replicate a search index to achieve high-availability, fault tolerance as well as take hot index backups. In this talk we will introduce these modules, discuss implementation and design details as well as best practices.

Recent Additions to Lucene Arsenal

lucenerevolution

As part of their work with large media monitoring companies, Flax has developed a technique for applying tens of thousands of stored Lucene queries to a document in under a second. We'll talk about how we built intelligent filters to reduce the number of actual queries applied and how we extended Lucene to extract the exact hit positions of matches, the challenges of implementation, and how it can be used, including applications that monitor hundreds of thousands of news stories every day.

Turning search upside down

lucenerevolution

Presented by Xavier Sanchez Loro, Ph.D, Trovit Search SL This session aims to explain the implementation and use case for spellchecking in Trovit search engine. Trovit is a classified ads search engine supporting several different sites, one for each on country and vertical. Our search engine supports multiple indexes in multiple languages, each with several millions of indexed ads. Those indexes are segmented in several different sites depending on the type of ads (homes, cars, rentals, products, jobs and deals). We have developed a multi-language spellchecking system using solr and lucene in order to help our users to better find the desired ads and avoid the dreaded 0 results as much as possible. As such our goal is not pure orthographic correction, but also suggestion of correct searches for a certain site.

Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...

lucenerevolution

Shrinking the haystack wes caldwell - final

lucenerevolution

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene

State of the Art Logging. Kibana4Solr is Here!

Search at Twitter

Building Client-side Search Applications with Solr

Integrate Solr with real-time stream processing applications

Scaling Solr with SolrCloud

Administering and Monitoring SolrCloud Clusters

Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled

Using Solr to Search and Analyze Logs

Enhancing relevancy through personalization & semantic search

Real-time Inverted Search in the Cloud Using Lucene and Storm

Solr's Admin UI - Where does the data come from?

Schemaless Solr and the Solr Schema REST API

High Performance JSON Search and Relational Faceted Browsing with Lucene

Text Classification with Lucene/Solr, Apache Hadoop and LibSVM

Faceted Search with Lucene

Recent Additions to Lucene Arsenal

Turning search upside down

Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...

Shrinking the haystack wes caldwell - final

Kürzlich hochgeladen

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...

christianmathematics

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf

Nirmal Dwivedi

Activity 01 - Artificial Culture (1).pdf

ciinovamais

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes

Celine George

ICT role in 21st century education and it's challenges.

MaryamAhmad92

Holdier Curriculum Vitae (April 2024).pdf

agholdier

Towards a code of practice for AI in AT.pptx

Jisc

Food safety_Challenges food safety laboratories_.pdf

Sherif Taha

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx

MaritesTamaniVerdade

Introduction to Nonprofit Accounting: The Basics

TechSoup

Accessible Digital Futures project (20/03/2024)

Jisc

General Principles of Intellectual Property: Concepts of Intellectual Proper...

Poonam Aher Patil

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

Nguyen Thanh Tu Collection

Spellings Wk 3 English CAPS CARES Please Practise

AnaAcapella

𝐋𝐞𝐬𝐬𝐨𝐧 𝐎𝐮𝐭𝐜𝐨𝐦𝐞𝐬: -Discern accommodations and modifications within inclusive classroom environments, distinguishing between their respective roles and applications. -Through critical analysis of hypothetical scenarios, learners will adeptly select appropriate accommodations and modifications, honing their ability to foster an inclusive learning environment for students with disabilities or unique challenges.

Understanding Accommodations and Modifications

MJDuyan

Unit-IV; Professional Sales Representative (PSR).pptx

VishalSingh1417

Unit-IV- Pharma. Marketing Channels.pptx

VishalSingh1417

Graduate Outcomes Presentation Slides - English

neillewis46

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf

ssuserdda66b

How to Manage Global Discount in Odoo 17 POS

Celine George

Kürzlich hochgeladen (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf

Activity 01 - Artificial Culture (1).pdf

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes

ICT role in 21st century education and it's challenges.

Holdier Curriculum Vitae (April 2024).pdf

Towards a code of practice for AI in AT.pptx

Food safety_Challenges food safety laboratories_.pdf

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx

Introduction to Nonprofit Accounting: The Basics

Accessible Digital Futures project (20/03/2024)

General Principles of Intellectual Property: Concepts of Intellectual Proper...

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

Spellings Wk 3 English CAPS CARES Please Practise

Understanding Accommodations and Modifications

Unit-IV; Professional Sales Representative (PSR).pptx

Unit-IV- Pharma. Marketing Channels.pptx

Graduate Outcomes Presentation Slides - English

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf

How to Manage Global Discount in Odoo 17 POS

Brahe mass scale flexible indexing

1. Brahe - Flexible Indexing At Scale Ben Brown Software Architect, Cerner Corporation

2. Who I Am • Ben Brown Software Architect • Cerner Healthcare IT Company • Semantic Solutions Team of 10 Search Services Fun Stuff NLP, Medical Ontologies, ML

3. Chart Search Taking This Photo: http://bit.ly/Y7kTJt

4. Chart Search Turning it into this

5. Chart Search Does • Faceting • NLP • Semantic Concept Markup Makes for a heavy record (Especially on Solr 1.4)

6. Where We Started Started Major Engineering in 2009 IBM Dev Works: http://ibm.co/14ZrtqX

7. Where We Started Started Major Engineering in 2009 IBM Dev Works: http://ibm.co/14ZrtqX

8. Scale • Clusters partitioned by client • Raw and processed data in HDFS • All processing & indexing done through map reduce

9. Shard Size Limiting Factor ~26 Million Discrete Results Per Shard Average of 35 Shards Per Client Range 5 to 140

10. Query Touch Points

11. Query Touch Points One User Action ~ 4 Queries 35 Shards - 432 Touch Points 140 Shards - 1692 Touch Points • Works, but not efficient • Chance for variance killing performance • Failure is a massive config headache

12. Growth • Hashed ID does not play well with resizing • Deploy Again • Reindex Everything Document Hash modulo Shard Count Doc One:Hash(abc123) = 15 Doc Two: Hash(efg456) = 8 Doc Three: Hash(hij789) = 7 3 Shards Doc One -> Shard 0 Doc Two -> Shard 2 Doc Three -> Shard 1 4 Shards Doc One -> Shard 3 Doc Two -> Shard 0 Doc Three -> Shard 3

13. We Have a Problem Painful Growth Lots of Deploys Variance Risk Image: http://bit.ly/Y7oBD6

14. What Would Be Better? Load Balance at the Client Automated Failover Easy Deployments Simplified Splitting Minimized Touch Points Disconnected Stages

15. Solution Shift Master to HBase Image: http://bit.ly/ZXO2na

16. Why HBase? Lexically organized keys Efficient key range scans Efficient time based scans We're pretty good at operating it

18. Custom Core Admin Work with ZooKeeper for claim process Creates solr core after claims Controls pulling data from HBase

19. Claim Process

20. Claim Process

21. Claim Process Image: http://bit.ly/Or317R

51. Queries • Client inspects ZooKeeper • Finds online nodes o Only for the keyspace it cares about o Issues distributed queries if necessary • Balances in the Client • Retries if queries fail

52. Ends Thoughts • Keep things simple • Disconnect your stages • Keep your touchpoints at a minimum • Organize your data around your queries • Use what you’re good at

53. CONTACT Ben Brown http://linkd.in/ZZIBK4 @b_brown ENGINEERING BLOG https://engineering.cerner.com/ WE’RE HIRING! http://www.cerner.com/About_Cerner/Careers/

54. Bonus Slides!

Brahe mass scale flexible indexing

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Brahe mass scale flexible indexing

Ähnlich wie Brahe mass scale flexible indexing (20)

Mehr von lucenerevolution

Mehr von lucenerevolution (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Brahe mass scale flexible indexing