SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Adventures in Discoverability with C* and Solr

Patricia Gorla, Systems Engineer
@patriciagorla
@o19s

#CASSANDRAEU

CASSANDRASUMMITEU
About Me
• Solr
• Cassandra
• Information retrieval

Paul Hostetler - phostetler.com

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple

Complex

Aristotle’s birthplace?

All ancient Greek philosophers?

Coordinates of Stagira?

All cities within 100km of Stagira

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple

Complex

Aristotle’s birthplace?
select birthPlace
where name = “Aristotle”;

All ancient Greek philosophers?

Coordinates of Stagira?
select coord
where name = “Stagira”;

All cities within 100km of Stagira

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple
Aristotle’s birthplace?
select birthPlace
where name = “Aristotle”;

Complex
All ancient Greek philosophers?
create index on tag;
select *
where tag = “Greek philosophy”;

Coordinates of Stagira?
select coord
where name = “Stagira”;

#CASSANDRAEU

All cities within 100km of Stagira
???

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple
Aristotle’s birthplace?
q=Aristotle&fl=birthPlace

All ancient Greek philosophers?
q=Greek philosophy

Coordinates of Stagira?
q=Stagira&fl=point

All cities within 100km of Stagira
q=*:*&fq={!geofilt pt=40.530,
23.752 sfield=point d=100}

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Search
• Google Site Search
• MySQL ‘like’ statements

Seth Casteel - littlefriendsphoto.com

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Search

• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Faceting

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Search

• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Faceting

#CASSANDRAEU

and many more!

CASSANDRASUMMITEU
Inverted Index
[1] Pleasure in the job puts perfection in the work.
[2] Education is the best provision for the journey to old age.
[3] If some animals are good at hunting and others are suitable
for hunting, then the gods must clearly smile on hunting.
[4] It is the mark of an educated mind to be able to entertain a
thought without absorbing it.

Term

Freq

Documents

education

[2] [4]

hunting

3

[3]

perfection
#CASSANDRAEU

2
1

[1]
CASSANDRASUMMITEU
Defining Field Types
<fieldType name="text_general" class="solr.TextField" >
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory” ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
#CASSANDRAEU

CASSANDRASUMMITEU
Index-side Analysis

Punctuation
Stop Words
Lowercase
Stemming

#CASSANDRAEU

Hope is a waking dream.
Hope is a waking dream
Hope
waking dream
hope
waking dream
hope
wake dream

CASSANDRASUMMITEU
Query-side Analysis

Punctuation
Stop Words
Lowercase
Stemming
Synonyms
#CASSANDRAEU

Hope is a waking dream.
Hope is a waking dream
Hope
waking dream
hope
waking dream
hope
wake dream
hope
wake dream
desire awake wish
CASSANDRASUMMITEU
Faceting
facet_fields: {
tags: [hunting: 1, education: 2, work: 2],
locations: [Stagira: 5, Chalcis: 3]
}

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Distribution
• Pay Google more $$
• MySQL ‘like’ shards
• Master/Slave replication
• SolrCloud

#CASSANDRAEU

CASSANDRASUMMITEU
“Distributed search is hard.”

#CASSANDRAEU

CASSANDRASUMMITEU
Solr + Cassandra: Datastax Enterprise
• Full-text search
• Tokenization
• Stemming
• Date ranges
• Aggregation
• High Availability
• Distributed Nature

#CASSANDRAEU

CASSANDRASUMMITEU
Examining DBpedia.org Datasets
Person
<http://xmlns.com/foaf/0.1/name> "Aristotle"@en .
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person .
<http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en .
<http://dbpedia.org/ontology/birthPlace> Stagira
<http://dbpedia.org/ontology/deathPlace> Chalcis .

Place
<http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> .
<http://dbpedia.org/resource/Stagira#lat> "40.5916667" .
<http://dbpedia.org/resource/Stagira#long> "23.7947222" .
<http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en .

#CASSANDRAEU

CASSANDRASUMMITEU
“Love is a single soul
inhabiting two bodies.”
#CASSANDRAEU

CASSANDRASUMMITEU
Querying for data
curl http://localhost:8983/solr/solr.person/q=Aristotle

#CASSANDRAEU

CASSANDRASUMMITEU
Filtering by location
curl http://localhost:8983/solr/solr.location/select?q=*:*
&spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point
d=100}

#CASSANDRAEU

CASSANDRASUMMITEU
“There is no great genius without a
mixture of madness.”
#CASSANDRAEU

CASSANDRASUMMITEU
Schema.xml
Unified Schema
<fields>
<field name="id" type="string" />
<field name="name" type="text" />
<dynamicField name="*Date" type="date" />
<dynamicField name="*Place" type="text" />
<dynamicField name="*Point" type="location" />
<dynamicField name="*_tag"

type="text" />

</fields>

#CASSANDRAEU

CASSANDRASUMMITEU
Upload to DSE, Create Core
curl http://localhost:8983/solr/resource/solr.location/schema.xml 
--data-binary @solr/location_schema.xml 
-H 'Content-type:text/xml; charset=utf-8 '

curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml 
--data-binary @solr/location_solrconfig.xml 
-H 'Content-type:text/xml; charset=utf-8 '
curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra Schema
Location Schema

gc_grace_seconds=864000 AND

cqlsh:solr> DESC COLUMNFAMILY location;

read_repair_chance=0.100000 AND

CREATE TABLE location (

replicate_on_write='true' AND

id text PRIMARY KEY,
"_docBoost" text,
"_dynFld" text,
location text,

populate_io_cache_on_flush='false' AND
compaction={'class':
'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression':
'SnappyCompressor'};

name text,
solr_query text,
tags text
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND

CREATE INDEX
solr_location__docBoost_index ON location
("_docBoost");
...
CREATE INDEX
solr_location_solr_query_index ON
location (solr_query);

dclocal_read_repair_chance=0.000000 AND
#CASSANDRAEU

CASSANDRASUMMITEU
What Changes
Solr

Cassandra

• No multiValued fields

• No composite columns

• No JOIN*

• No counter columns

#CASSANDRAEU

CASSANDRASUMMITEU
Bringing it All Together
•

Fault tolerant, available
search

#CASSANDRAEU

CASSANDRASUMMITEU
“Thank you.”

#CASSANDRAEU

CASSANDRASUMMITEU
THANK YOU

pgorla@opensourceconnections.com
@patriciagorla
@o19s
All information, including slides, are on http://github.com/pgorla/million-books
#CASSANDRAEU

CASSANDRASUMMITEU

Weitere ähnliche Inhalte

Mehr von DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Mehr von DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Kürzlich hochgeladen

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Kürzlich hochgeladen (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

C* Summit EU 2013: One Million Books: Adventures in Discoverability with Cassandra and Solr

  • 1. Adventures in Discoverability with C* and Solr Patricia Gorla, Systems Engineer @patriciagorla @o19s #CASSANDRAEU CASSANDRASUMMITEU
  • 2. About Me • Solr • Cassandra • Information retrieval Paul Hostetler - phostetler.com #CASSANDRAEU CASSANDRASUMMITEU
  • 3. How Do I Find What I’m Looking For? Simple Complex Aristotle’s birthplace? All ancient Greek philosophers? Coordinates of Stagira? All cities within 100km of Stagira #CASSANDRAEU CASSANDRASUMMITEU
  • 4. How Do I Find What I’m Looking For? Simple Complex Aristotle’s birthplace? select birthPlace where name = “Aristotle”; All ancient Greek philosophers? Coordinates of Stagira? select coord where name = “Stagira”; All cities within 100km of Stagira #CASSANDRAEU CASSANDRASUMMITEU
  • 5. How Do I Find What I’m Looking For? Simple Aristotle’s birthplace? select birthPlace where name = “Aristotle”; Complex All ancient Greek philosophers? create index on tag; select * where tag = “Greek philosophy”; Coordinates of Stagira? select coord where name = “Stagira”; #CASSANDRAEU All cities within 100km of Stagira ??? CASSANDRASUMMITEU
  • 6. How Do I Find What I’m Looking For? Simple Aristotle’s birthplace? q=Aristotle&fl=birthPlace All ancient Greek philosophers? q=Greek philosophy Coordinates of Stagira? q=Stagira&fl=point All cities within 100km of Stagira q=*:*&fq={!geofilt pt=40.530, 23.752 sfield=point d=100} #CASSANDRAEU CASSANDRASUMMITEU
  • 7. Approaches to Search • Google Site Search • MySQL ‘like’ statements Seth Casteel - littlefriendsphoto.com #CASSANDRAEU CASSANDRASUMMITEU
  • 8. Approaches to Search • Full-text search • Ranking (Scoring) • Tokenization • Stemming • Faceting #CASSANDRAEU CASSANDRASUMMITEU
  • 9. Approaches to Search • Full-text search • Ranking (Scoring) • Tokenization • Stemming • Faceting #CASSANDRAEU and many more! CASSANDRASUMMITEU
  • 10. Inverted Index [1] Pleasure in the job puts perfection in the work. [2] Education is the best provision for the journey to old age. [3] If some animals are good at hunting and others are suitable for hunting, then the gods must clearly smile on hunting. [4] It is the mark of an educated mind to be able to entertain a thought without absorbing it. Term Freq Documents education [2] [4] hunting 3 [3] perfection #CASSANDRAEU 2 1 [1] CASSANDRASUMMITEU
  • 11. Defining Field Types <fieldType name="text_general" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory” ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> #CASSANDRAEU CASSANDRASUMMITEU
  • 12. Index-side Analysis Punctuation Stop Words Lowercase Stemming #CASSANDRAEU Hope is a waking dream. Hope is a waking dream Hope waking dream hope waking dream hope wake dream CASSANDRASUMMITEU
  • 13. Query-side Analysis Punctuation Stop Words Lowercase Stemming Synonyms #CASSANDRAEU Hope is a waking dream. Hope is a waking dream Hope waking dream hope waking dream hope wake dream hope wake dream desire awake wish CASSANDRASUMMITEU
  • 14. Faceting facet_fields: { tags: [hunting: 1, education: 2, work: 2], locations: [Stagira: 5, Chalcis: 3] } #CASSANDRAEU CASSANDRASUMMITEU
  • 15. Approaches to Distribution • Pay Google more $$ • MySQL ‘like’ shards • Master/Slave replication • SolrCloud #CASSANDRAEU CASSANDRASUMMITEU
  • 16. “Distributed search is hard.” #CASSANDRAEU CASSANDRASUMMITEU
  • 17. Solr + Cassandra: Datastax Enterprise • Full-text search • Tokenization • Stemming • Date ranges • Aggregation • High Availability • Distributed Nature #CASSANDRAEU CASSANDRASUMMITEU
  • 18. Examining DBpedia.org Datasets Person <http://xmlns.com/foaf/0.1/name> "Aristotle"@en . <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person . <http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en . <http://dbpedia.org/ontology/birthPlace> Stagira <http://dbpedia.org/ontology/deathPlace> Chalcis . Place <http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> . <http://dbpedia.org/resource/Stagira#lat> "40.5916667" . <http://dbpedia.org/resource/Stagira#long> "23.7947222" . <http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en . #CASSANDRAEU CASSANDRASUMMITEU
  • 19. “Love is a single soul inhabiting two bodies.” #CASSANDRAEU CASSANDRASUMMITEU
  • 20. Querying for data curl http://localhost:8983/solr/solr.person/q=Aristotle #CASSANDRAEU CASSANDRASUMMITEU
  • 21. Filtering by location curl http://localhost:8983/solr/solr.location/select?q=*:* &spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point d=100} #CASSANDRAEU CASSANDRASUMMITEU
  • 22. “There is no great genius without a mixture of madness.” #CASSANDRAEU CASSANDRASUMMITEU
  • 23. Schema.xml Unified Schema <fields> <field name="id" type="string" /> <field name="name" type="text" /> <dynamicField name="*Date" type="date" /> <dynamicField name="*Place" type="text" /> <dynamicField name="*Point" type="location" /> <dynamicField name="*_tag" type="text" /> </fields> #CASSANDRAEU CASSANDRASUMMITEU
  • 24. Upload to DSE, Create Core curl http://localhost:8983/solr/resource/solr.location/schema.xml --data-binary @solr/location_schema.xml -H 'Content-type:text/xml; charset=utf-8 ' curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml --data-binary @solr/location_solrconfig.xml -H 'Content-type:text/xml; charset=utf-8 ' curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location #CASSANDRAEU CASSANDRASUMMITEU
  • 25. Cassandra Schema Location Schema gc_grace_seconds=864000 AND cqlsh:solr> DESC COLUMNFAMILY location; read_repair_chance=0.100000 AND CREATE TABLE location ( replicate_on_write='true' AND id text PRIMARY KEY, "_docBoost" text, "_dynFld" text, location text, populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; name text, solr_query text, tags text ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND CREATE INDEX solr_location__docBoost_index ON location ("_docBoost"); ... CREATE INDEX solr_location_solr_query_index ON location (solr_query); dclocal_read_repair_chance=0.000000 AND #CASSANDRAEU CASSANDRASUMMITEU
  • 26. What Changes Solr Cassandra • No multiValued fields • No composite columns • No JOIN* • No counter columns #CASSANDRAEU CASSANDRASUMMITEU
  • 27. Bringing it All Together • Fault tolerant, available search #CASSANDRAEU CASSANDRASUMMITEU
  • 29. THANK YOU pgorla@opensourceconnections.com @patriciagorla @o19s All information, including slides, are on http://github.com/pgorla/million-books #CASSANDRAEU CASSANDRASUMMITEU