DataStax: An Introduction to DataStax Enterprise Search

An Introduction to DSE Search
Caleb Rackliffe
Software Engineer
caleb.rackliffe@datastax.com
@calebrackliffe

What problem were we trying to solve?

4
SELECT * FROM customers WHERE country LIKE '%land%';

5
What about secondary indexes?

Why not just create your own secondary index
implementation that supports wildcard queries?

Why did we build something new?

10
Application
DataStax Driver Solr Client

12
Application
Consistency
Cost
Complexity

14
partitioning
multi-DC
replication
geospatial
wildcards
monitoring
C* field type support (UDT, Tuple, collections)
security
live indexing
sorting
faceting
fault-tolerant distributed search
caching
text analysis
grouping
automatic index updates
JVM
CQL
repair

15
Application
Consistency
Complexity
Cost

Creating a Solr Core
bash$ dse cassandra -s
cqlsh> CREATE KEYSPACE test
WITH replication = {'class': 'NetworkTopologyStrategy', 'Solr':1};
cqlsh:test> CREATE TABLE test.user(username text PRIMARY KEY,
fullname text,
address_ map<text, text>);
bash$ dsetool create_core test.user generateResources=true
Start a node…
Create a table…
Create the core…

bash$ dsetool get_core_schema test.user
<?xml version="1.0" encoding="UTF-8" standalone=“no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.TextField" name="text">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType class="org.apache.solr.schema.StrField" name="string"/>
</types>
<fields>
<field indexed="true" name="username" stored="true" type="string"/>
<field indexed="true" name="fullname" stored="true" type="text"/>
<dynamicField indexed="true" name="address_*" stored="true" type="string"/>
</fields>
<uniqueKey>fullname</uniqueKey>
</schema>
The Schema

Insert Rows (…and Index Documents)
cqlsh:test> INSERT INTO user(username, fullname, address)
VALUES('sbtourist', 'Sergio Bossa', {'address_home' : 'UK', 'address_work' : 'UK'});
VALUES('bereng', 'Berenguer Blasi', {'address_home' : 'ES', 'address_work' : 'ES'});
VALUES('thegrinch', 'Sven Delmas', {'address_home':'US','address_work':'HQ'});
…and that’s it. No ETL. No writing to a second datastore.

Wildcards
cqlsh:test> SELECT username, address
FROM user
WHERE solr_query='{"q":"address_home:U*"}';
username | address
-----------+----------------------------------------------------
sbtourist | {‘address_home': 'UK', ‘address_work': 'UK'}
thegrinch | {‘address_home': 'US', ‘address_work': 'HQ'}
(2 rows)

Sorting and Limits
FROM user
WHERE solr_query=‘{"q":"*:*", "sort":"address_home desc"}';
username | address
-----------+----------------------------------------------------
thegrinch | {'address_home': 'US', 'address_work': 'HQ'}
sbtourist | {'address_home': 'UK', 'address_work': 'UK'}
bereng | {'address_home': 'ES', 'address_work': 'ES'}
(3 rows)
FROM user
WHERE solr_query='{"q":"*:*", "sort":"address_home desc"}'
LIMIT 1;
username | address
-----------+----------------------------------------------------
thegrinch | {'address_home': 'US', 'address_work': 'HQ'}
(3 rows)

Faceting
cqlsh:test> SELECT *
FROM user
WHERE solr_query='{"q":"*:*", "facet":{"field" : "address_work"}}';
facet_fields
--------------------------------------------
{"address_work" : {"ES" : 1 , "HQ" : 1 , "UK" : 1}}
(1 rows)

Partition Restrictions
cqlsh:test> CREATE TABLE event(sensor_id bigint,
recording_time timestamp,
description text,
PRIMARY KEY(sensor_id, recording_time));
…
cqlsh:test> SELECT recording_time, description
FROM test.event
WHERE sensor_id = 2314234432
AND solr_query=‘description:unremarkable’;

What do the internals look like?

26
Buffered
Searchable
Durable
Memory
Disk

27
Buffered
Searchable
Durable
Memory
Disk

28
RAMBuffer
Segment
Segment
Memory
Disk
Segment Segment
Buffered
Searchable
Durable
Soft Commit
Hard Commit

Replica Selection
A
A
RF=2
shards: A-E
B
B CC D
D E
E
coordinator1
2
34
5
Healthy Unhealthy

What happens if a shard query fails?

Failover: Phase 1
4 nodes
RF = 2
shards: A-D
no vnodes
1
2
3
4

Failover: Phase 2
4 nodes
RF = 2
shards: A-D
no vnodes
1
2
3
4

Failover: Phase 3
4 nodes
RF = 2
shards: A-D
no vnodes
1
2
3
4

Search + Analytics: Explicit Predicate Pushdown
bash$ dse spark
scala> val table = sc.cassandraTable("wiki","solr")
scala> val result = table.select("id","title")
.where(“solr_query=‘body:dog'")
.collect

DataStax: An Introduction to DataStax Enterprise Search

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie DataStax: An Introduction to DataStax Enterprise Search

Ähnlich wie DataStax: An Introduction to DataStax Enterprise Search (20)

Mehr von DataStax Academy

Mehr von DataStax Academy (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

DataStax: An Introduction to DataStax Enterprise Search

Hinweis der Redaktion