5. Agenda
● Introduction
● Why a single DBMS is not enough
● What makes a DBMS
● Different flavors of DMBS
● Top picks
6. Why one DBMS is not enough
"If you feel things are not efficient in your code, is likely that you are suffering of
poor data structures choice/design" ~ Anonymous
7. Why one DBMS is not enough
● Different data structures
● Different access patterns
● Different consistency and durability requirements.
● Different scaling needs
● Different budgets
● Theoretical fundamentalism
8. Why one DBMS is not enough
A more concrete example
OLAP -vs- OLTP
14. What makes a DBMS: General
● Licensing
● Language support
● OS support
● Community & workforce
● Tools ecosystem
15. ● Data Architecture
○ Logical data model
○ Physical data model
● Standards adherence (where defined)
● Atomicity
● Consistency
● Isolation
● Durability
● Referential integrity
● Transactions
● Locking
● Crash recovery
● Unicode support
What makes a DBMS: Fundamental Features
16. ● Interface / connectors / protocols
● Sequences / auto-incrementals / atomic counters
● Conditional entry updates
● MapReduce
● Compression
● In-memory
● Availability
● Concurrency handling
● Scalability
● Embeddable
● Backups
What makes a DBMS: Fundamental Features cont.
17. ● CRUD
● Union
● Intersect
● JOIN (inner, outer)
● Inner selects
● Merge joins
● Common Table Expressions
● Windowing Functions
● Parallel Query
● Subqueries
● Aggregation
● Derived tables
What makes a DBMS: querying capabilities
18. ● Cursors
● Triggers
● Stored procedures
● Functions
● Views
● Materialized views
● Virtual columns
● UDF
● XML/JSON/YAML support
What makes a DBMS: programmatic capabilities
19. ● Database (tables size sum)
● Number of Tables
● Tables individual size
● Variable length column size
● Row width
● Row columns count
● Row count
● Column name
● Blob size
● Char
● Numeric
● Date (min / max)
What makes a DBMS: sizing limits
20. ● B-Tree
● Full text indexing
● Hash
● Bitmap
● Expression
● Partials
● Reverse
● GiST
● GIS indexing
● Composite keys
● Graph support
What makes a DBMS: indexing
22. Partitioning
● Range
● Hash
● Range+hash
● List
● Expression
● Sub-partitioning
Sharding
● By key
● By table
What makes a DBMS: scalability
23. ● Integer
● Floating point
● Decimal
● String
● Binary
● Date/time
● Boolean
● Binary
● Set
● Enumeration
● Blob
● Clob
● JSON/XML/YAML (as native types)
What makes a DBMS: supported data types
24. ● Authentication methods
● Access Control Lists
● Pluggable Authentication Modules support
● Encryption at-rest
● Encryption over the wire
● User proxy
What makes a DBMS: security features
25. ● Data organization model: unstructured, semi-structured, structured
● Data model (schema) stability: Static? Stable? Dynamic? Highly dynamic?
● Writes: append-only; append mostly; updates only; updates mostly
● Reads: full scans; range scans; multi-range scans; point reads;
● Reads by age: new only; new mostly; old only; old mostly; whole range
● Reads by complexity: simple, related, deeply-nested relations, ....?
What makes a DBMS: workload
26. ACID vs BASE
● Atomic
● Consistent
● Isolated
● Durable
● Basic Availability
● Soft-state
● Eventual Consistency
36. Relational Databases: JOINs
SELECT o.order_id AS Order, CONCAT(c.customer_name, “ (“, c.
customer_email, “)”) as Customer, GROUP_CONCAT(i.item_name),
SUM(item_price)
FROM orders AS o
JOIN order_items AS oi ON oi.order_id = o.order_id
JOIN items AS i ON i.item_id = oi.item_id
JOIN customers AS c ON c.customer_id = o.customer_id
37. Relational Databases: good use cases
● Highly-structured data with complex querying needs
● Projects that need very high data durability and guarantees of database-level
consistency and integrity
● Simple projects with limited data growth and limited amount of entities
● Projects that require PCI/DSS, HIPPA or similar security requirements
● Analysis of portions of larger BigData stores
● Projects where duplicated data volumes would be a problem
38. Relational Databases: bad use cases
● Unstructured data
● Deep Hierarchies / Nested -> XML
● Deep recursion:
● Ever-growing datasets; Projects that are basically logging data
● Projects recording time-series
● Reporting on massive datasets
39. Relational Databases: bad use cases
● Projects supporting extreme concurrency
● Projects supporting massive data intake
● Queues
● Cache storage
42. ● Well known / mature / extensive documentation
● GPLv2 + commercial license for OEMs, ISVs and VARs
● Client libraries for about every programming language
● Many different engines
● SQL/ACID impose scalability limits
● Asynchronous / Semi-synchronous / Virtually synchronous replication
● Can be AP or CP depending on replication model
Relational Databases: MySQL
43. PROs CONs
● Open source
● Mature and ubiquitous
● ACID
● Choice of AP or CP
● Highly available
● Abundant tooling and expertise
● General purpouse; Likely good to
start anything you want.
● Difficult to shard
● Replication issues
● Not 100% standard compliant
● Storage engines imposed
limiations
● General purpouse; No single
bullet solutions for scaling!
45. ● Mature / adequate documentation
● PostgreSQL License (similar to BSD/MIT)
● Client libraries for about every programming language
● Highly Standards Compliant
● SQL/ACID impose scalability limits
● Asynchronous / Semi-synchronous
● Virtually synchronous replication via 3rd party
● Can be AP or CP depending on replication model`
Relational Databases: PostgreSQL
46. PROs CONs
● Open source
● Mature and stable
● ACID
● Lots of advanced features
● Vacuum
● Difficult to shard
● Operations feel like an
afterthought
● Less forgiving
● Vacuum
57. K/V Stores - Good Use Cases
● Lots of data
● Object cache in front of RDBMS
● High concurrency
● Massive small-data intake
● Simple data access patterns
58. K/V Stores - Good Use Cases
● Lots of data
○ Usually easily horizontally scalable
● Object cache in front of RDBMS
● High concurrency
● Massive small-data intake
● Simple data access patterns
59. K/V Stores - Good Use Cases
● Lots of data
● Object cache in front of RDBMS
○ Memcached, anyone?
● High concurrency
● Massive small-data intake
● Simple data access patterns
60. K/V Stores - Good Use Cases
● Lots of data
● Object cache in front of RDBMS
● High concurrency
○ Very simple locking model
● Massive small-data intake
● Simple data access patterns
61. K/V Stores - Good Use Cases
● Lots of data
● Object cache in front of RDBMS
● High concurrency
● Massive small-data intake
● Simple data access patterns
62. K/V Stores - Good Use Cases
● Lots of data
● Object cache in front of RDBMS
● High concurrency
● Massive small-data intake
● Simple data access patterns
○ CRUD on PK access
63. K/V Stores - Bad Use Cases
● Durability and consistency*
● Complex data access patterns
● Non-PK access*
● Operations*
64. K/V Stores - Bad Use Cases
● Durability and consistency*
● Complex data access patterns
● Non-PK access*
● Operations*
65. K/V Stores - Bad Use Cases
● Durability and consistency*
● Complex data access patterns*
● Non-PK access*
● Operations*
66. K/V Stores - Bad Use Cases
● Durability and consistency*
● Complex data access patterns
● Non-PK access*
● Operations*
67. K/V Stores - Bad Use Cases
● Durability and consistency*
● Complex data access patterns
● Non-PK access*
● Operations*
○ Complex systems fail in complex ways
81. Columnar Data Layout
● Row-oriented Read Approach
What we want to read
Read Operation
Memory Page
1 2
3
4
10 Smith Bob 40000
12 Jones Mary 50000
11 Johnson Cathy 44000
82. Columnar Data Layout
● Column-oriented Read Approach
What we want to read
Read Operation
Memory Page
1 2
3
4
10 12 11 22
Smith Jones Johnson
Joe Mary Cathy Bob
83. Columnar Databases - Considerations
● Buffering and compression can help to reduce the impact of writes, but
they should still be avoided when possible
○ Usually, an ETL process should be put in place to prepare data for analysis in a column-based
format
● Covering Indexes in row-based stores could provide similar benefits, but only
up to a point → index maintenance work can become too expensive
● Column-based stores are self-indexing and more disk-space efficient
● SQL can be used for most column-based stores
84. Columnar Databases - Considerations
● Buffering and compression can help to reduce the impact of writes, but they
should still be avoided when possible
○ Usually, an ETL process should be put in place to prepare data for analysis in a column-based
format
● Covering Indexes in row-based stores could provide similar benefits,
but only up to a point → index maintenance work can become too
expensive
● Column-based stores are self-indexing and more disk-space efficient
● SQL can be used for most column-based stores
85. Columnar Databases - Considerations
● Buffering and compression can help to reduce the impact of writes, but they
should still be avoided when possible
○ Usually, an ETL process should be put in place to prepare data for analysis in a column-based
format
● Covering Indexes in row-based stores could provide similar benefits, but only
up to a point → index maintenance work can become too expensive
● Column-based stores are self-indexing and more disk-space efficient
● SQL can be used for most column-based stores
86. Columnar Databases - Considerations
● Buffering and compression can help to reduce the impact of writes, but they
should still be avoided when possible
○ Usually, an ETL process should be put in place to prepare data for analysis in a column-based
format
● Covering Indexes in row-based stores could provide similar benefits, but only
up to a point → index maintenance work can become too expensive
● Column-based stores are self-indexing and more disk-space efficient
● SQL can be used for most column-based stores
87. ● Suitable for read-mostly or read-intensive, large data repositories
● Good for full table / large range reads.
● Good for unstructured problems where “good” indexes are hard to forecast
● Good for re-creatable datasets
● Good for structured data
Columnar Database - Good use cases
88. ● Suitable for read-mostly or read-intensive, large data repositories
● Good for full table / large range reads.
● Good for unstructured problems where “good” indexes are hard to forecast
● Good for re-creatable datasets
● Good for structured data
Columnar Database - Good use cases
89. ● Suitable for read-mostly or read-intensive, large data repositories
● Good for full table / large range reads.
● Good for unstructured problems where “good” indexes are hard to forecast
● Good for re-creatable datasets
● Good for structured data
Columnar Database - Good use cases
90. ● Suitable for read-mostly or read-intensive, large data repositories
● Good for full table / large range reads.
● Good for unstructured problems where “good” indexes are hard to forecast
● Good for re-creatable datasets
● Good for structured data
Columnar Database - Good use cases
91. ● Suitable for read-mostly or read-intensive, large data repositories
● Good for full table / large range reads.
● Good for unstructured problems where “good” indexes are hard to forecast
● Good for re-creatable datasets
● Good for structured data
Columnar Database - Good use cases
92. ● Not good for “SELECT *” queries or queries fetching most of the columns
● Not good for writes
● Not good for mixed read/write
● Bad for unstructured data
Columnar Database - Bad use cases
93. ● Not good for “SELECT *” queries or queries fetching most of the columns
● Not good for writes
● Not good for mixed read/write
● Bad for unstructured data
Columnar Database - Bad use cases
94. ● Not good for “SELECT *” queries or queries fetching most of the columns
● Not good for writes
● Not good for mixed read/write
● Bad for unstructured data
Columnar Database - Bad use cases
95. ● Not good for “SELECT *” queries or queries fetching most of the columns
● Not good for writes
● Not good for mixed read/write
● Bad for unstructured data
Columnar Database - Bad use cases
103. Graph Databases - Good Use Cases
● Highly Connected Data
● Millions or Billions of Records
● Re-Creatable Data Set
● Structured Data
104. Graph Databases - Good Use Cases
● Highly Connected Data
○ Network & IT Operations, Recommendations, Fraud Detection, Social Networking, Identity &
Access Management, Geo Routing, Insurance Risk Analysis, Counter Terrorism
● Millions or Billions of Records
● Re-Creatable Data Set
● Structured Data
105.
106. Graph Databases - Good Use Cases
● Highly Connected Data
● Millions or Billions of Records
○ Relational databases can also solve this problem at a smaller scale
● Re-Creatable Data Set
● Structured Data
107. Graph Databases - Good Use Cases
● Highly Connected Data
● Millions or Billions of Records
● Re-Creatable Data Set
○ Keep as much as possible outside of the critical path
● Structured Data
108. Graph Databases - Good Use Cases
● Highly Connected Data
● Millions or Billions of Records
● Re-Creatable Data Set
● Structured Data
○ You cannot graph a relationship unless you can define it
109. Graph Databases - Bad Use Cases
● Unstructured Data
● Non-Connected Data
● Highly Concurrent RW Workloads
● Anything in the Critical OLTP Path*
● Ever-Growing Data Set
110. Graph Databases - Bad Use Cases
● Unstructured Data
○ You cannot graph a relationship if you cannot define it
● Non-Connected Data
● Highly Concurrent Workloads
● Anything in the Critical OLTP Path*
● Ever-Growing Data Set
111. Graph Databases - Bad Use Cases
● Unstructured Data
● Non-Connected Data
○ Graphiness is important here
● Highly Concurrent Workloads
● Anything in the Critical OLTP Path*
● Ever-Growing Data Set
112. Graph Databases - Bad Use Cases
● Unstructured Data
● Non-Connected Data
● Highly Concurrent RW Workloads
○ Performance breaks down
● Anything in the Critical OLTP Path*
● Ever-Growing Data Set
113. Graph Databases - Bad Use Cases
● Unstructured Data
● Non-Connected Data
● Highly Concurrent Workloads
● Anything in the Critical OLTP Path*
○ I'm not only talking about writes here
● Ever-Growing Data Set
114. Graph Databases - Bad Use Cases
● Unstructured Data
● Non-Connected Data
● Highly Concurrent RW Workloads
● Anything in the Critical OLTP Path*
● Ever-Growing Data Set
123. PROs CONs
● Solves a very specific (and hard) data
problem
● Learning curve not bad for developer
usage
● Data analysts’ dream
● Very little operational expertise for hire
● Little community and virtually no tooling for
administration and operations.
● Big mismatch in paradigm vs RDBMS;
Hard to switch for DBAs.
● Hard/Expensive to scale horizontally
● Writes are computationally expensive
127. Time Series - Good Use Cases
● Uh … Time Series Data
● Write-mostly (95%+) - Sequential Appends
● Rare updates, rarer still to the distant past
● Deletes occur at the opposite end (the beginning)
● Data does not fit in memory
128. Time Series - Good Use Cases
● Uh … Time Series Data
● Write-mostly (95%+) - Sequential Appends
● Rare updates, rarer still to the distant past
● Deletes occur at the opposite end (the beginning)
● Data does not fit in memory
129. Time Series - Good Use Cases
● Uh … Time Series Data
● Write-mostly (95%+) - Sequential Appends
● Rare updates, rarer still to the distant past
● Deletes occur at the opposite end (the beginning)
● Data does not fit in memory
130. Time Series - Good Use Cases
● Uh … Time Series Data
● Write-mostly (95%+) - Sequential Appends
● Rare updates, rarer still to the distant past
● Deletes occur at the opposite end (the beginning)
● Data does not fit in memory
131. Time Series - Good Use Cases
● Uh … Time Series Data
● Write-mostly (95%+) - Sequential Appends
● Rare updates, rarer still to the distant past
● Deletes occur at the opposite end (the beginning)
● Data does not fit in memory
132. Time Series - Good Use Cases
● Uh … Time Series Data
● Write-mostly (95%+) - Sequential Appends
● Rare updates, rarer still to the distant past
● Deletes occur at the opposite end (the beginning)
● Data does not fit in memory
133. Time Series - Bad Use Cases
● Uh … Not Time Series Data
● Small data
134. Example Time Series Databases
● InfluxDB
● Graphite
● OpenTSDB
● Blueflood
● Prometheus
135. Example Time Series Databases
● InfluxDB
● Graphite
● OpenTSDB
● Blueflood
● Prometheus
136.
137. Example Time Series Databases
● InfluxDB
● Graphite
● OpenTSDB
● Blueflood
● Prometheus
138. Example Time Series Databases
● InfluxDB
● Graphite
● OpenTSDB
● Blueflood
● Prometheus
139. Example Time Series Databases
● InfluxDB
● Graphite
● OpenTSDB
● Blueflood
● Prometheus
140. Example Time Series Databases
● InfluxDB
● Graphite
● OpenTSDB
● Blueflood
● Prometheus
141. PROs CONs
● Solves a very specific (big) data problem
● Well-defined and finite data access
patterns
● Terrible query semantics
155. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
156. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
157. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
○ Different locking behaviors
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
158. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
159. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
160. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
161. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
162. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
163. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
164. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
165. Document Stores: MongoDB
● Sharding and replication for dummies!
● Pluggable storage engines for distinct workloads.
● Excellent compression options with PerconaFT, RocksDB, WiredTiger
● On disk encryption (Enterprise Advanced)
● In-memory storage engine (Beta)
● Connectors for all major programming languages
● Sharding and replica aware connectors
● Geospatial functions
● Aggregation framework
● .. a lot more except being transactional
170. Document Stores: Couchbase
● MongoDB - more or less
● Global Secondary Indexes is exciting which produces localized secondary
indexes for low latency queries (Multi Dimensional Scaling)
● Drop in replacement for Memcache
171. Document Stores: Couchbase
● MongoDB - more or less
● Global Secondary Indexes is exciting which produces localized
secondary indexes for low latency queries (Multi Dimensional Scaling)
● Drop in replacement for Memcache
172. Document Stores: Couchbase
● MongoDB - more or less
● Global Secondary Indexes is exciting which produces localized secondary
indexes for low latency queries (Multi Dimensional Scaling)
● Drop in replacement for Memcache
173. Document Stores: Couchbase > Use Cases
● Internet of Things (direct or indirect receiver/pipeline)
● Mobile data persistence via Couchbase Mobile i.e. field devices with unstable
connections and local/close priximity ingestion points
● Distributed K/V store
174. Document Stores: Couchbase > Use Cases
● Internet of Things (direct or indirect receiver/pipeline)
● Mobile data persistence via Couchbase Mobile i.e. field devices with
unstable connections and local/close priximity ingestion points
● Distributed K/V store
175. Document Stores: Couchbase > Use Cases
● Internet of Things (direct or indirect receiver/pipeline)
● Mobile data persistence via Couchbase Mobile i.e. field devices with unstable
connections and local/close priximity ingestion points
● Distributed K/V store
183. Fulltext Search: Elasticsearch
● Lucene based
● RESTful interface - JSON in, JSON out
● Flexible schema
● Automatic sharding and replication (NDB like)
● Reasonable defaults
● Extension model
● Written in Java, JVM limitation applies i.e. GC
● ELK - Elasticsearch+Logstash+Kibana
184. Fulltext Search: Elasticsearch
● Lucene based
● RESTful interface - JSON in, JSON out
● Flexible schema
● Automatic sharding and replication (NDB like)
● Reasonable defaults
● Extension model
● Written in Java, JVM limitation applies i.e. GC
● ELK - Elasticsearch+Logstash+Kibana
185. Fulltext Search: Elasticsearch
● Lucene based
● RESTful interface - JSON in, JSON out
● Flexible schema
● Automatic sharding and replication (NDB like)
● Reasonable defaults
● Extension model
● Written in Java, JVM limitation applies i.e. GC
● ELK - Elasticsearch+Logstash+Kibana
186. Fulltext Search: Elasticsearch
● Lucene based
● RESTful interface - JSON in, JSON out
● Flexible schema
● Automatic sharding and replication (NDB like)
● Reasonable defaults
● Extension model
● Written in Java, JVM limitation applies i.e. GC
● ELK - Elasticsearch+Logstash+Kibana
187. Fulltext Search: Elasticsearch
● Lucene based
● RESTful interface - JSON in, JSON out
● Flexible schema
● Automatic sharding and replication (NDB like)
● Reasonable defaults
● Extension model
● Written in Java, JVM limitation applies i.e. GC
● ELK - Elasticsearch+Logstash+Kibana
188. Fulltext Search: Elasticsearch
● Lucene based
● RESTful interface - JSON in, JSON out
● Flexible schema
● Automatic sharding and replication (NDB like)
● Reasonable defaults
● Extension model
● Written in Java, JVM limitation applies i.e. GC
● ELK - Elasticsearch+Logstash+Kibana
189. Fulltext Search: Elasticsearch
● Lucene based
● RESTful interface - JSON in, JSON out
● Flexible schema
● Automatic sharding and replication (NDB like)
● Reasonable defaults
● Extension model
● Written in Java, JVM limitation applies i.e. GC
● ELK - Elasticsearch+Logstash+Kibana
190. Fulltext Search: Elasticsearch
● Lucene based
● RESTful interface - JSON in, JSON out
● Flexible schema
● Automatic sharding and replication (NDB like)
● Reasonable defaults
● Extension model
● Written in Java, JVM limitation applies i.e. GC
● ELK - Elasticsearch+Logstash+Kibana
191. Fulltext Search: Elasticsearch > Use Cases
● Logs Analysis - ELK Stack i.e. Netflix
● Full Text search i.e. Github, Wikipedia, StackExchange, etc
● https://www.elastic.co/use-cases
192. Fulltext Search: Elasticsearch > Use Cases
● Logs Analysis - ELK Stack i.e. Netflix
● Full Text search i.e. Github, Wikipedia, StackExchange, etc
● https://www.elastic.co/use-cases
193. Fulltext Search: Elasticsearch > Use Cases
● Logs Analysis - ELK Stack i.e. Netflix
● Full Text search i.e. Github, Wikipedia, StackExchange, etc
● https://www.elastic.co/use-cases
○ Sentiment analysis
○ Personalized experience
○ etc
194. ● Lucene based
● Quite cryptic query interface - Innovator’s Dilemma
● Support for SQL based query on 6.1
● Structured schema, data types needs to be predefined
● Written in Java, JVM limitation applies i.e. GC
● Near realtime indexing - DIH,
● Rich document handling - PDF, doc[x]
● SolrCloud support for sharding and replication
Fulltext Search: Solr
195. ● Lucene based
● Quite cryptic query interface - Innovator’s Dilemma
● Support for SQL based query on 6.1
● Structured schema, data types needs to be predefined
● Written in Java, JVM limitation applies i.e. GC
● Near realtime indexing - DIH,
● Rich document handling - PDF, doc[x]
● SolrCloud support for sharding and replication
Fulltext Search: Solr
196. ● Lucene based
● Quite cryptic query interface - Innovator’s Dilemma
● Support for SQL based query on 6.1
● Structured schema, data types needs to be predefined
● Written in Java, JVM limitation applies i.e. GC
● Near realtime indexing - DIH,
● Rich document handling - PDF, doc[x]
● SolrCloud support for sharding and replication
Fulltext Search: Solr
197. ● Lucene based
● Quite cryptic query interface - Innovator’s Dilemma
● Support for SQL based query on 6.1
● Structured schema, data types needs to be predefined
● Written in Java, JVM limitation applies i.e. GC
● Near realtime indexing - DIH,
● Rich document handling - PDF, doc[x]
● SolrCloud support for sharding and replication
Fulltext Search: Solr
198. ● Lucene based
● Quite cryptic query interface - Innovator’s Dilemma
● Support for SQL based query on 6.1
● Structured schema, data types needs to be predefined
● Written in Java, JVM limitation applies i.e. GC
● Near realtime indexing - DIH,
● Rich document handling - PDF, doc[x]
● SolrCloud support for sharding and replication
Fulltext Search: Solr
199. ● Lucene based
● Quite cryptic query interface - Innovator’s Dilemma
● Support for SQL based query on 6.1
● Structured schema, data types needs to be predefined
● Written in Java, JVM limitation applies i.e. GC
● Near real-time indexing - DIH,
● Rich document handling - PDF, doc[x]
● SolrCloud support for sharding and replication
Fulltext Search: Solr
200. ● Lucene based
● Quite cryptic query interface - Innovator’s Dilemma
● Support for SQL based query on 6.1
● Structured schema, data types needs to be predefined
● Written in Java, JVM limitation applies i.e. GC
● Near realtime indexing - DIH,
● Rich document handling - PDF, doc[x]
● SolrCloud support for sharding and replication
Fulltext Search: Solr
201. ● Lucene based
● Quite cryptic query interface - Innovator’s Dilemma
● Support for SQL based query on 6.1
● Structured schema, data types needs to be predefined
● Written in Java, JVM limitation applies i.e. GC
● Near realtime indexing - DIH,
● Rich document handling - PDF, doc[x]
● SolrCloud support for sharding and replication
Fulltext Search: Solr
202. ● Search and Relevancy
○ https://www.percona.com/live/data-performance-conference-2016/sessions/solr-how-index-10-
billion-phrases-mysql-and-hbase
● Recommendation Engine
● Spatial Search
Fulltext Search: Solr > Use Cases
203. ● Search and Relevancy
● Recommendation Engine
● Spatial Search
Fulltext Search: Solr > Use Cases
204. ● Search and Relevancy
● Recommendation Engine
● Spatial Search
Fulltext Search: Solr > Use Cases
205. ● Structured data
● MySQL protocol - SphinxQL
● Durable indexes via binary logs
● Realtime indexes via MySQL queries
● Distributed index for scaling
● No native support for replication i.e. via rsync
● Very good documentation
● Fastest full indexing/reindexing [?]
Fulltext Search: Sphinx Search
206. ● Structured data
● MySQL protocol - SphinxQL
● Durable indexes via binary logs
● Realtime indexes via MySQL queries
● Distributed index for scaling
● No native support for replication i.e. via rsync
● Very good documentation
● Fastest full indexing/reindexing [?]
Fulltext Search: Sphinx Search
207. ● Structured data
● MySQL protocol - SphinxQL
● Durable indexes via binary logs
● Realtime indexes via MySQL queries
● Distributed index for scaling
● No native support for replication i.e. via rsync
● Very good documentation
● Fastest full indexing/reindexing [?]
Fulltext Search: Sphinx Search
208. ● Structured data
● MySQL protocol - SphinxQL
● Durable indexes via binary logs
● Realtime indexes via MySQL queries
● Distributed index for scaling
● No native support for replication i.e. via rsync
● Very good documentation
● Fastest full indexing/reindexing [?]
Fulltext Search: Sphinx Search
209. ● Structured data
● MySQL protocol - SphinxQL
● Durable indexes via binary logs
● Realtime indexes via MySQL queries
● Distributed index for scaling
● No native support for replication i.e. via rsync
● Very good documentation
● Fastest full indexing/reindexing [?]
Fulltext Search: Sphinx Search
210. ● Structured data
● MySQL protocol - SphinxQL
● Durable indexes via binary logs
● Realtime indexes via MySQL queries
● Distributed index for scaling
● No native support for replication i.e. via rsync
● Very good documentation
● Fastest full indexing/reindexing [?]
Fulltext Search: Sphinx Search
211. ● Structured data
● MySQL protocol - SphinxQL
● Durable indexes via binary logs
● Realtime indexes via MySQL queries
● Distributed index for scaling
● No native support for replication i.e. via rsync
● Very good documentation
● Fastest full indexing/reindexing [?]
Fulltext Search: Sphinx Search
212. ● Structured data
● MySQL protocol - SphinxQL
● Durable indexes via binary logs
● Realtime indexes via MySQL queries
● Distributed index for scaling
● No native support for replication i.e. via rsync
● Very good documentation
● Fastest full indexing/reindexing
Fulltext Search: Sphinx Search
213. ● Real time full text + basic geo functions
● Above with with dependency or to simplify access with SphinxQL or even
Sphinx storage engine for MySQL
Fulltext Search: Sphinx Search > Use Cases
214. ● Real time full text + basic geo functions
● Above with with dependency or to simplify access with SphinxQL or
even Sphinx storage engine for MySQL
Fulltext Search: Sphinx Search > Use Cases