SlideShare ist ein Scribd-Unternehmen logo
1 von 97
Unifying Search Engine and NoSQL DBMS with a Universal Index Chris Biow MarkLogic Federal CTO
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
>200 customers, >170 employees HQ: San Carlos, CA Lead investor:  Sequoia Capital 2008: Top 5 fastest growing technology companies in Silicon Valley (Deloitte) 2009, 2010: Best DBMS (SIIA CODiE). Previously best Search, CMS. About MarkLogic MarkLogic Corporation makes a purpose-built database for unstructured information
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
What is MarkLogic Server? A hybrid (integrated parts) Special purpose DBMS for XML, with enterprise expectations ACID transactions DBA, backup, replication Search engine kernel, with enterprise expectations Full text Faceted navigation, at massive scale  Boolean, proximity, stemming, tokenization, decompounding, case, diacritics, … Application Server HTTP XCC Java/.NET WebDAV
MarkLogic as Special DBMS Not relational (RDBMS) XML The only data model required Schema agnostic Text a first-class citizen among data types XQuery (SQL) Search engine algorithms for many DB queries Order(1) initial lookup in number of docs O(log(n)) in range indexing	 Very low DBA overhead (0.5 FTE / 100 hosts) 5-minute install 5-minute scale-out Database and search engine are the same
MarkLogic as NoSQL DBMS SQL XQuery ! Extensions: cts:search() / xdmp:document-insert() NoSQL Categories [per AKF Partners] Key->Value store URI -> document (XML, JSON, text, binary) Extensible Record store Extensible Markup Language Document store XML documents, natch Differentiators ACID transactions in LAN cluster Ad hoc XQuery XML declares what is to be indexed, independently for each document DBMS and Search Engine are the same
MarkLogic as Special Search Engine timeline Understands document structure Transactional: high CRUD load Unicode Holds the documents Update / reindexing Delivery Geospatial: Box, Point/radius, polygon Alerting: Profiles, alerts, filters, tipping, selectors, “triggers,” …  Analytics: Facets, Co-occurrence, word lexicons, …  Everything composes (e.g. geo-alerting, geo-text-data, search-alerting) Processing near the data Relational joins and inferencing Database and search engine are the same message message @id 3 @id 5 status status Oh boy… Testing XProc Element node Attribute Node Text Node
MarkLogic as Special App Server Native HTTP(S) server RESTful XML by default Transform to HTML Transform to PDF, MS Office, etc. PKI with no dependencies Optionally with external Auth HTTP(S) client XCC Java / .NET server Similar to JDBC / ADO.NET WebDAV Folder on the user’s desktop RESTful Architecture user/ Representations + get  .json.xqy  .xml.xqy + … user.xqy + Get + Put + Post + Delete Resources notes/ URL Rewriter note.xqy Routes.xqy
MarkLogic at Scale Scale up: typically 1-2TB+ XML per server Scale out: low hundreds(++) of servers in a cluster Commodity hardware  Typically ~$15K HW budget per server 2-CPU x 6-core/hyperthreaded 32+ GB RAM 3x disk: local mount with failover OS Linux RHEL 5 Solaris 10 Windows 2003/8 (XP/Vista/7 for dev)
Collapsing the Stack The extended stack Data (CSV: data) DBMS (SQL: result sets) Search Engine (Search languages: result page) Web service (Java: XML, JSON) Application server (Ruby: HTML) MarkLogic Data (XML: xml) DBMS, Search, Service, App (XQuery, XQuery, XQuery, XQuery)
Data Model A database for unstructured (and semi-structured) information  XML Data Model fpML Document Trade Product Title Author Metadata Trade Cashflow Section ID ID Last TradeLeg First TradeLeg Amount TradeLeg Event Event Event Event Section Section Section Section
Example Document Document Title Section Section (cont’d) Author Abstract Section Metadata Section Section Footer
Serialized as XML <article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> 	<abstract> 	Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . .  	</abstract> 	<body> 		<section> 		     <section> … has values which uniquely identify each element ...  </section> 		</section> 		<section>… version of <product>IMS</product> provides the user . . .  </section> 	</body> 	<metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article>
Target Query Classes Target to optimize performance for these kinds of queries Full-Text Search  Find all documents that contain the phrase “uniquely identify”. XML Structure Find all articles that have an abstract. XML Semantics Find all documents that mention the product “IMS”. Aggregate Queries How many articles that contain “data base” were written in each of the last 5 decades. All of the above . . .   	Count all articles that contain “data” in the title and mention the product “IMS” in a section, grouping by year. at the same time
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
1) Full-text Search Find all documents that contain the phrase “uniquely identify” <article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> 	<abstract> 	Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . .  	</abstract> 	<body> 		<section> 		     <section> … has values which uniquely identify each element ...  </section> 		</section> 		<section>… version of <product>IMS</product> provides the user . . .  </section> 	</body> 	<metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article>
1) Full-text Search Find all documents that contain the phrase “uniquely identify” UNIVERSAL INDEX “which” 123, 127, 129, 152, 344, 791 . . .  “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . Document References “each” 123, 130, 131, 135, 162, 177 . . .  “uniquely identify” 126, 130, 167, 212, 219, 377 . . . “which uniquely” . . .  126, 130, 167, 212, 219, 377 . . . “identify each” . . .
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
<article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> 	<abstract> 	Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . .  	</abstract> 	<body> 		<section> 		     <section> … has values which uniquely identify each element ...  </section> 		</section> 		<section>… version of <product>IMS</product> provides the user . . .  </section> 	</body> 	<metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article> 2) XML Structure Find all articles that have an abstract
2) XML Structure Find all articles that have an abstract UNIVERSAL INDEX “which” 123, 127, 129, 152, 344, 791 . . .  “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . Document References “each” 123, 130, 131, 135, 162, 177 . . .  “uniquely identify” 126, 130, 167, 212, 219, 377 . . . <article> . . .  126, 130, 167, 212, 219, 377 . . . <article>/<abstract> . . .
<article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> 	<abstract> 	Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . .  	</abstract> 	<body> 		<section> 		     <section> … has values which uniquely identify each element ...  </section> 		</section> 		<section>… version of <product>IMS</product> provides the user . . .  </section> 	</body> 	<metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article> 3) XML Semantics Find all documents that mention the product “IMS”
3) XML Semantics Find all documents that mention the product “IMS” UNIVERSAL INDEX “which” 123, 127, 129, 152, 344, 791 . . .  “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . Document References “each” 123, 130, 131, 135, 162, 177 . . .  “uniquely identify” 126, 130, 167, 212, 219, 377 . . . <article> . . .  126, 130, 167, 212, 219, 377 . . . <article>/<abstract> . . .  <product>IMS</product>
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
<article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> 	<abstract> 	Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . .  	</abstract> 	<body> 		<section> 		     <section> … has values which uniquely identify each element ...  </section> 		</section> 		<section>… version of <product>IMS</product> provides the user . . .  </section> 	</body> 	<metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article> 4) Aggregate Queries How many of the articles that contain “data base” were written in each of the last 5 decades?
4) Aggregates How many of the articles that contain “data base” were written in each of the last 5 decades? UNIVERSAL INDEX “which” 123, 127, 129, 152, 344, 791 . . .  “uniquely” 122, 125, 126, 129, 130, 167 . . . YEAR “identify” 123, 126, 130, 142, 143, 167 . . . Document References “each” 123, 130, 131, 135, 162, 177 . . .  “data base” 126, 130, 167, 212, 219, 377 . . . <article> . . .  126, 130, 167, … <article>/<abstract> . . .  <product>IMS</product> Volume
<article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> 	<abstract> 	Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . .  	</abstract> 	<body> 		<section> 		     <section> … has values which uniquely identify each element ...  </section> 		</section> 		<section>… version of <product>IMS</product> provides the user . . .  </section> 	</body> 	<metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article> 5) All Of The Above Count all articles that contain “data” in the title and mention the product IMS in a section, grouping by year.
The Universal Index Range Indexes UNIVERSAL INDEX Term Term List “which” 123, 127, 129, 152, 344, 791 . . .  “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . “each” 123, 130, 131, 135, 162, 177 . . .  Document References “data base” 126, 130, 167, 212, 219, 377 . . . <article> . . .  <article>/<abstract> . . .  126, 130, 167, … <product>IMS</product>
Additional Uses For Universal Index Directories Exclusive, hierarchical, analogous to file system URI: /some/directory/hierarchy/me.xml Collections Set-based, N:M relationship Document URI : Collection URI Security Invisible to your app Document: Role, action
The Universal Index Range Indexes UNIVERSAL INDEX Term Term List “which” 123, 127, 129, 152, 344, 791 . . .  “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . “each” 123, 130, 131, 135, 162, 177 . . .  Document References “data base” 126, 130, 167, 212, 219, 377 . . . <article> . . .  <article>/<abstract> . . .  126, 130, 167, … <product>IMS</product> Directory(“/articles”) Collection(Red) Role:Editor + Action:Read
Universal Index: Schema Agnostic XML is self-describing <article> <title>MarkLogic Server: . . .</title> <author> 		<first-name>Dale</first-name> 		<last-name>Kim</last-name> 	</author> 	<abstract> 		. . . .<company>Mark Logic</company> </abstract> 	<body> 		<section> 			<section>. . .</section> 		</section> 		<section>. . . index . . . </section> 	</body> 	<copyright>Copyright©  . . . </copyright> </article>
Load As Is XML is self-describing <article> <title>MarkLogic Server: . . .</title> <author> 		<first-name>Dale</first-name> 		<last-name>Kim</last-name> 	</author> 	<abstract> 		. . . .<company>MarkLogic</company> </abstract> 	<body> 		<section> 			<section>. . .</section> 		</section> 		<section>. . . index . . . </section> 	</body> 	<copyright>Copyright©  . . . </copyright> </article> <article> <title> MarkLogic Server: . . . <author> <first-name> Dale <last-name> Kim <abstract> <company> MarkLogic <body> <section> <section> <section> . . . index. . .  <copyright>
Load As Is XML is self-describing <article> <body> <copyright> <title> <author> <abstract> "MarkLogic Server: . . ." <company> " . . . " " . . . " <first-name> <section> <section> <last-name> “ . . . " <section> "Dale" "MarkLogic" " . . . index. . . " "Kim" " . . . "
Load As Is XML is self-describing No Schema Needed! <article> <body> <copyright> <title> <author> <abstract> "MarkLogic Server: . . ." <company> " . . . " " . . . " <first-name> <section> <section> <last-name> “ . . . " <section> "Dale" "MarkLogic" " . . . index. . . " "Kim" " . . . "
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
Spatial Indexing Points ordered in latitude major order; special scan operators apply geospatial query constraints GEOSPATIAL INDEX 130 0, -124 123 -10,10.5 126 127 0,0 -10,10.5 167 0,0 126 0,-167 ... 130 0, -29 113 0,-12 126 0,0 167 0,0 113 10.1, 35.553 …
Spatial Query Data examples Latitude / Longitude Any other pair (e.g. volume / price) Query types Point (exact value) Point-Radius (circle) Lat/Lon bound (Mercator “rectangle”) Polygon (10K+ vertices) Composition with… Full Text XML structure XML semantics Other range indexes (e.g. temporal)
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
Query Registration Canonicalize a cts:query() Hash for an ID Resolve the query Cache the term list in memory Reuse as materialized sub-query (AKA Topics, Concepts, Macros, etc.) (This is not alerting)
Registered Query Range Indexes UNIVERSAL INDEX Term Term List “which” 123, 127, 129, 152, 344, 791 . . .  “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . “each” 123, 130, 131, 135, 162, 177 . . .  Document References “data base” 126, 130, 167, 212, 219, 377 . . . <article> . . .  <article>/<abstract> . . .  126, 130, 167, … <product>IMS</product> Directory(“/articles/”) Collection(Red) Role:Editor + Action:Read cts:query(<cts:word-query><cts:text>…)
Query Indexing “Alerting” Real-time search, selectors, tippers, standing queries, filters, “triggers*”, content-based routing, stream DBMS, etc. Search(query, Index[docs]) -> docs Alert(doc, Index[queries]) -> queries Queries are XML documents XML serialization of cts:query()  First step is O(1) in number of queries O(n) in results returned cts:reverse-query() * MarkLogic also has pre- and post-commit DBMS triggers, which are unrelated Alert New patient matching your study profile
The Reverse Index REVERSE INDEX Query Document References Query Unified Expression Trees year >= 1970 and (“data” and “size”) 437 and “data” and year > 2003 and (“data” and “web”) “size” year < 2000 and (“data” and “web”) 562 “web” and and (2000 <= year <= 2010) and “web” and 597 . . .  year >= 1970 and 623 year < 2000 year >= 2000 and year > 2003 year <= 2010
Alerting in Composition Scalar Query on scalar data with range queries Alert on range data with scalar reverse-queries Geospatial Query on point data with box, circle, polygon query constraint Compose with text, structure, XML-semantic query Alert on box, circle, polygon data with point reverse-query Compose with text, structure, XML-semantic data Search [Forward-]Query composes with Boolean operations (AND, OR, NOT, (()()(()())) Reverse- and forward-query compose (AND, OR, NOT, (()()(()())) Why would you ever want to do that?
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
Search Composed with Alerting In Soviet Russia, the document searches YOU! If you express yourself as XML Documents are XML Elements Text Typed data Serialized query against documents [cts:query()] Attributes Typed data Composition  Boolean (AND) incorporating [forward-]query and reverse-query
Matchmaking Constraints upon each other Matching pairs or one-to-many Examples Suitable date:  man / woman mixed pool Employment: job / resume Medication: patient  / drug Search security: document  / user Battle: target  / shooter Carpool ride: driver / rider
Carpool Driver Non-smoking woman driving from San Ramon to San Carlos, leaving at 8AM, listens to rock, pop, hip-hop, wants $10 for gas Requires female passenger within five miles of start and end Passenger Woman will pay up to $20 From: 3001 Summit View Dr, San Ramon, CA 94582 To: 400 Concourse Drive, Belmont, CA 94002 Requires non-smoking car  Won’t listen to country music
Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119)  (: San Carlos :) return xdmp:document-insert( "/driver.xml",   <driver>     <from>{$from}</from>     <to>{$to}</to>     <when>2010-01-20T08:00:00-08:00</when>     <gender>female</gender>     <smoke>no</smoke>     <music>rock, pop, hip-hop</music>     <cost>10</cost>     <preferences>       {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"),  cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to))        ))       }     </preferences> </driver>)
Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119)  (: San Carlos :) return xdmp:document-insert(   "/driver.xml",   <driver> <from>{$from}</from> <to>{$to}</to> <when>2010-01-20T08:00:00-08:00</when>     <gender>female</gender>     <smoke>no</smoke>     <music>rock, pop, hip-hop</music>     <cost>10</cost>     <preferences>       {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"),  cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to))        ))       } ...
Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119)  (: San Carlos :) return xdmp:document-insert(   "/driver.xml",   <driver>     <from>{$from}</from>     <to>{$to}</to>     <when>2010-01-20T08:00:00-08:00</when> <gender>female</gender>     <smoke>no</smoke>     <music>rock, pop, hip-hop</music>     <cost>10</cost>     <preferences>       {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"),  cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to))        ))       } ...
Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119)  (: San Carlos :) return xdmp:document-insert(   "/driver.xml",   <driver>     <from>{$from}</from>     <to>{$to}</to>     <when>2010-01-20T08:00:00-08:00</when>     <gender>female</gender>     <smoke>no</smoke>     <music>rock, pop, hip-hop</music>     <cost>10</cost>     <preferences> {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"),  cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to)) ))  }     </preferences>   </driver>)
Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119)  (: San Carlos :) return xdmp:document-insert(   "/driver.xml",   <driver>     <from>{$from}</from>     <to>{$to}</to>     <when>2010-01-20T08:00:00-08:00</when>     <gender>female</gender>     <smoke>no</smoke>     <music>rock, pop, hip-hop</music>     <cost>10</cost>     <preferences>       {cts:and-query(( cts:element-value-query(xs:QName("gender"),          "female"), cts:element-geospatial-query(xs:QName("from"),  cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to))        ))       }     </preferences>   </driver>)
Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119)  (: San Carlos :) return xdmp:document-insert(   "/driver.xml",   <driver>     <from>{$from}</from>     <to>{$to}</to>     <when>2010-01-20T08:00:00-08:00</when>     <gender>female</gender>     <smoke>no</smoke>     <music>rock, pop, hip-hop</music>     <cost>10</cost>     <preferences>       {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"),  cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"),  cts:circle(5, $to)) ...
Driver xdmp:document-insert( "/passenger.xml",   <passenger>     <from>37.739976,-121.915821</from> 	 <to>37.53244,-122.270969</to>     <gender>female</gender>     <preferences>       { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female")         ))       }     </preferences>   </passenger>)
Driver xdmp:document-insert(   "/passenger.xml",   <passenger>     <from>37.739976,-121.915821</from>     <to>37.53244,-122.270969</to>     <gender>female</gender>     <preferences>       { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female")         ))       }     </preferences>   </passenger>)
Driver xdmp:document-insert(   "/passenger.xml",   <passenger>     <from>37.739976,-121.915821</from>     <to>37.53244,-122.270969</to>     <gender>female</gender>     <preferences>       { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female")         ))       }     </preferences>   </passenger>)
Driver xdmp:document-insert(   "/passenger.xml",   <passenger>     <from>37.739976,-121.915821</from> 	 <to>37.53244,-122.270969</to>     <gender>female</gender>     <preferences>       { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female") )) }     </preferences>   </passenger>)
Driver xdmp:document-insert(   "/passenger.xml",   <passenger>     <from>37.739976,-121.915821</from> 	 <to>37.53244,-122.270969</to>     <gender>female</gender>     <preferences>       { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female")         ))       }     </preferences>   </passenger>)
Driver xdmp:document-insert(   "/passenger.xml",   <passenger>     <from>37.739976,-121.915821</from> 	 <to>37.53244,-122.270969</to>     <gender>female</gender>     <preferences>       { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"),           "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female")         ))       }     </preferences>   </passenger>)
Driver xdmp:document-insert(   "/passenger.xml",   <passenger>     <from>37.739976,-121.915821</from> 	 <to>37.53244,-122.270969</to>     <gender>female</gender>     <preferences>       { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female")         ))       }     </preferences>   </passenger>)
Driver xdmp:document-insert(   "/passenger.xml",   <passenger>     <from>37.739976,-121.915821</from> 	 <to>37.53244,-122.270969</to>     <gender>female</gender>     <preferences>       { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"),           "female")         ))       }     </preferences>   </passenger>)
Driver (: I'm the driver, find me passengers :) let $me := fn:doc("/driver.xml")/driver for $match in cts:search(/passenger, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me))))[1 to 3] return fn:base-uri($match)
Driver (: I'm the driver, find me passengers :) let $me := fn:doc("/driver.xml")/driver for $match in cts:search(/passenger, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me))))[1 to 3] return fn:base-uri($match)
Driver (: I'm the driver, find me passengers :) let $me := fn:doc("/driver.xml")/driver for $match in cts:search(/passenger, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me))))[1 to 3] return fn:base-uri($match)
Driver (: I'm the driver, find me passengers :) let $me := fn:doc("/driver.xml")/driver for $match in cts:search(/passenger, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me))))[1 to 3] return fn:base-uri($match)
Driver (: I'm a passenger, find me a driver :) let $me := fn:doc("/passenger.xml")/passenger for $match in cts:search(/driver, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me)))) return fn:base-uri($match)
Driver (: I'm a passenger, find me a driver :) let $me := fn:doc("/passenger.xml")/passenger for $match in cts:search(/driver, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me)))) return fn:base-uri($match)
Search and Alerting Composed XML data typing expectations  In both directions Arbitrary schema  At each document In each query Don’t [have to] declare anything You choose the logic for empty data Specify typed range indexes for scalar, geo, faceting Search engine speed and scalability O(1) term lookup Word, structure, values Query sub-expression O(log(n)) range lookup and term list intersection Shared-nothing (sharded) query evaluation
Document Security Document queries the user Rules for who can see me, the document Open, ad-hoc security [lack of] model Each document declares any rules it wants “user eye color not blue” Descriptive model/schema if desired  Extensible without changing DBMS schema
Medication Patient Diagnosis Background Idiopathic history Vital Statistics Treatment baseline Drug Therapeutics Side effects Interactions Contraindications
What is a strange loop? Not mere hierarchies of abstraction Abstraction is simplification and distortion Useful if bounded and well-ordered Problems if poorly bounded: Watch Ourselves Strange loops are disorderings of the hierarchy: heterarchy Hofstadter: “a paradoxical level-crossing feedback loop” Good strange loops gracefully accommodate the disordering Established examples in Computer Science Godel’s Incompleteness Theorems Self-compiling languages
Strange Loop: Fwd∘Rev Query Queries are an abstraction over the data /myroot//foo[@bar=7]/bash Reverse-query Indexed data are [serialized] queries.  Documents are the queries.  Queries can still be abstractions over a stream of docs. Composing forward and reverse query Strange Loop! Escher Drawing Hands
Strange Loop: XQuery throughout the Stack Query language is an abstraction over the data But the query language is re-used in the application The [No]SQL is the PL Declarative query becomes functional programming Creeping Lazy evaluation Parallelization Discarding unneeded work  Schrödinger’s tuple Not religious side effects available when required DBMS transactions xdmp:set() Escher Waterfall
Strange Loop: XQuery on XQuery In SVN Project organization No search Dependency tracking Import requires absolute paths Namespace prefix conflicts Surprise modules (functx) XQuery (with cts: extensions) to discover and automate imports
The Composable, Universal Index Full text XML structure XML semantics Range indexes Range queries Aggregations Co-occurrence Spatial indexes Query indexes
Database of documents Stored in partitions Database Partition3 Partition2 Partition1 Databases
Simple Architecture Host partition1 partition2 partition3
Shared Nothing Architecture Host 1 Host 2 partition1 partition2 partition3
Bi-Directionally Scalable Architecture Host 1 Host 3 Host 2 Host 4 Host 5 Host 6 Host k partition1 partition2 partition3 partitionm partition4
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
Multiversion Concurrency Control /articles/codd.xml /articles/codd.xml Document Document Title Title Author Author Metadata Metadata Section Section Last Last Year ∞ 628 ∞ 523 First First 628 ∞ Section Section Section Section Section Section Section Section Section Section c Creation Timestamp d Deleted Timestamp
Multiversion Concurrency Benefits High Throughput Queries don’t require locks Queries and Updates do not conflict ACID Cluster consistency: 2-phase commit Zero-latency ingestion and Indexing Append Only Ingest/update rates of ~400GB per partition per day /articles/codd.xml Document Title Author Metadata Section Last Year First 628 ∞ Section Section Section Section Section
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
A Single Forest Host Stand1 Stand2 Standn Buffer Buffer Forestk …
1. Create A New Tree Host Stand1 Stand2 Standn Buffer Buffer Forestk …
2. Expire Trees Host Stand1 Stand2 Standn Buffer Buffer Forestk …
3. Save A Buffer To Disk Host Stand1 Stand2 Standn Buffer Buffer Forestk …
4. Optimization: Merge Stands Host Buffer Forestk
The Four Forest Operations Create a new document ,[object Object],Mark a document as expired ,[object Object],Write buffer out to disk ,[object Object]
For performance, double bufferMerge ,[object Object]
Optimization: reduces number of stands in forest,[object Object]
How Did We Get Here? Founder: Christopher Lindblad MIT Architect of Ultraseek Server Intranet seach engine product Met people that wanted to use a search engine like a database Rich query language Guaranteed correctness Transactions
So We Built XML as data model Ad hoc schema A search engine core Universal Index Transaction model based on multiversion concurrency High throughput while keeping . . . Performance and scalability of a search engine
Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
Who Uses MarkLogic? Magazine Publishing Education Healthcare Software / Services Legal Tax Financial Enterprise Aggregation Scientific Technical Medical
Intelligence Community Department of Defense Selected Federal Customers Office of the Director of National Intelligence Intelligence Community Enterprise Services … Office of the Secretary of Defense US Army US Air Force Defense Information Systems Agency Defense Contract Management Agency Civilian ,[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Hidden automation-the-power-of-gel-scripting
Hidden automation-the-power-of-gel-scriptingHidden automation-the-power-of-gel-scripting
Hidden automation-the-power-of-gel-scripting
Prashank Singh
 
Easy data-with-spring-data-jpa
Easy data-with-spring-data-jpaEasy data-with-spring-data-jpa
Easy data-with-spring-data-jpa
Staples
 

Was ist angesagt? (14)

NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 
NoSQL Endgame JCON Conference 2020
NoSQL Endgame JCON Conference 2020NoSQL Endgame JCON Conference 2020
NoSQL Endgame JCON Conference 2020
 
JSON in der Oracle Datenbank
JSON in der Oracle DatenbankJSON in der Oracle Datenbank
JSON in der Oracle Datenbank
 
Tutorial: Building Your First App with MongoDB Stitch
Tutorial: Building Your First App with MongoDB StitchTutorial: Building Your First App with MongoDB Stitch
Tutorial: Building Your First App with MongoDB Stitch
 
FIWARE Global Summit - NGSI-LD: Modelling, Linking and Utilizing Context Info...
FIWARE Global Summit - NGSI-LD: Modelling, Linking and Utilizing Context Info...FIWARE Global Summit - NGSI-LD: Modelling, Linking and Utilizing Context Info...
FIWARE Global Summit - NGSI-LD: Modelling, Linking and Utilizing Context Info...
 
Hidden automation-the-power-of-gel-scripting
Hidden automation-the-power-of-gel-scriptingHidden automation-the-power-of-gel-scripting
Hidden automation-the-power-of-gel-scripting
 
NoSQL Endgame LWJUG 2021
NoSQL Endgame LWJUG 2021NoSQL Endgame LWJUG 2021
NoSQL Endgame LWJUG 2021
 
Data Analytics: Understanding Your MongoDB Data
Data Analytics: Understanding Your MongoDB DataData Analytics: Understanding Your MongoDB Data
Data Analytics: Understanding Your MongoDB Data
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Easy data-with-spring-data-jpa
Easy data-with-spring-data-jpaEasy data-with-spring-data-jpa
Easy data-with-spring-data-jpa
 
Devoxx08 - Nuxeo Core, JCR 2, CMIS
Devoxx08 - Nuxeo Core, JCR 2, CMIS Devoxx08 - Nuxeo Core, JCR 2, CMIS
Devoxx08 - Nuxeo Core, JCR 2, CMIS
 
Spring Data in 10 minutes
Spring Data in 10 minutesSpring Data in 10 minutes
Spring Data in 10 minutes
 
Paintfree Object-Document Mapping for MongoDB by Philipp Krenn
Paintfree Object-Document Mapping for MongoDB by Philipp KrennPaintfree Object-Document Mapping for MongoDB by Philipp Krenn
Paintfree Object-Document Mapping for MongoDB by Philipp Krenn
 

Ähnlich wie Mark Logic StrangeLoop 2010

NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
askankit
 
Letting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentLetting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search Component
Jay Luker
 

Ähnlich wie Mark Logic StrangeLoop 2010 (20)

New Directions in Metadata
New Directions in MetadataNew Directions in Metadata
New Directions in Metadata
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
 
Struts2
Struts2Struts2
Struts2
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
 
Relational data as_xml
Relational data as_xmlRelational data as_xml
Relational data as_xml
 
Working With XML in IDS Applications
Working With XML in IDS ApplicationsWorking With XML in IDS Applications
Working With XML in IDS Applications
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
Itemscript, a specification for RESTful JSON integration
Itemscript, a specification for RESTful JSON integrationItemscript, a specification for RESTful JSON integration
Itemscript, a specification for RESTful JSON integration
 
XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7
 
Odp
OdpOdp
Odp
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)
 
Text tagging with finite state transducers
Text tagging with finite state transducersText tagging with finite state transducers
Text tagging with finite state transducers
 
Metastudio DRM. WhitePaper (eng)
Metastudio DRM. WhitePaper (eng)Metastudio DRM. WhitePaper (eng)
Metastudio DRM. WhitePaper (eng)
 
Letting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentLetting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search Component
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App development
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Mark Logic StrangeLoop 2010

  • 1. Unifying Search Engine and NoSQL DBMS with a Universal Index Chris Biow MarkLogic Federal CTO
  • 2. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 3. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 4. >200 customers, >170 employees HQ: San Carlos, CA Lead investor: Sequoia Capital 2008: Top 5 fastest growing technology companies in Silicon Valley (Deloitte) 2009, 2010: Best DBMS (SIIA CODiE). Previously best Search, CMS. About MarkLogic MarkLogic Corporation makes a purpose-built database for unstructured information
  • 5. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 6. What is MarkLogic Server? A hybrid (integrated parts) Special purpose DBMS for XML, with enterprise expectations ACID transactions DBA, backup, replication Search engine kernel, with enterprise expectations Full text Faceted navigation, at massive scale Boolean, proximity, stemming, tokenization, decompounding, case, diacritics, … Application Server HTTP XCC Java/.NET WebDAV
  • 7. MarkLogic as Special DBMS Not relational (RDBMS) XML The only data model required Schema agnostic Text a first-class citizen among data types XQuery (SQL) Search engine algorithms for many DB queries Order(1) initial lookup in number of docs O(log(n)) in range indexing Very low DBA overhead (0.5 FTE / 100 hosts) 5-minute install 5-minute scale-out Database and search engine are the same
  • 8. MarkLogic as NoSQL DBMS SQL XQuery ! Extensions: cts:search() / xdmp:document-insert() NoSQL Categories [per AKF Partners] Key->Value store URI -> document (XML, JSON, text, binary) Extensible Record store Extensible Markup Language Document store XML documents, natch Differentiators ACID transactions in LAN cluster Ad hoc XQuery XML declares what is to be indexed, independently for each document DBMS and Search Engine are the same
  • 9. MarkLogic as Special Search Engine timeline Understands document structure Transactional: high CRUD load Unicode Holds the documents Update / reindexing Delivery Geospatial: Box, Point/radius, polygon Alerting: Profiles, alerts, filters, tipping, selectors, “triggers,” … Analytics: Facets, Co-occurrence, word lexicons, … Everything composes (e.g. geo-alerting, geo-text-data, search-alerting) Processing near the data Relational joins and inferencing Database and search engine are the same message message @id 3 @id 5 status status Oh boy… Testing XProc Element node Attribute Node Text Node
  • 10. MarkLogic as Special App Server Native HTTP(S) server RESTful XML by default Transform to HTML Transform to PDF, MS Office, etc. PKI with no dependencies Optionally with external Auth HTTP(S) client XCC Java / .NET server Similar to JDBC / ADO.NET WebDAV Folder on the user’s desktop RESTful Architecture user/ Representations + get .json.xqy .xml.xqy + … user.xqy + Get + Put + Post + Delete Resources notes/ URL Rewriter note.xqy Routes.xqy
  • 11. MarkLogic at Scale Scale up: typically 1-2TB+ XML per server Scale out: low hundreds(++) of servers in a cluster Commodity hardware Typically ~$15K HW budget per server 2-CPU x 6-core/hyperthreaded 32+ GB RAM 3x disk: local mount with failover OS Linux RHEL 5 Solaris 10 Windows 2003/8 (XP/Vista/7 for dev)
  • 12. Collapsing the Stack The extended stack Data (CSV: data) DBMS (SQL: result sets) Search Engine (Search languages: result page) Web service (Java: XML, JSON) Application server (Ruby: HTML) MarkLogic Data (XML: xml) DBMS, Search, Service, App (XQuery, XQuery, XQuery, XQuery)
  • 13. Data Model A database for unstructured (and semi-structured) information XML Data Model fpML Document Trade Product Title Author Metadata Trade Cashflow Section ID ID Last TradeLeg First TradeLeg Amount TradeLeg Event Event Event Event Section Section Section Section
  • 14. Example Document Document Title Section Section (cont’d) Author Abstract Section Metadata Section Section Footer
  • 15. Serialized as XML <article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> <abstract> Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . . </abstract> <body> <section> <section> … has values which uniquely identify each element ... </section> </section> <section>… version of <product>IMS</product> provides the user . . . </section> </body> <metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article>
  • 16. Target Query Classes Target to optimize performance for these kinds of queries Full-Text Search Find all documents that contain the phrase “uniquely identify”. XML Structure Find all articles that have an abstract. XML Semantics Find all documents that mention the product “IMS”. Aggregate Queries How many articles that contain “data base” were written in each of the last 5 decades. All of the above . . . Count all articles that contain “data” in the title and mention the product “IMS” in a section, grouping by year. at the same time
  • 17. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 18. 1) Full-text Search Find all documents that contain the phrase “uniquely identify” <article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> <abstract> Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . . </abstract> <body> <section> <section> … has values which uniquely identify each element ... </section> </section> <section>… version of <product>IMS</product> provides the user . . . </section> </body> <metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article>
  • 19. 1) Full-text Search Find all documents that contain the phrase “uniquely identify” UNIVERSAL INDEX “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . Document References “each” 123, 130, 131, 135, 162, 177 . . . “uniquely identify” 126, 130, 167, 212, 219, 377 . . . “which uniquely” . . . 126, 130, 167, 212, 219, 377 . . . “identify each” . . .
  • 20. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 21. <article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> <abstract> Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . . </abstract> <body> <section> <section> … has values which uniquely identify each element ... </section> </section> <section>… version of <product>IMS</product> provides the user . . . </section> </body> <metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article> 2) XML Structure Find all articles that have an abstract
  • 22. 2) XML Structure Find all articles that have an abstract UNIVERSAL INDEX “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . Document References “each” 123, 130, 131, 135, 162, 177 . . . “uniquely identify” 126, 130, 167, 212, 219, 377 . . . <article> . . . 126, 130, 167, 212, 219, 377 . . . <article>/<abstract> . . .
  • 23. <article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> <abstract> Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . . </abstract> <body> <section> <section> … has values which uniquely identify each element ... </section> </section> <section>… version of <product>IMS</product> provides the user . . . </section> </body> <metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article> 3) XML Semantics Find all documents that mention the product “IMS”
  • 24. 3) XML Semantics Find all documents that mention the product “IMS” UNIVERSAL INDEX “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . Document References “each” 123, 130, 131, 135, 162, 177 . . . “uniquely identify” 126, 130, 167, 212, 219, 377 . . . <article> . . . 126, 130, 167, 212, 219, 377 . . . <article>/<abstract> . . . <product>IMS</product>
  • 25. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 26. <article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> <abstract> Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . . </abstract> <body> <section> <section> … has values which uniquely identify each element ... </section> </section> <section>… version of <product>IMS</product> provides the user . . . </section> </body> <metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article> 4) Aggregate Queries How many of the articles that contain “data base” were written in each of the last 5 decades?
  • 27. 4) Aggregates How many of the articles that contain “data base” were written in each of the last 5 decades? UNIVERSAL INDEX “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . YEAR “identify” 123, 126, 130, 142, 143, 167 . . . Document References “each” 123, 130, 131, 135, 162, 177 . . . “data base” 126, 130, 167, 212, 219, 377 . . . <article> . . . 126, 130, 167, … <article>/<abstract> . . . <product>IMS</product> Volume
  • 28. <article> <title>A Relational Model of Data for Large Shared Data Banks</title> <author><first-name>Edgar</first-name><last-name>Codd</last-name></author> <abstract> Future users of data banks must be protected from having to know how the data is organized in the machine (the internal representation). . . . Changes in data representation will often be needed . . . </abstract> <body> <section> <section> … has values which uniquely identify each element ... </section> </section> <section>… version of <product>IMS</product> provides the user . . . </section> </body> <metadata><vol>13</vol><number>6</number><year>1970</year></metadata> </article> 5) All Of The Above Count all articles that contain “data” in the title and mention the product IMS in a section, grouping by year.
  • 29. The Universal Index Range Indexes UNIVERSAL INDEX Term Term List “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . “each” 123, 130, 131, 135, 162, 177 . . . Document References “data base” 126, 130, 167, 212, 219, 377 . . . <article> . . . <article>/<abstract> . . . 126, 130, 167, … <product>IMS</product>
  • 30. Additional Uses For Universal Index Directories Exclusive, hierarchical, analogous to file system URI: /some/directory/hierarchy/me.xml Collections Set-based, N:M relationship Document URI : Collection URI Security Invisible to your app Document: Role, action
  • 31. The Universal Index Range Indexes UNIVERSAL INDEX Term Term List “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . “each” 123, 130, 131, 135, 162, 177 . . . Document References “data base” 126, 130, 167, 212, 219, 377 . . . <article> . . . <article>/<abstract> . . . 126, 130, 167, … <product>IMS</product> Directory(“/articles”) Collection(Red) Role:Editor + Action:Read
  • 32. Universal Index: Schema Agnostic XML is self-describing <article> <title>MarkLogic Server: . . .</title> <author> <first-name>Dale</first-name> <last-name>Kim</last-name> </author> <abstract> . . . .<company>Mark Logic</company> </abstract> <body> <section> <section>. . .</section> </section> <section>. . . index . . . </section> </body> <copyright>Copyright© . . . </copyright> </article>
  • 33. Load As Is XML is self-describing <article> <title>MarkLogic Server: . . .</title> <author> <first-name>Dale</first-name> <last-name>Kim</last-name> </author> <abstract> . . . .<company>MarkLogic</company> </abstract> <body> <section> <section>. . .</section> </section> <section>. . . index . . . </section> </body> <copyright>Copyright© . . . </copyright> </article> <article> <title> MarkLogic Server: . . . <author> <first-name> Dale <last-name> Kim <abstract> <company> MarkLogic <body> <section> <section> <section> . . . index. . . <copyright>
  • 34. Load As Is XML is self-describing <article> <body> <copyright> <title> <author> <abstract> "MarkLogic Server: . . ." <company> " . . . " " . . . " <first-name> <section> <section> <last-name> “ . . . " <section> "Dale" "MarkLogic" " . . . index. . . " "Kim" " . . . "
  • 35. Load As Is XML is self-describing No Schema Needed! <article> <body> <copyright> <title> <author> <abstract> "MarkLogic Server: . . ." <company> " . . . " " . . . " <first-name> <section> <section> <last-name> “ . . . " <section> "Dale" "MarkLogic" " . . . index. . . " "Kim" " . . . "
  • 36. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 37. Spatial Indexing Points ordered in latitude major order; special scan operators apply geospatial query constraints GEOSPATIAL INDEX 130 0, -124 123 -10,10.5 126 127 0,0 -10,10.5 167 0,0 126 0,-167 ... 130 0, -29 113 0,-12 126 0,0 167 0,0 113 10.1, 35.553 …
  • 38. Spatial Query Data examples Latitude / Longitude Any other pair (e.g. volume / price) Query types Point (exact value) Point-Radius (circle) Lat/Lon bound (Mercator “rectangle”) Polygon (10K+ vertices) Composition with… Full Text XML structure XML semantics Other range indexes (e.g. temporal)
  • 39. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 40. Query Registration Canonicalize a cts:query() Hash for an ID Resolve the query Cache the term list in memory Reuse as materialized sub-query (AKA Topics, Concepts, Macros, etc.) (This is not alerting)
  • 41. Registered Query Range Indexes UNIVERSAL INDEX Term Term List “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . “each” 123, 130, 131, 135, 162, 177 . . . Document References “data base” 126, 130, 167, 212, 219, 377 . . . <article> . . . <article>/<abstract> . . . 126, 130, 167, … <product>IMS</product> Directory(“/articles/”) Collection(Red) Role:Editor + Action:Read cts:query(<cts:word-query><cts:text>…)
  • 42. Query Indexing “Alerting” Real-time search, selectors, tippers, standing queries, filters, “triggers*”, content-based routing, stream DBMS, etc. Search(query, Index[docs]) -> docs Alert(doc, Index[queries]) -> queries Queries are XML documents XML serialization of cts:query() First step is O(1) in number of queries O(n) in results returned cts:reverse-query() * MarkLogic also has pre- and post-commit DBMS triggers, which are unrelated Alert New patient matching your study profile
  • 43. The Reverse Index REVERSE INDEX Query Document References Query Unified Expression Trees year >= 1970 and (“data” and “size”) 437 and “data” and year > 2003 and (“data” and “web”) “size” year < 2000 and (“data” and “web”) 562 “web” and and (2000 <= year <= 2010) and “web” and 597 . . . year >= 1970 and 623 year < 2000 year >= 2000 and year > 2003 year <= 2010
  • 44. Alerting in Composition Scalar Query on scalar data with range queries Alert on range data with scalar reverse-queries Geospatial Query on point data with box, circle, polygon query constraint Compose with text, structure, XML-semantic query Alert on box, circle, polygon data with point reverse-query Compose with text, structure, XML-semantic data Search [Forward-]Query composes with Boolean operations (AND, OR, NOT, (()()(()())) Reverse- and forward-query compose (AND, OR, NOT, (()()(()())) Why would you ever want to do that?
  • 45. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 46. Search Composed with Alerting In Soviet Russia, the document searches YOU! If you express yourself as XML Documents are XML Elements Text Typed data Serialized query against documents [cts:query()] Attributes Typed data Composition Boolean (AND) incorporating [forward-]query and reverse-query
  • 47. Matchmaking Constraints upon each other Matching pairs or one-to-many Examples Suitable date: man / woman mixed pool Employment: job / resume Medication: patient / drug Search security: document / user Battle: target / shooter Carpool ride: driver / rider
  • 48. Carpool Driver Non-smoking woman driving from San Ramon to San Carlos, leaving at 8AM, listens to rock, pop, hip-hop, wants $10 for gas Requires female passenger within five miles of start and end Passenger Woman will pay up to $20 From: 3001 Summit View Dr, San Ramon, CA 94582 To: 400 Concourse Drive, Belmont, CA 94002 Requires non-smoking car Won’t listen to country music
  • 49. Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119) (: San Carlos :) return xdmp:document-insert( "/driver.xml", <driver> <from>{$from}</from> <to>{$to}</to> <when>2010-01-20T08:00:00-08:00</when> <gender>female</gender> <smoke>no</smoke> <music>rock, pop, hip-hop</music> <cost>10</cost> <preferences> {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"), cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to)) )) } </preferences> </driver>)
  • 50. Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119) (: San Carlos :) return xdmp:document-insert( "/driver.xml", <driver> <from>{$from}</from> <to>{$to}</to> <when>2010-01-20T08:00:00-08:00</when> <gender>female</gender> <smoke>no</smoke> <music>rock, pop, hip-hop</music> <cost>10</cost> <preferences> {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"), cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to)) )) } ...
  • 51. Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119) (: San Carlos :) return xdmp:document-insert( "/driver.xml", <driver> <from>{$from}</from> <to>{$to}</to> <when>2010-01-20T08:00:00-08:00</when> <gender>female</gender> <smoke>no</smoke> <music>rock, pop, hip-hop</music> <cost>10</cost> <preferences> {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"), cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to)) )) } ...
  • 52. Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119) (: San Carlos :) return xdmp:document-insert( "/driver.xml", <driver> <from>{$from}</from> <to>{$to}</to> <when>2010-01-20T08:00:00-08:00</when> <gender>female</gender> <smoke>no</smoke> <music>rock, pop, hip-hop</music> <cost>10</cost> <preferences> {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"), cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to)) )) } </preferences> </driver>)
  • 53. Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119) (: San Carlos :) return xdmp:document-insert( "/driver.xml", <driver> <from>{$from}</from> <to>{$to}</to> <when>2010-01-20T08:00:00-08:00</when> <gender>female</gender> <smoke>no</smoke> <music>rock, pop, hip-hop</music> <cost>10</cost> <preferences> {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"), cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to)) )) } </preferences> </driver>)
  • 54. Driver let $from := cts:point(37.751658,-121.898387) (: San Ramon :) let $to := cts:point(37.507363, -122.247119) (: San Carlos :) return xdmp:document-insert( "/driver.xml", <driver> <from>{$from}</from> <to>{$to}</to> <when>2010-01-20T08:00:00-08:00</when> <gender>female</gender> <smoke>no</smoke> <music>rock, pop, hip-hop</music> <cost>10</cost> <preferences> {cts:and-query(( cts:element-value-query(xs:QName("gender"), "female"), cts:element-geospatial-query(xs:QName("from"), cts:circle(5, $from)), cts:element-geospatial-query(xs:QName("to"), cts:circle(5, $to)) ...
  • 55. Driver xdmp:document-insert( "/passenger.xml", <passenger> <from>37.739976,-121.915821</from> <to>37.53244,-122.270969</to> <gender>female</gender> <preferences> { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female") )) } </preferences> </passenger>)
  • 56. Driver xdmp:document-insert( "/passenger.xml", <passenger> <from>37.739976,-121.915821</from> <to>37.53244,-122.270969</to> <gender>female</gender> <preferences> { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female") )) } </preferences> </passenger>)
  • 57. Driver xdmp:document-insert( "/passenger.xml", <passenger> <from>37.739976,-121.915821</from> <to>37.53244,-122.270969</to> <gender>female</gender> <preferences> { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female") )) } </preferences> </passenger>)
  • 58. Driver xdmp:document-insert( "/passenger.xml", <passenger> <from>37.739976,-121.915821</from> <to>37.53244,-122.270969</to> <gender>female</gender> <preferences> { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female") )) } </preferences> </passenger>)
  • 59. Driver xdmp:document-insert( "/passenger.xml", <passenger> <from>37.739976,-121.915821</from> <to>37.53244,-122.270969</to> <gender>female</gender> <preferences> { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female") )) } </preferences> </passenger>)
  • 60. Driver xdmp:document-insert( "/passenger.xml", <passenger> <from>37.739976,-121.915821</from> <to>37.53244,-122.270969</to> <gender>female</gender> <preferences> { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female") )) } </preferences> </passenger>)
  • 61. Driver xdmp:document-insert( "/passenger.xml", <passenger> <from>37.739976,-121.915821</from> <to>37.53244,-122.270969</to> <gender>female</gender> <preferences> { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female") )) } </preferences> </passenger>)
  • 62. Driver xdmp:document-insert( "/passenger.xml", <passenger> <from>37.739976,-121.915821</from> <to>37.53244,-122.270969</to> <gender>female</gender> <preferences> { cts:and-query(( cts:not-query(cts:element-word-query( xs:QName("music"), "country")), cts:element-range-query(xs:QName("cost"), "<=", 20), cts:element-value-query(xs:QName("smoke"), "no"), cts:element-value-query(xs:QName("gender"), "female") )) } </preferences> </passenger>)
  • 63. Driver (: I'm the driver, find me passengers :) let $me := fn:doc("/driver.xml")/driver for $match in cts:search(/passenger, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me))))[1 to 3] return fn:base-uri($match)
  • 64. Driver (: I'm the driver, find me passengers :) let $me := fn:doc("/driver.xml")/driver for $match in cts:search(/passenger, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me))))[1 to 3] return fn:base-uri($match)
  • 65. Driver (: I'm the driver, find me passengers :) let $me := fn:doc("/driver.xml")/driver for $match in cts:search(/passenger, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me))))[1 to 3] return fn:base-uri($match)
  • 66. Driver (: I'm the driver, find me passengers :) let $me := fn:doc("/driver.xml")/driver for $match in cts:search(/passenger, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me))))[1 to 3] return fn:base-uri($match)
  • 67. Driver (: I'm a passenger, find me a driver :) let $me := fn:doc("/passenger.xml")/passenger for $match in cts:search(/driver, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me)))) return fn:base-uri($match)
  • 68. Driver (: I'm a passenger, find me a driver :) let $me := fn:doc("/passenger.xml")/passenger for $match in cts:search(/driver, cts:and-query(( cts:query($me/preferences/element()), cts:reverse-query($me)))) return fn:base-uri($match)
  • 69. Search and Alerting Composed XML data typing expectations In both directions Arbitrary schema At each document In each query Don’t [have to] declare anything You choose the logic for empty data Specify typed range indexes for scalar, geo, faceting Search engine speed and scalability O(1) term lookup Word, structure, values Query sub-expression O(log(n)) range lookup and term list intersection Shared-nothing (sharded) query evaluation
  • 70. Document Security Document queries the user Rules for who can see me, the document Open, ad-hoc security [lack of] model Each document declares any rules it wants “user eye color not blue” Descriptive model/schema if desired Extensible without changing DBMS schema
  • 71. Medication Patient Diagnosis Background Idiopathic history Vital Statistics Treatment baseline Drug Therapeutics Side effects Interactions Contraindications
  • 72. What is a strange loop? Not mere hierarchies of abstraction Abstraction is simplification and distortion Useful if bounded and well-ordered Problems if poorly bounded: Watch Ourselves Strange loops are disorderings of the hierarchy: heterarchy Hofstadter: “a paradoxical level-crossing feedback loop” Good strange loops gracefully accommodate the disordering Established examples in Computer Science Godel’s Incompleteness Theorems Self-compiling languages
  • 73. Strange Loop: Fwd∘Rev Query Queries are an abstraction over the data /myroot//foo[@bar=7]/bash Reverse-query Indexed data are [serialized] queries. Documents are the queries. Queries can still be abstractions over a stream of docs. Composing forward and reverse query Strange Loop! Escher Drawing Hands
  • 74. Strange Loop: XQuery throughout the Stack Query language is an abstraction over the data But the query language is re-used in the application The [No]SQL is the PL Declarative query becomes functional programming Creeping Lazy evaluation Parallelization Discarding unneeded work Schrödinger’s tuple Not religious side effects available when required DBMS transactions xdmp:set() Escher Waterfall
  • 75. Strange Loop: XQuery on XQuery In SVN Project organization No search Dependency tracking Import requires absolute paths Namespace prefix conflicts Surprise modules (functx) XQuery (with cts: extensions) to discover and automate imports
  • 76. The Composable, Universal Index Full text XML structure XML semantics Range indexes Range queries Aggregations Co-occurrence Spatial indexes Query indexes
  • 77. Database of documents Stored in partitions Database Partition3 Partition2 Partition1 Databases
  • 78. Simple Architecture Host partition1 partition2 partition3
  • 79. Shared Nothing Architecture Host 1 Host 2 partition1 partition2 partition3
  • 80. Bi-Directionally Scalable Architecture Host 1 Host 3 Host 2 Host 4 Host 5 Host 6 Host k partition1 partition2 partition3 partitionm partition4
  • 81. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 82. Multiversion Concurrency Control /articles/codd.xml /articles/codd.xml Document Document Title Title Author Author Metadata Metadata Section Section Last Last Year ∞ 628 ∞ 523 First First 628 ∞ Section Section Section Section Section Section Section Section Section Section c Creation Timestamp d Deleted Timestamp
  • 83. Multiversion Concurrency Benefits High Throughput Queries don’t require locks Queries and Updates do not conflict ACID Cluster consistency: 2-phase commit Zero-latency ingestion and Indexing Append Only Ingest/update rates of ~400GB per partition per day /articles/codd.xml Document Title Author Metadata Section Last Year First 628 ∞ Section Section Section Section Section
  • 84. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 85. A Single Forest Host Stand1 Stand2 Standn Buffer Buffer Forestk …
  • 86. 1. Create A New Tree Host Stand1 Stand2 Standn Buffer Buffer Forestk …
  • 87. 2. Expire Trees Host Stand1 Stand2 Standn Buffer Buffer Forestk …
  • 88. 3. Save A Buffer To Disk Host Stand1 Stand2 Standn Buffer Buffer Forestk …
  • 89. 4. Optimization: Merge Stands Host Buffer Forestk
  • 90.
  • 91.
  • 92.
  • 93. How Did We Get Here? Founder: Christopher Lindblad MIT Architect of Ultraseek Server Intranet seach engine product Met people that wanted to use a search engine like a database Rich query language Guaranteed correctness Transactions
  • 94. So We Built XML as data model Ad hoc schema A search engine core Universal Index Transaction model based on multiversion concurrency High throughput while keeping . . . Performance and scalability of a search engine
  • 95. Agenda MarkLogic: company MarkLogic Server: DBMS product Search: indexing text Query: indexing XML structure and semantics Scalars: indexing ranges, analyzing aggregations Geo: indexing spatial locations Alerting: indexing queries, performing reverse query Strange Loop: composing forward and reverse query MVCC: temporizing the DBMS No-compromise lock-free queries LSM (Log Structured Merge) Trees No-compromise ingest , query, search Examples: BusinessWeek.com, WarriorGateway.org, USAF KM
  • 96. Who Uses MarkLogic? Magazine Publishing Education Healthcare Software / Services Legal Tax Financial Enterprise Aggregation Scientific Technical Medical
  • 97.
  • 101.

Hinweis der Redaktion

  1. Ordered index in lat-major order. log(n) lookup in latitude bounds, then scans longitude bounds
  2. Generate all the normal indexing terms for the reverse-query document, then do a linear merge to match query-document terms with the root nodes of the unified expression tree. Based on which terms do or don&apos;t match, nominate documents that may contain matching queries. For each nominated query-document, evaluate from the root of the query tree on the right side towards the leaf nodes at the left of the slide. Once a subtree has been evaluated for one query-document, we remember the result and short-circuit that evaluation for any other query-documents that share the subquery.