SlideShare a Scribd company logo
1 of 16
MarkLogic Developer Community NoSQL Frankfurt, 2010 Awesome document-oriented NoSQL database Beyond NoSQLwith MarkLogicThe Universal Index and
nunojob nuno.job@marklogic.com @dscape| nunojob.com
how?? Ad hoc Structure Predefined IDMS Ad hoc Predefined Queries
Indexes! indexes! so… filter map reduce !? well… sort of… flickr.com/ayalan
divide and conquer level of abstraction: ease of use database consistent-hashing-like thingy partition2 partition3 partition1 standa group of trees makes sense to have indexes in the same place
1st index resolution 2nd get documents shared-nothing cluster E Host 1 E Host 3 E Host 2 AppServer Same  Code- base Data D Host 4 D Host 5 D Host 6 D Host k HA&DR partition1 partition2 partition3 partitionm partition4
universal index Range Indexes Term Term List “accelerating” 123, 127, 129, 152, 344, 791 . . .  “creation” 122, 125, 126, 129, 130, 167 . . . “content” 123, 126, 130, 142, 143, 167 . . . “application” 123, 130, 131, 135, 162, 177 . . .  “agility” Document References 126, 130, 167, 212, 219, 377 . . . <article> . . .  <article> /  <title> . . .  126, 130, 167, … product: MarkLogic Geospatial
semi structured article title paragraph get tables from  computer  science articles  that include a  title with  word “content”  but not the  word “agility” information un-ordered list metadata structure parentchild paragraph table full text footer
universal index in kelly speak: zippy-ing Range Indexes Term Term List “accelerating” 123, 127, 129, 152, 344, 791 . . .  “creation” 122, 125, 126, 129, 130, 167 . . . “content” 123, 126, 130, 142, 143, 167 . . . “application” 123, 130, 131, 135, 162, 177 . . .  “agility” Document References 126, 130, 167, 212, 219, 377 . . . <article> 122, 125, 126, 129, 130, 143, 167 <article> /  <title> 122, 125, 126, 129, 130, 167 . . . 126, 130, 167, … product: MarkLogic Geospatial
wait a minute… Directories Exclusive, hierarchical, analogous to file  	system, map to URI Collections Set-based, N:N relationship Security Invisible to your app
universal index Range Indexes Term Term List “accelerating” 123, 127, 129, 152, 344, 791 . . .  “creation” 122, 125, 126, 129, 130, 167 . . . “content” 123, 126, 130, 142, 143, 167 . . . “application” 123, 130, 131, 135, 162, 177 . . .  “data base” Document References 126, 130, 167, 212, 219, 377 . . . <article> . . .  <article> /  <title> . . .  126, 130, 167, … product: MarkLogic Directory: /articles/ Collection: CS Role:Editor + Action:Read Geospatial
throughput in memory stand(s) durability: journal flickr.com/kt
mvcc append only database, use sys-timestamps to know which document is currently available and the marklogic time machine delete update (could also be create) create System timestamp query
too good to be true? try us out… free version available! developer.marklogic.com/products markmail.org pairs.demo.marklogic.com heatmap.demo.marklogic.com bit.ly/ml-demo flickr.com/nattu
questions? Love NoSQLdatabases? Want to change the world? We are hiring!! spkr8.com/t/4590 Feedback nuno.job@marklogic.com
Open-source, closed development? REST Mobile XQuery and why it’s awesome! not covered but conversations are welcome! App Server + Search + Database Scalable ACID transactions XML vs. JSON ? Merging / Compaction Relevance MVCC Reverse Indexes Alerting High Order Functions Geospatial queries Co-occurrence Meta programming Document databases

More Related Content

What's hot

Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
DataWorks Summit/Hadoop Summit
 

What's hot (20)

ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at ScaleArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at Scale
 
Getting started with Cosmos DB + Linkurious Enterprise
Getting started with Cosmos DB + Linkurious EnterpriseGetting started with Cosmos DB + Linkurious Enterprise
Getting started with Cosmos DB + Linkurious Enterprise
 
Taxonomy Quality Assessment
Taxonomy Quality AssessmentTaxonomy Quality Assessment
Taxonomy Quality Assessment
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Kazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep dive
 
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
 
Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief Introduction
 
Hadoop and other animals
Hadoop and other animalsHadoop and other animals
Hadoop and other animals
 
Kazoup Solution Overview
Kazoup Solution OverviewKazoup Solution Overview
Kazoup Solution Overview
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014
 
Taxonomy-Driven UX
Taxonomy-Driven UXTaxonomy-Driven UX
Taxonomy-Driven UX
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data Challenges
 
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
 
Graph-based Network & IT Management.
Graph-based Network & IT Management.Graph-based Network & IT Management.
Graph-based Network & IT Management.
 
MongoDB classes 2019
MongoDB classes 2019MongoDB classes 2019
MongoDB classes 2019
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18
 
Intro to Big Data Analytics and the Hybrid Cloud
Intro to Big Data Analytics and the Hybrid CloudIntro to Big Data Analytics and the Hybrid Cloud
Intro to Big Data Analytics and the Hybrid Cloud
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 

Similar to MarkLogic and The Universal Index

Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 

Similar to MarkLogic and The Universal Index (20)

Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
 
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Open Source Databases And Gis
Open Source Databases And GisOpen Source Databases And Gis
Open Source Databases And Gis
 
As You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data AnalyticsAs You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data Analytics
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
Scaling the Content Repository with Elasticsearch
Scaling the Content Repository with ElasticsearchScaling the Content Repository with Elasticsearch
Scaling the Content Repository with Elasticsearch
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiA Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Elastc Search
Elastc SearchElastc Search
Elastc Search
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

MarkLogic and The Universal Index

  • 1. MarkLogic Developer Community NoSQL Frankfurt, 2010 Awesome document-oriented NoSQL database Beyond NoSQLwith MarkLogicThe Universal Index and
  • 3. how?? Ad hoc Structure Predefined IDMS Ad hoc Predefined Queries
  • 4. Indexes! indexes! so… filter map reduce !? well… sort of… flickr.com/ayalan
  • 5. divide and conquer level of abstraction: ease of use database consistent-hashing-like thingy partition2 partition3 partition1 standa group of trees makes sense to have indexes in the same place
  • 6. 1st index resolution 2nd get documents shared-nothing cluster E Host 1 E Host 3 E Host 2 AppServer Same Code- base Data D Host 4 D Host 5 D Host 6 D Host k HA&DR partition1 partition2 partition3 partitionm partition4
  • 7. universal index Range Indexes Term Term List “accelerating” 123, 127, 129, 152, 344, 791 . . . “creation” 122, 125, 126, 129, 130, 167 . . . “content” 123, 126, 130, 142, 143, 167 . . . “application” 123, 130, 131, 135, 162, 177 . . . “agility” Document References 126, 130, 167, 212, 219, 377 . . . <article> . . . <article> / <title> . . . 126, 130, 167, … product: MarkLogic Geospatial
  • 8. semi structured article title paragraph get tables from computer science articles that include a title with word “content” but not the word “agility” information un-ordered list metadata structure parentchild paragraph table full text footer
  • 9. universal index in kelly speak: zippy-ing Range Indexes Term Term List “accelerating” 123, 127, 129, 152, 344, 791 . . . “creation” 122, 125, 126, 129, 130, 167 . . . “content” 123, 126, 130, 142, 143, 167 . . . “application” 123, 130, 131, 135, 162, 177 . . . “agility” Document References 126, 130, 167, 212, 219, 377 . . . <article> 122, 125, 126, 129, 130, 143, 167 <article> / <title> 122, 125, 126, 129, 130, 167 . . . 126, 130, 167, … product: MarkLogic Geospatial
  • 10. wait a minute… Directories Exclusive, hierarchical, analogous to file system, map to URI Collections Set-based, N:N relationship Security Invisible to your app
  • 11. universal index Range Indexes Term Term List “accelerating” 123, 127, 129, 152, 344, 791 . . . “creation” 122, 125, 126, 129, 130, 167 . . . “content” 123, 126, 130, 142, 143, 167 . . . “application” 123, 130, 131, 135, 162, 177 . . . “data base” Document References 126, 130, 167, 212, 219, 377 . . . <article> . . . <article> / <title> . . . 126, 130, 167, … product: MarkLogic Directory: /articles/ Collection: CS Role:Editor + Action:Read Geospatial
  • 12. throughput in memory stand(s) durability: journal flickr.com/kt
  • 13. mvcc append only database, use sys-timestamps to know which document is currently available and the marklogic time machine delete update (could also be create) create System timestamp query
  • 14. too good to be true? try us out… free version available! developer.marklogic.com/products markmail.org pairs.demo.marklogic.com heatmap.demo.marklogic.com bit.ly/ml-demo flickr.com/nattu
  • 15. questions? Love NoSQLdatabases? Want to change the world? We are hiring!! spkr8.com/t/4590 Feedback nuno.job@marklogic.com
  • 16. Open-source, closed development? REST Mobile XQuery and why it’s awesome! not covered but conversations are welcome! App Server + Search + Database Scalable ACID transactions XML vs. JSON ? Merging / Compaction Relevance MVCC Reverse Indexes Alerting High Order Functions Geospatial queries Co-occurrence Meta programming Document databases

Editor's Notes

  1. Remember:Ask people if they know: -Map-Reduce,MVCC, Sharding, Shared nothing Clustering, NoSQL, consistent hashing, fsync
  2. Worked in large companies like IBM in unstructured data management.Mostly client support.A lot of training.Now focused on clients specially on financial marketsLoves unstructured information data challenges
  3. http://www.theregister.co.uk/2010/09/09/google_caffeine_explained
  4. Examples: MarkmailApachecouchdb
  5. Double buffered in memory stand to ensure maximum throughputStands comprise indexes and respective fragmentsFragments are finalNo “real” update or deleteLess error proneMerging as a self-healing mechanism
  6. Introduce MVCC one liner