SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Accelerating Delivery of Data Products –
the EBSCO way
Mikhail Vaynshteyn
Ron Bennatan
About Our Parent Company –
EBSCO Industries
Twenty three (23) diverse businesses including:
– Subscription provider for more than 300,000 titles from more than 78,000 publishers
worldwide, publisher of full text databases and other online products
– Steel joist and metal roof deck manufacturing
– Insurance brokerage, national promotional
products supplier; Vitronic, Crown
– Real estate development; Alys Beach,
Mt Laurel
– Manufacturer of fishing lures & hunting products: Rebel, Yum, Booyah, Summit,
Moultrie, Knight & Hale, and more
EBSCO has nearly 5,000 employees - 1,000 outside the US
Among Forbes Top 200 Privately Held Companies
• EBSCO Information Services: Largest business unit in EBSCO
industries – over 3,200 employees
• The most highly-used online research platform to libraries
worldwide
• Largest subscription service agency in the world
• Over 150,000 library customers worldwide
• Relationships with more than 95,000 publishers internationally
Product Lines
Approaching 400 research and educational databases via
EBSCOhost with over 2.6 billion records, and 100 million
searches per day
Discovery Service Product Line
Leading discovery service provider to libraries
and institutions worldwide
Publisher &
Partner Research
Databases
Periodicals
Books
DissertationsCatalog
E-Books &
Audiobooks
EBSCO Discovery Service
EBSCO Discovery Service (EDS)
• Single “search box” experience with a single result list
• Search entire collection of electronic resources
– Library Catalog
– Research Databases
– Subscribed resources from various providers
• Fast, unparalleled relevancy ranking and quality
• Installed at 8,000+ institutions (mainly academic) in over 100
countries
EBSCO Host Integrated Search (mid-2000s)
• Federated Search
– API calls to each provider of electronic resources that library has
subscribed to
• Remote Calls
• Sometimes screen scraping
– Relevance ranking problem
• Slow
• Full text and metadata
used for construction of
the search index
• Access to full text
controlled by the
Provider
• Fast; relevancy ranking
problem addressed
EDS Today
Challenges
• Discovery Service users require a common search
experience across all data sources independent of quality of
data!
• Data comes in various formats (XML, MARC, CSV, etc…)
• Large data sets (10k to 156+ million documents)
• Data sources must be analyzed before indexing
2015 Goals for EBSCO Discovery Service
• Analyze and map one new data source a day
• Ingest additional 500 high-value data sources
• Analyze and map important data fields
• Identify and resolve variances with data
(2009-2014: the throughput varied from 4 to 10 data sources a month)
Day in a Life of a Database Designer
Data
Designer
Format Spec
Developer QA
Format
Loader Product
Product
Released
What changed for the Designers?
Data
Designer QA
Generic
Loader Product
Product
ReleasedJSON
transform Reports
Mapping
Rules
Developer
JSON Studio
Why did we use MongoDB
• Simple to setup and scale
• Documents stored as a whole
• Flexible schema
• Simple way to discovery paths/structure of the documents
• Ingest of 156 million document collection was a breeze
Why JSON Studio
• Ease of data exploration
• Wizard based aggregation pipeline builder
• Document structure discovery with ad hoc query capabilities
• Data visualization capabilities
• Ability to collaborate query strategy between team members
• We learned about JSON Studio at MongoDB World 2014!
SQL or JSON-Native?
• K / V stores
– Ad hoc querying is complex
– Steep learning curve for designers
– Complex loading process
• SQL / Columnar stores
– Ad hoc querying became complex quickly
– Complex loading process
• JSON was a good match for our documents
Example – Field Appearance
• Answer questions such as:
– What document types or record types
have both ISSN and ISBN, if any?
– How many documents with a doc-type
of “electronic media” do not have an
associated URL?
– ..
• Compare complexity:
– “Real” SQL:
– JSON-enabled SQL:
• select
"metadata.dc:type","metadata.dc:identifier
" from ebsco where not exists (select 1
from
json_array_elements_text("metadata.dc:id
entifier") as jsondata where
jsondata.value like '%URI%') and
'Electronic Media' = ……
– Native JSON analysis in JSON Studio
Example – Coverage Reports
• E.g. Min/max pub dates by title, ISSN, ISBN and source
• Librarians build their own aggregations and reports, publish
reports, download to excel, etc.
Next Step – Deep Analysis Initiative
Deep Analysis Initiative
• Deep inspection and analysis of very large and very
heterogeneous data sets
• Attributes needed:
– Support for very large and diverse data sets
– Very large number of fields – some across all data sets, some not
– Sophisticated, in-place analytics
• Data is too large to bring into app for operations
– Need for high-performing analytic queries
Why we became a SonarW Design Partner
• We like/use MongoDB – SonarW is
MongoDB compatible.
• SonarW is a data warehouse for
MongoDB data
• Many new aggregation operators that
are useful in our analysis
• As a design partner we could
influence and ensure that our
workloads were supported well
• What is SonarW
– Data warehouse for JSON data
– MongoDB compatible
– Columnar database
– Everything runs in parallel
– Optimized for very large data
sets and complex queries /
analytic workloads
Example 1 – Frequency Analysis
• Example:
– Per pubtype (which can sit in an array or
not), compute how many have title group
information and what percentage of them
have an article title
– <10s for 20M documents (each doc
having hundreds of fields and sometimes
two levels of arrays
"artinfo": {
"@vendorFT": "Y",
…
"pubtype": "Journal Articles",
…
"tig": {
…
"atl": "Breaking the Language Barrier…”
}
"urlIP": "http://www.eric.ed.gov/…"
},
{
"_id": {
"pubtype": "Review"
},
"count": 103983,
"tig_atl": 103983,
"pct": 100.0
}
Example 2 – Working with Arrays
• Data is full of arrays – sometimes multiple levels deep
• Arrays sometime represent “foreign patterns” – e.g. NV pairs
• We can work with them because:
– $unwind does not stop in docs
– Fast $unwind and fast $unwind+$unwind
– Fast in-place conversion of array to document
• Aggregation speed is very important
– 20M docs
– 2 Levels of arrays – 50x and 5x
– Aggregations run on 5 Billion documents
Business Impact
Datasets per month
PLUS: 50% improvement in quality of mapping
0
5
10
15
20
25
30
35
Pre 2015 Mid 2015 End 2015
~ 2x
~ 4.5x
Q & A
Questions on Schema are also Important
• E.g. “What fields present in the whole dataset occur only for Opinion Papers doc-
types?”
• Combination of two approaches:
– Using JSON Studio’s Schema Analyzer
– Using pipelines build with JSON Studio
Example 2 – Working with Arrays
• Data is full of arrays – sometimes multiple levels deep
• Arrays sometime represent “foreign patterns” – e.g. NV pairs
• We can work with them because:
– $unwind does not stop in docs
– Fast $unwind and fast $unwind+$unwind
– Fast in-place conversion of array to document
"df" : [
{
"@tag" : "13",
"sf" : {
"@code" : "a",
"#text" : "ED024700"
}
},
{
"@tag" : "001",
"sf" : {
"@code" : "a",
"#text" : "New"
}
},
..
"_13" : {
"a" : {
"#text" : "ED024700"
}
},
"_001" : {
"a" : {
"#text" : ”New"
}
}, …
{
"@name" : "DateEntry”,
"@label" : "Entry Date",
"#text" : "1969”
}
Example 2 – Cont.
• What percentage of the documents have
more than 10 @Code values for tag 20650
and a value for #text in tag 28 but the #text
value for tag 204 does not match regex …
• <5 seconds on a collection of 10 Million docs
• $unwind becomes a query on 1.5 Billion
documents (df-50x; sf-3x)
"record": {
"@uid": "11288235",
"timestamp": "2015-03-16T17:49:47Z",
"df": [
{
"@tag": "13",
"sf": {
"@code": "a",
"#text": "11288235"
}
},
{
"@tag": "10245",
"sf": [
{
"@code": "a",
"#text": "Once Upon a California Christmas."
},
{
"@code": "b",
"#text": "English"
},
{
"@code": "c",
"#text": "EN"
}
]
},

Weitere ähnliche Inhalte

Was ist angesagt?

Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & MarquezJulien Le Dem
 
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB
 
MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
MongoDB Evenings Minneapolis: Medtronic's MongoDB JourneyMongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
MongoDB Evenings Minneapolis: Medtronic's MongoDB JourneyMongoDB
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Data Con LA
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerMark Kromer
 
Time Series Analytics for Big Fast Data
Time Series Analytics for Big Fast DataTime Series Analytics for Big Fast Data
Time Series Analytics for Big Fast DataSouth West Data Meetup
 
Db presentation google_megastore
Db presentation google_megastoreDb presentation google_megastore
Db presentation google_megastoreAlanoud Alqoufi
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionMurtaza Doctor
 
Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsMongoDB
 
DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015Vidyasagar Machupalli
 
An afternoon with mongo db new delhi
An afternoon with mongo db new delhiAn afternoon with mongo db new delhi
An afternoon with mongo db new delhiRajnish Verma
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysMongoDB APAC
 

Was ist angesagt? (20)

Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
 
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 
MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
MongoDB Evenings Minneapolis: Medtronic's MongoDB JourneyMongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
 
MongoDB on Azure
MongoDB on AzureMongoDB on Azure
MongoDB on Azure
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
 
Time Series Analytics for Big Fast Data
Time Series Analytics for Big Fast DataTime Series Analytics for Big Fast Data
Time Series Analytics for Big Fast Data
 
Db presentation google_megastore
Db presentation google_megastoreDb presentation google_megastore
Db presentation google_megastore
 
NoSQL for SQL Users
NoSQL for SQL UsersNoSQL for SQL Users
NoSQL for SQL Users
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
 
Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSs
 
DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015
 
MongoDB
MongoDBMongoDB
MongoDB
 
An afternoon with mongo db new delhi
An afternoon with mongo db new delhiAn afternoon with mongo db new delhi
An afternoon with mongo db new delhi
 
Practical Use of a NoSQL Database
Practical Use of a NoSQL DatabasePractical Use of a NoSQL Database
Practical Use of a NoSQL Database
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdays
 

Ähnlich wie Accelerating Delivery of Data Products - The EBSCO Way

Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2Neo4j
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)Christine Stohn
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...Charleston Conference
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph GeneratorLDBC council
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Lucidworks
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreTim Schneider
 
Crossref LIVE US Online
Crossref LIVE US OnlineCrossref LIVE US Online
Crossref LIVE US OnlineCrossref
 
Leverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformLeverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformAndrea Bollini
 

Ähnlich wie Accelerating Delivery of Data Products - The EBSCO Way (20)

NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
NoSQL
NoSQLNoSQL
NoSQL
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and Sitecore
 
Crossref LIVE US Online
Crossref LIVE US OnlineCrossref LIVE US Online
Crossref LIVE US Online
 
Leverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformLeverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platform
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 

Kürzlich hochgeladen (20)

Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 

Accelerating Delivery of Data Products - The EBSCO Way

  • 1. Accelerating Delivery of Data Products – the EBSCO way Mikhail Vaynshteyn Ron Bennatan
  • 2. About Our Parent Company – EBSCO Industries Twenty three (23) diverse businesses including: – Subscription provider for more than 300,000 titles from more than 78,000 publishers worldwide, publisher of full text databases and other online products – Steel joist and metal roof deck manufacturing – Insurance brokerage, national promotional products supplier; Vitronic, Crown – Real estate development; Alys Beach, Mt Laurel – Manufacturer of fishing lures & hunting products: Rebel, Yum, Booyah, Summit, Moultrie, Knight & Hale, and more EBSCO has nearly 5,000 employees - 1,000 outside the US Among Forbes Top 200 Privately Held Companies
  • 3. • EBSCO Information Services: Largest business unit in EBSCO industries – over 3,200 employees • The most highly-used online research platform to libraries worldwide • Largest subscription service agency in the world • Over 150,000 library customers worldwide • Relationships with more than 95,000 publishers internationally
  • 4. Product Lines Approaching 400 research and educational databases via EBSCOhost with over 2.6 billion records, and 100 million searches per day
  • 5. Discovery Service Product Line Leading discovery service provider to libraries and institutions worldwide
  • 7. EBSCO Discovery Service (EDS) • Single “search box” experience with a single result list • Search entire collection of electronic resources – Library Catalog – Research Databases – Subscribed resources from various providers • Fast, unparalleled relevancy ranking and quality • Installed at 8,000+ institutions (mainly academic) in over 100 countries
  • 8. EBSCO Host Integrated Search (mid-2000s) • Federated Search – API calls to each provider of electronic resources that library has subscribed to • Remote Calls • Sometimes screen scraping – Relevance ranking problem • Slow
  • 9. • Full text and metadata used for construction of the search index • Access to full text controlled by the Provider • Fast; relevancy ranking problem addressed EDS Today
  • 10. Challenges • Discovery Service users require a common search experience across all data sources independent of quality of data! • Data comes in various formats (XML, MARC, CSV, etc…) • Large data sets (10k to 156+ million documents) • Data sources must be analyzed before indexing
  • 11. 2015 Goals for EBSCO Discovery Service • Analyze and map one new data source a day • Ingest additional 500 high-value data sources • Analyze and map important data fields • Identify and resolve variances with data (2009-2014: the throughput varied from 4 to 10 data sources a month)
  • 12. Day in a Life of a Database Designer Data Designer Format Spec Developer QA Format Loader Product Product Released
  • 13. What changed for the Designers? Data Designer QA Generic Loader Product Product ReleasedJSON transform Reports Mapping Rules Developer JSON Studio
  • 14. Why did we use MongoDB • Simple to setup and scale • Documents stored as a whole • Flexible schema • Simple way to discovery paths/structure of the documents • Ingest of 156 million document collection was a breeze
  • 15. Why JSON Studio • Ease of data exploration • Wizard based aggregation pipeline builder • Document structure discovery with ad hoc query capabilities • Data visualization capabilities • Ability to collaborate query strategy between team members • We learned about JSON Studio at MongoDB World 2014!
  • 16. SQL or JSON-Native? • K / V stores – Ad hoc querying is complex – Steep learning curve for designers – Complex loading process • SQL / Columnar stores – Ad hoc querying became complex quickly – Complex loading process • JSON was a good match for our documents
  • 17. Example – Field Appearance • Answer questions such as: – What document types or record types have both ISSN and ISBN, if any? – How many documents with a doc-type of “electronic media” do not have an associated URL? – .. • Compare complexity: – “Real” SQL: – JSON-enabled SQL: • select "metadata.dc:type","metadata.dc:identifier " from ebsco where not exists (select 1 from json_array_elements_text("metadata.dc:id entifier") as jsondata where jsondata.value like '%URI%') and 'Electronic Media' = …… – Native JSON analysis in JSON Studio
  • 18. Example – Coverage Reports • E.g. Min/max pub dates by title, ISSN, ISBN and source • Librarians build their own aggregations and reports, publish reports, download to excel, etc.
  • 19. Next Step – Deep Analysis Initiative
  • 20. Deep Analysis Initiative • Deep inspection and analysis of very large and very heterogeneous data sets • Attributes needed: – Support for very large and diverse data sets – Very large number of fields – some across all data sets, some not – Sophisticated, in-place analytics • Data is too large to bring into app for operations – Need for high-performing analytic queries
  • 21. Why we became a SonarW Design Partner • We like/use MongoDB – SonarW is MongoDB compatible. • SonarW is a data warehouse for MongoDB data • Many new aggregation operators that are useful in our analysis • As a design partner we could influence and ensure that our workloads were supported well • What is SonarW – Data warehouse for JSON data – MongoDB compatible – Columnar database – Everything runs in parallel – Optimized for very large data sets and complex queries / analytic workloads
  • 22. Example 1 – Frequency Analysis • Example: – Per pubtype (which can sit in an array or not), compute how many have title group information and what percentage of them have an article title – <10s for 20M documents (each doc having hundreds of fields and sometimes two levels of arrays "artinfo": { "@vendorFT": "Y", … "pubtype": "Journal Articles", … "tig": { … "atl": "Breaking the Language Barrier…” } "urlIP": "http://www.eric.ed.gov/…" }, { "_id": { "pubtype": "Review" }, "count": 103983, "tig_atl": 103983, "pct": 100.0 }
  • 23. Example 2 – Working with Arrays • Data is full of arrays – sometimes multiple levels deep • Arrays sometime represent “foreign patterns” – e.g. NV pairs • We can work with them because: – $unwind does not stop in docs – Fast $unwind and fast $unwind+$unwind – Fast in-place conversion of array to document • Aggregation speed is very important – 20M docs – 2 Levels of arrays – 50x and 5x – Aggregations run on 5 Billion documents
  • 24. Business Impact Datasets per month PLUS: 50% improvement in quality of mapping 0 5 10 15 20 25 30 35 Pre 2015 Mid 2015 End 2015 ~ 2x ~ 4.5x
  • 25. Q & A
  • 26. Questions on Schema are also Important • E.g. “What fields present in the whole dataset occur only for Opinion Papers doc- types?” • Combination of two approaches: – Using JSON Studio’s Schema Analyzer – Using pipelines build with JSON Studio
  • 27. Example 2 – Working with Arrays • Data is full of arrays – sometimes multiple levels deep • Arrays sometime represent “foreign patterns” – e.g. NV pairs • We can work with them because: – $unwind does not stop in docs – Fast $unwind and fast $unwind+$unwind – Fast in-place conversion of array to document "df" : [ { "@tag" : "13", "sf" : { "@code" : "a", "#text" : "ED024700" } }, { "@tag" : "001", "sf" : { "@code" : "a", "#text" : "New" } }, .. "_13" : { "a" : { "#text" : "ED024700" } }, "_001" : { "a" : { "#text" : ”New" } }, … { "@name" : "DateEntry”, "@label" : "Entry Date", "#text" : "1969” }
  • 28. Example 2 – Cont. • What percentage of the documents have more than 10 @Code values for tag 20650 and a value for #text in tag 28 but the #text value for tag 204 does not match regex … • <5 seconds on a collection of 10 Million docs • $unwind becomes a query on 1.5 Billion documents (df-50x; sf-3x) "record": { "@uid": "11288235", "timestamp": "2015-03-16T17:49:47Z", "df": [ { "@tag": "13", "sf": { "@code": "a", "#text": "11288235" } }, { "@tag": "10245", "sf": [ { "@code": "a", "#text": "Once Upon a California Christmas." }, { "@code": "b", "#text": "English" }, { "@code": "c", "#text": "EN" } ] },