Elsevier is the world's largest publisher of scientific, medical and technical (STM) content. An early adopter of XML as a standard representation for content, Elsevier has used MarkLogic in the development of a range of information access and discovery solutions for its customers. This presentation will cover Elsevier's experience with XML-centric content management systems in general and MarkLogic's technology in specific, describing Elsevier's initial adoption and uptake of the technology, current use within the Elsevier suite of online products and solutions, and opportunities for future use. Design patterns for content repositories within a publishing context that have emerged during our use of the technology will be described, and we will touch on a number of issues that have emerged, including XQuery and its adoption within the developer community, the challenges facing XML from new representations for documents and metadata such as JSON and RDF, and the delivery of search applications based on XML infrastructure.
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Experience with MarkLogic at Elsevier
1. Experience with MarkLogic at Elsevier
Bradley P. Allen and Darin McBeath, Elsevier Labs
Presentation at NoSQL Now 2011
San Jose, CA, USA
2011-08-25
2. Elsevier: who we are
• Elsevier , part of the Reed Elsevier group, is a world leading publisher of
scientific, technical and medical full text literature. 7,000 employees in over
70 offices worldwide publish more than 2,500 journal titles and 11,000
online books.
Global
Global Global market
community audience
North
7,000 editors 15 million doctors, America
= 70,000 editorial
+ nurses and health +
board members professionals
10 million+ Europe
200,000 referees Asia-
researchers in 4,500 Pacific
500,000+ authors institutes
5 million students
2
3. MarkLogic at Elsevier
• MarkLogic is used pervasively throughout our
business
– Science and Technology
– Health Sciences
– Operations
• It is also a strategic technology for our sister
Reed Elsevier organization LexisNexis
• We were an early adopter of MarkLogic
– Began working with MarkLogic in 2001
3
4. Motivations for MarkLogic adoption
• Company was committed to XML standard for
content representation
• Vision of building Web services on top of XML
content repositories
• Enabling new information solutions through
reuse and mashup of existing journal and
book content
• Relational technologies not a good fit
4
5. MarkLogic applications at Elsevier
Business Product Description MarkLogic Features Used Launched
Science & Scopus The largest abstract and citation database containing Repository, Transformation, and 2005
Technology both peer-reviewed research literature and quality some extensions (such as
web sources fast/accurate counting).
Contains 50+ million abstracts
Original application that used MarkLogic
Scopus Offline version of Scopus Repository, Transformation 2007
Custom Data
EMBASE Biomedical database with over 24 million indexed Repository, Search, 2008
records Transformation
Methods Task-specific search for experimental methods and Repository, Content Processing 2010
Navigator protocols across 40,000 articles Framework
HazMat Chemical safety database based on Bretherick's Repository, Content Processing 2010
Navigator Handbook of Reactive Chemical Hazards, others Framework
SciVal Funding Database of current research funding opportunities Repository, Content Processing 2010
and award information Framework
Health Books 1000 books supporting multiple Health Sciences Repository, ability to present 2006
Sciences applications (HESI, NursingConsult, MDConsult). content quickly/easily by
chapter, section, paragraph
Health Health Sciences journal platform Repository, Search, 2007
Connect Transformation
Linked Data 500,000 content enhancement metadata documents Repository, Xpath and a handful 2011
Repository 100% XQuery application of proprietary extensions
Operations ConSyn Batch retrieval service for 10+ million journal articles Search, Repository, Task Server, 2010
Zip, Security, Transformation 5
6. MarkLogic benefits and challenges at Elsevier
• MarkLogic brings us two big benefits
– Excellent fit with how we represent our content
– Tools (XQuery, XSLT) that support working with that
content representation
• Those benefits come with challenges, some old,
some new
– Developer productivity and adoption
– Standards and interoperability
– Software ecosystem
– Total solution fit
– TCO relative to other solutions
6
7. Developer productivity and adoption
• XQuery can be a powerful language for rapid
prototyping
– Can support writing complete web applications
• Experienced XQuery resources are difficult to
find
– Especially relative to emerging JSON/Web
framework resources
• Difficult to motivate developers committed to
more mainstream frameworks, patterns, and
languages
7
8. Standards and interoperability
• Vendors view XQuery in different ways: some view it as a
query language, some as a transformation language, some as
a programming language, all of the above, etc.
• These disparate views often lead to confusion in the
community as to what really is XQuery
• XQuery interoperability is currently difficult and it is doubtful
that it ever will be beyond simple applications
– Groups such as eXPath will help tidy up some interfaces, but there is
far more work that needs to be done.
– Elsevier Labs has investigated this issue in the context of the SciVal
Showcase application using 4 different XQuery engines (MarkLogic,
eXist, 28ms, and XQIB)
– This experiment highlighted the differences in the implementations
(and the looseness of the W3C recommendation)
8
9. Software ecosystem
• The eco-system around XQuery and
MarkLogic is lacking
– Not a tremendous amount of open source
and/or 3rd party modules or language bindings
• The IDEs and debugging tools (while vastly
improved) are still not at par with other
query languages
9
10. Total solution fit
• MarkLogic started out as an XML database
solution
• It has added functionality (e.g. free text search)
matured over the years
– This is a big part of its intended use at LexisNexis
• We struggle to understand the tradeoffs
between a single solution vs. composition of
best-of-breed solution (e.g. MarkLogic
standalone vs. MarkLogic integrated with Solr)
10
11. TCO relative to other solutions
• Traditional enterprise software licensing can
lead to significant costs
• NoSQL document database solutions with
business models based on open source plus
support services are an emerging alternative
• Still working on determining TCO tradeoff
between the two in an enterprise context
11
12. MarkLogic in the context of NoSQL in general
• NoSQL before it was cool
• But there are emerging differences between
the document stores for traditional vs.
Internet publishing
– XML/XQuery/XSLT vs. JSON/UnSQL/Javascript
– Manual scale-out vs automated scale-out
• Overhead of legacy standards can be a drag
– Where is XML in its adoption lifecycle?
– How does HTML5 fit in?
12
13. Future use of MarkLogic at Elsevier
• Persisting as foundation of content repository efforts
– XML legacy drives continued use
• Turnkey SaaS for publishing, newer NoSQL solutions competing for
attention
– Solutions that layer XML processing and query technologies on top of non-XML
NoSQL stores are beginning to appear (e.g. Ambrosoft’s XML DB project)
• Design choices driven by consumer Internet use cases may not yield as
good a fit to information publishing as MarkLogic
– Emphasis on join-free queries and use-case-driven indexing
• We are watching to see how emerging best practices and design patterns
associated with consumer Internet that are good fits are supported
moving forward
– Auto-scaling
– Web application frameworks
– HTML5
13
14. Summary
• We were an early adopter of MarkLogic
• Over ten years it has become a mature
product that we rely on extensively across our
business
• The response of MarkLogic to the emergence
of NoSQL document stores, non-XML
document serializations and application
design patterns from the consumer Internet is
of keen interest to us
14