SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
NoSQL technologies from an STM
publishing perspective
Bradley P. Allen, Elsevier Labs
Presentation at NoSQL Now 2011
San Jose, CA, USA
2011-08-25
Peak physical media: is it here?




               •   “Music Sales”, New York Times, 1 August 2009.
                   http://www.nytimes.com/imagepages/2009/08/01/opinion/01blow.ready.html
               •   “Initial Circs per student”, William Denton, 31 January 2011.
                   http://www.miskatonic.org/2011/01/31/initial-circs-student
               •   “Rise of e-book Readers to Result in Decline of Book Publishing Business”, Steven
                   Mather, iSuppli, 28 April 2011. http://www.isuppli.com/Home-and-Consumer-
                   Electronics/News/Pages/Rise-of-e-book-Readers-to-Result-in-Decline-of-Book-
                   Publishing-Business.aspx                                                            2
In any case, the challenge to STM publishers is clear


 • Print revenue is softening
 • Online channels are exploding
    – Changing the way customers create and consume
      our content
    – Leading to new requirements and market
      opportunities for online products




                                                      3
Additional challenges in STM publishing


 • Academic context and tradition inhibits
   business model innovation
 • Technology and business traditionally
   separate concerns
 • Acquisitions create content and data silos
 • Global market drives lowest common
   denominator technology choices


                                                4
A simple model of the evolution of STM publishing



 Print era: 1600s -   Digital Library era:    Platform-as-a-
 1980                 1980 – 2010s            service era: 2010s

      • Packaged as        • Packaged as           • Packaged as
        books and            books and               apps
        journals             journals              • Digitally
      • Physically         • Digitally               distributed
        distributed          distributed           • Access and
      • Access and         • Access and              discovery
        discovery            discovery               through social
        through              through search          networks
        libraries            engines




                                                                      5
STM publishing use cases in transition
Use case                                         Digital Library era                            Platform-as-a-service era
A new medical term relevant to an emerging       Organizational governance issues about how     A single, automated and standardized
healthcare issue (e.g. a new type of avian flu   taxonomies are be updated, coupled with        taxonomy management and content
virus) needs to be incorporated into a search    manually-intensive workflows and ad-hoc        enhancement workflow allows rapid and
index immediately                                approaches to content tagging, inhibit rapid   timely update of search applications
                                                 response
Application developers want to mash up           Data silos without easy means of               Content API and single-point-of-access
epidemiological data with medical journal        programmatic access by developers, coupled     repository allow data and content to be
articles to create topic-specific Web resource   with governance and business model             accessed, discovered and reused across
                                                 questions , inhibit data reuse                 multiple applications
Digital library developers want to stage         Duplication of core content leads to           Consolidation of duplicate repositories into a
content into single repository for unified       synchronization, quality control issues        single point of truth across all content
search index generation                                                                         accessible and discoverable through a
                                                                                                Content API eliminates the need for
                                                                                                duplication and synchronization
Third party solutions providers want to          No standards, no APIs for point-of-care        Standards and APIs that scale across multiple
integrate content (e.g. tagged medical journal   content integration across all content and     partners, for all content types, for all delivery
articles, medical taxonomies) into point-of-     data                                           formats
care solutions
Publishers want to deliver their content to      No clear standard or approach for targeting    Web- and industry-standards for eReader,
tablets and e-readers in delivery formats that   emerging eReader, tablet devices, multiple     tablet devices supported as part of standard
take advantage of the displays and interaction   and divergent approaches leading to siloed     automated processing into delivery channel-
modalities on those devices                      solutions, duplication of effort               specific formats, regularly updated and
                                                                                                exposed through a Content API
Journal publisher wants to integrate content     No single point of access to content           Easy access to multiple opportunities for
enhancements across multiple subject matter      enhancements, no standards for content         content enhancements embedded in
areas to add value to products leveraging        enhancement suppliers and partners to          standard next-generation article formats and
Article of the Future technology                 deliver enhancements for integration           provided using standard content
                                                                                                enhancement formats
                                                                                                                                              6
Facets of STM publishing processes

                                                   Process Type


                                            Access and
     Acquisition        Transformation                         Enhancement           Composition         Delivery
                                             discovery




                    Entity                                            Activity                      Content Type
                                                         submitting           entity extraction
author                  product catalog
                                                         crawling             fact extraction
supplier                editor
                                                         syndicating          clustering           article
Web site                reviewer
                                                         formatting           aggregating          book
typesetter              user
                                                         mapping              ordering             media object
automated process       designer
                                                         cleansing            summarizing          entity record
subject matter expert   developer
                                                         indexing             filtering            taxonomy
search engine           e-book
                                                         querying             analysis             ontology
content repository      mobile app
                                                         updating             rendering            user-generated content
entity registry         mobile-enhanced Web site
                                                         storing              design
                        API
                                                         annotating           publishing
                                                         subject tagging      accessing
                                                         classification       retrieving
                                                         entity recognition   deleting


                                                                                                                    7
Emerging content requirements

 •   Broad range of content types                            •   Accessible
      –   Must treat as first-class objects video, audio,         –   Must be easily accessed through content
          images, datasets, metadata and knowledge                    creation, retrieval, update and deletion (CRUD)
          organization systems in addition to articles and            services
          books
                                                             •   Flexible
 •   Standards-based                                              –   New content types and associated schemas
      –   Web-standard formats to support ease of                     must be easily added through configuration
          integration and interoperability
                                                             •   Reusable
 •   Fine-grained                                                 –   It must be efficient for product developers to
      –   Must be decomposable into and addressable in                aggregate and compose content fragments into
          fragments smaller than the unit of publication;             new products
          e.g., down to the level of specific words,
          phrases, images, table cells in articles or book
                                                             •   Modifiable
          chapters, key frames and segments in videos             –   Support the enhancement and correction of
                                                                      content at any time following creation
 •   Discoverable
      –   Must be easily located across all levels of
                                                             •   Broad range of delivery formats
          granularity,                                            –   Content standards and services must support
                                                                      fulfillment, delivery and presentation across
                                                                      desktop, notebook, tablet and mobile
                                                                      computing devices



                                                                                                                      8
Emerging content architecture

                                 Linked data


                                                            Relational
                                                            metadata
                                           Entity record
                          Relational
                          Metadata
               Document                                        Relational
                                                               metadata




                              Relational
     Acquire                  Metadata                     Relational       Deliver
                                                           metadata
                                           Media object


                          Relational                          Relational
                                                              metadata
                          Metadata




                                  Transform,
                               Enhance, Compose


                                                                                      9
Content acquisition and transformation




                                         10
Content enhancement and analytics




                                    11
Content composition and delivery




                                   12
Why NoSQL is important to STM publishing


 • NoSQL emphasizes design choices that focus on
   delivering robust, scalable Web applications
   –   Document-centric
   –   Schemaless
   –   Support for analytics
   –   Read/write at Web scale
   –   Move scale-out from development to operations
 • As we shift to the platform-as-a-service era,
   these features become an important part of the
   STM publishing technology stack
                                                       13
How NoSQL addresses STM publishing’s needs

 • Schemaless, document-centric stores
     – Ease repository extension to accommodate expanding range of new, finer-
       grained content types
     – Fit HTML5/JS/CSS content stack providing web-based alternatives to native apps
     – Expedite application stack refresh in support of authoring and editorial workflow
       portals and tools
 • Support for analytics eases innovation in scientometrics
 • Read/write at Web scale accommodates solutions incorporating content
   at more dynamic, fine-grained scale
     –   Entity records
     –   Annotations
     –   Other forms of community-contributed content
     –   Linked data integration of heterogeneous information resources across the Web
         for mashups/solutions
 • Moving scale-out from development to operations reduces time-to-
   market, cost of failure for emerging, niche publishing opportunities


                                                                                      14
Where STM publishing can drive NoSQL requirements


 • Integrated support for search
    – Free text retrieval
    – Faceted navigation
 • Query language functionality
    – Nearest-neighbor matching
    – Joins vs. join-free
 • Primitives/support for analytics design patterns
    – Clustering
    – Classification
    – Entity resolution
 • Primitives/support for semantic enhancement
    – Linked data
    – Language processing
 • Versioning for document stores

                                                      15
Elsevier applications of NoSQL technologies


 •   Entity registries
 •   Metadata repositories
 •   Big data analytics
 •   User-built apps




                                              16
Linked Data Repository




                         17
SciVal




         18
SciVerse




           19
Conclusions


 • STM publishing is in transition
 • This is driving new requirements for content
 • Many of these requirements are well met by
   NoSQL solutions
 • Some requirements point to areas of future
   work for NoSQL technologists and vendors



                                                  20

Weitere ähnliche Inhalte

Mehr von DATAVERSITY

Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsDATAVERSITY
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelDATAVERSITY
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?DATAVERSITY
 

Mehr von DATAVERSITY (20)

Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and Analytics
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?
 

Kürzlich hochgeladen

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Kürzlich hochgeladen (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

NoSQL technologies from an STM publishing perspective

  • 1. NoSQL technologies from an STM publishing perspective Bradley P. Allen, Elsevier Labs Presentation at NoSQL Now 2011 San Jose, CA, USA 2011-08-25
  • 2. Peak physical media: is it here? • “Music Sales”, New York Times, 1 August 2009. http://www.nytimes.com/imagepages/2009/08/01/opinion/01blow.ready.html • “Initial Circs per student”, William Denton, 31 January 2011. http://www.miskatonic.org/2011/01/31/initial-circs-student • “Rise of e-book Readers to Result in Decline of Book Publishing Business”, Steven Mather, iSuppli, 28 April 2011. http://www.isuppli.com/Home-and-Consumer- Electronics/News/Pages/Rise-of-e-book-Readers-to-Result-in-Decline-of-Book- Publishing-Business.aspx 2
  • 3. In any case, the challenge to STM publishers is clear • Print revenue is softening • Online channels are exploding – Changing the way customers create and consume our content – Leading to new requirements and market opportunities for online products 3
  • 4. Additional challenges in STM publishing • Academic context and tradition inhibits business model innovation • Technology and business traditionally separate concerns • Acquisitions create content and data silos • Global market drives lowest common denominator technology choices 4
  • 5. A simple model of the evolution of STM publishing Print era: 1600s - Digital Library era: Platform-as-a- 1980 1980 – 2010s service era: 2010s • Packaged as • Packaged as • Packaged as books and books and apps journals journals • Digitally • Physically • Digitally distributed distributed distributed • Access and • Access and • Access and discovery discovery discovery through social through through search networks libraries engines 5
  • 6. STM publishing use cases in transition Use case Digital Library era Platform-as-a-service era A new medical term relevant to an emerging Organizational governance issues about how A single, automated and standardized healthcare issue (e.g. a new type of avian flu taxonomies are be updated, coupled with taxonomy management and content virus) needs to be incorporated into a search manually-intensive workflows and ad-hoc enhancement workflow allows rapid and index immediately approaches to content tagging, inhibit rapid timely update of search applications response Application developers want to mash up Data silos without easy means of Content API and single-point-of-access epidemiological data with medical journal programmatic access by developers, coupled repository allow data and content to be articles to create topic-specific Web resource with governance and business model accessed, discovered and reused across questions , inhibit data reuse multiple applications Digital library developers want to stage Duplication of core content leads to Consolidation of duplicate repositories into a content into single repository for unified synchronization, quality control issues single point of truth across all content search index generation accessible and discoverable through a Content API eliminates the need for duplication and synchronization Third party solutions providers want to No standards, no APIs for point-of-care Standards and APIs that scale across multiple integrate content (e.g. tagged medical journal content integration across all content and partners, for all content types, for all delivery articles, medical taxonomies) into point-of- data formats care solutions Publishers want to deliver their content to No clear standard or approach for targeting Web- and industry-standards for eReader, tablets and e-readers in delivery formats that emerging eReader, tablet devices, multiple tablet devices supported as part of standard take advantage of the displays and interaction and divergent approaches leading to siloed automated processing into delivery channel- modalities on those devices solutions, duplication of effort specific formats, regularly updated and exposed through a Content API Journal publisher wants to integrate content No single point of access to content Easy access to multiple opportunities for enhancements across multiple subject matter enhancements, no standards for content content enhancements embedded in areas to add value to products leveraging enhancement suppliers and partners to standard next-generation article formats and Article of the Future technology deliver enhancements for integration provided using standard content enhancement formats 6
  • 7. Facets of STM publishing processes Process Type Access and Acquisition Transformation Enhancement Composition Delivery discovery Entity Activity Content Type submitting entity extraction author product catalog crawling fact extraction supplier editor syndicating clustering article Web site reviewer formatting aggregating book typesetter user mapping ordering media object automated process designer cleansing summarizing entity record subject matter expert developer indexing filtering taxonomy search engine e-book querying analysis ontology content repository mobile app updating rendering user-generated content entity registry mobile-enhanced Web site storing design API annotating publishing subject tagging accessing classification retrieving entity recognition deleting 7
  • 8. Emerging content requirements • Broad range of content types • Accessible – Must treat as first-class objects video, audio, – Must be easily accessed through content images, datasets, metadata and knowledge creation, retrieval, update and deletion (CRUD) organization systems in addition to articles and services books • Flexible • Standards-based – New content types and associated schemas – Web-standard formats to support ease of must be easily added through configuration integration and interoperability • Reusable • Fine-grained – It must be efficient for product developers to – Must be decomposable into and addressable in aggregate and compose content fragments into fragments smaller than the unit of publication; new products e.g., down to the level of specific words, phrases, images, table cells in articles or book • Modifiable chapters, key frames and segments in videos – Support the enhancement and correction of content at any time following creation • Discoverable – Must be easily located across all levels of • Broad range of delivery formats granularity, – Content standards and services must support fulfillment, delivery and presentation across desktop, notebook, tablet and mobile computing devices 8
  • 9. Emerging content architecture Linked data Relational metadata Entity record Relational Metadata Document Relational metadata Relational Acquire Metadata Relational Deliver metadata Media object Relational Relational metadata Metadata Transform, Enhance, Compose 9
  • 10. Content acquisition and transformation 10
  • 11. Content enhancement and analytics 11
  • 12. Content composition and delivery 12
  • 13. Why NoSQL is important to STM publishing • NoSQL emphasizes design choices that focus on delivering robust, scalable Web applications – Document-centric – Schemaless – Support for analytics – Read/write at Web scale – Move scale-out from development to operations • As we shift to the platform-as-a-service era, these features become an important part of the STM publishing technology stack 13
  • 14. How NoSQL addresses STM publishing’s needs • Schemaless, document-centric stores – Ease repository extension to accommodate expanding range of new, finer- grained content types – Fit HTML5/JS/CSS content stack providing web-based alternatives to native apps – Expedite application stack refresh in support of authoring and editorial workflow portals and tools • Support for analytics eases innovation in scientometrics • Read/write at Web scale accommodates solutions incorporating content at more dynamic, fine-grained scale – Entity records – Annotations – Other forms of community-contributed content – Linked data integration of heterogeneous information resources across the Web for mashups/solutions • Moving scale-out from development to operations reduces time-to- market, cost of failure for emerging, niche publishing opportunities 14
  • 15. Where STM publishing can drive NoSQL requirements • Integrated support for search – Free text retrieval – Faceted navigation • Query language functionality – Nearest-neighbor matching – Joins vs. join-free • Primitives/support for analytics design patterns – Clustering – Classification – Entity resolution • Primitives/support for semantic enhancement – Linked data – Language processing • Versioning for document stores 15
  • 16. Elsevier applications of NoSQL technologies • Entity registries • Metadata repositories • Big data analytics • User-built apps 16
  • 18. SciVal 18
  • 19. SciVerse 19
  • 20. Conclusions • STM publishing is in transition • This is driving new requirements for content • Many of these requirements are well met by NoSQL solutions • Some requirements point to areas of future work for NoSQL technologists and vendors 20