SlideShare ist ein Scribd-Unternehmen logo
1 von 20
HathiTrust Research Center
  Architecture Overview
      Robert H. McDonald | @mcdonald
Executive Committee-HathiTrust Research Center (HTRC)
         Deputy Director-Data to Insight Center
           Associate Dean-University Libraries
                Indiana University
Follow Along




http://slidesha.re/U4z1gW
HTRC Architecture Group
Indiana University   University of Illinois
• Beth Plale, Lead   • J. Stephen Downie
• Yiming Sun         • Loretta Auvil
• Stacy Kowalczyk    • Boris Capitanu
• Aaron Todd
                     • Kirk Hess
• Jiaan Zeng
                     • Harriett Green
• Guangchen Ruan
• Zong Peng
• Swati Nagde
Presentation Overview
•   Considerations for Current Architecture
•   Architecture - Use Case Methodology
•   Technical Overview
•   UnCamp Sessions for Further Review
Main Case – Data Near
             Computation
                                   HTRC
  HT                              Volume
                 HT              Store and
Volume
              Volume               Index
 Store
               Store               (IUB)
 (UM)                                              XSEDE
              (IUPUI)
                                                 Compute
                         FutureGrid              Allocation
                        Computation
                           Cloud                UIUC
                                             Compute
                                             Allocation
                             IU
                         Compute
                         Allocation
Non-Consumptive Research Paradigm
• No action or set of actions on part of users,
  either acting alone or in cooperation with
  other users over duration of one or multiple
  sessions can result in sufficient information
  gathered from collection of copyrighted works
  to reassemble pages from collection.
• Definition disallows collusion between users,
  or accumulation of material over time.
  Differentiates human researcher from proxy
  which is not a user. Users are human beings.
Amicus Brief and NCR
• Jockers, Sag, Schultz –
• http://tinyurl.com/cy34hhr
Use Cases for Phase 1 Architecture
• Use Case #1 - Previously registered user
  submitted algorithm retrieved and run with
  results set
• Use Case #2 - HTRC applications/portal access
  (SEASR)
• Use Case #3 – Blacklight Lucene/Solr faceted
  access
• Use Case #4 - Direct programmatic access
  through Secure Data API
HTRC Current Infrastructure
• Servers
  – 14 production-level quad-core servers
     • 16 – 32GB of memory
     • 250 – 500GB of local disk each
  – 6-node Cassandra cluster for volume store
  – Ingest service and secure Data API access point
• Storage (IU University Infrastructure)
  – 13TB of 15,000 RPM SAS disk storage
  – Increase up to 17TB by end of 2012
  – 500TB available in late year 2-year 3
Key Components of Architecture
•   Portal Access
•   Blacklight Access
•   Agent
•   Registry
•   Secured Data API Access
HTRC Architecture
 Portal Access
              Blacklight
                                                            Direct
          Agent                                         programmatic
                                                          access (by
Application       Collection                          programs running
submission         building                          on HTRC machines)


                                      Security (OAuth2)
                                                    Data API access interface           Solr Proxy
               Registry (WSO2)                                           Audit
                               Meandre
         Algorithms                                              Cassandra
                               Workflows
                                                                 cluster
                                                                 volume store
         Result Sets           Collections
                                                                           Solr index




 Compute resources
                                                           Storage resources
HTRC Architecture                                         Portal Access
 Portal Access
              Blacklight
                                                                HTRC Portal
                                                            Direct
          Agent                                         programmatic
                                                          access (by
Application       Collection                          programs running                  Blacklight
submission         building                          on HTRC machines)


                                      Security (OAuth2)
                                                    App SEAR                     App Blacklight
                                                    Data API access interface              Solr Proxy
               Registry (WSO2)                                           Audit
                               Meandre
         Algorithms                                              Cassandra
                               Workflows
                                                                 cluster
                                                                 volume store
         Result Sets           Collections
                                                                           Solr index




 Compute resources
                                                           Storage resources
HTRC Architecture                                              Agent
 Portal Access
                                                           HTRC Agent
              Blacklight
                                                           Direct
          Agent                                   Application
                                                       programmatic       Collection
                                                         access (by
Application       Collection
                                                  submission
                                                     programs running      building
submission         building                         on HTRC machines)


                                      Security (OAuth2)
                                                    Data API access interface          Solr Proxy
               Registry (WSO2)                                          Audit
                               Meandre
         Algorithms                                              Cassandra
                               Workflows
                                                                 cluster
                                                                 volume store
         Result Sets           Collections
                                                                          Solr index




 Compute resources
                                                          Storage resources
HTRC Architecture                                         HTRC Registry
 Portal Access
                                                            Registry (WSO2)
              Blacklight
                                                                                Meandre
                                                     Algorithms
                                                         Direct
                                                        programmatic
                                                                                Workflows
          Agent
                                                          access (by
Application       Collection                          programs running




                                                                     1
submission         building                          on HTRC Sets
                                                     Resultmachines)            Collections

                                      Security (OAuth2)
                                                    Data API access interface            Solr Proxy
               Registry (WSO2)                                           Audit
                               Meandre
         Algorithms                                              Cassandra
                               Workflows
                                                                 cluster
                                                                 volume store
         Result Sets           Collections
                                                                           Solr index




 Compute resources
                                                           Storage resources
HTRC Architecture
                                                             Secure Data API
 Portal Access
              Blacklight                                • RESTful Web Service
                                                    Direct     –   Language agnostic
          Agent                                 programmatic –     Clients don’t have to
                                                  access (by
Application       Collection                  programs running     deal with Cassandra
submission         building
                                                             • Simple OAuth2
                                             on HTRC machines)

                                                                  authentication
                                  Security (OAuth2)          • HTTP over SSL
                                                             • Audits
                                                 Data API access interface client access
                                                                                  Solr Proxy
                Registry (WSO2)
                                                             • Protected behind
                                                                      Audit
         Algorithms
                            Meandre
                           Workflows
                                                                  firewall, accessible
                                                              Cassandra
                                                                  only to authorized IPs
                                                              cluster
                                                         volume store
         Result Sets           Collections
                                                                    Solr index

                                                                        HTRC
 Compute resources
                                                   Storage resources
NoSQL Methodology
• Currently HT content is stored in a pair-tree file
  system convention (CDL)
• Moving these files into a NoSQL store like
  Cassandra enabled HTRC to aggregate them into
  larger sets of files for use in retrieval
• Use of Cassandra enabled HTRC to share content
  over a commodity based Cassandra cluster of
  virtual machines
• Originally investigated use of MongoDB,
  CouchDB, Hbase and Cassandra
HTRC Solr index
• The Solr Data API 0.1 test version
  – Preserves all query syntax of original Solr
  – Prevents user from modification
  – Hides the host machine and port number HTRC
    Solr is actually running on
  – Creates audit log of requests
  – Provides filtered term vector for words starting
    with user-specified letter
Data Capsules VM
            Cluster

                                          HTRC Volume
                                         Store and Index

      Remote
                  Provide secure
      Desktop
                  VM
      Or VNC

                       Submit secure
 Scholars              capsule
                                          FutureGrid
                       map/reduce Data
                                         Computation
                       Capsule images
                                            Cloud
                       to FutureGrid.
                       Receive and
                       review results


Non-Consumptive Research-Secure Data Capsule
Sessions for Further Review
• For more on API – Tues Topic I/II
  (Yiming Sun)
• For more on Portal/SEASR – Tues Topic II
  (Loretta Auvil)
• For more on Portal/Blacklight – Tues Topic III
  (Stacy Kowalczyk)
Contact Information
• Robert H. McDonald
  – Email – robert@indiana.edu
  – Chat – rhmcdonald on googletalk | skype
  – Twitter - @mcdonald
  – Blog – http://www.rmcdonald.net
  – Twitter Hashtag: #HTRC12

Weitere ähnliche Inhalte

Andere mochten auch

Cloud Architecture in the Data Center
Cloud Architecture in the Data CenterCloud Architecture in the Data Center
Cloud Architecture in the Data CenterInterVision Systems
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederOpenAIRE
 
Data Center: Earth
Data Center: EarthData Center: Earth
Data Center: EarthRich Rogers
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopSpagoWorld
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSAmazon Web Services LATAM
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureGunawan Jusuf
 
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...Maria Demitras
 
Saving money with smart data center design
Saving money with smart data center designSaving money with smart data center design
Saving money with smart data center designSwitchOn to Eaton
 
Data Center Free Cooling in the Middle East
Data Center Free Cooling in the Middle EastData Center Free Cooling in the Middle East
Data Center Free Cooling in the Middle EastSyskaHennessy
 
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6John Sing
 

Andere mochten auch (12)

Cloud Architecture in the Data Center
Cloud Architecture in the Data CenterCloud Architecture in the Data Center
Cloud Architecture in the Data Center
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
MetaFabric Architecture
MetaFabric ArchitectureMetaFabric Architecture
MetaFabric Architecture
 
Data Center: Earth
Data Center: EarthData Center: Earth
Data Center: Earth
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture
 
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...
 
Saving money with smart data center design
Saving money with smart data center designSaving money with smart data center design
Saving money with smart data center design
 
Data Center Free Cooling in the Middle East
Data Center Free Cooling in the Middle EastData Center Free Cooling in the Middle East
Data Center Free Cooling in the Middle East
 
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
 

Ähnlich wie HTRC Architecture Overview

HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionRobert H. McDonald
 
OGCE MSI Presentation
OGCE MSI PresentationOGCE MSI Presentation
OGCE MSI Presentationmarpierc
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environmentsDocker, Inc.
 
Wc Mand Connectors2
Wc Mand Connectors2Wc Mand Connectors2
Wc Mand Connectors2day
 
Introduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformIntroduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformGruter
 
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...Kai Wähner
 
Denial of Service in Software Defined Netoworks
Denial of Service in Software Defined NetoworksDenial of Service in Software Defined Netoworks
Denial of Service in Software Defined NetoworksMohammad Faraji
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategydrmarcustillett
 
Virtualization Map Tech Ed2009
Virtualization Map Tech Ed2009Virtualization Map Tech Ed2009
Virtualization Map Tech Ed2009rsnarayanan
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source Nitesh Jadhav
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowKai Wähner
 
Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka confluent
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastMapR Technologies
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Mindtree Ltd.
 
Play with cloud foundry
Play with cloud foundryPlay with cloud foundry
Play with cloud foundryPeng Wan
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 

Ähnlich wie HTRC Architecture Overview (20)

HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast Version
 
OGCE MSI Presentation
OGCE MSI PresentationOGCE MSI Presentation
OGCE MSI Presentation
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
 
Wc Mand Connectors2
Wc Mand Connectors2Wc Mand Connectors2
Wc Mand Connectors2
 
Introduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformIntroduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData Platform
 
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
 
Denial of Service in Software Defined Netoworks
Denial of Service in Software Defined NetoworksDenial of Service in Software Defined Netoworks
Denial of Service in Software Defined Netoworks
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategy
 
Virtualization Map Tech Ed2009
Virtualization Map Tech Ed2009Virtualization Map Tech Ed2009
Virtualization Map Tech Ed2009
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
 
Microservices
MicroservicesMicroservices
Microservices
 
Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Avenue Omg
Avenue OmgAvenue Omg
Avenue Omg
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Io t data streaming
Io t data streamingIo t data streaming
Io t data streaming
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).
 
Play with cloud foundry
Play with cloud foundryPlay with cloud foundry
Play with cloud foundry
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 

Mehr von Robert H. McDonald

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelRobert H. McDonald
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15Robert H. McDonald
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesRobert H. McDonald
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14Robert H. McDonald
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkRobert H. McDonald
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsRobert H. McDonald
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesRobert H. McDonald
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudRobert H. McDonald
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...Robert H. McDonald
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Robert H. McDonald
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...Robert H. McDonald
 

Mehr von Robert H. McDonald (20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote Slides
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
SCONUL Kuali OLE Briefing
SCONUL Kuali OLE BriefingSCONUL Kuali OLE Briefing
SCONUL Kuali OLE Briefing
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
 

Kürzlich hochgeladen

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 

Kürzlich hochgeladen (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

HTRC Architecture Overview

  • 1. HathiTrust Research Center Architecture Overview Robert H. McDonald | @mcdonald Executive Committee-HathiTrust Research Center (HTRC) Deputy Director-Data to Insight Center Associate Dean-University Libraries Indiana University
  • 3. HTRC Architecture Group Indiana University University of Illinois • Beth Plale, Lead • J. Stephen Downie • Yiming Sun • Loretta Auvil • Stacy Kowalczyk • Boris Capitanu • Aaron Todd • Kirk Hess • Jiaan Zeng • Harriett Green • Guangchen Ruan • Zong Peng • Swati Nagde
  • 4. Presentation Overview • Considerations for Current Architecture • Architecture - Use Case Methodology • Technical Overview • UnCamp Sessions for Further Review
  • 5. Main Case – Data Near Computation HTRC HT Volume HT Store and Volume Volume Index Store Store (IUB) (UM) XSEDE (IUPUI) Compute FutureGrid Allocation Computation Cloud UIUC Compute Allocation IU Compute Allocation
  • 6. Non-Consumptive Research Paradigm • No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection. • Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
  • 7. Amicus Brief and NCR • Jockers, Sag, Schultz – • http://tinyurl.com/cy34hhr
  • 8. Use Cases for Phase 1 Architecture • Use Case #1 - Previously registered user submitted algorithm retrieved and run with results set • Use Case #2 - HTRC applications/portal access (SEASR) • Use Case #3 – Blacklight Lucene/Solr faceted access • Use Case #4 - Direct programmatic access through Secure Data API
  • 9. HTRC Current Infrastructure • Servers – 14 production-level quad-core servers • 16 – 32GB of memory • 250 – 500GB of local disk each – 6-node Cassandra cluster for volume store – Ingest service and secure Data API access point • Storage (IU University Infrastructure) – 13TB of 15,000 RPM SAS disk storage – Increase up to 17TB by end of 2012 – 500TB available in late year 2-year 3
  • 10. Key Components of Architecture • Portal Access • Blacklight Access • Agent • Registry • Secured Data API Access
  • 11. HTRC Architecture Portal Access Blacklight Direct Agent programmatic access (by Application Collection programs running submission building on HTRC machines) Security (OAuth2) Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
  • 12. HTRC Architecture Portal Access Portal Access Blacklight HTRC Portal Direct Agent programmatic access (by Application Collection programs running Blacklight submission building on HTRC machines) Security (OAuth2) App SEAR App Blacklight Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
  • 13. HTRC Architecture Agent Portal Access HTRC Agent Blacklight Direct Agent Application programmatic Collection access (by Application Collection submission programs running building submission building on HTRC machines) Security (OAuth2) Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
  • 14. HTRC Architecture HTRC Registry Portal Access Registry (WSO2) Blacklight Meandre Algorithms Direct programmatic Workflows Agent access (by Application Collection programs running 1 submission building on HTRC Sets Resultmachines) Collections Security (OAuth2) Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
  • 15. HTRC Architecture Secure Data API Portal Access Blacklight • RESTful Web Service Direct – Language agnostic Agent programmatic – Clients don’t have to access (by Application Collection programs running deal with Cassandra submission building • Simple OAuth2 on HTRC machines) authentication Security (OAuth2) • HTTP over SSL • Audits Data API access interface client access Solr Proxy Registry (WSO2) • Protected behind Audit Algorithms Meandre Workflows firewall, accessible Cassandra only to authorized IPs cluster volume store Result Sets Collections Solr index HTRC Compute resources Storage resources
  • 16. NoSQL Methodology • Currently HT content is stored in a pair-tree file system convention (CDL) • Moving these files into a NoSQL store like Cassandra enabled HTRC to aggregate them into larger sets of files for use in retrieval • Use of Cassandra enabled HTRC to share content over a commodity based Cassandra cluster of virtual machines • Originally investigated use of MongoDB, CouchDB, Hbase and Cassandra
  • 17. HTRC Solr index • The Solr Data API 0.1 test version – Preserves all query syntax of original Solr – Prevents user from modification – Hides the host machine and port number HTRC Solr is actually running on – Creates audit log of requests – Provides filtered term vector for words starting with user-specified letter
  • 18. Data Capsules VM Cluster HTRC Volume Store and Index Remote Provide secure Desktop VM Or VNC Submit secure Scholars capsule FutureGrid map/reduce Data Computation Capsule images Cloud to FutureGrid. Receive and review results Non-Consumptive Research-Secure Data Capsule
  • 19. Sessions for Further Review • For more on API – Tues Topic I/II (Yiming Sun) • For more on Portal/SEASR – Tues Topic II (Loretta Auvil) • For more on Portal/Blacklight – Tues Topic III (Stacy Kowalczyk)
  • 20. Contact Information • Robert H. McDonald – Email – robert@indiana.edu – Chat – rhmcdonald on googletalk | skype – Twitter - @mcdonald – Blog – http://www.rmcdonald.net – Twitter Hashtag: #HTRC12

Hinweis der Redaktion

  1. Registry – agent can deploy any service listed in this digram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -
  2. Registry – agent can deploy any service listed in this digram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -
  3. Registry – agent can deploy any service listed in this diagram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -
  4. Registry – agent can deploy any service listed in this diagram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -
  5. Registry – agent can deploy any service listed in this digram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -