SlideShare ist ein Scribd-Unternehmen logo
1 von 27
By
Sem Gebresilassie
13 May 2015
dwell.sem@gmail.com
Harvesting Statstical Metadata from an Online
Repository for Data Analysis and Visualization
Outline
 Goal and Motivation
 Theseus.fi
 Dspace
 Getting Data out from Dspace
 Dspace OAI-PMH as a Data provider for Theseus
 Request Types(Verbs)
 Flow Control
 Harvesting Data from Theseus’s Data provider
 Project Result
 Final thoughts
Goal
 Harvest metadata of thesis documents
from Theseus
author name, title, keywords, submission year....
 Store the harvested data into a separate
MYSQL database.
 Build a Web portal out of this stored data
Goal and Motivation
Why conduct this project?
 Thesis data analysis and visualization of
overall statistical facts.
 Compare thesis documents
 Compare universities and departments
 Analyse trending keywords used by
students every year
Theseus.fi
 Digital libraries are now commonly used by academic institutions
worldwide.
 Theseus provides online access to theses and publications from Finnish
universities of applied sciences.
 End users can search, browse and upload thesis documents to Theseus.
...
 Theseus also has an API that can be used by third party organizations to
utilize theses data.
 Theseus is powered by a pioneer open source digital asset management
system called Dspace.
 Functionalities and features of Theseus are inherited from Dspace.
Dspace
 Dspace is an open source software platform that provides stable, long-term
storages commonly for digital intellectual materials.
 Many academic institutions worldwide use Dspace to offer their users an
easy access to their digital resources.
 Dspace can be freely downloaded and used or even modified to store digital
materials.
Abbreviations
OAI: Open Archives Initiative
PMH: Protocol for Metadata Harvesting
Getting Data out from Dspace
 OAI-PMH is HTTP based protocol that defines methods and protocols for
sharing, publishing and archiving metadata from Dspace repositories
 Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is
used to programatically access data from Dspace.
Dspace OAI-PMH as a Data provider for Theseus
 Dspace repositories have an 'OAI Base URL' in addition the URL for human users.
OAI Base URL : http://publications.theseus.fi/oai/request?
URL for human users : https://www.theseus.fi/
 This URL is used in machine to machine communications between data consumers
and data harvesters.
 When harvesting request is made using the OAI Base URL , Theseus’s data
provider returns XML formatted metadata of thesis documents.
…
 Theseus OAI-PMH exposes thesis documents in twelve unique metadata
formats.
KansalliKirjasto format:
<kk:field schema="dc" element="contributor" qualifier="author" language="none" value=" Denut,
Nicolae "/>
OAI Dublin Core format :
<dc:creator> Denut, Nicolae </dc:creator>
 Each metadata format can be queried to get any data from Theseus’s data
provider.
Request Types (Verbs)
 There are six methods in OAI-PMH that can be appended to OAI based URLs
to access different repository contents.
 Theseus implements all six request types to provide thesis metadata to
harvesters.
1. Identify: fetches information about Theseus data-provider itself
2. ListMetadataFormats: returns a list of available metadata formats supported by a
Theseus data provider
3. ListIdentifiers: lists thesis record identifiers
…
4. ListSets: retrieves the set structure (list of universities and departments) .
5. ListRecords: gets list of complete metadata of thesis documents from a Theseus and
6. GetRecord: retrieves individual metadata of a thesis document
 By attaching any one of these request types to Theseus’s OAI base URL,a
request URL can be formed.
+AOI Base
URL
Request
type
=>
Request
URL
http://publications.theseus.fi/oai/request?verb=ListSets
Flow control
 The three request types ListIdentifiers, ListSets and ListRecords return large
lists from Theseus.
 In such cases, it is practical to partition them among a series of requests and
responses.
 Resumption tokens are options from OAI protocol that allow data providers
to chunk long list responses in parts.
Resumption token work flow
Harvesting Data from Theseus’s Data provider
 Simple HTML DOM parser, is an open source parser library written in PHP to
read, modify, and return structured content from external data sources.
 This parser library can create a Document Object Model by loading
structured data from a URL.
 To get nodes of the DOM object , this library provides a method called
“find ()”.
Universities Departments Thesis documents
Identifier (setSpec) identifier (setSpec) Thesis Identifier
University name Department Name Author names
ListSets Request URLs ListSets Request URLs Titles
Total number of papers Total number of papers GetRecord request URLs
University identifiers Department identifiers
University identifiers
Keywords
Subjects (official keywords)
Number of pages
year
Language
Summary of gathered theses metadata
84,391
Whoa! That’s a big number, aren’t you proud?
Project Result
• How many Thesis documents are in Theseus?
• Which school has what amount of papers in Theseus?
• How many papers is each school publishing every year?
• What departments are there in each school?
• How many papers belong to which department?
• How many pages does each paper have?
• In what language is the paper written?
• How many times has each paper been downloaded by
Theseus visitors?
• What are the keywords of each thesis document?
 The built Web portal aims to give better insights on the contribution of each
school to Theseus on its front page.
Web portal showing
Departments versus number of Thesis documents in Metropolia UAS
Analysing Keywords is also easy
I want to analyse
keywords
Fill out a
form
See results
Keyword fetching form
Thank you!
Any questions?
dwell.sem@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...
Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...
Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...InfluxData
 
A guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on KubernetesA guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on Kubernetest8kobayashi
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Glen Hawkins
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using LokiKnoldus Inc.
 
Microsoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow ScenariosMicrosoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow ScenariosMark Kromer
 
Bringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherBringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherDavid La Motta
 
Oracle data guard for beginners
Oracle data guard for beginnersOracle data guard for beginners
Oracle data guard for beginnersPini Dibask
 
Oracle COTS Applications on AWS
Oracle COTS Applications on AWSOracle COTS Applications on AWS
Oracle COTS Applications on AWSTom Laszewski
 
Openstack heat & How Autoscaling works
Openstack heat & How Autoscaling worksOpenstack heat & How Autoscaling works
Openstack heat & How Autoscaling worksCoreStack
 
Security of Oracle EBS - How I can Protect my System (UKOUG APPS 18 edition)
Security of Oracle EBS - How I can Protect my System (UKOUG APPS 18 edition)Security of Oracle EBS - How I can Protect my System (UKOUG APPS 18 edition)
Security of Oracle EBS - How I can Protect my System (UKOUG APPS 18 edition)Andrejs Prokopjevs
 
VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu
 
Exadata master series_asm_2020
Exadata master series_asm_2020Exadata master series_asm_2020
Exadata master series_asm_2020Anil Nair
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesGustavo Rene Antunez
 

Was ist angesagt? (20)

Cloud Migration
Cloud MigrationCloud Migration
Cloud Migration
 
Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...
Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...
Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...
 
ZFS appliance
ZFS applianceZFS appliance
ZFS appliance
 
A guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on KubernetesA guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on Kubernetes
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
Openstack 101
Openstack 101Openstack 101
Openstack 101
 
Microsoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow ScenariosMicrosoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow Scenarios
 
Bringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherBringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack Together
 
Oracle data guard for beginners
Oracle data guard for beginnersOracle data guard for beginners
Oracle data guard for beginners
 
Oracle COTS Applications on AWS
Oracle COTS Applications on AWSOracle COTS Applications on AWS
Oracle COTS Applications on AWS
 
Openstack heat & How Autoscaling works
Openstack heat & How Autoscaling worksOpenstack heat & How Autoscaling works
Openstack heat & How Autoscaling works
 
EMCSymmetrix vmax-10
EMCSymmetrix vmax-10EMCSymmetrix vmax-10
EMCSymmetrix vmax-10
 
Security of Oracle EBS - How I can Protect my System (UKOUG APPS 18 edition)
Security of Oracle EBS - How I can Protect my System (UKOUG APPS 18 edition)Security of Oracle EBS - How I can Protect my System (UKOUG APPS 18 edition)
Security of Oracle EBS - How I can Protect my System (UKOUG APPS 18 edition)
 
VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020
 
Exadata master series_asm_2020
Exadata master series_asm_2020Exadata master series_asm_2020
Exadata master series_asm_2020
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
 

Ähnlich wie Dspace OAI-PMH

Organization of Patent as Open Source Software based Open Access Repository Item
Organization of Patent as Open Source Software based Open Access Repository ItemOrganization of Patent as Open Source Software based Open Access Repository Item
Organization of Patent as Open Source Software based Open Access Repository ItemMoumita Ash
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community CallOpenAIRE
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
PoolParty Thesaurus Management - ISKO UK, London 2010
PoolParty Thesaurus Management - ISKO UK, London 2010PoolParty Thesaurus Management - ISKO UK, London 2010
PoolParty Thesaurus Management - ISKO UK, London 2010Andreas Blumauer
 
Mendeley introduction NUI Galway 15th July
Mendeley introduction NUI Galway 15th JulyMendeley introduction NUI Galway 15th July
Mendeley introduction NUI Galway 15th JulyMichaela Kurschildgen
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data ModelingVital.AI
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the HaystackAdrian Stevenson
 
Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyCESSDA Training
 
The reach of Crossref metadata and who is using it
The reach of Crossref metadata and who is using itThe reach of Crossref metadata and who is using it
The reach of Crossref metadata and who is using itCrossref
 
Building Collections in IRs from External Data Sources
Building Collections in IRs from External Data SourcesBuilding Collections in IRs from External Data Sources
Building Collections in IRs from External Data SourcesSusan Matveyeva
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...rmacneil88
 

Ähnlich wie Dspace OAI-PMH (20)

Organization of Patent as Open Source Software based Open Access Repository Item
Organization of Patent as Open Source Software based Open Access Repository ItemOrganization of Patent as Open Source Software based Open Access Repository Item
Organization of Patent as Open Source Software based Open Access Repository Item
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
PoolParty Thesaurus Management - ISKO UK, London 2010
PoolParty Thesaurus Management - ISKO UK, London 2010PoolParty Thesaurus Management - ISKO UK, London 2010
PoolParty Thesaurus Management - ISKO UK, London 2010
 
Mendeley introduction NUI Galway 15th July
Mendeley introduction NUI Galway 15th JulyMendeley introduction NUI Galway 15th July
Mendeley introduction NUI Galway 15th July
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
Service Integration to Enhance RDM
Service Integration to Enhance RDMService Integration to Enhance RDM
Service Integration to Enhance RDM
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Brislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evsBrislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evs
 
Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values Study
 
The reach of Crossref metadata and who is using it
The reach of Crossref metadata and who is using itThe reach of Crossref metadata and who is using it
The reach of Crossref metadata and who is using it
 
Building Collections in IRs from External Data Sources
Building Collections in IRs from External Data SourcesBuilding Collections in IRs from External Data Sources
Building Collections in IRs from External Data Sources
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 

Kürzlich hochgeladen

VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...roncy bisnoi
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...tanu pandey
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...SUHANI PANDEY
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...SUHANI PANDEY
 
Enjoy Night⚡Call Girls Samalka Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Samalka Delhi >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Samalka Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Samalka Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 

Kürzlich hochgeladen (20)

VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
Enjoy Night⚡Call Girls Samalka Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Samalka Delhi >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Samalka Delhi >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Samalka Delhi >༒8448380779 Escort Service
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 

Dspace OAI-PMH

  • 1. By Sem Gebresilassie 13 May 2015 dwell.sem@gmail.com
  • 2. Harvesting Statstical Metadata from an Online Repository for Data Analysis and Visualization
  • 3. Outline  Goal and Motivation  Theseus.fi  Dspace  Getting Data out from Dspace  Dspace OAI-PMH as a Data provider for Theseus  Request Types(Verbs)  Flow Control  Harvesting Data from Theseus’s Data provider  Project Result  Final thoughts
  • 4. Goal  Harvest metadata of thesis documents from Theseus author name, title, keywords, submission year....  Store the harvested data into a separate MYSQL database.  Build a Web portal out of this stored data Goal and Motivation Why conduct this project?  Thesis data analysis and visualization of overall statistical facts.  Compare thesis documents  Compare universities and departments  Analyse trending keywords used by students every year
  • 5. Theseus.fi  Digital libraries are now commonly used by academic institutions worldwide.  Theseus provides online access to theses and publications from Finnish universities of applied sciences.  End users can search, browse and upload thesis documents to Theseus.
  • 6. ...  Theseus also has an API that can be used by third party organizations to utilize theses data.  Theseus is powered by a pioneer open source digital asset management system called Dspace.  Functionalities and features of Theseus are inherited from Dspace.
  • 7. Dspace  Dspace is an open source software platform that provides stable, long-term storages commonly for digital intellectual materials.  Many academic institutions worldwide use Dspace to offer their users an easy access to their digital resources.  Dspace can be freely downloaded and used or even modified to store digital materials.
  • 8. Abbreviations OAI: Open Archives Initiative PMH: Protocol for Metadata Harvesting
  • 9. Getting Data out from Dspace  OAI-PMH is HTTP based protocol that defines methods and protocols for sharing, publishing and archiving metadata from Dspace repositories  Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is used to programatically access data from Dspace.
  • 10. Dspace OAI-PMH as a Data provider for Theseus  Dspace repositories have an 'OAI Base URL' in addition the URL for human users. OAI Base URL : http://publications.theseus.fi/oai/request? URL for human users : https://www.theseus.fi/  This URL is used in machine to machine communications between data consumers and data harvesters.  When harvesting request is made using the OAI Base URL , Theseus’s data provider returns XML formatted metadata of thesis documents.
  • 11. …  Theseus OAI-PMH exposes thesis documents in twelve unique metadata formats. KansalliKirjasto format: <kk:field schema="dc" element="contributor" qualifier="author" language="none" value=" Denut, Nicolae "/> OAI Dublin Core format : <dc:creator> Denut, Nicolae </dc:creator>  Each metadata format can be queried to get any data from Theseus’s data provider.
  • 12. Request Types (Verbs)  There are six methods in OAI-PMH that can be appended to OAI based URLs to access different repository contents.  Theseus implements all six request types to provide thesis metadata to harvesters. 1. Identify: fetches information about Theseus data-provider itself 2. ListMetadataFormats: returns a list of available metadata formats supported by a Theseus data provider 3. ListIdentifiers: lists thesis record identifiers
  • 13. … 4. ListSets: retrieves the set structure (list of universities and departments) . 5. ListRecords: gets list of complete metadata of thesis documents from a Theseus and 6. GetRecord: retrieves individual metadata of a thesis document  By attaching any one of these request types to Theseus’s OAI base URL,a request URL can be formed.
  • 16. Flow control  The three request types ListIdentifiers, ListSets and ListRecords return large lists from Theseus.  In such cases, it is practical to partition them among a series of requests and responses.  Resumption tokens are options from OAI protocol that allow data providers to chunk long list responses in parts.
  • 18. Harvesting Data from Theseus’s Data provider  Simple HTML DOM parser, is an open source parser library written in PHP to read, modify, and return structured content from external data sources.  This parser library can create a Document Object Model by loading structured data from a URL.  To get nodes of the DOM object , this library provides a method called “find ()”.
  • 19. Universities Departments Thesis documents Identifier (setSpec) identifier (setSpec) Thesis Identifier University name Department Name Author names ListSets Request URLs ListSets Request URLs Titles Total number of papers Total number of papers GetRecord request URLs University identifiers Department identifiers University identifiers Keywords Subjects (official keywords) Number of pages year Language Summary of gathered theses metadata
  • 20. 84,391 Whoa! That’s a big number, aren’t you proud?
  • 21. Project Result • How many Thesis documents are in Theseus? • Which school has what amount of papers in Theseus? • How many papers is each school publishing every year? • What departments are there in each school? • How many papers belong to which department? • How many pages does each paper have? • In what language is the paper written? • How many times has each paper been downloaded by Theseus visitors? • What are the keywords of each thesis document?
  • 22.  The built Web portal aims to give better insights on the contribution of each school to Theseus on its front page.
  • 24. Departments versus number of Thesis documents in Metropolia UAS
  • 25. Analysing Keywords is also easy I want to analyse keywords Fill out a form See results