SlideShare a Scribd company logo
1 of 3
Download to read offline
Proc. Int’l Conf. on Dublin Core and Metadata Applications 2015 
Best Practice Poster:  
Gateway to Oklahoma History Case Study:  
Structured Data and Metadata Evaluation for Improved Image 
Resource Findability on the Web 
 
  Emily Ann Kolvitz 
University of Oklahoma, 
USA 
kolvitz1@gmail.com 
 
 
 
Keywords: structured data; metadata; Semantic Web; online search; information retrieval; image                     
findability 
1. Introduction 
Image Resource Findability on the World Wide Web is still very much a land­grab. For the                               
Semantic Web to become a reality online businesses and individuals have to get their hands dirty                               
and also come face­to­face with the realization that search engine giants are increasingly                         
becoming the go­to tool for information resource retrieval. “Increasingly, students use Web                       
search engines such as Google to locate information resources rather than seek out library online                             
catalogs or databases of scholarly journal articles” (Lippincott 2013). This puts the search engine                           
giant in a unique position to dictate how the future of search will work on the Web ­ and                                     
therefore, your organization’s future presence (or lack thereof) on the Web. Search Engine                         
Optimization (SEO) techniques change frequently and remain much a mystery to many                       
companies. The one variable in the equation of Web findability that remains a staple is good                               
quality metadata under the hood of the Website. In this case study, a methodology is applied to                                 
the Gateway to Oklahoma History’s Website. This study can be generalized to organizations                         
looking to benchmark their own findability maturity on the Web from an image­centric                         
viewpoint.  
2. Purpose 
Image search and retrieval is a more difficult area than text search and retrieval because                             
accessibility to the image content is largely dependent on the context presented in and around the                               
image resource. The future of Semantic Web technologies relies very much on the idea that                             
organizations are fluent in structured data and have devoted resources to exposing valuable data                           
to the web. The W3C (World Wide Web Consortium) was founded over two decades ago and the                                 
widespread adoption of Schema.org and structured data on the Web did not gain traction until the                               
big four search engines (Google, Bing, Yahoo, and Yandex ) agreed that a standard was needed to                                 
pave the way forward. “On­page markup helps search engines understand the information on                         
web pages and provide richer search results. A shared markup vocabulary makes easier for                           
webmasters to decide on a markup schema and get the maximum benefit for their efforts.”                             
(Schema.org 2014) Many organizations still lag behind on their implementation of any type of                           
structured data. Structured data is only one piece of the findability algorithm. Metadata near                           
content, embedded within content, or listed in the alt text of an html document all tell machines                                 
something about the content inside of the record as well. There are no guardians of the Web,                                 
ensuring structured data is uniformly applied to all records with equal attention and care and there                               
is no standard, mandated requirement for records on the Web to provide context for image                             
 
Proc. Int’l Conf. on Dublin Core and Metadata Applications 2015 
resource findability. Most search engines do not crawl embedded XMP data or the invisible Web,                             
leaving text near images, file names or text in the alt­text in html markup as the only context for                                     
image resources. The search algorithms for image retrieval are subject to change frequently                         
(Kritzinger 2013) and additionally, social media sites and organizations strip embedded data from                         
images (Embedded Metadata Manifesto 2014). Embedded metadata provides context and                   
provenance for image resources. Even with the dramatic adoption of structured data markup                         
utilizing schema.org vocabularies, there still remains metadata opportunities on the Web. Reicks                       
recommends embedded metadata as a strategy for online findability by showcasing examples of                         
applications that parse embedded data into structured data around images on the Web such as                             
PhotoShelter and LicenseStream (2010).  
2. Research Methods 
 The following research question informed this project: What are the types and quality of 
structured data, XMP, and metadata records available for image resources appearing on the 
website? Utilizing the Structured Data Linter Tool and Phil Harvey’s ExifTool, information was 
gathered to quantify these research questions.  Image records on the Gateway to Oklahoma 
History’s website were investigated for the types, quality and quantity of embedded metadata and 
structured data.  
3.  Results 
The Gateway to Oklahoma History’s Website has a wealth of structured data and metadata                           
pertaining to its image resources. Search queries utilizing structured data markup tags and/or                         
embedded metadata yielded relevant and accurate results during a normal web search, but did not                             
yield relevant and/or accurate image resources during an image search. Descriptive filenames                       
were not used for image resources, which is an important part of image retrieval through web                               
search engines. Adding Schema.org tags to the on­page markup, to accompany the structured                         
data already present is another area for improvement. An interesting finding from this research                           
was that embedded metadata was only found on the largest, original version of the image                             
resource, and never on smaller derivative images. Structured data included in the on­page                         
mark­up included Open Graph Protocol and Dublin Core. IPTC was the primarily type of                           
embedded metadata present for the image resources.   
4. Conclusion 
The results and methodology for this research can help GLAM institutions (Galleries,                         
Libraries, Archives & Museums) by bringing awareness to the state of structured data and image                             
resource findability for cultural heritage institutions on the Web. GLAMs must be active in the                             
SEO space, support machine­readable language in the markup of their sites, and utilize                         
Schema.org vocabularies and descriptive filenames for relevancy in search engine results. The                       
Digital Library Federation, which is a program of the Council on Library and Information                           
Resources, concludes that “​Getting found means repository objects must be included in the                         
indexes of major search engines because most students and faculty now begin their research with                             
Internet search engines. Digital repositories created by libraries will be largely invisible to users if                             
their contents are not indexed in these search engines”  (Digital Library Foundation 2014).   
 
References 
Digital Library Federation.  Last accessed October 20, 2014.   “SEO for Digital Libraries.”   
     ​http://www.diglib.org/community/groups/seo­for­digital­libraries/ 
 
Proc. Int’l Conf. on Dublin Core and Metadata Applications 2015 
International Business, Times. 0006. "Bing,Google and Yahoo merge to make search easier with schema.org."                           
International Business Times, April. 
IPTC International Press Telecommunications Council, 2014. “Embedded Metadata Manifesto” Last accessed                     
November 20, 2014. ​http://www.embeddedmetadata.org/social­media­test­results.php​(Embedded Metadata         
Manifesto 2014).  
Kritzinger, W. T. "Search Engine Optimization and Pay­per­Click Marketing Strategies." Journal of Organizational                         
Computing and Electronic Commerce, no. 3 (2013): 273­86. 
Lippincott, Joan K. “Net Generation Students and Libraries,” EDUCAUSE (2005), accessed November 19, 2014,                           
http://www.educause.edu/research­and­publications/books/educating­net­generation/net­generation­students­and­lib
raries 
Reicks, David. 2010. “Why Embedded Metadata Won’t Help Your SEO,” Last Updated December 30, 2013. Last                               
Accessed November 23, 2014. ​http://www.controlledvocabulary.com/blog/embedded­metadata­wont­help­seo.html 
Schema.org. 2015.  “About Schema.org”  Last Updated Unknown.  ​https://schema.org/docs/faq.html 
 

More Related Content

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Best Practice Poster: Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluation for Improved Image Resource Findability on the Web

  • 1. Proc. Int’l Conf. on Dublin Core and Metadata Applications 2015  Best Practice Poster:   Gateway to Oklahoma History Case Study:   Structured Data and Metadata Evaluation for Improved Image  Resource Findability on the Web      Emily Ann Kolvitz  University of Oklahoma,  USA  kolvitz1@gmail.com        Keywords: structured data; metadata; Semantic Web; online search; information retrieval; image                      findability  1. Introduction  Image Resource Findability on the World Wide Web is still very much a land­grab. For the                                Semantic Web to become a reality online businesses and individuals have to get their hands dirty                                and also come face­to­face with the realization that search engine giants are increasingly                          becoming the go­to tool for information resource retrieval. “Increasingly, students use Web                        search engines such as Google to locate information resources rather than seek out library online                              catalogs or databases of scholarly journal articles” (Lippincott 2013). This puts the search engine                            giant in a unique position to dictate how the future of search will work on the Web ­ and                                      therefore, your organization’s future presence (or lack thereof) on the Web. Search Engine                          Optimization (SEO) techniques change frequently and remain much a mystery to many                        companies. The one variable in the equation of Web findability that remains a staple is good                                quality metadata under the hood of the Website. In this case study, a methodology is applied to                                  the Gateway to Oklahoma History’s Website. This study can be generalized to organizations                          looking to benchmark their own findability maturity on the Web from an image­centric                          viewpoint.   2. Purpose  Image search and retrieval is a more difficult area than text search and retrieval because                              accessibility to the image content is largely dependent on the context presented in and around the                                image resource. The future of Semantic Web technologies relies very much on the idea that                              organizations are fluent in structured data and have devoted resources to exposing valuable data                            to the web. The W3C (World Wide Web Consortium) was founded over two decades ago and the                                  widespread adoption of Schema.org and structured data on the Web did not gain traction until the                                big four search engines (Google, Bing, Yahoo, and Yandex ) agreed that a standard was needed to                                  pave the way forward. “On­page markup helps search engines understand the information on                          web pages and provide richer search results. A shared markup vocabulary makes easier for                            webmasters to decide on a markup schema and get the maximum benefit for their efforts.”                              (Schema.org 2014) Many organizations still lag behind on their implementation of any type of                            structured data. Structured data is only one piece of the findability algorithm. Metadata near                            content, embedded within content, or listed in the alt text of an html document all tell machines                                  something about the content inside of the record as well. There are no guardians of the Web,                                  ensuring structured data is uniformly applied to all records with equal attention and care and there                                is no standard, mandated requirement for records on the Web to provide context for image                               
  • 2. Proc. Int’l Conf. on Dublin Core and Metadata Applications 2015  resource findability. Most search engines do not crawl embedded XMP data or the invisible Web,                              leaving text near images, file names or text in the alt­text in html markup as the only context for                                      image resources. The search algorithms for image retrieval are subject to change frequently                          (Kritzinger 2013) and additionally, social media sites and organizations strip embedded data from                          images (Embedded Metadata Manifesto 2014). Embedded metadata provides context and                    provenance for image resources. Even with the dramatic adoption of structured data markup                          utilizing schema.org vocabularies, there still remains metadata opportunities on the Web. Reicks                        recommends embedded metadata as a strategy for online findability by showcasing examples of                          applications that parse embedded data into structured data around images on the Web such as                              PhotoShelter and LicenseStream (2010).   2. Research Methods   The following research question informed this project: What are the types and quality of  structured data, XMP, and metadata records available for image resources appearing on the  website? Utilizing the Structured Data Linter Tool and Phil Harvey’s ExifTool, information was  gathered to quantify these research questions.  Image records on the Gateway to Oklahoma  History’s website were investigated for the types, quality and quantity of embedded metadata and  structured data.   3.  Results  The Gateway to Oklahoma History’s Website has a wealth of structured data and metadata                            pertaining to its image resources. Search queries utilizing structured data markup tags and/or                          embedded metadata yielded relevant and accurate results during a normal web search, but did not                              yield relevant and/or accurate image resources during an image search. Descriptive filenames                        were not used for image resources, which is an important part of image retrieval through web                                search engines. Adding Schema.org tags to the on­page markup, to accompany the structured                          data already present is another area for improvement. An interesting finding from this research                            was that embedded metadata was only found on the largest, original version of the image                              resource, and never on smaller derivative images. Structured data included in the on­page                          mark­up included Open Graph Protocol and Dublin Core. IPTC was the primarily type of                            embedded metadata present for the image resources.    4. Conclusion  The results and methodology for this research can help GLAM institutions (Galleries,                          Libraries, Archives & Museums) by bringing awareness to the state of structured data and image                              resource findability for cultural heritage institutions on the Web. GLAMs must be active in the                              SEO space, support machine­readable language in the markup of their sites, and utilize                          Schema.org vocabularies and descriptive filenames for relevancy in search engine results. The                        Digital Library Federation, which is a program of the Council on Library and Information                            Resources, concludes that “​Getting found means repository objects must be included in the                          indexes of major search engines because most students and faculty now begin their research with                              Internet search engines. Digital repositories created by libraries will be largely invisible to users if                              their contents are not indexed in these search engines”  (Digital Library Foundation 2014).      References  Digital Library Federation.  Last accessed October 20, 2014.   “SEO for Digital Libraries.”         ​http://www.diglib.org/community/groups/seo­for­digital­libraries/   
  • 3. Proc. Int’l Conf. on Dublin Core and Metadata Applications 2015  International Business, Times. 0006. "Bing,Google and Yahoo merge to make search easier with schema.org."                            International Business Times, April.  IPTC International Press Telecommunications Council, 2014. “Embedded Metadata Manifesto” Last accessed                      November 20, 2014. ​http://www.embeddedmetadata.org/social­media­test­results.php​(Embedded Metadata          Manifesto 2014).   Kritzinger, W. T. "Search Engine Optimization and Pay­per­Click Marketing Strategies." Journal of Organizational                          Computing and Electronic Commerce, no. 3 (2013): 273­86.  Lippincott, Joan K. “Net Generation Students and Libraries,” EDUCAUSE (2005), accessed November 19, 2014,                            http://www.educause.edu/research­and­publications/books/educating­net­generation/net­generation­students­and­lib raries  Reicks, David. 2010. “Why Embedded Metadata Won’t Help Your SEO,” Last Updated December 30, 2013. Last                                Accessed November 23, 2014. ​http://www.controlledvocabulary.com/blog/embedded­metadata­wont­help­seo.html  Schema.org. 2015.  “About Schema.org”  Last Updated Unknown.  ​https://schema.org/docs/faq.html