This document discusses metadata, including what it is, why it is important, common components of metadata records, examples of metadata standards, and tips for writing good metadata. Metadata captures key details about data, such as who created it, when, how, and why, to facilitate discovering, understanding, and reusing the data. Standards provide consistency for computer interpretation and searching. Good metadata includes specific, accurate, and complete details to fully document data.
2. What is Metadata
• Explanation of metadata
• Illustrate the value and utility of metadata to data users,
data providers, and organizations
• Examine information included in a metadata record
• Examples of metadata standards and how to choose
• Preparing to write metadata
• Tips for writing a quality metadata record
CCimagebyAlecCouros
onFlickr
3. What is Metadata
After completing this lesson, the participant will be able to:
• Identify and list the types of information typically included
in metadata records for environmental datasets
• Identify 3 reasons metadata is of value to data users, data
developers, and organizations
• List 3 uses for metadata, beyond discovery of data
• Identify and describe factors that may determine which
metadata standards are most appropriate for a given
dataset
• List steps to prepare to write metadata
• Explain how to write good metadata
5. What is Metadata
Metadata is: Data ‘reporting’
• WHO created the data?
• WHAT is the content of the data?
• WHEN were the data created?
• WHERE is it geographically?
• HOW were the data developed?
• WHY were the data developed?
PhotobyMichelleChang.AllRightsReserved
6. What is Metadata
• Metadata is all around…
Author(s) Boullosa, Carmen.
Title(s) They're cows, we're pigs /
by Carmen Boullosa
Place New York : Grove Press, 1997.
Physical Descr viii, 180 p ; 22 cm.
Subject(s) Pirates Caribbean Area Fiction.
Format Fiction
CCimagebyUSDAgovonFlickr
CCimagebyMskaduonFlickr
7. What is Metadata
· Data Discovery
Metadata:
captures
information
USGS
Science Data
Catalog:
enabling
discovery
DataONE:
enables
exchange
8. What is Metadata
InformationContent
Time
· Scientific Understanding and Reuse
Time of data development
Accident
Retirement or
career change
Death
(modified from Michener et al. 1997)
Specific details
General details
9. What is Metadata
· Defending policy decisions based on data
• Regulatory decisions based on undocumented data are not
defensible
• Metadata accuracy and details are important as supporting
evidence for the science and policy
Controversies arise when metadata are incomplete and/or
absent
11. What is Metadata
Metadata allows data developers to:
• Avoid data duplication
• Share reliable information
• Publicize efforts – promote the work of a scientist and
his/her contributions to a field of study
• Metadata reuse saves time and resources in the long-run
CCimagebyUSEmbassyGuyanaonFlickr
12. What is Metadata
Metadata gives a user the ability to:
• Search, retrieve, and evaluate dataset
information from both inside and outside
an organization
• Find data: Determine what data exists for
a geographic location and/or topic
• Determine applicability: Decide if a
dataset meets a particular need
• Discover how to acquire the dataset
identified; process and use the dataset
• Understand the dataset, including
definitions of column names, or expected
numerical ranges found in the data
CCimagebyASEEonFlickr
13. What is Metadata
• Metadata helps ensure an organization’s investment
in data:
o Documentation of data processing steps, quality
control, definitions, data uses, and restrictions
o Ability to use data after initial intended purpose
o Allows organization to track data use and
facilitates publication
• Transcends people and time:
o Offers data permanence
o Creates institutional memory
• Advertises an organization’s research:
o Creates possible new partnerships and
collaborations through data sharing
CCimagebymambolonFlickr
15. What is Metadata
The descriptive content of the metadata file can be used to
identify, assess, and access available data resources.
• online access
• order process
• contacts
ACCESS
• use constraints
• access constraints
• data quality
• availability/pricing
ASSESS
• keywords
• geographic location
• time period
• attributes
IDENTIFY
16. What is Metadata
Examples of metadata search catalogs:
o DataONE
• Data discovery, knowledge, community…for a sustainable future
• https://search.dataone.org
o Data.gov
• Federal e-gov geospatial data portal
• http://www.geo.data.gov
o Metacat
• Repository for data and metadata
• http://knb.ecoinformatics.org/index.jsp
o US Geological Survey
• USGS Science Data Catalog
• http://data.usgs.gov/datacatalog
o ArcGIS Online
• ESRI sponsored national geospatial data portal
• http://www.geographynetwork.com
CCimagebyRGB12onFlickr
18. What is Metadata
• Metadata records can be used to track data provenance
accurately
• Data Maintenance:
o Are the data current?
o Are the data in a reliable format?
o Where are the data stored?
• Data Update:
o Contact information
o Distribution policies, availability, pricing, URLs
o New derivations of the dataset
19. What is Metadata
• Metadata allows you to repeat a scientific process if:
o methodologies are defined
o variables are defined
o analytical parameters are defined
• Metadata allows you to defend your
scientific process:
o demonstrate process
o increasingly data savvy public
requires metadata for consumer information
INPUT
RESULTS
20. What is Metadata
Metadata is a declaration of:
• Purpose – the originator’s intended
application of the data
• Use Constraints - inappropriate applications
of the data
• Completeness - features or geographies
excluded from the data
• Distribution Liability - explicit liability of the
data producer and assumed liability of the
consumer
What to
do…
What not to
do…
21. What is Metadata
Even if the value of data documentation is recognized,
researchers are often concerned about the effort required to
create metadata that effectively describe their data.
CCimagebywaterlilysageonFlickr
22. What is Metadata
Concern Solution
workload required to capture
accurate robust metadata
incorporate metadata creation
into data development process –
distribute the effort
time and resources to create,
manage, and maintain metadata
include in grant budget and
schedule
readability / usability of metadata
use a standardized metadata
format
discipline specific information
and ontologies
Use a standard ‘profile’ that
supports discipline specific
information
23. What is Metadata
• A Standard provides a structure to describe data with:
o Common terms to allow consistency between records
o Common definitions for easier interpretation
o Common language for ease of communication
o Common structure to quickly locate information
• In search and retrieval, standards provide:
o Documentation structure in a reliable and predictable
format for computer interpretation
o A uniform summary description of the dataset
CCimageby
ccarlsteadonFlickr
24. What is Metadata
Components of metadata:
• A metadata standard is
made up of defined
elements, including the type
of information the user
should enter (e.g. text,
numbers, date).
• Examples of elements
include Title, Abstract,
Keyword, Online Link
27. What is Metadata
• Dublin Core Element Set
o Emphasis on web resources, publications
o http://dublincore.org/documents/dces/
• FGDC Content Standard for Digital Geospatial Metadata
(CSDGM)
o Emphasis on geospatial data
o The Biological Data Profile (BDP) of the CSDGM is a profile
to the CSDGM with an emphasis on biological data (and
geospatial)
ohttps://www.fgdc.gov/metadata/csdgm-standard
• ISO 19115/19139 Geographic information – metadata
o Emphasis on geospatial data and services
o https://www.fgdc.gov/metadata/iso-standards
28. What is Metadata
• Ecological Metadata Language (EML)
o Focus on ecological data
o http://knb.ecoinformatics.org/eml_metadata_guide.html
• Darwin Core
oEmphasis on museum specimens
o http://rs.tdwg.org/dwc/index.htm
• Geography Markup Language (GML)
o Emphasis on geographic features (roads, highways,
bridges)
o http://www.opengeospatial.org/standards/gml
29. What is Metadata
Ecological Metadata Language (EML) FGDC Content Standard for Digital
Geospatial Metadata
Title Title
Abstract Abstract
Entity Description Entity Type Definition
Intellectual Rights Use Constraints
Terminology for the same concepts may vary across standards
30. What is Metadata
• Many standards collect similar information
• Factors to consider:
o Your data type:
• Are you working mainly with GIS data? Raster/vector or point data?
Do you have biological or shoreline information in your dataset?
- Consider the FGDC Content Standard for Digital Geospatial
Metadata with one of its profiles: the Biological Data Profile or the
Shoreline Data Profile.
• Are you working with data retrieved from instruments such as
monitoring stations or satellites? Are you using geospatial data
services such as applications for web-mapping applications or data
modeling?
- If so, then consider using the ISO 19115-2 standard
• Are you mainly working with ecological data?
- Consider Ecological Metadata Language (EML)
31. What is Metadata
• More Factors to consider:
o Your organization’s policies: do they state which standard to use?
o What resources are available to create metadata?
Examples of Tools:
• FGDC CSDGM:
- https://www.fgdc.gov/metadata/geospatial-metadata-
tools#availabletools
• EML:
- Morpho (http://knb.ecoinformatics.org/morphoportal.jsp)
• ISO: (http://www.fgdc.gov/metadata/iso-metadata-editor-review)
- XML Spy or Oxygen
- CatMD
o Other factors: Availability of human support; instructional materials; use of
controlled vocabularies; output formats
32. What is Metadata
Metadata are developed continuously throughout the
entire data lifecycle
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
33. What is Metadata
Consistency with commonly used fields
<publish>U.S. Geological
Survey</publish>
<publish>USGS</publish>
✔ ✗Publisher:
<pubdate>YYYYMMDD</pubdate>
<pubdate>YYYY</pubdate>
<pubdate>MM/DD/YYYY</pubdate>
<pubdate>May 27, 2003</pubdate>
Date:
Examples for a FGDC CSDGM record:
<placekt>Geographic Names Information
System</placekt>
<placekey>Roosevelt National
Forest</placekey>
<themekey>Roosevelt Forest</themekey>
Keywords:
34. What is Metadata
Use Authority Files and Standard Vocabulary
Photo by mxgirl2014 on flickr
Global Change Master Directory
Geographic Names Information System
Getty Thesaurus of Geographic Names
ISO 19115 Topic Category Thesaurus
✗
35. What is Metadata
Acronyms
Spell out acronyms with first
use. Many acronyms have
multiple meanings (e.g., DOI)
Use widely known acronyms
only when it corresponds to
specific metadata fields such as
file formats (e.g.,TIFF, JPEG,
PDF)
36. What is Metadata
Provide all of the critical information for discovery, understanding,
and reuse:
• Identification Information
• Entities & Attributes
• Data Quality
• Access, Use & Liability Constraints
• Distribution
• Spatial References
41. What is Metadata
Provide all of the critical information for: Access, Use, & Liability
Constraints
Access Constraints: restrictions and legal prerequisites for access the data.
Use Constraints: restrictions and legal prerequisites for using the data after access is granted.
Example: Use_Constraints:
Users are free to use, copy, distribute, transmit, and adapt the work for commercial and non-
commercial purposes, without restriction, as long as clear attribution of the source is provided.
Distribution Liability: statement of the liability assumed by the distributor with respect to
content and accuracy of the data.
Example: Distribution_Liability:
Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality
standards relative to the purpose for which the data were collected. Although these data and associated
metadata have been reviewed for accuracy and completeness and approved for release by the U.S.
Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the
data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any
such warranty.
44. What is Metadata
1. Organize your information
• Did you write a project abstract to obtain funding for your proposal? Re-
use it in your metadata!
• Did you use a lab notebook or other notes during the data development
process that define measurements and other parameters?
• Do you have the contact information for colleagues you worked with?
• What about citations for other data sources you used in your project?
2. Write your metadata using a metadata tool
3. Review for accuracy and completeness
4. Have someone else read your record
5. Revise the record, based on comments from your reviewer
6. Review once more before you publish
45. What is Metadata
Titles, Titles, Titles…
• Titles are critical in helping readers find your data
o While individuals are searching for the most appropriate
datasets, they are most likely going to use the title as the
first criteria to determine if a dataset meets their needs.
o Treat the title as the opportunity to sell your dataset.
• A complete title includes: What, Where, When, Who, and
Scale
• An informative title includes: topic, timeliness of the data,
specific information about place and geography
46. What is Metadata
A Clear Choice: Which title is better?
• Rivers
OR
• Greater Yellowstone Rivers from 1:126,700 U.S. Forest
Service Visitor Maps (1961-1983)
Greater Yellowstone (where) Rivers (what) from 1:126,700
(scale) U.S. Forest Service (who) Visitor Maps (1961-1983)
(when)
CCimagebydolfion
Flickr
47. What is Metadata
• Be specific and quantify when you can! The goal of a
metadata record is to give the user enough information to
know if they can use the data without contacting the
dataset owner.
Vague: We checked our work and it
looks complete.
Specific: We checked our work
using a random sample of 5 monitoring
sites reviewed by 2 different people. We
determined our work to be 95%
complete based on these visual
inspections.
CCimagebyPNASHonFlickr
48. What is Metadata
• Use descriptive and clear writing
• Fully document geographic locations
• Select keywords wisely
• Use thesauri for keywords whenever possible
• Be detailed: there’s no such thing as too much metadata!
CCimagebyMarcoArmentonFlickr
49. What is Metadata
• Remember: a computer will read your metadata
• Do not use symbols that could be misinterpreted by
software: Examples: ! @ # % { } | / < > ~
• Don’t use tabs, indents, or line feeds/carriage returns
• When copying and pasting from other sources, use a text
editor (e.g., Notepad) to eliminate hidden characters
50. What is Metadata
• Metadata is documentation of data
• A metadata record captures critical information about the
content of a dataset
• Metadata allows data to be discovered, accessed, and re-
used
• A metadata standard provides structure and consistency to
data documentation
• Standards and tools vary – select according to defined
criteria such as data type, organizational guidance, and
available resources
• Metadata is of critical importance to data developers, data
users, and organizations
• Metadata completes a dataset.
51. What is Metadata
· Federal Policies:
o Executive Order 12906
o M-13-13 Open Data Policy
· Data Catalogs:
o DataONE
o USGS Science Data Catalog
o Data.gov
More about CSDGM & ISO 19115:
o FGDC Geospatial Metadata Website
· Metadata Tools:
o Metadata Wizard✓
o TKME✓*
o CatMDEdit
o GRIIDC Metadata Editor✓
o ArcGIS 10.2✓
by J B on flickr
· Standard Vocabularies
· USGS Thesaurus
· Global Change Master Directory
· Geographic Names Information System
· Getty Thesaurus of Geographic Names Supports Biological Data Profile
* Supports Biological Data Profile, Shoreline Profile, and
Remote Sensing Extension
✓ Provides validation
52. What is Metadata
The full slide deck may be downloaded from:
http://www.dataone.org/education-modules
Suggested citation:
DataONE Education Module: Metadata. DataONE. Retrieved
Nov12, 2016. From
http://www.dataone.org/sites/all/documents/L07_Metadata.pptx
Copyright license information:
No rights reserved; you may enhance and reuse for
your own purposes. We do ask that you provide
appropriate citation and attribution to DataONE.
Hinweis der Redaktion
In this segment of the course we will cover:What is metadata?What are examples of metadata in our daily lives? And what information needs to be included in a metadata record?
Data collection in the field is recorded in a wide variety of ways, including field notebooks, streaming data from satellites, data created from models, etc
After returning from the field, scientists will transfer field notes into spreadsheets and other types of databases in preparation for their data analysis. Displayed here is a partial copy of a data set taken from the website “Frog Watch”. Notice there is no indication of Celsius or Fahrenheit in the “temperature” column. This is a simple example of how it is difficult to understand a dataset without all of the information.
Once scientists have collected and analyzed data, they publish their conclusions in appropriate science journals.
A dataset is a collection of data. Often datasets are considered spatial or tabular. However, many tabular datasets are inherently spatial – they represent spatial information. There are a variety of elements that can be found in a dataset including values, measures, points, conditions, qualities, frequencies or attributes.
If you were to share your data, what type of information would be most useful to understand the data set?Alternatively, whenreceiving data from an external source, what information is needed to understand the data set? Metadata containsinformation about the dataset that allows it to be understood when shared amongst scientists.
When sharing data, some considerations include: - why the data was created; - what limitations, if any, the data have;. - what the data means; and who should be cited if someone publishes something that utilized the data.When receiving data from an alternative source, consider: What are the data gaps?What processes were used for creating the current data?Are there any fees associated with the data?In what scale were the data created? What do the values in the tables mean?What software do I need in order to read the data?What projection is the data in?Can I give this data to someone else?Metadata contain information about a data set, in a standardized format, such that it can be understood and re-used.
Metadata is data about data. It describes the content, quality, condition, and other characteristics of a dataset. Metadata records answer questions such as: Why was the data set created? What processes were used to create the data set? What projection is the data in? When was the data last updated? Who created the data? What scale was used? What fields are in the table? What do the values in those fields mean? Who do I contact about getting more information about the data? How do I obtain a copy of the data? Do the data cost anything? Are there any limitations to the data?Metadata is a valuable tool. Metadata records preserve the usefulness of data over time by detailing methods for data collection and data set creation. Metadata greatly minimizes duplication of effort in the collection of expensive digital data and fosters the sharing of digital data resources.
Metadata is all around us. . .from Mp3 players, to nutrition labels, to library card catalogues.For example, a card catalogue tell us more information than just the title of the book, they also tells the user: Who is the author? Who published the book? What subject area does the book fall in? And finally, where is it located in the library? Another example of metadata that we see in our daily lives is the nutrition and ingredient information on food labels.Nutrition labels answer questions such as: What ingredients were used? Who made the food? How many calories per serving? How many servings in the can? What percentage of daily vitamins are in each serving?
An established standard provides common terms,definitions and structure that allow for consistent communication. The use of standards also support search and retrieval in automated systems.
This is an example of a metadata record using the Federal Geographic Data Committee (FGDC) standard.
Metadata is useful to Data Users, Data Developers, and Organizations. In this era of data sharing, collaboration, and need for information organization, metadata can serve multiple purposes.
Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describes the data.
Metadata does require time and effort to create. The workload, however, is reduced when metadata creation is incorporated into the data development process and the effort is distributed among data contributors. Metadata creation and management should be treated as a standard data development procedure and resources for staff and time should be included in project and proposal work plans and budgets. The use of a standardized metadata format and the development of discipline specific ‘profiles’ of metadata can enable data users to quickly find needed information and address data developer concerns about metadata use and comprehension.
What value does metadata have to Data Developers?Metadata records will help avoid data duplication because researchers can determine if data already exists. Scientists are able to share reliable information about a dataset by creating metadata and passing it along with the dataset. Scientists wishing to reuse a dataset can be confident of its origins, data quality, and other valuable information about the data. Metadata also allow data creators to publicize the valuable data they have collected by making the metadata available on clearinghouses and other publically available venues. Metadata can be used in citation practices, thus increasing the visibility of the data.
Metadata allows the user to search for and access data from a variety of sources. A search for metadata can be constricted to a geographic boundary, thus showing the user what data has been collected in a particular region. Metadata records help users determine whether the data will be applicable for use in a particular study. Finally, metadata records are of value to data users, because they determine how a dataset can be acquired, and if there are any restrictions on how the data can be used.
An organization that keeps current metadata can benefit in many ways. Metadata records help ensure the organization’s investment in the data by retaining information about how the data was collected, processed, and quality controlled. This creates a permanent record of the dataset –which is critical institutional memory. When researchers leave or retire, metadata allows the dataset to “live on” for the organization. The data may be reused in another research project in the future, and future researchers in the organization will need to know how the dataset was created. Finally, metadata advertises an organization’s research, creating new potential partnerships and collaboration thru data sharing.
This graph illustrates the phenomenon of “information entropy”, associated with research. At the time of the research project, a scientists memory is fresh. Details about the development of the dataset are easily recalled, and it is a good time to document information about the process. Over time, memory of the details begins to fade. A variety of circumstances can intervene, and eventually detailed knowledge about the dataset fades. Without a metadata record, this data might be unusable. A dataset it not considered complete without a metadata record to accompany it.
Sound data management is best achieved with making metadata creation a part of the workflow. Not only can it keep the individual scientist organized, but the data has a much better chance of being re-used by future scientists.
Metadata is very beneficial because it can be used to support data distribution, data management, and project management. To be best utilized, metadata should be considered a component of the data, created during the development of the data, and populated with rich content.
Metadata supports data distribution through discovery, publication, and data portals.
Metadata serves data discovery at multiple levels:initial identification by query of keywords, location, time, and attributes a quick assessment can be made by the scientist as to how useful the data is for a project by reading the access and use constraints; data quality measures of positional and attribute accuracy and sources used; and statements as to data availability, format and pricing a user can find out how to access the by reading access instructions, any standard order process instructions, and contact information for the dataset.
A collection of metadata records can be published to the web in a variety of formats including:A website catalogcreation of a web accessible folder (waf) that is later havested a Z39.50 clearinghouse metadata services such as ESRI ArcIMS Metadata Service a geospatial data portal
Geospatial data portals are plentiful, and contain easily accessible metadata collections from a variety of institutions.
The USGS Core Science Metadata Clearinghouse is an example of a metadata repository, available to all researchers.
Metadata supports data management in a variety of ways because metadata records can be used to assist data maintenance and update, accountability, data liability, and discovery and reuse.
For data management, metadata records can be queried to determine:do we have data older than 10 years?do we have data that was before some political or geophysical event resulted in significant change?do we have data that used some older or now invalid data as a source?do we have data that used older or now invalid methods?Global edits to contacts, policies, URLS, and information about new derivations of the data set can be included in metadata records, thus assisting the data management process.
It is important to note that metadata is not only useful for those trying to find and reuse good data, it is also very useful to the researcher in managing his/her own data.
If you create robust metadata, you can use information contained in the metadata to locate and re-use data. The metadata contains information about themes, geographic location, time ranges, analytical methods, sources, and data quality.
Metadata allows you to repeat a scientific process if methodologies, variables, and analytical parameters are well defined. It allows you to defend and demonstrate scientific process. A defensible process enables you to demonstrate the methodologies that led to decisions using the data. Increasingly the savvy public demand metadata with datasets for consumer information purposes.
Metadata creation requires that you are accountable for your data and can document everything you know and do not know about your dataset.
Metadata is invaluable for data liability. For example, a record will indicate the purpose – why the data was collected, any use constraints that are associated with the data, how complete the dataset is, and who is liable once the data is distributed and reused.
Metadata records support project management activities in a variety of ways including project planning, monitoring, coordination, and deliverables.
Metadata can serve as a project design document that establishes the project’s intent, extents, suggested source data resources, and database design. By doing so, project expectations are clearly outlined, metadata is immediately integrated into the process and the record serves as a medium to record project progress.
If metadata is created during the course of the research, the record can be used to monitor: where the data set is in the data development process any problems associated with data quality that should be addressed prior to further development any problems with proposed methods and/or sources that will require a change in approach to data analysis.
Metadata can be a means to improve communications among project participants.A project metadata template can be developed for to establish descriptors, parameters (time, geography, species, etc.), vocabularies, contact info, entity/attributes and distribution information.If actively used by the team, the metadata can be a source for identifying and tracking the use of new source data, analytical methods, and adding in information pertinent to the project used by other project participants.
Metadata should be specified as a component of any data deliverable. If a part of a project is contracted out, include metadata as a deliverable in the contract. Be sure to specify the standard required and the level of completion required in the record itself. It is not enough to say ‘compliant metadata’ as you will likely get minimal information in the record. Instead, provide clear guidance and, preferably, a sample record that provides core info (Who should be named as Originator, what liability and use constraints statements to use) and illustrates the level of detail expected.
There are many standards available to document data. Each has a different focus, yet ask for similar information about the data set.
There are many standards available to document data. Each has a different focus, yet ask for similar information about the data set.
More Factors to consider: Your organization’s policies: do they state which standard to use? What resources are available to create metadata? Examples of Tools:FGDC CSDGM: Mermaid (NOAA) http://www.ncddc.noaa.gov/metadata-standards/mermaid/Metavist (Forest Service) http://ncrs.fs.fed.us/pubs/viewpub.asp?key=2737TKME (USGS) http://geology.usgs.gov/tools/metadata/tools/doc/tkme.html EML: Morpho (http://knb.ecoinformatics.org/morphoportal.jsp)ISO: (http://www.fgdc.gov/metadata/iso-metadata-editor-review) XML Spy or OxegynCatMDOther factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats
Metadata is documentation of dataA metadata record captures critical information about the content of a datasetMetadata allows data to be discovered, accessed, and re-usedA metadata standard provides structure and consistency to data documentationStandards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resourcesMetadata is of critical importance to data developers, data users, and organizationsMetadata can be effectively used for:data distributiondata managementproject managementMetadata completes a dataset.