1. Where did you hear that? Information and the Sources They Come From James P. McCusker, 1 Timothy Lebo, 1 Li Ding, 1 Cynthia Chang, 1 Paulo Pinheiro da Silva, 2 and Deborah L. McGuinness 1 1 Tetherless World Constellation 2 CyberShARE Center, University of Texas at El Paso Linked Science 2011, Bonn, Germany
10. Questions? Thanks! Also, come to SemantAqua demo on Tues and talk on Wed aft. 5:30 The Tetherless World Constellation is partially funded by DARPA, U.S. Department of Energy, Fujitsu, LGS, Lockheed Martin, Microsoft Research, NASA, National Ecological Observatory Network (NEON), the National Science Foundation, Qualcomm, and the Woods Hole Oceonographic Institution (WHOI). This research was partially funded by the National Science Foundation under CREST Grant No. HRD-0734825.
Challenges: It may not be easy to characterize the piece of information in larger information containers. Information sources are often sources of other pieces of information. Assertion of the piece of information is a point-time event that occurs during the life-time of the information source. Not any assertion event is a valid assertion. Valid assertions occur: During the lifespan of its source(s) In places where the sources are located. Before publishing a result, scientists need to check their facts. We stand on the shoulders of giants, but as we push forward in science, we need to make sure that we aren't standing on a giant house of cards. Knowing how, when and where your data comes from is critical for good science, and it's even more critical for linked science, where it isn't immediately clear where a database record or knowledge assertion came from. Sources of information become critical to evaluate information quality. It is difficult if not impossible to assess the trust of information, or to encode it as knowledge, without having a link between information and their sources. For example, one may want to know if the information came from a source such as the New York Times, and further, it may be useful to know the date, edition, page, and exact text fragment where the information was asserted. There are many challenges in the task of assigning a source to a piece of information. First, it may not be easy to characterize the piece of information in larger information containers (databases, printed documents, web documents, documents that require parameters from a system to be retrieved, etc). Second, the source of a piece of information is often a source of other pieces of information and should be referenced by an identifier and characterized elsewhere. Third, the assertion of the piece of information is a point-time event that occurs during the life-time of the information source. Thus, not any assertion event is a valid assertion: it needs to occur during the lifespan of its source(s) and in places where the sources are located. Those are all critical conditions that need to be properly captured in provenance languages if one wants to make proper use of source information in combination with linked data.
Example of other abstractions: Cell Lines vs Cell Line Colonies TODO: Add FRBR Example of PML Primer
Functional Requirements for Bibliographic References
A more generalized way to express Information and sources. Copy Events and Subset Events are used to describe accession (copy event) and quoting (subset event). The derivation of the quote from the content retrieved from the source is explicit. Quotes can happen at the Expression level (same offset in the text of an HTML as in plain text as in a Word Doc), versus at the Manifestation level. URL is Work, multiple URLs can be identified as the same Work if appropriate. Expressions can be seen to be the same if they have the same content. We’ve gathered up content digest algorithms for RDF graphs, tabular data, XML trees, and images. Others and domain-specific content digests are welcome. Manifestations are the same if the message digest (think MD5 or SHA-1) is the same. Items might be the same if they are actually the same physical copy.
Same stack of Work, Expression, Manifestation, Item. Multiple Works are redirected from one to another. The HTTP response (hash:Item/SHA256-DDD) is what the file on disk (filed://EEE/SHA356-FFF/us_economic_assistance.csv) is derived from. Tools in csv2rdf4lod create raw conversions of the spreadsheet to RDF (filed://GGG/SHA356-HHH/us_economic_assistance.csv.raw.ttl) which we then give a signature using a standalone graph hash tool (frbrstack.py), which confirms that even though the manifestations are different (BBB vs. CCC) the Expressions remain the same. Since frbrstack.py doesn’t know the original Work like pcurl.py does, it generates a new one. However, it is reasonable to assume that, in a case like this, where two items that have different manifestations but the same expressions AND there is knowledge that one Item is derived from another, that the Work remains the same. This may not be true for artwork, but is definitely true for information resources.