SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Some notes on data
   Tom Moritz
  IAMSLIC 2012

  Anchorage, Alaska
    August, 2012
Libraries in the 21st Century must be
     responsible for all types of
   knowledge resources… And for
     understanding the complex
   development, interactions and
   disposition of such resources…
First a brief digression focusing on the
                 common…
Ethos of Science and Libraries
“Declaration of Scientific Principles”
                          in
          “The Commonwealth of Science”

“7. The pursuit of scientific inquiry demands
  complete intellectual freedom. And
  unrestricted international exchange of
  knowledge…“

      from “The Commonwealth of Science ” Nature No.3753 October 4,
      1941.
“The substantive findings of science are a product of social
        collaboration and are assigned to the community. They
       constitute a common heritage in which the equity of the
               individual producer is severely limited…”

 “The scientist’s claim to “his” intellectual “property” is limited to
       that of recognition and esteem which, if the institution
          functions with a modicum of efficiency, is roughly
  commensurate with the significance of the increments brought
                 to the common fund of knowledge.”



Robert K. Merton, “A Note on Science and Democarcy,” Journal of Law and Political
Sociology 1 (1942): 121.
“Factual data are fundamental to the progress of science
           and to our preeminent system of innovation. Freedom
          of inquiry, the open availability of scientific data, and full
               disclosure of results through publication are the
          cornerstones of basic research, which both domestic law
              and the norms of public science have long upheld.”




J.H. Reichman and P.F Uhlir. “A contractually reconstructed research commons for scientific data in a highly
protectionist intellectual property environment.” in The Public Domain. J.Boyle, ed.Durham, NC: schoolo of Law,
Duke University. (Law and Contemporary Problems, Vol.66 nos 1&2 ) 2003
“Public research is largely an open, communitarian, and cooperative system.
        It is founded on freedom of inquiry,sharing of data and full
     disclosure of results by scientistswhose motivations are rooted
   primarily in intellectual curiosity, the desire to influence the thinking of
            others about the natural world, peer recognition for their
                achievements, and promotion of the public interest.

“Although this normative and value structure of public science predated the
    revolution in digitally networked technologies, it makes it ideally suited
            to experiment with and exploit those new technological
    capabilities,which themselves facilitate open, distributed and
                     cooperative uses of information.”




P.F. Uhlir. “Re-intermediation in the Republic of Science: Moving from
IntellectualProperty to Intellectual Commons.” Information Services and Use
23(2/3) 63-66. 2003
The erosion of the ethic of data sharing:
                          “Could you patent the sun? “

    In a 1954 interview with Edward R Murrow, Jonas Salk
       responded to a question suggesting the patenting of
       the polio vaccine : “Could you patent the sun?”
    and then ca 50 years later

    In a 2002 study, 47% of surveyed geneticists had been
       rejected at least once in their efforts to gain access to
       key genetics data (this result indicated a significant
       increase over a previous survey).

                       EG Campbell et al. “Data Withholding in Academic Genetics: Evidence From a National Survey”
     JAMA, Jan 2002; 287: 473 – 480; Massachusetts General Hospital (2006). “Studies examine withholding of scientific data among
researchers, trainees: Relationships with industry, competitive environments associated with research secrecy.” News release (January 25).
                   Massachusetts General Hospital. http://www.massgeneral.org/news/releases/012506campbell.html,
                                                         as of November 17, 2008.
Cascading resources…?
The Science Commons:
   “Protocol for Implementing Open Access Data”


“…it is conceivable that in 20 years, a complex
  semantic query across tens of thousands of data
  records across the web might return a result
  which itself populates a new database. If
  intellectual property rights are involved, that
  query might well trigger requirements carrying
  a stiff penalty for failure, including such
  problems as a copyright infringement lawsuit.”
   http://sciencecommons.org/projects/publishing/open-access-data-protocol/
Complex knowledge resources support research




                      Research Information Network and British Library
  “Patterns of information use and exchange: case studies of researchers in the life sciences”
http://www.rin.ac.uk/system/files/attachments/Patterns_information_use-REPORT_Nov09.pdf
Linked Open Data

                                                             2009
                                                               2011




              Courtesy of Tim Lebo, RPI http://bit.ly/lebo-ipaw-
20 Jun 2012                                2012
                        @timrdf http://bit.ly/lebo-ipaw-2012          12
And a tension exists between the great potential of dynamically
    linked data and the fear of legal liability by infringement of
                  conventional IPR claims… But…


“Progress in modern technology, combined with a legal system
        that was crafted for the analog era, is now having
    unintended consequences. One of these is a kind of legal
    "friction" that hinders the reuse of knowledge and slows
                           innovation.”



         From: “Science Commons” by M. McGeever, University of Edinburgh
http://www.dcc.ac.uk/resources/briefing-papers/legal-watch-papers/science-commons#2
The Scientific Environment…
Research Commons
The Public Domain

“The institutional
ecology of the
digital                                                                                                         Knowledge
environment”                                                                                                    Commons
(Yokai Benkler)

Sectors (public < -
> private) and
Jurisdictional Scale




THE ROLE OF SCIENTIFIC AND TECHNICAL DATA AND INFORMATION IN THE PUBLIC DOMAIN PROCEEDINGS OF A SYMPOSIUM Julie M. Esanu
and Paul F. Uhlir, Editors Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of
International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division,
National Research Council of the National Academies, p. 5
The “Ecology” of Science

 GRIDS




 Data
                                   International
 Centers
                                   Collaborative
                                   Research Effort



Individual
                    National Disciplinary Initiatives
Libraries

              Cooperative Projects

Local /
             Individuals
Personal
Archiving

              “Small Science”                           “BIG Science”
The “small science,” independent investigator approach traditionally has
characterized a large area of experimental laboratory sciences, such as
chemistry or biomedical research, and field work and studies, such as
biodiversity, ecology, microbiology, soil science, and anthropology. The
data or samples are collected and analyzed independently, and the
resulting data sets from such studies generally are heterogeneous and
unstandardized, with few of the individual data holdings deposited in
public data repositories or openly shared.
The data exist in various twilight states of accessibility, depending
on the extent to which they are published, discussed in papers but not
revealed, or just known about because of reputation or ongoing work,
but kept under absolute or relative secrecy. The data are thus
disaggregated components of an incipient network that is only as
effective as the individual transactions that put it together.
Openness and sharing are not ignored, but they are not necessarily
dominant either. These values must compete with strategic
considerations of self-interest, secrecy, and the logic of mutually
beneficial exchange, particularly in areas of research in which
commercial applications are more readily identifiable.
The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Julie M. Esanu and Paul
F. Uhlir, Eds. Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of
International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs
Division, National Research Council of the National Academies, p. 8
The “Economy” of Scientific Knowledge?




                                                          ???

Julian Birkinshaw and Tony Sheehan, “Managing the Knowledge Life Cycle,”
            MIT Sloan Management Review, 44 (2) Fall, 2002: 77.
“Data” ?
               [technical – “bits & bytes” -- definition]

“…’data’ are defined as any information that can be stored in
  digital form and accessed electronically, including, but not
  limited to, numeric data, text, publications, sensor streams,
  video, audio, algorithms, software, models and simulations,
  images, etc.”-- Program Solicitation 07-601
  “Sustainable Digital Data Preservation and Access Network Partners (DataNet)”



Taken in this broadest possible sense, “data” are thus simply
   electronic coded forms of information. And virtually anything
   can be represented as “data” so long as it is electronically
  machine-readable.
“The digital universe in 2007 — at 2.25 x 1021bits (281 exabytes
       or 281 billion gigabytes) — was 10% bigger than we thought.
       The resizing comes as a result of faster growth in
       cameras, digital TV shipments, and better understanding of
       information replication.
    “By 2011, the digital universe will be 10 times the size it was in
       2006.
    “As forecast, the amount of information created, captured, or
       replicated exceeded available storage for the first time in
       2007. Not all information created and transmitted gets
       stored, but by 2011, almost half of the digital universe will not
       have a permanent home.
    “Fast-growing corners of the digital universe include those
       related to digital TV, surveillance cameras, Internet access in
       emerging countries, sensor-based applications, datacenters
       supporting “cloud computing,” and social networks.
The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth through 2011 -- Executive Summary.
IDC Information and Data, March, 2008 http://www.emc.com/collateral/analyst-reports/diverse-exploding-idc-exec-summary.pdf
“As you go down the Long Tail the signal-to-noise ratio gets worse. Thus
the only way you can maintain a consistently good enough signal to find
what you want is if your filters get increasingly powerful.”
          Chris Anderson “Is the Long Tail full of crap?” May 22, 2005


http://longtail.typepad.com/the_long_tail/2005/05/isnt_the_long_t.html
“Note that you have high-quality goods in every part of the curve, from top to bottom.
Yes, there are more low-quality goods in the tail and the average level of quality declines
as you go down the curve. But with good filters averages don't matter. It's all about the
diamonds, not the rough, and diamonds can be found anywhere.”
                        Chris Anderson “Is the Long Tail full of crap?” May 22, 2005
    http://longtail.typepad.com/the_long_tail/2005/05/isnt_the_long_t.html
“Data” [epistemic definition]
“Measurements, observations or descriptions of
 a referent -- such as an individual, an event, a
 specimen in a collection or an
 excavated/surveyed object -- created or
 collected through human interpretation
 (whether directly “by hand” or through the use
 of technologies)”
                 -- AnthroDPA Working Group on Metadata (May, 2009)
“A Letter from George Lynn, Esq; To Ja. Jurin, M. D. F.
R. S. Containing Some Remarks on the Weather, and
Accompanying Three Synoptical Tables of
Meteorological Observations for 14 Years, viz. from
1726 to 1739. Both Inclusive” (January 1, 1753)
The Philosophical Transactions of the Royal
Society (Phil. Trans.) V. 41
 http://archive.org/details/philtrans00658288
New capacity for historical (“longitudinal”) studies: ICOADS
                            Marine Data Rescue




Scott Wodruff et al. “ICOADS Marine Data Rescue: Status and Future CDMP Priorities”:
http://icoads.noaa.gov/reclaim/pdf/marine-data-rescue_v15.pdf
NCAR Research Data Archive (RDA)




C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
          sharing ,” from the 4th International Digital Curation Conference December 2008 , page 7. www.dcc.ac.uk/events/dcc-
                      2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
2-d_soil_temps.csv
               surface, and sub-surface soil temperatures (at 2cm and 8cm depths) measured at one location for a few days in order to
                        calibrate a model of temperature propagation. Surface temperature was measured with an infrared
                        thermometer, subsurface temperatures with a thermocouple.
               ----------------------------
               5-minute_light_data_for_4_continuous_days_plus_reference.xls
               PPF (photosynthetic photon flux = photosynthetically active radiation 400-700nm) measured with an array of photodiodes
                        calibrated to a Licor sensor, along a linear transect for a few days. used to get an idea of how much light plants along
                        the transect are receiving.
               ----------------------------


  DATA         CO2_of_air_at_different_heights_July_9.xls
               concentration of CO2 in the air during the evening for one day, measured with a Licor infrared gas analyzer and a series of
                        relays and tubes with a pump. used to examine the gradient of CO2 coming from the soil when the air is still during the
                        evening.
               ----------------------------

  SETS         Fern_light_response.xls
               Light response curves for bracken ferns, measured with a Licor photosynthesis system. Fronds are exposed to different light
                        levels and their instantaneous photosynthesis and conductance is measured. used in conjunction with the induction
                        data (below) for physiological characterization of the ferns.
               ----------------------------
               La_Selva_species_photosyntheis_table.xls
               incomplete data set on instantaneous photosynthesis rates for various tropical understory and epiphytic species grown in a
                        shade house in Costa Rica.
               ----------------------------

   some        manzanita_sapflow_12-5-07_to_7-7-08.xls
               instantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple
                        branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground
                        measures of root grown and CO2 production.
 examples      ----------------------------
               moisture_release_curves.xls
               percentage of water content, water potential (in MegaPascals) and temperature of soil samples, measured in the laboratory
with “native            for calibration of water content with water potential. soil is from the James Reserve in California.
               ----------------------------
               Photosynthetic_induction.xls

metadata”      a time-course of photosynthetic induction for a leaf over 35 minutes. instantaneous photosynthesis measured as �mol CO2
                        m/2/s and light level is probably 1000 micromoles. used to determine physiological characteristics of bracken ferns.
               ----------------------------
               run_2_24-h_data_for_mesh.xls
               measurements of micrometeorological parameters on a moving shuttle, going from a clearing across a forest edge and into
                        the forest for about 30 meters. Pyronometers facing up and down, pyrgeometer facing up and down, PAR, air
                        temperature, relative humidity. Also data from a station fixed in the clearing and some derived variables calculated.
                        used for examining edge effects in forests.
               ----------------------------
               Segment_of_wallflower_compare_colorspaces_blur.xls
               pixel counts from images of wallflowers that were segmented into flower/not-flower under different color spaces.
                        segmentation was made using a probability matrix of hand-segmented images. used to automatically count flowers in
                        images collected after this training data was collected (and used to determine the best color space for this task).
manzanita_sapflow_12-5-07_to_7-7-08.xls
instantaneous sap flow data (as temperature differences on a constant temperature heat
dissipation probe) for multiple branches of Manzanita, collected with a datalogger.  used to
correlate physiological activity with below-ground measures of root grown and CO2 production.



sbid battery datetime heater_voltage Manz1Sap1 Manz1Sap2 Manz1Sap3 Manz1Sap4 Manz2Sap5 Manz2Sap6 Manz2Sap7 Manz3Sap10 Manz3Sap8 Manz3Sap9 Manz4Sap11 timestamp Datagap Julian




2         12.365    1196796112          2018.8    0.5585    0.51029   0.55517   0.54354   0.6067     0.52858   0.55351   0.59008   0.59506   0.60337   0.56514   12/4/07 11:21       4.47351
3         12.348    1196796232          2017.9    0.55682   0.51028   0.5535    0.54352   0.60669    0.52857   0.55017   0.59007   0.59505   0.60336   0.56513   12/4/07 11:23   0   4.47490
4         12.357    1196796352          2018.6    0.55514   0.51027   0.55348   0.54351   0.60501    0.52855   0.55016   0.59005   0.59504   0.60501   0.56512   12/4/07 11:25   0   4.47628
5         12.354    1196796472          2017.6    0.55514   0.51026   0.55181   0.5435    0.60334    0.52855   0.54849   0.59004   0.59503   0.60334   0.56511   12/4/07 11:27   0   4.47767
6         12.334    1196796592          2018.3    0.55347   0.51026   0.55015   0.5435    0.60333    0.52854   0.54682   0.59004   0.59502   0.605     0.56511   12/4/07 11:29   0   4.47906
7         12.34     1196796712          2018.5    0.55014   0.50859   0.55014   0.54349   0.60332    0.53019   0.54349   0.59003   0.59501   0.60498   0.56676   12/4/07 11:31   0   4.48045
8         12.337    1196796832          2017.8    0.55013   0.50692   0.55013   0.54348   0.60332    0.53019   0.54182   0.59002   0.59501   0.60498   0.56675   12/4/07 11:33   0   4.48184
9         12.328    1196796952          2017.5    0.5468    0.50691   0.5468    0.54347   0.60331    0.53018   0.53849   0.59001   0.595     0.60497   0.56674   12/4/07 11:35   0   4.48323
10        12.323    1196797072          2017      0.54679   0.50524   0.54679   0.54347   0.59998    0.53017   0.53682   0.59      0.59499   0.60496   0.56674   12/4/07 11:37   0   4.48462
11        12.328    1196797192          2018.9    0.54679   0.50191   0.54512   0.5418    0.59665    0.53017   0.53349   0.59      0.59498   0.60496   0.56673   12/4/07 11:39   0   4.48601
12        12.319    1196797312          2017.7    0.54345   0.49857   0.54178   0.54178   0.59663    0.53015   0.53015   0.58998   0.5933    0.60327   0.56671   12/4/07 11:41   0   4.48740
13        12.311    1196797432          2017.3    0.54343   0.4969    0.54011   0.54177   0.59661    0.53014   0.52848   0.58997   0.59329   0.6016    0.5667    12/4/07 11:43   0   4.48878
14        12.316    1196797552          2018.6    0.5401    0.49357   0.53678   0.54176   0.59328    0.53013   0.5268    0.58995   0.59328   0.60325   0.56669   12/4/07 11:45   0   4.49017
15        12.31     1196797672          2016.8    0.53844   0.4919    0.53511   0.54176   0.59494    0.53013   0.52514   0.58995   0.59328   0.60325   0.56503   12/4/07 11:47   0   4.49156
16        12.31     1196797792          2017.1    0.53676   0.48856   0.53343   0.54174   0.59326    0.53011   0.5218    0.58993   0.59326   0.60323   0.56501   12/4/07 11:49   0   4.49295
17        12.31     1196797912          2017.1    0.53342   0.48523   0.5301    0.54173   0.59324    0.5301    0.51846   0.58826   0.59324   0.60321   0.56499   12/4/07 11:51   0   4.49434
18        12.301    1196798031          2017.5    0.53174   0.48521   0.52842   0.53839   0.59156    0.53008   0.51845   0.58824   0.59323   0.6032    0.56498   12/4/07 11:53   0   4.49573
19        12.301    1196798151          2016.3    0.53007   0.48188   0.52509   0.53838   0.59155    0.53007   0.51512   0.58823   0.59321   0.60152   0.5633    12/4/07 11:55   0   4.49712
20        12.303    1196798271          2016.6    0.5284    0.47855   0.52175   0.53837   0.59154    0.5284    0.5151    0.58821   0.59154   0.60151   0.56163   12/4/07 11:57   0   4.49851




                                                                                   Datum: “0.59998”
Full Life-cycle
Management of Data?
US NSF “DataNet” Program
            “the full data preservation and access lifecycle”

      •   “acquisition”
      •   “documentation”
      •   “protection”
      •   “access”
      •   “analysis and dissemination”
      •   “migration”
      •   “disposition”
“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07-
 601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & Information
                                           Science & Engineering
http://wiki.esipfed.org/images/c/c4/IWGDD.pp t
“Data Quality” ???
In the most general colloquial terms, “Data Quality” is the fundamental issue
    of concern to scientists, policy makers, managers/decision makers and the
    general public.

“Quality” can be considered in terms of three primary values:

• Validity: logical in terms of intended hypothesis to be tested (all potential
  types of data that could be chosen should be weighed for probative
  value,,,)

• Competence (Reliability) : consideration of the proper choice of expert
  staff, methods, apparatus/gear, calibration, deployment and operation

• Integrity: the maintenance of original integrity of data as well as tracking
  and documenting of all transformations and sequences of transformation
  of data
“…the “validation” of any scientific hypotheses rests
  upon the sum integrity of all original data and
      of all sequences of data transformation
    to which original data have been subject. “




                                                                                  – Tom Moritz
                                                                          “The Burden of Proof”
                                                              Microsoft GRDI2020 Position Paper
                                                                            October 23-24, 2010


http://www.grdi2020.eu/Pages/SelectedDocument.aspx?id_documento=87f1b6d5-5c30-42a7-94df-d9cd5f4b147c
Logical / Scientific
 Validity of Data
T.C. Chamberlin
“What science does is put forward hypotheses, and use them to make
 predictions, and test those predictions against empirical evidence. Then the
scientists make judgments about which hypotheses are more likely, given the
 data. These judgments are notoriously hard to formalize, as Thomas Kuhn
argued in great detail, and philosophers of science don’t have anything like
 a rigorous understanding of how such judgments are made. But that’s only
 a worry at the most severe levels of rigor; in rough outline, the procedure is
  pretty clear. Scientists like hypotheses that fit the data, of course, but they
      also like them to be consistent with other established ideas, to be
  unambiguous and well-defined, to be wide in scope, and most of all to be
 simple. The more things an hypothesis can explain on the basis of the fewer
                   pieces of input, the happier scientists are.”

                                                                                -- Sean Carroll
                                                    “Science and Religion are not Compatible”
                                                                           Discover Magazine
                                                                     June 23rd, 2009 8:01 AM
Disregarding
    Validity???
(or assuming it can
   be inferred?)          No Stated hypothesis as basis
 An example from          for defining valid data-types

      wildlife
   management



 Page image is from
 “Road Ecology” RTT
 Forman et al. Island
 Press, 2002
 SEE:
 http://www.indiebound.
 org/book/97815596393
 30
COMPREHENSIVE VALIDITY???
  An exemplar of the possible range of data types available as “evidence” – in this case, that a zoological
survey has generated comprehensive results… Note: an inclusive combination of evidence types is ideally
                        necessary to optimize the evidentiary force of a survey…

        Comprehensive set of data types




                                                   Source: Voss & Emmons, AMNH Bull. No. 230, 1996
                                                   (by permission)
“Generic Competence”
       of Data
(from the National Atomic Testing
       Museum, Las Vegas)
BATS & SQUIRRELS AT SLAC

    “Data Cables Downed, not by Terabytes, but
                    Squirrel Bites”
    by Diane Rezendes Khirallah March 29, 2012
“The alert came shortly after 11 a.m. on Saturday:
Blackbox 1, a modular data center behind Building
50 that handles 252 computers dedicated to SLAC’s
BaBar experiment, was down. Les Cottrell, SLAC’s
manager of networking and
telecommunications, went with network architect
Antonio Ceseracciu and technical coordinator Ron
Barrett to investigate and get the system back up
                                                          March 29, 2012
and running as fast as possible. The power was            “A pine cone was tucked away in the lower-
on, so the problem was somewhere in the network           right corner of the cable box junction behind
equipment or cables. To determine the precise             one of the data servers. There, two cables were
location, Ceseracciu ran a test that sends a pulse of     chewed up, taking down 252 computers for
                                                          several hours last weekend.”
light to the far end of the cable. The pulse travels      Photos by Les Cottrell
down to the place where the cable is broken and
returns. By measuring how long this takes – much
as a bat measures distance by using sound waves
for echolocation – they ascertained that the
damaged area was 15 meters down the 100-meter
cable…”
                   https://news.slac.stanford.edu/image/squirrel-was-here
“Faster than light neutrinos: Heads roll“
                                March 30, 2012


“If you follow science at all (and maybe even if you don’t), you
    probably heard last year that scientists had discovered neutrinos
    that travelled faster than light… If true, this would be a big deal,
    knocking out laws of physics and causing dear doctor Einstein to roll
    in his grave, etc. What most physicists said at the time was
    something like, ‘Well, if it is true, then it’s a wonderful surprise, but
    it’s probably not true.’ It wasn’t true. It turned out that a faulty
    optical cable connection had affected the GPS readings and
    thereby the speed of light calculations. Today (March 30, 2012) it
    was announced that two leaders of the OPERA consortium, which
    conducted the original experiment, resigned following a vote of no-
    confidence. Thus, unlike in some other kinds of disasters – say
    financial collapse – scientists are willing and able to mete out
    consequences. “


    http://scitechstory.com/2012/03/30/faster-than-light-neutrinos-heads-roll/
Competent
Calibration
    and
Deployment
“Keeping Raw Data in Context”
“…any initiative to share raw clinical research data must also pay close attention to sharing clear
   and complete information about the design of the original studies. Relying on journal articles
   for study design information is problematic, for three reasons. First, journal articles often
   provide insufficient detail when describing key study design features such as randomization
   (1) and intervention details (2). Second, some data sets may come from studies with no
   publications [only 21% of oncology trials registered in ClinicalTrials.gov before 2004 and
   completed by September 2007 were published (3)]. Finally, investigators cannot reliably
   search journal articles for methodological concepts like “double blinding” or “interrupted
   time series,” crucial concepts for proper interpretation of the data. A mishmash of non-
   standardized databases of raw results and unevenly reported study designs is not a strong
   foundation for clinical research data sharing. “

“ We believe that the effective sharing of clinical research data requires the establishment of an
   interoperable federated database system that includes both study design and results data. A
   key component of this system is a logical model of clinical study characteristics in which all
   the data elements are standardized to controlled vocabularies and common ontologies to
    facilitate cross-study comparison and synthesis. “




I Sim, et al. “Keeping Raw Data in Context”[letter] Science v 323 6 Feb 2009, p713.
Provenance
    and
 Workflow
Management




SEE ALSO: VizTrails,
Expert
   Competence




                                                                                                                         “Competence”
                                “Involvement”
D. J. Meltzer, “Folsom: New Archaeological Investigations of a Classic Paleoindian Bison Kill” Univ of California Press, 2006.
Integrity of Data
8”
objet trouvé – gutter, 10th& Colorado, Santa Monica, California
Losses of Integrity: Data Degraded by successive transformations

                                                         Data
                                                 transformations
                                                and the risks of loss
                                                    of integrity --
                                                   (we must fully
                                                     analyze the
                                                  etiology of data
                                                   degradation!)
“It is well known that cartographic coordinates stored in double precision are far
more precisely specified than is merited by their accuracy, even for highly-accurate
global datasets. Far more coordinate digit places are stored for the sake of
avoiding machine error than are needed to define the location of map objects
within the necessary tolerances for both absolute and relative accuracies.”



“A careful look at the coordinate digits stored as double precision variables in a
GIS yields a variety of interesting patterns that are a result of previous machine
error, rounding error, measurement error, and so forth. Any slight cartographic
alteration (rotation/skewing, clipping/sub-setting, reprojecting, etc.) can add
noise into the coordinate and can be used to characterize a vector dataset.”




 Rice, Matt, Michael F. Goodchild, Keith C. Clarke (2005) "Cartographic Data Precision and Information Content". In
 Proceedings of Auto-Carto 2005: A Research Symposium. Las Vegas, Nevada, March 18-23, 2005.
“Most commonly, computer scientists are concerned with
  digital objects that are defined as a set of sequences of
  bits. One can then ask computationally based questions
  about whether one has the correct set of sequences of
       bits, such as whether the digital object in one's
      possession is the same as that which some entity
 published under a specific identifier at a specific point in
    time… However, this is a simplistic notion. There are
               additional factors to consider.” *!+




    Clifford Lynch, “Authenticity and Integrity in the Digital Environment: An
                 Exploratory Analysis of the Central Role of Trust,”
               http://www.clir.org/pubs/reports/pub92/lynch.html
“Canonical” Data?
• In the case of paradigm-shifting scientific
  discoveries – all supporting evidence must
  (and will) be held to an exacting, rigorously
  precise standard
• This is also true of scientific assertions that
  have major economic impacts – for example,
  climate change…
“…the “validation” of any scientific hypotheses rests
  upon the sum integrity of all original data and
      of all sequences of data transformation
    to which original data have been subject. “




                                              – Tom Moritz
                                      “The Burden of Proof”
                                    GRDI2020 Position Paper
                                        October 23-24, 2010
“Unstructured (or Weakly
  Structured) Data” ???
DARWIN




http://darwin-
online.org.uk/converted/published/1975_NaturalSelection_F15
83/1975_NaturalSelection_F1583_fig03.jpg                          http://www.nyu.edu/projects/materialworld/images/1_
                                                                           Darwin%20Tree%20B%2036.jpg
FIELD NOTES
FROM THE AMERICAN MUSEM CONGO EXPEDITION 1909-1915

            http://diglib1.amnh.org/cgi-bin/database/index.cgi
Rheinardia ocellata, the Crested Argus. Photographed at night by an
automatic camera-trap in the Ngoc Linh foothills (Quang Nam Province).
             Courtesy AMNH Center for Biodiversity and Conservation
Kirtland’s Warbler / Abaco Island, The
                Bahamas
“NATIVE”
                       METADATA


  DEAD HARBOR SEAL
         and
5 CALIFORNIA CONDORS
Field sketch by Professor OT
Hayward, Baylor University
  “Guadalupe Trip” Friday
     November 13, 1981
Treating unstructured data?
• Careful analysis to detect elements of
  emergent structure
• Systematic use of inference and recursion to
  attain optimal efficiencies
• Assumption that description will be an
  additive process not a single event
Thanks!

     Tom Moritz
tom.moritz@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Citizen science
Citizen scienceCitizen science
Citizen sciencesamar1407
 
Limitations of scientific activities
Limitations of scientific activitiesLimitations of scientific activities
Limitations of scientific activitiesAndrea Boggio
 
Nano tecnology 2050
Nano tecnology 2050Nano tecnology 2050
Nano tecnology 2050faysal3624
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPMicah Altman
 
Comments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data PrivacyComments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data PrivacyMicah Altman
 
Extreme Citizen Science - Public Participation in Scientific Research 2012
Extreme Citizen Science - Public Participation in Scientific Research 2012Extreme Citizen Science - Public Participation in Scientific Research 2012
Extreme Citizen Science - Public Participation in Scientific Research 2012Muki Haklay
 
What is Extreme Citizen Science? Volunteerism & Publicly Initiated Scientific...
What is Extreme Citizen Science? Volunteerism & Publicly Initiated Scientific...What is Extreme Citizen Science? Volunteerism & Publicly Initiated Scientific...
What is Extreme Citizen Science? Volunteerism & Publicly Initiated Scientific...Cindy Regalado
 
June2014 brownbag privacy
June2014 brownbag privacyJune2014 brownbag privacy
June2014 brownbag privacyMicah Altman
 
Citizen science presentation
Citizen science presentationCitizen science presentation
Citizen science presentationHessah Alajmi
 
Learning from Web Activity
Learning from Web ActivityLearning from Web Activity
Learning from Web Activityjakehofman
 
Jankowski, Vks E Research Slidecast, 26 June2008
Jankowski, Vks E Research Slidecast, 26 June2008Jankowski, Vks E Research Slidecast, 26 June2008
Jankowski, Vks E Research Slidecast, 26 June2008Nick Jankowski
 
Citizen science - theory, practice & policy workshop
Citizen science - theory, practice & policy workshopCitizen science - theory, practice & policy workshop
Citizen science - theory, practice & policy workshopMuki Haklay
 
De castro mekelle university 2014
De castro mekelle university 2014De castro mekelle university 2014
De castro mekelle university 2014Paola De Castro
 

Was ist angesagt? (13)

Citizen science
Citizen scienceCitizen science
Citizen science
 
Limitations of scientific activities
Limitations of scientific activitiesLimitations of scientific activities
Limitations of scientific activities
 
Nano tecnology 2050
Nano tecnology 2050Nano tecnology 2050
Nano tecnology 2050
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTP
 
Comments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data PrivacyComments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data Privacy
 
Extreme Citizen Science - Public Participation in Scientific Research 2012
Extreme Citizen Science - Public Participation in Scientific Research 2012Extreme Citizen Science - Public Participation in Scientific Research 2012
Extreme Citizen Science - Public Participation in Scientific Research 2012
 
What is Extreme Citizen Science? Volunteerism & Publicly Initiated Scientific...
What is Extreme Citizen Science? Volunteerism & Publicly Initiated Scientific...What is Extreme Citizen Science? Volunteerism & Publicly Initiated Scientific...
What is Extreme Citizen Science? Volunteerism & Publicly Initiated Scientific...
 
June2014 brownbag privacy
June2014 brownbag privacyJune2014 brownbag privacy
June2014 brownbag privacy
 
Citizen science presentation
Citizen science presentationCitizen science presentation
Citizen science presentation
 
Learning from Web Activity
Learning from Web ActivityLearning from Web Activity
Learning from Web Activity
 
Jankowski, Vks E Research Slidecast, 26 June2008
Jankowski, Vks E Research Slidecast, 26 June2008Jankowski, Vks E Research Slidecast, 26 June2008
Jankowski, Vks E Research Slidecast, 26 June2008
 
Citizen science - theory, practice & policy workshop
Citizen science - theory, practice & policy workshopCitizen science - theory, practice & policy workshop
Citizen science - theory, practice & policy workshop
 
De castro mekelle university 2014
De castro mekelle university 2014De castro mekelle university 2014
De castro mekelle university 2014
 

Andere mochten auch

International Society for Biological and Environmental Repositories (American...
International Society for Biological and Environmental Repositories (American...International Society for Biological and Environmental Repositories (American...
International Society for Biological and Environmental Repositories (American...Tom Moritz
 
"Assuming the Burden of Proof: Data as Evidence in Science and Public Policy,...
"Assuming the Burden of Proof: Data as Evidence in Science and Public Policy,..."Assuming the Burden of Proof: Data as Evidence in Science and Public Policy,...
"Assuming the Burden of Proof: Data as Evidence in Science and Public Policy,...Tom Moritz
 
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
 
Digital @ the American Museum of Natural History: ZBIG Meeting University of ...
Digital @ the American Museum of Natural History: ZBIG Meeting University of ...Digital @ the American Museum of Natural History: ZBIG Meeting University of ...
Digital @ the American Museum of Natural History: ZBIG Meeting University of ...Tom Moritz
 
"Toward Sustainability: "Margin" and "Mission" in the Natural History Setting...
"Toward Sustainability: "Margin" and "Mission" in the Natural History Setting..."Toward Sustainability: "Margin" and "Mission" in the Natural History Setting...
"Toward Sustainability: "Margin" and "Mission" in the Natural History Setting...Tom Moritz
 
"Science Literacy" - American Library Association, New Orleans June 2006
"Science Literacy"  - American Library Association, New Orleans June 2006"Science Literacy"  - American Library Association, New Orleans June 2006
"Science Literacy" - American Library Association, New Orleans June 2006Tom Moritz
 
"The Conservation Commons: Lessons and Analyses Adapted from other Sectors an...
"The Conservation Commons: Lessons and Analyses Adapted from other Sectors an..."The Conservation Commons: Lessons and Analyses Adapted from other Sectors an...
"The Conservation Commons: Lessons and Analyses Adapted from other Sectors an...Tom Moritz
 
"What does 'Full Life-Cycle' Data Management Mean ?"
"What does 'Full Life-Cycle' Data Management Mean ?""What does 'Full Life-Cycle' Data Management Mean ?"
"What does 'Full Life-Cycle' Data Management Mean ?"Tom Moritz
 
A Vision for the Biodiversity Commons Concept, May 2004, IUCN Geneva, Switze...
 A Vision for the Biodiversity Commons Concept, May 2004, IUCN Geneva, Switze... A Vision for the Biodiversity Commons Concept, May 2004, IUCN Geneva, Switze...
A Vision for the Biodiversity Commons Concept, May 2004, IUCN Geneva, Switze...Tom Moritz
 
Computer technology in library and information science notes in hindi
Computer technology in library and information science notes in hindiComputer technology in library and information science notes in hindi
Computer technology in library and information science notes in hindiMohammad Rehan
 

Andere mochten auch (11)

International Society for Biological and Environmental Repositories (American...
International Society for Biological and Environmental Repositories (American...International Society for Biological and Environmental Repositories (American...
International Society for Biological and Environmental Repositories (American...
 
"Assuming the Burden of Proof: Data as Evidence in Science and Public Policy,...
"Assuming the Burden of Proof: Data as Evidence in Science and Public Policy,..."Assuming the Burden of Proof: Data as Evidence in Science and Public Policy,...
"Assuming the Burden of Proof: Data as Evidence in Science and Public Policy,...
 
Why library and information science research matters in hard times
Why library and information science research matters in hard timesWhy library and information science research matters in hard times
Why library and information science research matters in hard times
 
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
 
Digital @ the American Museum of Natural History: ZBIG Meeting University of ...
Digital @ the American Museum of Natural History: ZBIG Meeting University of ...Digital @ the American Museum of Natural History: ZBIG Meeting University of ...
Digital @ the American Museum of Natural History: ZBIG Meeting University of ...
 
"Toward Sustainability: "Margin" and "Mission" in the Natural History Setting...
"Toward Sustainability: "Margin" and "Mission" in the Natural History Setting..."Toward Sustainability: "Margin" and "Mission" in the Natural History Setting...
"Toward Sustainability: "Margin" and "Mission" in the Natural History Setting...
 
"Science Literacy" - American Library Association, New Orleans June 2006
"Science Literacy"  - American Library Association, New Orleans June 2006"Science Literacy"  - American Library Association, New Orleans June 2006
"Science Literacy" - American Library Association, New Orleans June 2006
 
"The Conservation Commons: Lessons and Analyses Adapted from other Sectors an...
"The Conservation Commons: Lessons and Analyses Adapted from other Sectors an..."The Conservation Commons: Lessons and Analyses Adapted from other Sectors an...
"The Conservation Commons: Lessons and Analyses Adapted from other Sectors an...
 
"What does 'Full Life-Cycle' Data Management Mean ?"
"What does 'Full Life-Cycle' Data Management Mean ?""What does 'Full Life-Cycle' Data Management Mean ?"
"What does 'Full Life-Cycle' Data Management Mean ?"
 
A Vision for the Biodiversity Commons Concept, May 2004, IUCN Geneva, Switze...
 A Vision for the Biodiversity Commons Concept, May 2004, IUCN Geneva, Switze... A Vision for the Biodiversity Commons Concept, May 2004, IUCN Geneva, Switze...
A Vision for the Biodiversity Commons Concept, May 2004, IUCN Geneva, Switze...
 
Computer technology in library and information science notes in hindi
Computer technology in library and information science notes in hindiComputer technology in library and information science notes in hindi
Computer technology in library and information science notes in hindi
 

Ähnlich wie IAMSLIC 2012, ANCHORAGE, AK

Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Martin Donnelly
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigmstarastar
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationciakov
 
e-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE Francee-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE FranceJean-François Lutz
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Darlene Cavalier
 
Science Gateways and Their Tremendous Potential for Science and Engineering
Science Gateways and Their Tremendous Potential for Science and EngineeringScience Gateways and Their Tremendous Potential for Science and Engineering
Science Gateways and Their Tremendous Potential for Science and EngineeringCybera Inc.
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey BoultonLEARN Project
 
An initial exploration of Citizen Science
An initial exploration of Citizen ScienceAn initial exploration of Citizen Science
An initial exploration of Citizen ScienceNiamh O Riordan
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
 
CERN communicating scientific breakthrough
CERN communicating scientific breakthroughCERN communicating scientific breakthrough
CERN communicating scientific breakthroughjaydip12
 
Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...Rebekah Cummings
 
Moritz D Lib Building The Biodiversity Commons
Moritz D Lib Building The Biodiversity CommonsMoritz D Lib Building The Biodiversity Commons
Moritz D Lib Building The Biodiversity CommonsTom Moritz
 
A whirlwind tour of Citizen Science in Astronomy
A whirlwind tour of Citizen Science in AstronomyA whirlwind tour of Citizen Science in Astronomy
A whirlwind tour of Citizen Science in AstronomyMargaret Gold
 
World Conservation Congress 2008
World Conservation Congress 2008World Conservation Congress 2008
World Conservation Congress 2008Tom Moritz
 
Arin6912 – Digital Research And Publishing Presentation
Arin6912 – Digital Research And Publishing PresentationArin6912 – Digital Research And Publishing Presentation
Arin6912 – Digital Research And Publishing Presentationklfagan
 
Research Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staffResearch Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staffMartin Donnelly
 
Research Data Management for the Humanities and Social Sciences
Research Data Management for the Humanities and Social SciencesResearch Data Management for the Humanities and Social Sciences
Research Data Management for the Humanities and Social SciencesMartin Donnelly
 

Ähnlich wie IAMSLIC 2012, ANCHORAGE, AK (20)

2012 IASC Thematic Conference "On the knowledge Commons" call for papers
2012 IASC Thematic Conference "On the knowledge Commons" call for papers2012 IASC Thematic Conference "On the knowledge Commons" call for papers
2012 IASC Thematic Conference "On the knowledge Commons" call for papers
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms:
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovation
 
e-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE Francee-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE France
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
 
Science Gateways and Their Tremendous Potential for Science and Engineering
Science Gateways and Their Tremendous Potential for Science and EngineeringScience Gateways and Their Tremendous Potential for Science and Engineering
Science Gateways and Their Tremendous Potential for Science and Engineering
 
Scienceofscience
ScienceofscienceScienceofscience
Scienceofscience
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey Boulton
 
An initial exploration of Citizen Science
An initial exploration of Citizen ScienceAn initial exploration of Citizen Science
An initial exploration of Citizen Science
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
CERN communicating scientific breakthrough
CERN communicating scientific breakthroughCERN communicating scientific breakthrough
CERN communicating scientific breakthrough
 
Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...
 
Moritz D Lib Building The Biodiversity Commons
Moritz D Lib Building The Biodiversity CommonsMoritz D Lib Building The Biodiversity Commons
Moritz D Lib Building The Biodiversity Commons
 
A whirlwind tour of Citizen Science in Astronomy
A whirlwind tour of Citizen Science in AstronomyA whirlwind tour of Citizen Science in Astronomy
A whirlwind tour of Citizen Science in Astronomy
 
World Conservation Congress 2008
World Conservation Congress 2008World Conservation Congress 2008
World Conservation Congress 2008
 
Arin6912 – Digital Research And Publishing Presentation
Arin6912 – Digital Research And Publishing PresentationArin6912 – Digital Research And Publishing Presentation
Arin6912 – Digital Research And Publishing Presentation
 
Research Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staffResearch Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staff
 
Research Data Management for the Humanities and Social Sciences
Research Data Management for the Humanities and Social SciencesResearch Data Management for the Humanities and Social Sciences
Research Data Management for the Humanities and Social Sciences
 

Mehr von Tom Moritz

ESA Science Commons
ESA Science CommonsESA Science Commons
ESA Science CommonsTom Moritz
 
Marine microbiology
Marine microbiologyMarine microbiology
Marine microbiologyTom Moritz
 
Pelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copyPelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copyTom Moritz
 
Pelagic environment and ecology (2)
Pelagic environment and ecology (2) Pelagic environment and ecology (2)
Pelagic environment and ecology (2) Tom Moritz
 
Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)Tom Moritz
 
Chaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub EcologyChaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub EcologyTom Moritz
 
The Intertidal and Kelp Forests - Pacific Coast
The Intertidal and Kelp Forests  - Pacific CoastThe Intertidal and Kelp Forests  - Pacific Coast
The Intertidal and Kelp Forests - Pacific CoastTom Moritz
 
A Universe of Data
A Universe of DataA Universe of Data
A Universe of DataTom Moritz
 
Climate Change
Climate ChangeClimate Change
Climate ChangeTom Moritz
 
Climate change
Climate changeClimate change
Climate changeTom Moritz
 
The commons???
The commons???The commons???
The commons???Tom Moritz
 
Ecological Society of America Science Commons
Ecological Society of America Science CommonsEcological Society of America Science Commons
Ecological Society of America Science CommonsTom Moritz
 
Epidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aidsEpidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aidsTom Moritz
 
The Human Biome
The Human BiomeThe Human Biome
The Human BiomeTom Moritz
 
Epistemology, ontology, knowledge x
Epistemology, ontology, knowledge xEpistemology, ontology, knowledge x
Epistemology, ontology, knowledge xTom Moritz
 
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)Tom Moritz
 
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary TheoryCharles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary TheoryTom Moritz
 
Trauma and violence
Trauma and violenceTrauma and violence
Trauma and violenceTom Moritz
 
Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...Tom Moritz
 

Mehr von Tom Moritz (20)

ESA Science Commons
ESA Science CommonsESA Science Commons
ESA Science Commons
 
Microbiology
MicrobiologyMicrobiology
Microbiology
 
Marine microbiology
Marine microbiologyMarine microbiology
Marine microbiology
 
Pelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copyPelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copy
 
Pelagic environment and ecology (2)
Pelagic environment and ecology (2) Pelagic environment and ecology (2)
Pelagic environment and ecology (2)
 
Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)
 
Chaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub EcologyChaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub Ecology
 
The Intertidal and Kelp Forests - Pacific Coast
The Intertidal and Kelp Forests  - Pacific CoastThe Intertidal and Kelp Forests  - Pacific Coast
The Intertidal and Kelp Forests - Pacific Coast
 
A Universe of Data
A Universe of DataA Universe of Data
A Universe of Data
 
Climate Change
Climate ChangeClimate Change
Climate Change
 
Climate change
Climate changeClimate change
Climate change
 
The commons???
The commons???The commons???
The commons???
 
Ecological Society of America Science Commons
Ecological Society of America Science CommonsEcological Society of America Science Commons
Ecological Society of America Science Commons
 
Epidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aidsEpidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aids
 
The Human Biome
The Human BiomeThe Human Biome
The Human Biome
 
Epistemology, ontology, knowledge x
Epistemology, ontology, knowledge xEpistemology, ontology, knowledge x
Epistemology, ontology, knowledge x
 
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
 
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary TheoryCharles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
 
Trauma and violence
Trauma and violenceTrauma and violence
Trauma and violence
 
Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

IAMSLIC 2012, ANCHORAGE, AK

  • 1. Some notes on data Tom Moritz IAMSLIC 2012 Anchorage, Alaska August, 2012
  • 2. Libraries in the 21st Century must be responsible for all types of knowledge resources… And for understanding the complex development, interactions and disposition of such resources…
  • 3. First a brief digression focusing on the common… Ethos of Science and Libraries
  • 4. “Declaration of Scientific Principles” in “The Commonwealth of Science” “7. The pursuit of scientific inquiry demands complete intellectual freedom. And unrestricted international exchange of knowledge…“ from “The Commonwealth of Science ” Nature No.3753 October 4, 1941.
  • 5. “The substantive findings of science are a product of social collaboration and are assigned to the community. They constitute a common heritage in which the equity of the individual producer is severely limited…” “The scientist’s claim to “his” intellectual “property” is limited to that of recognition and esteem which, if the institution functions with a modicum of efficiency, is roughly commensurate with the significance of the increments brought to the common fund of knowledge.” Robert K. Merton, “A Note on Science and Democarcy,” Journal of Law and Political Sociology 1 (1942): 121.
  • 6. “Factual data are fundamental to the progress of science and to our preeminent system of innovation. Freedom of inquiry, the open availability of scientific data, and full disclosure of results through publication are the cornerstones of basic research, which both domestic law and the norms of public science have long upheld.” J.H. Reichman and P.F Uhlir. “A contractually reconstructed research commons for scientific data in a highly protectionist intellectual property environment.” in The Public Domain. J.Boyle, ed.Durham, NC: schoolo of Law, Duke University. (Law and Contemporary Problems, Vol.66 nos 1&2 ) 2003
  • 7. “Public research is largely an open, communitarian, and cooperative system. It is founded on freedom of inquiry,sharing of data and full disclosure of results by scientistswhose motivations are rooted primarily in intellectual curiosity, the desire to influence the thinking of others about the natural world, peer recognition for their achievements, and promotion of the public interest. “Although this normative and value structure of public science predated the revolution in digitally networked technologies, it makes it ideally suited to experiment with and exploit those new technological capabilities,which themselves facilitate open, distributed and cooperative uses of information.” P.F. Uhlir. “Re-intermediation in the Republic of Science: Moving from IntellectualProperty to Intellectual Commons.” Information Services and Use 23(2/3) 63-66. 2003
  • 8. The erosion of the ethic of data sharing: “Could you patent the sun? “ In a 1954 interview with Edward R Murrow, Jonas Salk responded to a question suggesting the patenting of the polio vaccine : “Could you patent the sun?” and then ca 50 years later In a 2002 study, 47% of surveyed geneticists had been rejected at least once in their efforts to gain access to key genetics data (this result indicated a significant increase over a previous survey). EG Campbell et al. “Data Withholding in Academic Genetics: Evidence From a National Survey” JAMA, Jan 2002; 287: 473 – 480; Massachusetts General Hospital (2006). “Studies examine withholding of scientific data among researchers, trainees: Relationships with industry, competitive environments associated with research secrecy.” News release (January 25). Massachusetts General Hospital. http://www.massgeneral.org/news/releases/012506campbell.html, as of November 17, 2008.
  • 10. The Science Commons: “Protocol for Implementing Open Access Data” “…it is conceivable that in 20 years, a complex semantic query across tens of thousands of data records across the web might return a result which itself populates a new database. If intellectual property rights are involved, that query might well trigger requirements carrying a stiff penalty for failure, including such problems as a copyright infringement lawsuit.” http://sciencecommons.org/projects/publishing/open-access-data-protocol/
  • 11. Complex knowledge resources support research Research Information Network and British Library “Patterns of information use and exchange: case studies of researchers in the life sciences” http://www.rin.ac.uk/system/files/attachments/Patterns_information_use-REPORT_Nov09.pdf
  • 12. Linked Open Data 2009 2011 Courtesy of Tim Lebo, RPI http://bit.ly/lebo-ipaw- 20 Jun 2012 2012 @timrdf http://bit.ly/lebo-ipaw-2012 12
  • 13. And a tension exists between the great potential of dynamically linked data and the fear of legal liability by infringement of conventional IPR claims… But… “Progress in modern technology, combined with a legal system that was crafted for the analog era, is now having unintended consequences. One of these is a kind of legal "friction" that hinders the reuse of knowledge and slows innovation.” From: “Science Commons” by M. McGeever, University of Edinburgh http://www.dcc.ac.uk/resources/briefing-papers/legal-watch-papers/science-commons#2
  • 15. Research Commons The Public Domain “The institutional ecology of the digital Knowledge environment” Commons (Yokai Benkler) Sectors (public < - > private) and Jurisdictional Scale THE ROLE OF SCIENTIFIC AND TECHNICAL DATA AND INFORMATION IN THE PUBLIC DOMAIN PROCEEDINGS OF A SYMPOSIUM Julie M. Esanu and Paul F. Uhlir, Editors Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 5
  • 16. The “Ecology” of Science GRIDS Data International Centers Collaborative Research Effort Individual National Disciplinary Initiatives Libraries Cooperative Projects Local / Individuals Personal Archiving “Small Science” “BIG Science”
  • 17. The “small science,” independent investigator approach traditionally has characterized a large area of experimental laboratory sciences, such as chemistry or biomedical research, and field work and studies, such as biodiversity, ecology, microbiology, soil science, and anthropology. The data or samples are collected and analyzed independently, and the resulting data sets from such studies generally are heterogeneous and unstandardized, with few of the individual data holdings deposited in public data repositories or openly shared. The data exist in various twilight states of accessibility, depending on the extent to which they are published, discussed in papers but not revealed, or just known about because of reputation or ongoing work, but kept under absolute or relative secrecy. The data are thus disaggregated components of an incipient network that is only as effective as the individual transactions that put it together. Openness and sharing are not ignored, but they are not necessarily dominant either. These values must compete with strategic considerations of self-interest, secrecy, and the logic of mutually beneficial exchange, particularly in areas of research in which commercial applications are more readily identifiable. The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Julie M. Esanu and Paul F. Uhlir, Eds. Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 8
  • 18. The “Economy” of Scientific Knowledge? ??? Julian Birkinshaw and Tony Sheehan, “Managing the Knowledge Life Cycle,” MIT Sloan Management Review, 44 (2) Fall, 2002: 77.
  • 19. “Data” ? [technical – “bits & bytes” -- definition] “…’data’ are defined as any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor streams, video, audio, algorithms, software, models and simulations, images, etc.”-- Program Solicitation 07-601 “Sustainable Digital Data Preservation and Access Network Partners (DataNet)” Taken in this broadest possible sense, “data” are thus simply electronic coded forms of information. And virtually anything can be represented as “data” so long as it is electronically machine-readable.
  • 20. “The digital universe in 2007 — at 2.25 x 1021bits (281 exabytes or 281 billion gigabytes) — was 10% bigger than we thought. The resizing comes as a result of faster growth in cameras, digital TV shipments, and better understanding of information replication. “By 2011, the digital universe will be 10 times the size it was in 2006. “As forecast, the amount of information created, captured, or replicated exceeded available storage for the first time in 2007. Not all information created and transmitted gets stored, but by 2011, almost half of the digital universe will not have a permanent home. “Fast-growing corners of the digital universe include those related to digital TV, surveillance cameras, Internet access in emerging countries, sensor-based applications, datacenters supporting “cloud computing,” and social networks. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth through 2011 -- Executive Summary. IDC Information and Data, March, 2008 http://www.emc.com/collateral/analyst-reports/diverse-exploding-idc-exec-summary.pdf
  • 21. “As you go down the Long Tail the signal-to-noise ratio gets worse. Thus the only way you can maintain a consistently good enough signal to find what you want is if your filters get increasingly powerful.” Chris Anderson “Is the Long Tail full of crap?” May 22, 2005 http://longtail.typepad.com/the_long_tail/2005/05/isnt_the_long_t.html
  • 22. “Note that you have high-quality goods in every part of the curve, from top to bottom. Yes, there are more low-quality goods in the tail and the average level of quality declines as you go down the curve. But with good filters averages don't matter. It's all about the diamonds, not the rough, and diamonds can be found anywhere.” Chris Anderson “Is the Long Tail full of crap?” May 22, 2005 http://longtail.typepad.com/the_long_tail/2005/05/isnt_the_long_t.html
  • 23. “Data” [epistemic definition] “Measurements, observations or descriptions of a referent -- such as an individual, an event, a specimen in a collection or an excavated/surveyed object -- created or collected through human interpretation (whether directly “by hand” or through the use of technologies)” -- AnthroDPA Working Group on Metadata (May, 2009)
  • 24. “A Letter from George Lynn, Esq; To Ja. Jurin, M. D. F. R. S. Containing Some Remarks on the Weather, and Accompanying Three Synoptical Tables of Meteorological Observations for 14 Years, viz. from 1726 to 1739. Both Inclusive” (January 1, 1753) The Philosophical Transactions of the Royal Society (Phil. Trans.) V. 41 http://archive.org/details/philtrans00658288
  • 25.
  • 26. New capacity for historical (“longitudinal”) studies: ICOADS Marine Data Rescue Scott Wodruff et al. “ICOADS Marine Data Rescue: Status and Future CDMP Priorities”: http://icoads.noaa.gov/reclaim/pdf/marine-data-rescue_v15.pdf
  • 27. NCAR Research Data Archive (RDA) C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 7. www.dcc.ac.uk/events/dcc- 2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
  • 28. 2-d_soil_temps.csv surface, and sub-surface soil temperatures (at 2cm and 8cm depths) measured at one location for a few days in order to calibrate a model of temperature propagation. Surface temperature was measured with an infrared thermometer, subsurface temperatures with a thermocouple. ---------------------------- 5-minute_light_data_for_4_continuous_days_plus_reference.xls PPF (photosynthetic photon flux = photosynthetically active radiation 400-700nm) measured with an array of photodiodes calibrated to a Licor sensor, along a linear transect for a few days. used to get an idea of how much light plants along the transect are receiving. ---------------------------- DATA CO2_of_air_at_different_heights_July_9.xls concentration of CO2 in the air during the evening for one day, measured with a Licor infrared gas analyzer and a series of relays and tubes with a pump. used to examine the gradient of CO2 coming from the soil when the air is still during the evening. ---------------------------- SETS Fern_light_response.xls Light response curves for bracken ferns, measured with a Licor photosynthesis system. Fronds are exposed to different light levels and their instantaneous photosynthesis and conductance is measured. used in conjunction with the induction data (below) for physiological characterization of the ferns. ---------------------------- La_Selva_species_photosyntheis_table.xls incomplete data set on instantaneous photosynthesis rates for various tropical understory and epiphytic species grown in a shade house in Costa Rica. ---------------------------- some manzanita_sapflow_12-5-07_to_7-7-08.xls instantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground measures of root grown and CO2 production. examples ---------------------------- moisture_release_curves.xls percentage of water content, water potential (in MegaPascals) and temperature of soil samples, measured in the laboratory with “native for calibration of water content with water potential. soil is from the James Reserve in California. ---------------------------- Photosynthetic_induction.xls metadata” a time-course of photosynthetic induction for a leaf over 35 minutes. instantaneous photosynthesis measured as �mol CO2 m/2/s and light level is probably 1000 micromoles. used to determine physiological characteristics of bracken ferns. ---------------------------- run_2_24-h_data_for_mesh.xls measurements of micrometeorological parameters on a moving shuttle, going from a clearing across a forest edge and into the forest for about 30 meters. Pyronometers facing up and down, pyrgeometer facing up and down, PAR, air temperature, relative humidity. Also data from a station fixed in the clearing and some derived variables calculated. used for examining edge effects in forests. ---------------------------- Segment_of_wallflower_compare_colorspaces_blur.xls pixel counts from images of wallflowers that were segmented into flower/not-flower under different color spaces. segmentation was made using a probability matrix of hand-segmented images. used to automatically count flowers in images collected after this training data was collected (and used to determine the best color space for this task).
  • 29. manzanita_sapflow_12-5-07_to_7-7-08.xls instantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground measures of root grown and CO2 production. sbid battery datetime heater_voltage Manz1Sap1 Manz1Sap2 Manz1Sap3 Manz1Sap4 Manz2Sap5 Manz2Sap6 Manz2Sap7 Manz3Sap10 Manz3Sap8 Manz3Sap9 Manz4Sap11 timestamp Datagap Julian 2 12.365 1196796112 2018.8 0.5585 0.51029 0.55517 0.54354 0.6067 0.52858 0.55351 0.59008 0.59506 0.60337 0.56514 12/4/07 11:21 4.47351 3 12.348 1196796232 2017.9 0.55682 0.51028 0.5535 0.54352 0.60669 0.52857 0.55017 0.59007 0.59505 0.60336 0.56513 12/4/07 11:23 0 4.47490 4 12.357 1196796352 2018.6 0.55514 0.51027 0.55348 0.54351 0.60501 0.52855 0.55016 0.59005 0.59504 0.60501 0.56512 12/4/07 11:25 0 4.47628 5 12.354 1196796472 2017.6 0.55514 0.51026 0.55181 0.5435 0.60334 0.52855 0.54849 0.59004 0.59503 0.60334 0.56511 12/4/07 11:27 0 4.47767 6 12.334 1196796592 2018.3 0.55347 0.51026 0.55015 0.5435 0.60333 0.52854 0.54682 0.59004 0.59502 0.605 0.56511 12/4/07 11:29 0 4.47906 7 12.34 1196796712 2018.5 0.55014 0.50859 0.55014 0.54349 0.60332 0.53019 0.54349 0.59003 0.59501 0.60498 0.56676 12/4/07 11:31 0 4.48045 8 12.337 1196796832 2017.8 0.55013 0.50692 0.55013 0.54348 0.60332 0.53019 0.54182 0.59002 0.59501 0.60498 0.56675 12/4/07 11:33 0 4.48184 9 12.328 1196796952 2017.5 0.5468 0.50691 0.5468 0.54347 0.60331 0.53018 0.53849 0.59001 0.595 0.60497 0.56674 12/4/07 11:35 0 4.48323 10 12.323 1196797072 2017 0.54679 0.50524 0.54679 0.54347 0.59998 0.53017 0.53682 0.59 0.59499 0.60496 0.56674 12/4/07 11:37 0 4.48462 11 12.328 1196797192 2018.9 0.54679 0.50191 0.54512 0.5418 0.59665 0.53017 0.53349 0.59 0.59498 0.60496 0.56673 12/4/07 11:39 0 4.48601 12 12.319 1196797312 2017.7 0.54345 0.49857 0.54178 0.54178 0.59663 0.53015 0.53015 0.58998 0.5933 0.60327 0.56671 12/4/07 11:41 0 4.48740 13 12.311 1196797432 2017.3 0.54343 0.4969 0.54011 0.54177 0.59661 0.53014 0.52848 0.58997 0.59329 0.6016 0.5667 12/4/07 11:43 0 4.48878 14 12.316 1196797552 2018.6 0.5401 0.49357 0.53678 0.54176 0.59328 0.53013 0.5268 0.58995 0.59328 0.60325 0.56669 12/4/07 11:45 0 4.49017 15 12.31 1196797672 2016.8 0.53844 0.4919 0.53511 0.54176 0.59494 0.53013 0.52514 0.58995 0.59328 0.60325 0.56503 12/4/07 11:47 0 4.49156 16 12.31 1196797792 2017.1 0.53676 0.48856 0.53343 0.54174 0.59326 0.53011 0.5218 0.58993 0.59326 0.60323 0.56501 12/4/07 11:49 0 4.49295 17 12.31 1196797912 2017.1 0.53342 0.48523 0.5301 0.54173 0.59324 0.5301 0.51846 0.58826 0.59324 0.60321 0.56499 12/4/07 11:51 0 4.49434 18 12.301 1196798031 2017.5 0.53174 0.48521 0.52842 0.53839 0.59156 0.53008 0.51845 0.58824 0.59323 0.6032 0.56498 12/4/07 11:53 0 4.49573 19 12.301 1196798151 2016.3 0.53007 0.48188 0.52509 0.53838 0.59155 0.53007 0.51512 0.58823 0.59321 0.60152 0.5633 12/4/07 11:55 0 4.49712 20 12.303 1196798271 2016.6 0.5284 0.47855 0.52175 0.53837 0.59154 0.5284 0.5151 0.58821 0.59154 0.60151 0.56163 12/4/07 11:57 0 4.49851 Datum: “0.59998”
  • 31. US NSF “DataNet” Program “the full data preservation and access lifecycle” • “acquisition” • “documentation” • “protection” • “access” • “analysis and dissemination” • “migration” • “disposition” “Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07- 601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & Information Science & Engineering
  • 33. “Data Quality” ??? In the most general colloquial terms, “Data Quality” is the fundamental issue of concern to scientists, policy makers, managers/decision makers and the general public. “Quality” can be considered in terms of three primary values: • Validity: logical in terms of intended hypothesis to be tested (all potential types of data that could be chosen should be weighed for probative value,,,) • Competence (Reliability) : consideration of the proper choice of expert staff, methods, apparatus/gear, calibration, deployment and operation • Integrity: the maintenance of original integrity of data as well as tracking and documenting of all transformations and sequences of transformation of data
  • 34. “…the “validation” of any scientific hypotheses rests upon the sum integrity of all original data and of all sequences of data transformation to which original data have been subject. “ – Tom Moritz “The Burden of Proof” Microsoft GRDI2020 Position Paper October 23-24, 2010 http://www.grdi2020.eu/Pages/SelectedDocument.aspx?id_documento=87f1b6d5-5c30-42a7-94df-d9cd5f4b147c
  • 35. Logical / Scientific Validity of Data
  • 37. “What science does is put forward hypotheses, and use them to make predictions, and test those predictions against empirical evidence. Then the scientists make judgments about which hypotheses are more likely, given the data. These judgments are notoriously hard to formalize, as Thomas Kuhn argued in great detail, and philosophers of science don’t have anything like a rigorous understanding of how such judgments are made. But that’s only a worry at the most severe levels of rigor; in rough outline, the procedure is pretty clear. Scientists like hypotheses that fit the data, of course, but they also like them to be consistent with other established ideas, to be unambiguous and well-defined, to be wide in scope, and most of all to be simple. The more things an hypothesis can explain on the basis of the fewer pieces of input, the happier scientists are.” -- Sean Carroll “Science and Religion are not Compatible” Discover Magazine June 23rd, 2009 8:01 AM
  • 38. Disregarding Validity??? (or assuming it can be inferred?) No Stated hypothesis as basis An example from for defining valid data-types wildlife management Page image is from “Road Ecology” RTT Forman et al. Island Press, 2002 SEE: http://www.indiebound. org/book/97815596393 30
  • 39. COMPREHENSIVE VALIDITY??? An exemplar of the possible range of data types available as “evidence” – in this case, that a zoological survey has generated comprehensive results… Note: an inclusive combination of evidence types is ideally necessary to optimize the evidentiary force of a survey… Comprehensive set of data types Source: Voss & Emmons, AMNH Bull. No. 230, 1996 (by permission)
  • 40. “Generic Competence” of Data (from the National Atomic Testing Museum, Las Vegas)
  • 41. BATS & SQUIRRELS AT SLAC “Data Cables Downed, not by Terabytes, but Squirrel Bites” by Diane Rezendes Khirallah March 29, 2012 “The alert came shortly after 11 a.m. on Saturday: Blackbox 1, a modular data center behind Building 50 that handles 252 computers dedicated to SLAC’s BaBar experiment, was down. Les Cottrell, SLAC’s manager of networking and telecommunications, went with network architect Antonio Ceseracciu and technical coordinator Ron Barrett to investigate and get the system back up March 29, 2012 and running as fast as possible. The power was “A pine cone was tucked away in the lower- on, so the problem was somewhere in the network right corner of the cable box junction behind equipment or cables. To determine the precise one of the data servers. There, two cables were location, Ceseracciu ran a test that sends a pulse of chewed up, taking down 252 computers for several hours last weekend.” light to the far end of the cable. The pulse travels Photos by Les Cottrell down to the place where the cable is broken and returns. By measuring how long this takes – much as a bat measures distance by using sound waves for echolocation – they ascertained that the damaged area was 15 meters down the 100-meter cable…” https://news.slac.stanford.edu/image/squirrel-was-here
  • 42.
  • 43. “Faster than light neutrinos: Heads roll“ March 30, 2012 “If you follow science at all (and maybe even if you don’t), you probably heard last year that scientists had discovered neutrinos that travelled faster than light… If true, this would be a big deal, knocking out laws of physics and causing dear doctor Einstein to roll in his grave, etc. What most physicists said at the time was something like, ‘Well, if it is true, then it’s a wonderful surprise, but it’s probably not true.’ It wasn’t true. It turned out that a faulty optical cable connection had affected the GPS readings and thereby the speed of light calculations. Today (March 30, 2012) it was announced that two leaders of the OPERA consortium, which conducted the original experiment, resigned following a vote of no- confidence. Thus, unlike in some other kinds of disasters – say financial collapse – scientists are willing and able to mete out consequences. “ http://scitechstory.com/2012/03/30/faster-than-light-neutrinos-heads-roll/
  • 44. Competent Calibration and Deployment
  • 45.
  • 46.
  • 47.
  • 48. “Keeping Raw Data in Context” “…any initiative to share raw clinical research data must also pay close attention to sharing clear and complete information about the design of the original studies. Relying on journal articles for study design information is problematic, for three reasons. First, journal articles often provide insufficient detail when describing key study design features such as randomization (1) and intervention details (2). Second, some data sets may come from studies with no publications [only 21% of oncology trials registered in ClinicalTrials.gov before 2004 and completed by September 2007 were published (3)]. Finally, investigators cannot reliably search journal articles for methodological concepts like “double blinding” or “interrupted time series,” crucial concepts for proper interpretation of the data. A mishmash of non- standardized databases of raw results and unevenly reported study designs is not a strong foundation for clinical research data sharing. “ “ We believe that the effective sharing of clinical research data requires the establishment of an interoperable federated database system that includes both study design and results data. A key component of this system is a logical model of clinical study characteristics in which all the data elements are standardized to controlled vocabularies and common ontologies to facilitate cross-study comparison and synthesis. “ I Sim, et al. “Keeping Raw Data in Context”[letter] Science v 323 6 Feb 2009, p713.
  • 49. Provenance and Workflow Management SEE ALSO: VizTrails,
  • 50. Expert Competence “Competence” “Involvement” D. J. Meltzer, “Folsom: New Archaeological Investigations of a Classic Paleoindian Bison Kill” Univ of California Press, 2006.
  • 52. 8”
  • 53. objet trouvé – gutter, 10th& Colorado, Santa Monica, California
  • 54. Losses of Integrity: Data Degraded by successive transformations Data transformations and the risks of loss of integrity -- (we must fully analyze the etiology of data degradation!)
  • 55. “It is well known that cartographic coordinates stored in double precision are far more precisely specified than is merited by their accuracy, even for highly-accurate global datasets. Far more coordinate digit places are stored for the sake of avoiding machine error than are needed to define the location of map objects within the necessary tolerances for both absolute and relative accuracies.” “A careful look at the coordinate digits stored as double precision variables in a GIS yields a variety of interesting patterns that are a result of previous machine error, rounding error, measurement error, and so forth. Any slight cartographic alteration (rotation/skewing, clipping/sub-setting, reprojecting, etc.) can add noise into the coordinate and can be used to characterize a vector dataset.” Rice, Matt, Michael F. Goodchild, Keith C. Clarke (2005) "Cartographic Data Precision and Information Content". In Proceedings of Auto-Carto 2005: A Research Symposium. Las Vegas, Nevada, March 18-23, 2005.
  • 56. “Most commonly, computer scientists are concerned with digital objects that are defined as a set of sequences of bits. One can then ask computationally based questions about whether one has the correct set of sequences of bits, such as whether the digital object in one's possession is the same as that which some entity published under a specific identifier at a specific point in time… However, this is a simplistic notion. There are additional factors to consider.” *!+ Clifford Lynch, “Authenticity and Integrity in the Digital Environment: An Exploratory Analysis of the Central Role of Trust,” http://www.clir.org/pubs/reports/pub92/lynch.html
  • 57. “Canonical” Data? • In the case of paradigm-shifting scientific discoveries – all supporting evidence must (and will) be held to an exacting, rigorously precise standard • This is also true of scientific assertions that have major economic impacts – for example, climate change…
  • 58. “…the “validation” of any scientific hypotheses rests upon the sum integrity of all original data and of all sequences of data transformation to which original data have been subject. “ – Tom Moritz “The Burden of Proof” GRDI2020 Position Paper October 23-24, 2010
  • 59. “Unstructured (or Weakly Structured) Data” ???
  • 60. DARWIN http://darwin- online.org.uk/converted/published/1975_NaturalSelection_F15 83/1975_NaturalSelection_F1583_fig03.jpg http://www.nyu.edu/projects/materialworld/images/1_ Darwin%20Tree%20B%2036.jpg
  • 61. FIELD NOTES FROM THE AMERICAN MUSEM CONGO EXPEDITION 1909-1915 http://diglib1.amnh.org/cgi-bin/database/index.cgi
  • 62. Rheinardia ocellata, the Crested Argus. Photographed at night by an automatic camera-trap in the Ngoc Linh foothills (Quang Nam Province). Courtesy AMNH Center for Biodiversity and Conservation
  • 63. Kirtland’s Warbler / Abaco Island, The Bahamas
  • 64.
  • 65. “NATIVE” METADATA DEAD HARBOR SEAL and 5 CALIFORNIA CONDORS
  • 66. Field sketch by Professor OT Hayward, Baylor University “Guadalupe Trip” Friday November 13, 1981
  • 67. Treating unstructured data? • Careful analysis to detect elements of emergent structure • Systematic use of inference and recursion to attain optimal efficiencies • Assumption that description will be an additive process not a single event
  • 68. Thanks! Tom Moritz tom.moritz@gmail.com