Children and Trauma in the International World (UWest Psych 490 November 7, 2...
University of California, Berkeley: iSchool Nov, 2009
1. Hinges and Loops? -- Data as Evidence
I-School UC, Berkeley
November 13, 2009
“Vertical section drawing of Cavendish's torsion balance instrument including the building in which it was housed.” http://en.wikipedia.org/wiki/Cavendish_experiment
2. “Othello:
‘Villain: be sure thou prove my love
a whore; Be sure of it; give
me the ocular proof;
Or by the worth of man’s eternal soul,
Thou hadst been better
born a dog Than
answer my naked wrath!
Iago: ‘Is’t come to this?’
Othello:
‘Make me to see‘t ; or at the least so
prove it, That the probation bear
no hinge nor loop To hang
doubt on; or woe upon thy life!’ “
The Tragedy of Othello: The Moor of Venice (Act 3 Scene 3)
3. “So the universe has always appeared to the natural mind
as a kind of enigma, of which the key must be sought in the shape of
some illuminating or power-
bringing word or name. That word
names the universe's principle, and to possess it is, after a fashion,
to possess the universe itself 'God,' 'Matter,' 'Reason,’ 'the
Absolute,’ ‘Energy,’
are so many solving names.
You can rest when you have them. You are at the end of your
metaphysical quest.”
William James. "What Pragmatism Means". Lecture 2 in Pragmatism: A new name for some old ways of
thinking. New York: Longman Green and Co (1922): 52-52.
http://www.archive.org/stream/pragmatismnewnam00jame
5. Clear definitions are good (!)
We should not reflexively rely on metaphysical
“solving” / “power-bringing” words…
ADD to James’s list?:
“Knowledge”
“Information”
“Data” ???
8. Usage
Data: The word data is the Latin plural of datum, neuter past participle
of dare, "to give", hence "something given".
“ Data leads a life of its own quite independent of datum, of which it
was originally the plural. It occurs in two constructions: as a plural
noun (like earnings), taking a plural verb and plural modifiers (as
these, many, a few) but not cardinal numbers, and serving as a
referent for plural pronouns; and as an abstract mass noun (like
information), taking a singular verb and singular modifiers (as this,
much, little), and being referred to by a singular pronoun. Both
constructions are standard. The plural construction is more common
in print, perhaps because the house style of some publishers
mandates it.”
The Merriam-Webster Online Dictionary
http://www.merriam-webster.com/dictionary/data
9. “Data” ? [technological]
“…’data’ are defined as any information that can be stored in
digital form and accessed electronically, including, but not
limited to, numeric data, text, publications, sensor streams,
video, audio, algorithms, software, models and simulations,
images, etc.” -- Program Solicitation 07-601
“Sustainable Digital Data Preservation and Access Network Partners (DataNet)”
Taken in this broadest possible sense, “data” are thus simply
electronic coded forms of information. And virtually anything
can be represented as “data” so long as it is electronically
machine-readable.
10. “The digital universe in 2007 — at 2.25 x 1021bits (281
exabytes or 281 billion gigabytes) — was 10% bigger than we
thought. The resizing comes as a result of faster growth in
cameras, digital TV shipments, and better understanding of
information replication.
“By 2011, the digital universe will be 10 times the size it was in
2006.
“As forecast, the amount of information created, captured, or
replicated exceeded available storage for the first time in
2007. Not all information created and transmitted gets
stored, but by 2011, almost half of the digital universe will not
have a permanent home.
“Fast-growing corners of the digital universe include those
related to digital TV, surveillance cameras, Internet access in
emerging countries, sensor-based applications, datacenters
supporting “cloud computing,” and social networks.
The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth through 2011 -- Executive
Summary. IDC Information and Data, March, 2008
http://www.emc.com/collateral/analyst-reports/diverse-exploding-idc-exec-summary.pdf
11. “The diversity of the digital universe can be seen in
the variability of file sizes, from 6 gigabyte
movies on DVD to 128-bit signals from RFID tags.
Because of the growth of VoIP, sensors, and RFID,
the number of electronic information
“containers” — files, images, packets, tag
contents — is growing 50% faster than the
number of gigabytes. The information created in
2011 will be contained in more than 20
quadrillion — 20 million billion — of such
containers, a tremendous management
challenge for both businesses and consumers.
alone. “
The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information
Growth through 2011 -- Executive Summary. IDC Information and Data, March, 2008
http://www.emc.com/collateral/analyst-reports/diverse-exploding-idc-exec-summary.pdf
12. “Data” [epistemic]
“Measurements, observations or descriptions of
a referent -- such as an individual, an event, a
specimen in a collection or an
excavated/surveyed object -- created or
collected through human interpretation
(whether directly “by hand” or through the
use of technologies)”
-- AnthroDPA Working Group on Metadata (May, 2009)
13. “The General Definition of Information (GDI)”
σ is an instance of information, understood as
semantic content, if and only if:
• (GDI.1) σ consists of one or more data;
• (GDI.2) the data in σ are well-formed;
• (GDI.3) the well-formed data in σ are meaningful.
Luciano Floridi <luciano.floridi@philosophy.ox.ac.uk> “Semantic Conceptions of Information”
(First published Wed Oct 5, 2005) Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/information-semantic/ [visited 11/12/09]
14. “…with the corollary assumptions that they are
objective -- that is, not conditioned by
subjective perspectives
and
invariant – that is, true under all circumstances.”
-- Draft GBIF DPFTG Report, 2009
SEE: R. Nozick, Invariances: The Structure of the Objective World, Harvard
University Press, Cambridge, 2001. AND L. Daston and P. Galison, Objectivity,
Zone Books, NY, 2007.
15. The Diaphoric Definition of Data (DDD):
“According to GDI, information cannot be dataless but, in the simplest case, it can consist of a single
datum. Now a datum is reducible to just a lack of uniformity (diaphora is the Greek word for
“difference”), so a general definition of a datum is:
The Diaphoric Definition of Data (DDD): A datum is a putative fact regarding some difference or
lack of uniformity within some context.
“Depending on philosophical inclinations, DDD can be applied at three levels:
1. data as diaphora de re, that is, as lacks of uniformity in the real world out there. There is no
specific name for such “data in the wild”. A possible suggestion is to refer to them as dedomena
(“data” in Greek; note that our word “data” comes from the Latin translation of a work by
Euclid entitled Dedomena). Dedomena are not to be confused with environmental data (see
section 1.7.1). They are pure data or proto-epistemic data, that is, data before they are
epistemically interpreted. As “fractures in the fabric of being” they can only be posited as an
external anchor of our information, for dedomena are never accessed or elaborated
independently of a level of abstraction (more on this in section 3.2.2). They can be
reconstructed as ontological requirements, like Kant's noumena or Locke's substance: they are
not epistemically experienced but their presence is empirically inferred from (and required by)
experience. Of course, no example can be provided, but dedomena are whatever lack of
uniformity in the world is the source of (what looks to information systems like us as) as data,
e.g., a red light against a dark background. Note that the point here is not to argue for the
existence of such pure data in the wild, but to provide a distinction that (in section 1.6) will help
to clarify why some philosophers have been able to accept the thesis that there can be no
information without data representation while rejecting the thesis that information requires
physical implementation; …”
16. The Diaphoric Definition of Data (DDD): (cont.)
“2. data as diaphora de signo, that is, lacks of uniformity between (the perception of) at least two
physical states, such as a higher or lower charge in a battery, a variable electrical signal in a
telephone conversation, or the dot and the line in the Morse alphabet; and
3. data as diaphora de dicto, that is, lacks of uniformity between two symbols, for example the
letters A and B in the Latin alphabet.”
Luciano Floridi <luciano.floridi@philosophy.ox.ac.uk> “Semantic Conceptions of Information”
(First published Wed Oct 5, 2005) Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/information-semantic/ [visited 11/12/09]
17. “Evidence”?
“Data having probative value and authority”?
i.e. well supported by scientific logic and considered
technically valid
19. Poder Politico y Conocimiento
Alto
???
Políticos
Responsabilidad y Poder
Administradores
o Gestores
Analistas-
Técnicos
Científicos
Alto
Bajo
Conocimiento (en términos científicos-occidentales)
(Sutton, 1999)
From: Organizaciones que aprenden, paises que aprenden: lecciones y AP en Costa Rica by Andrea
Ballestero Directora ELAP
20. Wednesday, January 21st, 2009 at 12:00 am
MEMORANDUM FOR THE HEADS OF EXECUTIVE DEPARTMENTS AND
AGENCIES
SUBJECT: Freedom of Information Act
A democracy requires accountability, and accountability requires transparency. As Justice Louis
Brandeis wrote, "sunlight is said to be the best of disinfectants." In our democracy, the Freedom
of Information Act (FOIA), which encourages accountability through transparency, is the most
prominent expression of a profound national commitment to ensuring an open Government. At the
heart of that commitment is the idea that accountability is in the interest of the Government and
the citizenry alike.
The Freedom of Information Act should be administered with a clear presumption: In the face of
doubt, openness prevails. The Government should not keep information confidential merely
because public officials might be embarrassed by disclosure, because errors and failures might
be revealed, or because of speculative or abstract fears. Nondisclosure should never be based on
an effort to protect the personal interests of Government officials at the expense of those they are
supposed to serve. In responding to requests under the FOIA, executive branch agencies
(agencies) should act promptly and in a spirit of cooperation, recognizing that such agencies are
servants of the public.
All agencies should adopt a presumption in favor of disclosure, in order to renew their
commitment to the principles embodied in FOIA, and to usher in a new era of open Government.
The presumption of disclosure should be applied to all decisions involving FOIA…[clip]
Barack Obama
http://www.whitehouse.gov/the_press_office/Freedom_of_Information_Act/
21. “Declaration of Scientific Principles”
in “The Commonwealth of Science”
“7. The pursuit of scientific inquiry demands
complete intellectual freedom. And
unrestricted international exchange of
knowledge…“
from “The Commonwealth of Science ” Nature No.3753
October 4, 1941.
22. August 4, 2009: the White House issued a
memorandum stating unequivocally “Sound
science should inform policy decisions”
“Science and Technology Priorities for the FY2011 Budget,” PR Orszag and
JP Holdren August 4, 2009, Memorandum for the Heads of Executive
Departments and Agencies, M-09-27.
http://www.whitehouse.gov/omb/assets/memoranda_fy2009/m09-27.pdf
23. The $3.6 billion Large Hadron Collider
(LHC) will sample and record the
results of up to 600 million proton
collisions per second, producing
roughly 15 petabytes (15 million
gigabytes) of data annually in search of
new fundamental particles. To allow
thousands of scientists from around the
globe to collaborate on the analysis of
these data over the next 15 years (the
estimated lifetime of the LHC), tens of
thousands of computers located around
the world are being harnessed in a
distributed computing network called
the Grid. Within the Grid, described as
the most powerful supercomputer
system in the world, the avalanche of
data will be analyzed, shared, re-
purposed and combined in innovative
new ways designed to reveal the
secrets of the fundamental properties
of matter.
LHC source:
http://public.web.cern.ch/public/en/LHC/L
Source:
http://public.web.cern.ch/Public/en/LHC/L
24. “The Legacy of GenBank: The
DNA Sequence Database That
Set a Precedent,” 1663: Los
Alamos Science and
Technology Magazine August
2008
http://www.lanl.gov/news/1663/imag
25. “The Legacy of GenBank: The DNA Sequence Database That Set a Precedent,” 1663: Los
Alamos Science and Technology Magazine August 2008
http://www.lanl.gov/news/1663/images/aug08/22lg.jpg
26. The (US) NCAR
Research Data Archive (RDA)
“The NCAR Research Data Archive (RDA) is a comparatively small
(currently 246 TB, less than 5% of the MSS [Mass Storage System] total
size), but very important, part of the MSS stored data. The RDA has
been curated by the staff in the Computational and Information
Systems Laboratory for over 40 years, [emphasis added] and as such
contains reference datasets used by large numbers of scientists.
The RDA contents are long-term atmospheric (surface and upper
air) and oceanographic observations, grid analyses of observational
datasets, operational weather prediction model output, reanalyses,
satellite derived datasets, and ancillary datasets, such as
topography/bathymetry, vegetation, and land use. The RDA is not
a static collection; it is now over 580 datasets with about 100
routinely updated and 10-20 new ones added each year. “
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008, page 5.
www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
27. NCAR Research Data Archive (RDA)
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008 , page 7.
www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
29. Facebook?
Facebook, for example, uses more than 1
petabyte of storage space to manage its users’
40 billion photos. (A petabyte is about 1,000
times as large as a terabyte, and could store
about 500 billion pages of text.)
Training to Climb an Everest of Digital Data
By ASHLEE VANCE NYT Published: October 11,
2009
http://www.nytimes.com/2009/10/12/technology/12data.html?_r=1
30. “Vertical section drawing of Cavendish's torsion balance instrument including the building in which it was housed.” http://en.wikipedia.org/wiki/Cavendish_experiment
32. “Experiments to determine the density of the earth,” by Henry Cavendish, ESQ., F.R.S. AND A.S. Read
June 21, 1798 (From the Philosophical Transactions of the Royal Society of London for the year
1798, Part II. , pp. 469-526)
From: http://www.archive.org/details/lawsofgravitatio00mackrich
33. 2-d_soil_temps.csv
surface, and sub-surface soil temperatures (at 2cm and 8cm depths) measured at one location for a few days in order to
calibrate a model of temperature propagation. Surface temperature was measured with an infrared thermometer,
subsurface temperatures with a thermocouple.
----------------------------
5-minute_light_data_for_4_continuous_days_plus_reference.xls
PPF (photosynthetic photon flux = photosynthetically active radiation 400-700nm) measured with an array of photodiodes
calibrated to a Licor sensor, along a linear transect for a few days. used to get an idea of how much light plants along
the transect are receiving.
----------------------------
DATA CO2_of_air_at_different_heights_July_9.xls
concentration of CO2 in the air during the evening for one day, measured with a Licor infrared gas analyzer and a series of
relays and tubes with a pump. used to examine the gradient of CO2 coming from the soil when the air is still during the
evening.
SETS
----------------------------
Fern_light_response.xls
Light response curves for bracken ferns, measured with a Licor photosynthesis system. Fronds are exposed to different light
levels and their instantaneous photosynthesis and conductance is measured. used in conjunction with the induction
data (below) for physiological characterization of the ferns.
----------------------------
La_Selva_species_photosyntheis_table.xls
incomplete data set on instantaneous photosynthesis rates for various tropical understory and epiphytic species grown in a
shade house in Costa Rica.
----------------------------
some manzanita_sapflow_12-5-07_to_7-7-08.xls
instantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple
branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground
examples
measures of root grown and CO2 production.
----------------------------
moisture_release_curves.xls
with “native
percentage of water content, water potential (in MegaPascals) and temperature of soil samples, measured in the laboratory
for calibration of water content with water potential. soil is from the James Reserve in California.
----------------------------
Photosynthetic_induction.xls
metadata” 2
O
C
.
5
3
v
l
d
n
y
h
p
f
s
r
u
o
c
-
e
m
i
t
a
�
m/2/s and light level is probably 1000 micromoles. used to determine physiological characteristics of bracken ferns.
----------------------------
run_2_24-h_data_for_mesh.xls
measurements of micrometeorological parameters on a moving shuttle, going from a clearing across a forest edge and into the
forest for about 30 meters. Pyronometers facing up and down, pyrgeometer facing up and down, PAR, air temperature,
relative humidity. Also data from a station fixed in the clearing and some derived variables calculated. used for
examining edge effects in forests.
----------------------------
Segment_of_wallflower_compare_colorspaces_blur.xls
pixel counts from images of wallflowers that were segmented into flower/not-flower under different color spaces.
segmentation was made using a probability matrix of hand-segmented images. used to automatically count flowers in
images collected after this training data was collected (and used to determine the best color space for this task).
35. “Jim Gray on eScience: A Transformed Scientific Method” T. Hey, S. Tansley, and K.Tolle (eds)| Microsoft
Research Based on the transcript of a talk given by Jim Gray to the NRC-CSTB1 in Mountain View, CA, on
January 11, 2007
http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_jim_gray_transcript.pdf
36. “Reanalyses” [or Meta-Analyses ]
“Atmospheric reanalyses are a main feature within the RDA and were
intended to be, and have become, a very valuable data resource
for a wide variety of climate and weather studies. By combining
many types of atmospheric observations with advanced data
assimilation and forecast models a “best possible” 3D estimate of
the atmospheric state over extended time periods is achieved.
“Reanalyses are supported by many historical data sources that have
been curated over time. As an illustration the major sources of
atmospheric profile data include wind only soundings beginning in
1920 (Figure 2). These are augmented with soundings of
temperature, humidity, and wind beginning in 1948. “
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
sharing ,” from the 4th International Digital Curation Conference December 2008, page 6.
www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
37. Fundamental Questions:
• Data Specification – scientific logic of data
definition
• Data Creation – specification of methodology
• Data Integrity – preservation -- “chain of
custody” “Chain of custody refers to the chronological
documentation or paper trail, showing the seizure, custody,
control, transfer, analysis, and disposition of evidence,
physical or electronic.”
[ http://en.wikipedia.org/wiki/Chain_of_custody [clipped 11/12/09 10:30pm PST]
• Data transformations
– Logic
– Competence /Technical Performance / Execution
38. “Keeping Raw Data in Context”
“…any initiative to share raw clinical research data must also pay close attention to sharing clear
and complete information about the design of the original studies. Relying on journal articles
for study design information is problematic, for three reasons. First, journal articles often
provide insufficient detail when describing key study design features such as randomization
(1) and intervention details (2). Second, some data sets may come from studies with no
publications [only 21% of oncology trials registered in ClinicalTrials.gov before 2004 and
completed by September 2007 were published (3)]. Finally, investigators cannot reliably
search journal articles for methodological concepts like “double blinding” or “interrupted
time series,” crucial concepts for proper interpretation of the data. A mishmash of non-
standardized databases of raw results and unevenly reported study designs is not a strong
foundation for clinical research data sharing. “
“ We believe that the effective sharing of clinical research data requires the establishment of an
interoperable federated database system that includes both study design and results data. A
key component of this system is a logical model of clinical study characteristics in which all
the data elements are standardized to controlled vocabularies and common ontologies to
facilitate cross-study comparison and synthesis. “
I Sim, et al. “Keeping Raw Data in Context”[letter] Science v 323 6 Feb 2009, p713.
39. “Increasing levels of coordinate digit noise
associated with repeated projection transformations”
Rice, Matt, Michael F. Goodchild, Keith C. Clarke (2005) "Cartographic Data Precision and Information
Content". In Proceedings of Auto-Carto 2005: A Research Symposium. Las Vegas, Nevada, March 18-23,
2005.
40. “It is well known that cartographic coordinates stored in double precision are
far more precisely specified than is merited by their accuracy, even for
highly-accurate global datasets. Far more coordinate digit places are stored
for the sake of avoiding machine error than are needed to define the location
of map objects within the necessary tolerances for both absolute and relative
accuracies.”
“A careful look at the coordinate digits stored as double precision variables
in a GIS yields a variety of interesting patterns that are a result of previous
machine error, rounding error, measurement error, and so forth. Any
slight cartographic alteration (rotation/skewing, clipping/sub-setting,
reprojecting, etc.) can add noise into the coordinate and can be used to
characterize a vector dataset.”
Rice, Matt, Michael F. Goodchild, Keith C. Clarke (2005) "Cartographic Data Precision and Information
Content". In Proceedings of Auto-Carto 2005: A Research Symposium. Las Vegas, Nevada, March 18-23,
2005.
41. GRIDS
Data
International
Centers
Collaborative
Research Effort
Individual
National Disciplinary Initiatives
Libraries
Cooperative Projects
Local /
Individuals
Personal
Archiving
“Small Science” “BIG Science”
43. The “small science,” independent investigator approach traditionally has
characterized a large area of experimental laboratory sciences, such as
chemistry or biomedical research, and field work and studies, such as
biodiversity, ecology, microbiology, soil science, and anthropology. The data
or samples are collected and analyzed independently, and the resulting data
independently
sets from such studies generally are heterogeneous and unstandardized, with
unstandardized
few of the individual data holdings deposited in public data repositories or
openly shared.
The data exist in various twilight states of accessibility, depending on
accessibility
the extent to which they are published, discussed in papers but not revealed, or
just known about because of reputation or ongoing work, but kept under
absolute or relative secrecy. The data are thus disaggregated components of
an incipient network that is only as effective as the individual transactions
that put it together. Openness and sharing are not ignored, but they are not
together
necessarily dominant either. These values must compete with strategic
considerations of self-interest, secrecy, and the logic of mutually beneficial
exchange, particularly in areas of research in which commercial applications
are more readily identifiable.
The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Julie
M. Esanu and Paul F. Uhlir, Eds. Steering Committee on the Role of Scientific and Technical Data and Information in the
Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific
Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 8
44. Maria Sibylla Merian Metamorphosis
insectorum Surinamensium
(Metamorphosis of the Insects of
Surinam) Amsterdam, 1705, figure 46
Hand-colored engraving (123)
http://www.loc.gov/exhibits/dres/dre123.jpg
49. Rheinardia ocellata, the Crested Argus. Photographed at night by an
automatic camera-trap in the Ngoc Linh foothills (Quang Nam Province).
Courtesy AMNH Center for Biodiversity and Conservation
50.
51. By Serge Bloch in NYT: Natalie Anger “Tracking forest creatures on the move.” NYT Feb 2, 2009 SEE:
http://www.nytimes.com/2009/02/03/science/03angier.html?_r=1&scp=1&sq=tracking%20mammals&st=cse
http://www.jamesreserve.edu/webcams.lasso?CameraID=Cam14
53. The “small science,” independent investigator approach traditionally has
characterized a large area of experimental laboratory sciences, such as
chemistry or biomedical research, and field work and studies, such as
biodiversity, ecology, microbiology, soil science, and anthropology. The data
or samples are collected and analyzed independently, and the resulting data
independently
sets from such studies generally are heterogeneous and unstandardized, with
unstandardized
few of the individual data holdings deposited in public data repositories or
openly shared.
The data exist in various twilight states of accessibility, depending on
accessibility
the extent to which they are published, discussed in papers but not revealed, or
just known about because of reputation or ongoing work, but kept under
absolute or relative secrecy. The data are thus disaggregated components of
an incipient network that is only as effective as the individual transactions
that put it together. Openness and sharing are not ignored, but they are not
together
necessarily dominant either. These values must compete with strategic
considerations of self-interest, secrecy, and the logic of mutually beneficial
exchange, particularly in areas of research in which commercial applications
are more readily identifiable.
The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Julie
M. Esanu and Paul F. Uhlir, Eds. Steering Committee on the Role of Scientific and Technical Data and Information in the
Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific
Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 8
54. GRIDS
Data
International
Centers
Collaborative
Research Effort
Individual
National Disciplinary Initiatives
Libraries
Cooperative Projects
Local /
Individuals
Personal
Archiving
“Small Science” “BIG Science”
55.
56. Green, T (2009), “We Need Publishing Standards for
Datasets and Data Tables”, OECD Publishing White Paper,
OECD Publishing. doi: 10.1787/603233448430
http://dx.doi.org/10.1787/603233448430
http://ocde.p4.siteinternet.com/publications/doifiles/publishin
g-standards-data-2009.pdf
57. Green, T (2009), “We Need Publishing Standards for Datasets and Data Tables”, OECD Publishing White Paper,
OECD Publishing. doi: 10.1787/603233448430 http://dx.doi.org/10.1787/603233448430
http://ocde.p4.siteinternet.com/publications/doifiles/publishing-standards-data-2009.pdf
59. US NSF “DataNet” Program
“the full data preservation and access lifecycle”
• “acquisition”
• “documentation”
• “protection”
• “access”
• “analysis and dissemination”
• “migration”
• “disposition”
“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07-
601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & Information
Science & Engineering
61. How do we Incentivize Change ?
• Individuals
• Professions / Disciplines
• Organizations
• Institutions (Universities, Research Institutes,
Museums, Gardens, Herbaria, Aquariums, Zoos)
• “Memory Institutions” (Libraries, Archives)
• Governments
• Funders / Sponsors
• Publishers!
62. Individual’s willingness to share:
the Core functions of Scholarly Communication
• “Registration, which allows claims of precedence for a
scholarly finding.
• “Certification, which establishes the validity of a registered
scholarly claim.
• “Awareness, which allows participants in the scholarly system
to remain aware of new claims and findings.
• “Archiving, which preserves the scholarly record over time.
• “Rewarding, which rewards participants for their
performance in the communication system based on metrics
derived from that system.
Roosendaal, H., Geurts, P in Cooperative Research Information Systems in Physics (Oldenburg, Germany, 1997).
64. • Norms and standards for sharing vary by discipline
• In “big science” (astrophysics / astronomy /
meteorology / oceanography / genomics) sharing is
expected (if not required) and contributions to a
common fund of knowledge are assumed (See also:
GENBANK )
– Standards are relatively clear
– Mechanisms for sharing are well-developed
– Collective / collaborative authorship is commonplace
• In “small science” such norms are weaker
65. Small Science: Data Deposit and Access
• Data are typically held in many formats
• Discovery of data is very weakly supported by
standards-development
• Access to and use of data are highly variable
• [ However progress has been made respecting
museum specimen data in the past 20 years [SEE for
ex. : GBIF and many allied projects] ]
• Some progress has been made respecting
observational and other data
• Ecological and conservation field data remain highly
problematic
66. Some suggestions for action include:
government agencies and private foundations must both set strict
requirements for effective sharing – with serious penalties (such as
disqualification for future research funding) for failures to share;
• peer review processes must include rigorous scrutiny of past histories of
sharing and must require state-of-the-art planning for sharing (not simply
a promise to “put data up on the Web” ];
• negotiations for “overhead” (“indirect costs”) compensation from funders
must include examination of digital infrastructure adequate for sharing
and maintenance of data;
• accreditation bodies for educational institutions and museums must start
to require demonstrated evidence of capacity to support digital access
and maintenance of data;
• professional societies and professional disciplines must begin to require
evidence of effective sharing of data in evaluating credentials for hiring,
promotion and tenure;
71. From: Tom Moritz [mailto:tom.moritz@gmail.com]
Sent: Thursday, November 12, 2009 1:46 AM
To: Donat Agosti
Subject: Snapple Real Fact #134: " An ant can lift 50 times its own weight. ”
Is this true?
Tom
________________________________________________
From: Donat Agosti <agosti@amnh.org>
Date: Wed, Nov 11, 2009 at 8:03 PM
Subject: RE: Snapple Real Fact #134: " An ant can lift 50 times its own weight. "
To: Tom Moritz tom.moritz@gmail.com
People says so [emphasis added] – but we once looked for the evidence, but
could not find a scientific paper confirming this.
D
72. Iobi Ludolfi aliàs Leut-holf dicti
Historia Æthiopica, sive Brevis
& succincta descriptio regni
Habessinorum, quod vulgò
malè Presbyteri Iohannis
vocatur : 2009 Cambridge
University Library
"They [the hippopotami] present the following appearance; four-
footed, with cloven hooves like cattle; blunt-nosed; with a
horse's mane, visible tusks, a horse's tail and voice; big as the
biggest bull. Their hide is so thick that, when it is dried,
spearshafts are made of it.” Herodotus, The Histories (with an English translation by A. D.
Godley). Cambridge. Harvard University Press. 1920. LXXI
http://old.perseus.tufts.edu/cgi-bin/ptext?doc=Perseus%3Aabo%3Atlg%2C0016%2C001&query=2%3A71%3A1
[clipped 11/12/09]
73. a problem with “evidence”…
“…the great trouble with the world was that
which survived was held in hard evidence as
to past events. A false authority clung to what
persisted, as if those artifacts of the past
which had endured had done so by some act
of their own will.”
-- Cormac McCarthy The Crossing
74. “Πάντα ῥ εῖ καὶ οὐ δὲ ν μένει”
Heraclitus: “Everything flows, nothing stands still.”
All data is dynamic
75. From examination of elephants’
skulls the early Greeks deduced
that a species of humanoid
Cyclops existed…
(SEE -- for example -- The
Odyssey and Ulysses encounter
with Polyphemus on the island of
Sicily… )
http://www.amnh.org/exhibitions/mythiccreatures/land/greek.php
76. Another deduction from the evidence of narwhal tusks…
“In the Middle Ages, narwhal tusks were widely thought to be unicorn horns
with magical, curative properties. Indeed, cups made from narwhal tusks
(above) were thought to neutralize poisons and were highly valued. “
http://www.amnh.org/exhibitions/mythiccreatures/land/unicorns.php