Government open data strategies aimed at wider access and re-use by entrepreneurs, publishers and the wider US healthcare delivery industry. Presentation to the OMG Standards Community technical workshop on semantics, held in Reston VA on 20-March 2013. Presentation by Bernadette Hyland, CEO 3 Round Stones, Inc and co-chair W3C Government Linked Data Working Group.
The Power of Linked Data for Government & Healthcare Information Integration
1. The Power of Linked Data for
Government and Healthcare
Information Integration
By Bernadette Hyland
CEO 3 Round Stones, co-chair W3C Govât Linked Data WG
This presentation on http://slideshare.net/3roundstones
OMG Technical Meeting Special Event, Reston VA
20-Mar-2013
Wednesday, March 20, 13 1
2. Agenda
⢠Government data publication on the Web
⢠Update on EPA Linked Data Service
⢠Healthcare Delivery Industry s Appetite
⢠Update on W3C Government Linked Data
Working Group
Wednesday, March 20, 13 2
3. 3 Round Stones produces the leading platform for
the publication of reusable data on the Web. Our
commercially supported Open Source platform is
used by the Fortune 2000 and US Government
agencies to collect, publish and reuse data, both on
the public Internet and behind institutional ďŹrewalls.
Wednesday, March 20, 13 3
4. http://www.manning.com/dwood/
http://3roundstones.com/linking-government-data/
http://3roundstones.com/linking-enterprise-data/
Wednesday, March 20, 13 4
5. US EPA Linked Data
⢠Cloud-based Linked Data provision of 3 core
programs:
⢠2.9M Facilities
⢠100K substances
⢠25 years of toxic pollution reports
⢠FISMA compliant
⢠16 Callimachus templates
⢠OďŹcial launch April 2013
Wednesday, March 20, 13 5
6. US GPO
⢠Cloud-based Linked Data provision of persistent
URLs for US Government documents:
⢠100k+ documents
⢠Used by 1,240 Federal Depository Libraries and
public
⢠In 3rd year of operation
⢠Deemed an Essential service supporting US
Congress
Wednesday, March 20, 13 6
11. Growing chorus ...
âWeâre moving from managing
documents to managing discrete pieces of
open data and content which can be
tagged, shared, secured, mashed up and
presented in the way that is most useful
for the consumer of that information.â
-- Report on Digital Government: Building a 21st Century Platform to
Better Serve the American People
Wednesday, March 20, 13 11
16. Open data + open standards +
open platforms
Highly scalable computing on the Cloud
Open Web Standards
5 Star Data (Linked Data), whenever possible
Leverage Open Source tools where practical
Wednesday, March 20, 13 16
17. Use a non-proprietary format
⢠Open Web data exchange formats
⢠RDF instead of CSV
⢠BeneďŹts
⢠Accessibility, Interoperability & Re-use
⢠Reduces the risks of
⢠âSuper modelâ data warehouse approach
⢠Budget & schedule over runs
⢠ConďŹdential info leakage
Wednesday, March 20, 13 17
19. Universal IdentiďŹers
⢠Itâs the foundation of the
Web
⢠Others can reference things
⢠Two references with the
same URI are the same
thing
⢠Quick, easy and scaleable
⢠People keep coming back
for more!!
Wednesday, March 20, 13 19
24. A Path to Success
⢠Start with the basics
⢠Well curated datasets with relevant data
⢠Integrate related datasets (e.g., EPA chemical
substances, toxic releases & facilities)
⢠Reach out to developers early
⢠Emphasize the internal agency beneďŹt
⢠Address data quality ...
⢠Multiple approaches including crowed sourcing
Wednesday, March 20, 13 23
25. Social responsibility of
government publishers
⢠Must specify a license for use
⢠Publish frequency of data updates
⢠Ensure data is accurate as possible
⢠Recognize responsibility to maintain data
⢠Document & follow a persistence strategy
⢠Respond to reports of problematic data
Wednesday, March 20, 13 24
26. Callimachus
http://callimachusproject.org
http://3roundstones.com
Wednesday, March 20, 13 25
27. CONTENT LINKED DATA
MANAGEMENT MANAGEMENT
SYSTEM SYSTEM
DATA
TEXT
UNSTRUCTURED
Callimachus
STRUCTURED
DATA
TEXT
Wednesday, March 20, 13 26
43. Potential Audience
â
⢠Middle school student doing a science project
â
⢠Concerned citizen worried about local pollution
âEnvironmental Science PhD from EPA
â˘
â
⢠Doctor from NIH writing a research paper
Wednesday, March 20, 13 42
44. Active PURLs for Clinical Study Aggregation
David Wood1 and Tom Plasterer2
1 david@3roundstones.com, 2Tom.Plasterer@astrazeneca.com
The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources.
The solution: Gather, convert, aggregate and format for display
3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the Callimachus
Project, an Open Source management system for Linked Data.
Persistent URLs, or PURLs, were used to provide globally unique and resolvable identiďŹers for each clinical study. The PURL concept was extended to enable
PURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs.
Information sources relevant to clinical studies were identiďŹed, regardless of whether their location was internal or external to the pharmaceutical company's
network. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source is
dynamically transformed into Resource Description Framework (RDF) formats and all sources' results then merged into a single, temporary graph of RDF data.
Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readable
versions of the data are also available.
How semantic technologies help
Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it.
Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributed
enterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowing
researchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base.
User experience Challenges
HTTP-accessible endpoints capable of returning XML or textual content Distributed queries have many known
1 Users resolve a URL that limitations, such as the introduction of
provides a unique identiďŹer for multiple single points of failure in any
a clinical study, drug, chemical given PURL resolution. HTTP timeouts,
or other concept managed by auth/auth errors or other network failures
this system. The user may can slow or stop a pipeline from returning
be presented with the URL on correctly.
HTML pages, search it via full- Similarly, distributed queries can result
text techniques or discover it in variant query-time performance due to
via semantic search. complex network and endpoint perform-
Multiple targets queried
independently ance variances.
Convert XML or textual results to
2 Users are presented with a RDF Proactive caching and cache manage-
dynamically generated Web meant strategies can improve runtime
page representing aggregated 1 performance and protect end users from
clinical study information. Users User resolves a
single URI to an Render RDF to HTML via template
the limitations inherent in a distributed
are isolated from the complex Active PURL query architecture. Caching of
and distributed information intermediate results from endpoints has
environment. not yet been implemented.
References Next steps
Wednesday, MarchProject,
1. Callimachus 20, 13 We intend to continue to address 43
48. http://slideshare.com/3roundstones
Twitter: @BernHyland
Email. bhyland@3roundstones.com
Thank you for participating!!
Wednesday, March 20, 13 47
49. Credits
Gartner: âInnovation Insight: Linked Data Drives Innovation Through Information-
David Newman
Sharing Network Effectsâ Published: 15 December 2011
Linking Government Data, Springer (2011)
David Wood, ed.
http://3roundstones.com/linking-government-data/
Digital Government Strategy: Building a 21st Century Platform to Better Serve the
American People,
US Executive Branch
http://www.whitehouse.gov/sites/default/ďŹles/omb/egov/digital-government/digital-
government.html
W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook
All other photos and images Š 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa license
Wednesday, March 20, 13 48
50. This work is Copyright Š 2011-2012 3 Round Stones Inc.
It is licensed under the Creative Commons Attribution 3.0 Unported License
Full details at: http://creativecommons.org/licenses/by/3.0/
You are free:
to Share â to copy, distribute and transmit the work
to Remix â to adapt the work
Under the following conditions:
Attribution. You must attribute the work in the manner specified by the
author or licensor (but not in any way that suggests that they endorse
you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may
distribute the resulting work only under the same or similar license to this
one.
Wednesday, March 20, 13 49