A datastore is a central system that offers data in a consistent place and provides important context and metadata about the data. It allows data owners to publish their data through various methods like email submissions, file dropboxes, data proxies, or data replication. The document also discusses challenges around building and maintaining datastores, as well as opportunities for engaging communities around open data.
2. Who am I?
• Lex Slaghuis, CEO @Wikiwise
– Computer Science and consulting
background
@ajslaghu
– Engaged with open data quite the bit
• Wikiwise
– Wiki’s, open content, open data,
open collaboration and networking
3. A datastore is a system that
• Offers data on a central ‘place’
– Historical
– Current (although, technically that is also historical )
– (Near) Realtime
• Offers context
– Descriptions, including updateness of the data
– Contactinformation!!!
• Prevents you from building an accesspoint for each
production site that is opened up
4. A datastore is not:
• A register
– A register only links to datastores or datasets on
websites
– Although great for developers to find data
– Developers probably want as few registers as
possible
• But a datastore should expose data by means
of (metadata) search, an indexable catalog
and unique links for each dataset
5. How to get data into a datastore
• A working proces that allows data‐owners to publish their data by means
of:
1. Sending a e‐mail with a datafile? Yes, please!
2. Having a file dropbox, so computers (servers) can send datafiles
automatically
• Requires a tunnel from a production site into the datastore
3. A data webproxy. This means a datastore can handle a request and
forward it to a server who knows the answer and then sends it back
• Requires a secure tunnel from a datastore into a production site
• Only option with realtime or Bigdata like geo‐info
4. A data replication site. A datastore synchronizes (part of a )
database and offers it indepedently
• Requires a secure tunnel, either direction can work
• Bigdata and realtime data is though to replicate (duh!)
6. How to get a datastore?
• Buy / hire / build / outsource the datastore… I
don’t care.
• Think about trust relations
– If external parties tap into your production
systems, better trust them
– Your datastore should also be trusted so make
sure it is recognizable as yours (logo’s and a
weblocation like data.yourgov.gov)
7. Anything else?
• Challenges and opportunities ahead:
– Big goverments are building datawarehouses, which means just
opening up 1 system
– Small governments also need datastores, but they do cost money
– Semi public insitutions such as hospitals are not allowed in the formal
government data registers
– Commercial and community data registers are abound, see:
http://thedatahub.org/ and http://opendatanederland.org/
– Engaging a community around data results in more use of data and
less repeated Q&A with governments
• But difficult. Community engagement is a lot of work.