Nell’iperspazio con Rocket: il Framework Web di Rust!
NYCFacets: Metadata, Extrametadata and Crowdknowing
1. Metadata, Extrametadata & Crowdknowing
Fostering 'Big Open Data' in government
through Open Collaboration
Ontolog - “Big Open Data” session 2
May 17, 2012
Joel Natividad, co-founder
@jqnatividad
1
2. CROWDKNOWING
Human-powered,
Machine-accelerated,
Collective Knowledge Systems
2
3. 0. Huge Open Data
1. Extract Metadata
2. Derive ExtraMetadata
(Semantics + Statistics + Algorithm + Crowd)
3. Do Federated Queries on both the
Metadata AND the Data
Crowdknowing
3
4. Crowdknowing
Human-powered, Machine-accelerated,
Collective Knowledge Systems
Ontology, Inferencing, Semantic
Curation, Comments,
Mapping, Query Federation, Statistics,
Feedback, Bug Reports,
Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
Analysis & Forecasting, Automated
Subscribes, Tagging,
linking, Feeds, Notifications
etc. etc. etc.
etc. etc. etc. 4
7. NYCFacets Spider
v0.5
• Crawls NYC Open Data Catalog every
weekend
• RESTFul API
• Extracts metadata & derive extrametadata
• Pumps the data into NYCFacets
7
8. Metadata
Top Level Metadata Detail Metadata
• Name/ID • Column Names
• Category • Datatype
• Dataset Type • Width, etc.
• Attribution
• Owner ID, etc.
8
10. ExtraMetadata?
• Derived using Algorithm & the Crowd”
“Semantics, Statistics,
• “Supercharacterize” by sampling the underlying
not just the schema, but
each dataset
data as well
• Score each dataset - Pediacities Rank
• Virtuous Feedback Loop around the Data
micro-conversations/contributions
10
11. ExtraMetadata
Top Level Detail
ExtraMetadata ExtraMetadata
• Number of Rows • Top Values
• Pediacities Rank • Descriptive statistics
• Freshness Score • Nulls/Non-nulls
• Sparseness Score • Smallest Value
• Social Score • Largest Value
• Views Score • “Uniqueness”
• Download Score
• Rating Score
• Simple Visualization
11
15. Crowdknowing
Human-powered, Machine-accelerated,
Collective Knowledge Systems
Ontology, Inferencing, Semantic
Curation, Comments,
Mapping, Query Federation, Statistics,
Feedback, Bug Reports,
Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
Analysis & Forecasting, Automated
Subscribes, Tagging,
linking, Feeds, Notifications
etc. etc. etc.
etc. etc. etc. 15
16. • More Datasources!
• Not just Metadata!
• Federated Queries!
• SPARQL endpoint
• Bugzilla Integration
• Collaborative Ontology Modeling
• Feeds
• Microcontributions
• Gamification
• In time for NYCBigApps 4.0
16
17. We need your help & feedback
A Smart Data Exchange for All Data NYC
Find out more at
http://nyc.pediacities.com/facets
@jqnatividad @samimirzabaig @pediacities @ontodia
17
18. CREDITS
• Flickr User Weston Price, Paleo-Caveman-
Omnivore-LowCarb-Meat-Diet-Info (http://
www.flickr.com/photos/paleo-atkins-meat-
diet-info/with/6718805047/)
• Flickr User Gao Yi (http://www.flickr.com/
photos/gaoyi/178514677/)
18