SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Metadata, Extrametadata & Crowdknowing
      Fostering 'Big Open Data' in government
            through Open Collaboration
             Ontolog - “Big Open Data” session 2
                        May 17, 2012




          Joel Natividad, co-founder
                     @jqnatividad
                                                   1
CROWDKNOWING




                     Human-powered,
                  Machine-accelerated,
        Collective Knowledge Systems
                                   2
0. Huge Open Data
1. Extract Metadata

2. Derive ExtraMetadata
  (Semantics + Statistics + Algorithm + Crowd)


3. Do Federated Queries on both the
   Metadata AND the Data



Crowdknowing
                                                 3
Crowdknowing
     Human-powered, Machine-accelerated,
        Collective Knowledge Systems




                                   Ontology, Inferencing, Semantic
   Curation, Comments,
                                 Mapping, Query Federation, Statistics,
  Feedback, Bug Reports,
                                   Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
                                  Analysis & Forecasting, Automated
    Subscribes, Tagging,
                                     linking, Feeds, Notifications
        etc. etc. etc.
                                             etc. etc. etc.            4
a Semantic Data Dictionary




                             5
Semantic Steroids
• Searchable
  • Faceted Search
  • Drilldown
• Interlinked
• Semantic Browsing
• Queryable
• Query Results Formats
   ~3.5M facts
~950 datasets/views



                   6
NYCFacets Spider
             v0.5
• Crawls NYC Open Data Catalog every
  weekend
• RESTFul API
• Extracts metadata & derive extrametadata
• Pumps the data into NYCFacets
                                             7
Metadata
Top Level Metadata         Detail Metadata

   •   Name/ID                •   Column Names

   •   Category               •   Datatype

   •   Dataset Type           •   Width, etc.

   •   Attribution

   •   Owner ID, etc.



                                                 8
9
ExtraMetadata?
• Derived using Algorithm & the Crowd”
   “Semantics, Statistics,

• “Supercharacterize” by sampling the underlying
  not just the schema, but
                           each dataset
  data as well

• Score each dataset - Pediacities Rank
• Virtuous Feedback Loop around the Data
  micro-conversations/contributions
                                                   10
ExtraMetadata
Top Level                    Detail
ExtraMetadata                ExtraMetadata

  •   Number of Rows           •   Top Values

  •   Pediacities Rank         •   Descriptive statistics
      •   Freshness Score          •   Nulls/Non-nulls
      •   Sparseness Score         •   Smallest Value
      •   Social Score             •   Largest Value
      •   Views Score              •   “Uniqueness”
      •   Download Score
      •   Rating Score
                               •   Simple Visualization


                                                            11
12
13
“Crowd”

Microconversations/contributions
  •   Overall Rating

  •   Comments (comment rating)

  •   Bug Reports (data quality)

  •   Likes/Shares

  •   Downloads


                                   14
Crowdknowing
     Human-powered, Machine-accelerated,
        Collective Knowledge Systems




                                   Ontology, Inferencing, Semantic
   Curation, Comments,
                                 Mapping, Query Federation, Statistics,
  Feedback, Bug Reports,
                                   Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
                                  Analysis & Forecasting, Automated
    Subscribes, Tagging,
                                     linking, Feeds, Notifications
        etc. etc. etc.
                                             etc. etc. etc.          15
• More Datasources!
• Not just Metadata!
• Federated Queries!
• SPARQL endpoint
• Bugzilla Integration
• Collaborative Ontology Modeling
• Feeds
• Microcontributions
• Gamification
• In time for NYCBigApps 4.0
                                    16
We need your help & feedback




        A Smart Data Exchange for All Data NYC

                  Find out more at
          http://nyc.pediacities.com/facets

@jqnatividad @samimirzabaig @pediacities @ontodia
                                                    17
CREDITS

• Flickr User Weston Price, Paleo-Caveman-
  Omnivore-LowCarb-Meat-Diet-Info (http://
  www.flickr.com/photos/paleo-atkins-meat-
  diet-info/with/6718805047/)
• Flickr User Gao Yi (http://www.flickr.com/
  photos/gaoyi/178514677/)


                                              18

Weitere ähnliche Inhalte

Was ist angesagt?

Federated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentFederated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentHelen Mitchell
 
Mendeley Open API
Mendeley Open APIMendeley Open API
Mendeley Open APIBen Dowling
 
Federated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The UglyFederated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The Uglydorishelfer
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Shortslknight
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowCrossref
 
Data quality problem and solution
Data quality problem and solutionData quality problem and solution
Data quality problem and solutionPunk Milton
 
Automatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanAutomatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanJISC CETIS
 
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...Crossref
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 

Was ist angesagt? (14)

Federated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentFederated Search in a Disparate Environment
Federated Search in a Disparate Environment
 
hack4knowledge - Mendeley API
hack4knowledge - Mendeley APIhack4knowledge - Mendeley API
hack4knowledge - Mendeley API
 
Mendeley Open API
Mendeley Open APIMendeley Open API
Mendeley Open API
 
Presentation federated search
Presentation federated searchPresentation federated search
Presentation federated search
 
Federated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The UglyFederated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The Ugly
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Short
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to know
 
Linked library data
Linked library dataLinked library data
Linked library data
 
Data quality problem and solution
Data quality problem and solutionData quality problem and solution
Data quality problem and solution
 
VictorCassen
VictorCassenVictorCassen
VictorCassen
 
Automatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanAutomatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles Duncan
 
Web Information Retrieval and Mining
Web Information Retrieval and MiningWeb Information Retrieval and Mining
Web Information Retrieval and Mining
 
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 

Andere mochten auch

Project VCF learning so far
Project VCF learning so far Project VCF learning so far
Project VCF learning so far Anand Mangalam
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteJoel Natividad
 
The Next Generation of Open Data
The Next Generation of Open DataThe Next Generation of Open Data
The Next Generation of Open DataJoel Natividad
 
The Coming Web of Data
The Coming Web of DataThe Coming Web of Data
The Coming Web of DataJoel Natividad
 
Microsoft word
Microsoft wordMicrosoft word
Microsoft wordJosé Luis
 
Guia de illustrator 23 11-15
Guia de illustrator 23 11-15Guia de illustrator 23 11-15
Guia de illustrator 23 11-15José Luis
 
Smart Cities and Big Open Data
Smart Cities and Big Open DataSmart Cities and Big Open Data
Smart Cities and Big Open DataJoel Natividad
 
NYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon TalkNYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon TalkJoel Natividad
 
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012Joel Natividad
 
Effortless Hr Offering Presentation
Effortless Hr Offering PresentationEffortless Hr Offering Presentation
Effortless Hr Offering PresentationEffortlessHr1
 
clase visual basic
clase visual basicclase visual basic
clase visual basicJosé Luis
 
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCNYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCJoel Natividad
 
Ejercicios practicos de excel ii
Ejercicios practicos de excel iiEjercicios practicos de excel ii
Ejercicios practicos de excel iiJosé Luis
 
Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015Joel Natividad
 
Open source in government
Open source in governmentOpen source in government
Open source in governmentJoel Natividad
 

Andere mochten auch (18)

Project VCF learning so far
Project VCF learning so far Project VCF learning so far
Project VCF learning so far
 
CityMission
CityMissionCityMission
CityMission
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
 
The Next Generation of Open Data
The Next Generation of Open DataThe Next Generation of Open Data
The Next Generation of Open Data
 
The Coming Web of Data
The Coming Web of DataThe Coming Web of Data
The Coming Web of Data
 
Microsoft word
Microsoft wordMicrosoft word
Microsoft word
 
Guia de illustrator 23 11-15
Guia de illustrator 23 11-15Guia de illustrator 23 11-15
Guia de illustrator 23 11-15
 
Smart Cities and Big Open Data
Smart Cities and Big Open DataSmart Cities and Big Open Data
Smart Cities and Big Open Data
 
NYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon TalkNYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon Talk
 
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
 
Effortless Hr Offering Presentation
Effortless Hr Offering PresentationEffortless Hr Offering Presentation
Effortless Hr Offering Presentation
 
clase visual basic
clase visual basicclase visual basic
clase visual basic
 
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCNYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
 
NYC Remapped
NYC RemappedNYC Remapped
NYC Remapped
 
Practica word
Practica wordPractica word
Practica word
 
Ejercicios practicos de excel ii
Ejercicios practicos de excel iiEjercicios practicos de excel ii
Ejercicios practicos de excel ii
 
Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015
 
Open source in government
Open source in governmentOpen source in government
Open source in government
 

Ähnlich wie NYCFacets: Metadata, Extrametadata and Crowdknowing

Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysisikanow
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisOpen Analytics
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs MatterEric Kavanagh
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)Zenodia Charpy
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation InfrastructureMicah Altman
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayAmit Sheth
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...New York University
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationTamikaTannis
 
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)mars197365
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...benaam
 

Ähnlich wie NYCFacets: Metadata, Extrametadata and Crowdknowing (20)

Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Data Mining
Data MiningData Mining
Data Mining
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
Big databigideasit4bc
Big databigideasit4bcBig databigideasit4bc
Big databigideasit4bc
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
SMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data ManagementSMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data Management
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

NYCFacets: Metadata, Extrametadata and Crowdknowing

  • 1. Metadata, Extrametadata & Crowdknowing Fostering 'Big Open Data' in government through Open Collaboration Ontolog - “Big Open Data” session 2 May 17, 2012 Joel Natividad, co-founder @jqnatividad 1
  • 2. CROWDKNOWING Human-powered, Machine-accelerated, Collective Knowledge Systems 2
  • 3. 0. Huge Open Data 1. Extract Metadata 2. Derive ExtraMetadata (Semantics + Statistics + Algorithm + Crowd) 3. Do Federated Queries on both the Metadata AND the Data Crowdknowing 3
  • 4. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, Multivariate Likes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 4
  • 5. a Semantic Data Dictionary 5
  • 6. Semantic Steroids • Searchable • Faceted Search • Drilldown • Interlinked • Semantic Browsing • Queryable • Query Results Formats ~3.5M facts ~950 datasets/views 6
  • 7. NYCFacets Spider v0.5 • Crawls NYC Open Data Catalog every weekend • RESTFul API • Extracts metadata & derive extrametadata • Pumps the data into NYCFacets 7
  • 8. Metadata Top Level Metadata Detail Metadata • Name/ID • Column Names • Category • Datatype • Dataset Type • Width, etc. • Attribution • Owner ID, etc. 8
  • 9. 9
  • 10. ExtraMetadata? • Derived using Algorithm & the Crowd” “Semantics, Statistics, • “Supercharacterize” by sampling the underlying not just the schema, but each dataset data as well • Score each dataset - Pediacities Rank • Virtuous Feedback Loop around the Data micro-conversations/contributions 10
  • 11. ExtraMetadata Top Level Detail ExtraMetadata ExtraMetadata • Number of Rows • Top Values • Pediacities Rank • Descriptive statistics • Freshness Score • Nulls/Non-nulls • Sparseness Score • Smallest Value • Social Score • Largest Value • Views Score • “Uniqueness” • Download Score • Rating Score • Simple Visualization 11
  • 12. 12
  • 13. 13
  • 14. “Crowd” Microconversations/contributions • Overall Rating • Comments (comment rating) • Bug Reports (data quality) • Likes/Shares • Downloads 14
  • 15. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, Multivariate Likes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 15
  • 16. • More Datasources! • Not just Metadata! • Federated Queries! • SPARQL endpoint • Bugzilla Integration • Collaborative Ontology Modeling • Feeds • Microcontributions • Gamification • In time for NYCBigApps 4.0 16
  • 17. We need your help & feedback A Smart Data Exchange for All Data NYC Find out more at http://nyc.pediacities.com/facets @jqnatividad @samimirzabaig @pediacities @ontodia 17
  • 18. CREDITS • Flickr User Weston Price, Paleo-Caveman- Omnivore-LowCarb-Meat-Diet-Info (http:// www.flickr.com/photos/paleo-atkins-meat- diet-info/with/6718805047/) • Flickr User Gao Yi (http://www.flickr.com/ photos/gaoyi/178514677/) 18