SlideShare ist ein Scribd-Unternehmen logo
1 von 18
News in JSON




Stuart Myles * Associated Press * 11th March 2013
News In JSON Activity
 News in JSON Activity




   http://www.flickr.com/photos/jondresner/5789254800/
      http://www.flickr.com/photos/jondresner/5789254800/
News In JSON Activity
              News in JSON Activity


 Determine the priority properties to be expressed
By examining G2, rNews, NITF and existing implementations

   Created 2-3 candidate JSON representations
                      Places
                     Subjects
                    Text markup

                 Wrote experimental code
          to try out the candidate structures
                 http://www.flickr.com/photos/jondresner/5789254800/
                    http://www.flickr.com/photos/jondresner/5789254800/
Remind Me: What is JSON?
             JSON = JavaScript Object Notation http://json.org/

Name / value pairs: a fieldname in quotes, a colon, a value in quotes
                     "givenname" : "Stuart"

  Objects: written inside curly braces, may contain multiple NVPs
   {"givenname" : "Stuart", "familyname" : "Myles"}

 Arrays: Written inside square braces, may contain multiple objects
  {"iptcdelegates":      [
      {
       "givenname":      "Dave", "familyname": "Compton"},
      {
       "givenname":      "Stuart", "familyname": "Myles"},
      {
       "givenname":      "Robert", "familyname": "Schmidt-Nia"
      }
  ]}

                       © 2013 IPTC (www.iptc.org)   All rights reserved   4
Things We Considered But
                    Decided Against
• Translating from an existing XML standard into JSON
   – Not all IPTC standards are XML
   – Not all publishers use the same IPTC standards
   – Not all publishers use any IPTC standards


• “Mechanically” translating from XML into JSON
   –   There are many libraries that can do this
   –   Different choices for how to represent certain XML features
   –   So each technique results in a slightly different JSON
   –   We felt that more a more “natural” JSON would be more valuable




                        © 2010 IPTC (www.iptc.org)   All rights reserved   5
News in JSON Properties

• We reviewed existing sets of news properties including
   –   NewsML-G2
   –   NewsML 1
   –   rNews
   –   NITF
• We selected a set of priority properties to represent
• https://docs.google.com/spreadsheet/ccc?key=0AvnUbL
  xJqDwBdGxOQXdYeTRPM2k3WFhiNGRuMWR2M1E

• We could add more later...
• ...but we wanted to start somewhere

                    © 2010 IPTC (www.iptc.org)   All rights reserved   6
Let’s draft a News in JSON white paper!



           © 2010 IPTC (www.iptc.org)   All rights reserved   7
Representing Places in JSON
Geographic metadata such as
• Display Name: Brooklyn (NYC)
• ID: http://id.example.org/5110302
• Centroid: 40.6501038, -73.9495823
• Bounding Box: 40.453216826620995, -73.68930777156369,
                 40.846990773379, 1.0, -74.20985682843632
• Hierarchy: Kings County > New York > NY > USA
• Type: Second order administrative division


Several non publishing JSON implementations, such as
• GeoJSON http://www.geojson.org/
• Geonames API http://www.geonames.org/export/JSON-
  webservices.html
                      © 2010 IPTC (www.iptc.org)   All rights reserved   8
Two Ways to Represent Places
Approach #1: The geonames way
http://api.geonames.org/getJSON?geonameId=5110302&username=kansandh
aus&format=json


Approach #2: With a bit more structure
https://gist.github.com/jays0n/5032774


We wrote some code to test them out
The app selects a few fields and prints out the objects created
Note the different nesting that these caused when looking at the two
BO classes.

http://tech.groups.yahoo.com/group/iptc-news-in-json-dev/files/jayson-json-geo-
tests.tar.gz

                           © 2010 IPTC (www.iptc.org)   All rights reserved   9
What We Learnt

• Simpler JSON results in simpler code
   – Avoid arrays if they will normally only contain a single object
• Ensure property labels start with lower case letters
   – Some parsers (e.g. Jackson) assume this convention


• The main conclusion: there wasn’t much to choose
  between the two styles in practice

• Proposal: adopt the slightly more structured approach




                        © 2010 IPTC (www.iptc.org)   All rights reserved   10
Some Useful Tools

• http://jsonlint.com/
   – helpful for finding syntax errors


• http://jackson.codehaus.org/
   – nice JSON support in JAVA


• http://goessner.net/articles/JsonPath/
   – like XPATH for JSON




                        © 2010 IPTC (www.iptc.org)   All rights reserved   11
Subjects in JSON

• Subjects
   – People, companies, organizations, abstract concepts
   – Keywords, categories


• A single structure for all
   – Like NITF http://www.iptc.org/std/NITF/3.6/documentation/nitf-3-
     6.html#Link19
   – For example https://gist.github.com/kansandhaus/5049159
• Each subject type has its own structure
   – For example https://gist.github.com/anonymous/5049220




                       © 2010 IPTC (www.iptc.org)   All rights reserved   12
Subjects in JSON
• A single structure leaves no room for error in selecting
  the “bucket” to use to represent a given concept
• However, the code to access these anonymous buckets
  is much more complex
• To select documents which are marked as having a
  location=San Francisco in MongoDB
   – mnc.queryDB("{"keywords" : {"$elemMatch" : {"type" :
     "location", "name" : "San Francisco (CA)"}}}");
   – mnc.queryDB("{ locations.name: 'San Francisco (SF)' }");


• Proposal: Adopt the “specific buckets” structure



                       © 2010 IPTC (www.iptc.org)   All rights reserved   13
Text Markup in JSON

• How to represent richly marked up text in JSON?
• A sweet spot for document-oriented XML
• Could be HTML, XHTML, NITF ...

• We experiment with two existing text markup examples
• NITF: http://www.iptc.org/std/NITF/3.2/examples/nitf-
  fishing.xml
• HTML: http://dev.iptc.org/Implementation-Guide-HTML-
  5-Microdata-in-IPTC-namespace



                   © 2010 IPTC (www.iptc.org)   All rights reserved   14
Text Markup Options in JSON

• Plain text, stripped of markup
• Preserved but escaped markup
   – HTML: https://gist.github.com/anonymous/4996653
   – XML: https://gist.github.com/anonymous/4996676
   – See http://stackoverflow.com/questions/993970/what-do-i-need-
     to-escape-in-my-html-json-response for a discussion of how to
     escape markup in JSON
• Mechanically create JSON structures to mimic the
  original markup
   – We used JSONML as an example http://www.jsonml.org/
   – NITF : https://gist.github.com/anonymous/4996697
   – HTML: https://gist.github.com/anonymous/4996720


                      © 2010 IPTC (www.iptc.org)   All rights reserved   15
What We Learnt

• Both plain text (no markup) and escaped markup have
  clear use cases
   – Plain text can be useful for search, for example
   – Escaped markup works well for direct display on a webpage


• Markup translated (like JSONML) works OK if you have
  a library to implement the rules
   – But what is the added benefit beyond just working directly with
     XML or HTML?
   – Who will write and maintain the libraries for ever language?


• Proposal: Let providers use both plain and escaped text

                       © 2010 IPTC (www.iptc.org)   All rights reserved   16
News in JSON Road Map

• Evaluate more structures
   – Such as links to binaries
• Write a white paper on our initial recommendations
   – Publish and seek out feedback within IPTC and beyond


• Create a News in JSON 1.0 recommendation
   – Present it for a vote at the Paris meeting
   – Consider an experimental phase


• You can help by joining the News in JSON group
   – iptc-news-in-json-dev@yahoogroups.com


                        © 2010 IPTC (www.iptc.org)   All rights reserved   17
Date and Place of Next Meeting
 Paris 24 - 26 June, 2013




      http://www.flickr.com/photos/anirudhkoul/3536413126/




   Dank en tot ziens!
        © 2013 IPTC (www.iptc.org)   All rights reserved     18

Weitere ähnliche Inhalte

Was ist angesagt?

Test-driven development: a case study
Test-driven development: a case studyTest-driven development: a case study
Test-driven development: a case studyMaciej Pasternacki
 
DHWI Linked Open Data - Show and Tell
DHWI Linked Open Data - Show and TellDHWI Linked Open Data - Show and Tell
DHWI Linked Open Data - Show and TellGeorgina Goodlander
 
Persisting dynamic data with mongodb and mongomapper
Persisting dynamic data with mongodb and mongomapperPersisting dynamic data with mongodb and mongomapper
Persisting dynamic data with mongodb and mongomapperwonko
 
First steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webFirst steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webhorvadam
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchSawood Alam
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoSammy Fung
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...Dr.-Ing. Thomas Hartmann
 
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopI Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopGoDataDriven
 
RDF WG Update SemTechBiz 2012
RDF WG Update SemTechBiz 2012RDF WG Update SemTechBiz 2012
RDF WG Update SemTechBiz 20123 Round Stones
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of DataRinke Hoekstra
 
FluidDB NYC Python presentation
FluidDB NYC Python presentationFluidDB NYC Python presentation
FluidDB NYC Python presentationTerry Jones
 
Tickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDBTickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDBTerry Jones
 
Best practices for generating Bio2RDF linked data
Best practices for generating Bio2RDF linked dataBest practices for generating Bio2RDF linked data
Best practices for generating Bio2RDF linked dataalison.callahan
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataFabien Gandon
 

Was ist angesagt? (20)

Test-driven development: a case study
Test-driven development: a case studyTest-driven development: a case study
Test-driven development: a case study
 
MongoDB (Advanced)
MongoDB (Advanced)MongoDB (Advanced)
MongoDB (Advanced)
 
Web of data
Web of dataWeb of data
Web of data
 
DHWI Linked Open Data - Show and Tell
DHWI Linked Open Data - Show and TellDHWI Linked Open Data - Show and Tell
DHWI Linked Open Data - Show and Tell
 
Linked Data:Libraries and Beyond
Linked Data:Libraries and BeyondLinked Data:Libraries and Beyond
Linked Data:Libraries and Beyond
 
Persisting dynamic data with mongodb and mongomapper
Persisting dynamic data with mongodb and mongomapperPersisting dynamic data with mongodb and mongomapper
Persisting dynamic data with mongodb and mongomapper
 
First steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webFirst steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic web
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and Django
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
 
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopI Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
 
RDF WG Update SemTechBiz 2012
RDF WG Update SemTechBiz 2012RDF WG Update SemTechBiz 2012
RDF WG Update SemTechBiz 2012
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
Mongo db
Mongo dbMongo db
Mongo db
 
FluidDB NYC Python presentation
FluidDB NYC Python presentationFluidDB NYC Python presentation
FluidDB NYC Python presentation
 
Tickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDBTickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDB
 
Best practices for generating Bio2RDF linked data
Best practices for generating Bio2RDF linked dataBest practices for generating Bio2RDF linked data
Best practices for generating Bio2RDF linked data
 
Getting triples from records: the role of ISBD
Getting triples from records: the role of ISBDGetting triples from records: the role of ISBD
Getting triples from records: the role of ISBD
 
Acronym Soup
Acronym SoupAcronym Soup
Acronym Soup
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 

Ähnlich wie IPTC News in JSON Spring 2013

IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSONStuart Myles
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017Stuart Myles
 
IPTC News Exchange Formats Working Party Autumn 2012
IPTC News Exchange Formats Working Party Autumn 2012IPTC News Exchange Formats Working Party Autumn 2012
IPTC News Exchange Formats Working Party Autumn 2012Stuart Myles
 
IPTC News in JSON AGM 2013
IPTC News in JSON AGM 2013IPTC News in JSON AGM 2013
IPTC News in JSON AGM 2013Stuart Myles
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsJoshua Shinavier
 
How to put an annotation in html
How to put an annotation in htmlHow to put an annotation in html
How to put an annotation in htmlSTIinnsbruck
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
 
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldHow to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldMilo Yip
 
How do we develop open source software to help open data ? (MOSC 2013)
How do we develop open source software to help open data ? (MOSC 2013)How do we develop open source software to help open data ? (MOSC 2013)
How do we develop open source software to help open data ? (MOSC 2013)Sammy Fung
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Toolcrus0e
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesTony Hammond
 
JSON Fuzzing: New approach to old problems
JSON Fuzzing: New  approach to old problemsJSON Fuzzing: New  approach to old problems
JSON Fuzzing: New approach to old problemstitanlambda
 
BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)Dave Rogers
 
2010 code camp rest for the rest of us
2010 code camp   rest for the rest of us2010 code camp   rest for the rest of us
2010 code camp rest for the rest of usKen Yagen
 
Getting Started with Dojo Toolkit
Getting Started with Dojo ToolkitGetting Started with Dojo Toolkit
Getting Started with Dojo ToolkitThomas Koch
 

Ähnlich wie IPTC News in JSON Spring 2013 (20)

IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSON
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017
 
IPTC News Exchange Formats Working Party Autumn 2012
IPTC News Exchange Formats Working Party Autumn 2012IPTC News Exchange Formats Working Party Autumn 2012
IPTC News Exchange Formats Working Party Autumn 2012
 
IPTC News in JSON AGM 2013
IPTC News in JSON AGM 2013IPTC News in JSON AGM 2013
IPTC News in JSON AGM 2013
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
Framework for IoT Interoperability
Framework for IoT InteroperabilityFramework for IoT Interoperability
Framework for IoT Interoperability
 
How to put an annotation in html
How to put an annotation in htmlHow to put an annotation in html
How to put an annotation in html
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldHow to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the World
 
How do we develop open source software to help open data ? (MOSC 2013)
How do we develop open source software to help open data ? (MOSC 2013)How do we develop open source software to help open data ? (MOSC 2013)
How do we develop open source software to help open data ? (MOSC 2013)
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Tool
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 
JSON Fuzzing: New approach to old problems
JSON Fuzzing: New  approach to old problemsJSON Fuzzing: New  approach to old problems
JSON Fuzzing: New approach to old problems
 
BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)
 
2010 code camp rest for the rest of us
2010 code camp   rest for the rest of us2010 code camp   rest for the rest of us
2010 code camp rest for the rest of us
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
Getting Started with Dojo Toolkit
Getting Started with Dojo ToolkitGetting Started with Dojo Toolkit
Getting Started with Dojo Toolkit
 

Mehr von Stuart Myles

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For NewsStuart Myles
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasStuart Myles
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019Stuart Myles
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceStuart Myles
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?Stuart Myles
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated PressStuart Myles
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018Stuart Myles
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeStuart Myles
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?Stuart Myles
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018Stuart Myles
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...Stuart Myles
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesStuart Myles
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018Stuart Myles
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesStuart Myles
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...Stuart Myles
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorStuart Myles
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017Stuart Myles
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Stuart Myles
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working GroupStuart Myles
 
Rights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated PressRights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated PressStuart Myles
 

Mehr von Stuart Myles (20)

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For News
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies Ideas
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 Conference
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated Press
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 Welcome
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and Challenges
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical Challenges
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing Director
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working Group
 
Rights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated PressRights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated Press
 

IPTC News in JSON Spring 2013

  • 1. News in JSON Stuart Myles * Associated Press * 11th March 2013
  • 2. News In JSON Activity News in JSON Activity http://www.flickr.com/photos/jondresner/5789254800/ http://www.flickr.com/photos/jondresner/5789254800/
  • 3. News In JSON Activity News in JSON Activity Determine the priority properties to be expressed By examining G2, rNews, NITF and existing implementations Created 2-3 candidate JSON representations Places Subjects Text markup Wrote experimental code to try out the candidate structures http://www.flickr.com/photos/jondresner/5789254800/ http://www.flickr.com/photos/jondresner/5789254800/
  • 4. Remind Me: What is JSON? JSON = JavaScript Object Notation http://json.org/ Name / value pairs: a fieldname in quotes, a colon, a value in quotes "givenname" : "Stuart" Objects: written inside curly braces, may contain multiple NVPs {"givenname" : "Stuart", "familyname" : "Myles"} Arrays: Written inside square braces, may contain multiple objects {"iptcdelegates": [ { "givenname": "Dave", "familyname": "Compton"}, { "givenname": "Stuart", "familyname": "Myles"}, { "givenname": "Robert", "familyname": "Schmidt-Nia" } ]} © 2013 IPTC (www.iptc.org) All rights reserved 4
  • 5. Things We Considered But Decided Against • Translating from an existing XML standard into JSON – Not all IPTC standards are XML – Not all publishers use the same IPTC standards – Not all publishers use any IPTC standards • “Mechanically” translating from XML into JSON – There are many libraries that can do this – Different choices for how to represent certain XML features – So each technique results in a slightly different JSON – We felt that more a more “natural” JSON would be more valuable © 2010 IPTC (www.iptc.org) All rights reserved 5
  • 6. News in JSON Properties • We reviewed existing sets of news properties including – NewsML-G2 – NewsML 1 – rNews – NITF • We selected a set of priority properties to represent • https://docs.google.com/spreadsheet/ccc?key=0AvnUbL xJqDwBdGxOQXdYeTRPM2k3WFhiNGRuMWR2M1E • We could add more later... • ...but we wanted to start somewhere © 2010 IPTC (www.iptc.org) All rights reserved 6
  • 7. Let’s draft a News in JSON white paper! © 2010 IPTC (www.iptc.org) All rights reserved 7
  • 8. Representing Places in JSON Geographic metadata such as • Display Name: Brooklyn (NYC) • ID: http://id.example.org/5110302 • Centroid: 40.6501038, -73.9495823 • Bounding Box: 40.453216826620995, -73.68930777156369, 40.846990773379, 1.0, -74.20985682843632 • Hierarchy: Kings County > New York > NY > USA • Type: Second order administrative division Several non publishing JSON implementations, such as • GeoJSON http://www.geojson.org/ • Geonames API http://www.geonames.org/export/JSON- webservices.html © 2010 IPTC (www.iptc.org) All rights reserved 8
  • 9. Two Ways to Represent Places Approach #1: The geonames way http://api.geonames.org/getJSON?geonameId=5110302&username=kansandh aus&format=json Approach #2: With a bit more structure https://gist.github.com/jays0n/5032774 We wrote some code to test them out The app selects a few fields and prints out the objects created Note the different nesting that these caused when looking at the two BO classes. http://tech.groups.yahoo.com/group/iptc-news-in-json-dev/files/jayson-json-geo- tests.tar.gz © 2010 IPTC (www.iptc.org) All rights reserved 9
  • 10. What We Learnt • Simpler JSON results in simpler code – Avoid arrays if they will normally only contain a single object • Ensure property labels start with lower case letters – Some parsers (e.g. Jackson) assume this convention • The main conclusion: there wasn’t much to choose between the two styles in practice • Proposal: adopt the slightly more structured approach © 2010 IPTC (www.iptc.org) All rights reserved 10
  • 11. Some Useful Tools • http://jsonlint.com/ – helpful for finding syntax errors • http://jackson.codehaus.org/ – nice JSON support in JAVA • http://goessner.net/articles/JsonPath/ – like XPATH for JSON © 2010 IPTC (www.iptc.org) All rights reserved 11
  • 12. Subjects in JSON • Subjects – People, companies, organizations, abstract concepts – Keywords, categories • A single structure for all – Like NITF http://www.iptc.org/std/NITF/3.6/documentation/nitf-3- 6.html#Link19 – For example https://gist.github.com/kansandhaus/5049159 • Each subject type has its own structure – For example https://gist.github.com/anonymous/5049220 © 2010 IPTC (www.iptc.org) All rights reserved 12
  • 13. Subjects in JSON • A single structure leaves no room for error in selecting the “bucket” to use to represent a given concept • However, the code to access these anonymous buckets is much more complex • To select documents which are marked as having a location=San Francisco in MongoDB – mnc.queryDB("{"keywords" : {"$elemMatch" : {"type" : "location", "name" : "San Francisco (CA)"}}}"); – mnc.queryDB("{ locations.name: 'San Francisco (SF)' }"); • Proposal: Adopt the “specific buckets” structure © 2010 IPTC (www.iptc.org) All rights reserved 13
  • 14. Text Markup in JSON • How to represent richly marked up text in JSON? • A sweet spot for document-oriented XML • Could be HTML, XHTML, NITF ... • We experiment with two existing text markup examples • NITF: http://www.iptc.org/std/NITF/3.2/examples/nitf- fishing.xml • HTML: http://dev.iptc.org/Implementation-Guide-HTML- 5-Microdata-in-IPTC-namespace © 2010 IPTC (www.iptc.org) All rights reserved 14
  • 15. Text Markup Options in JSON • Plain text, stripped of markup • Preserved but escaped markup – HTML: https://gist.github.com/anonymous/4996653 – XML: https://gist.github.com/anonymous/4996676 – See http://stackoverflow.com/questions/993970/what-do-i-need- to-escape-in-my-html-json-response for a discussion of how to escape markup in JSON • Mechanically create JSON structures to mimic the original markup – We used JSONML as an example http://www.jsonml.org/ – NITF : https://gist.github.com/anonymous/4996697 – HTML: https://gist.github.com/anonymous/4996720 © 2010 IPTC (www.iptc.org) All rights reserved 15
  • 16. What We Learnt • Both plain text (no markup) and escaped markup have clear use cases – Plain text can be useful for search, for example – Escaped markup works well for direct display on a webpage • Markup translated (like JSONML) works OK if you have a library to implement the rules – But what is the added benefit beyond just working directly with XML or HTML? – Who will write and maintain the libraries for ever language? • Proposal: Let providers use both plain and escaped text © 2010 IPTC (www.iptc.org) All rights reserved 16
  • 17. News in JSON Road Map • Evaluate more structures – Such as links to binaries • Write a white paper on our initial recommendations – Publish and seek out feedback within IPTC and beyond • Create a News in JSON 1.0 recommendation – Present it for a vote at the Paris meeting – Consider an experimental phase • You can help by joining the News in JSON group – iptc-news-in-json-dev@yahoogroups.com © 2010 IPTC (www.iptc.org) All rights reserved 17
  • 18. Date and Place of Next Meeting Paris 24 - 26 June, 2013 http://www.flickr.com/photos/anirudhkoul/3536413126/ Dank en tot ziens! © 2013 IPTC (www.iptc.org) All rights reserved 18