SlideShare a Scribd company logo
1 of 24
Download to read offline
CSV-X: A Linked Data Enabled
Schema Language, Model, 

and Processing Engine 

for Non-Uniform CSV
Wirawit Chaochaisit

Sakamura-Koshizuka Laboratory
Graduate School of Interdisciplinary Information Studies

University of Tokyo
The Era of Data
• BIG Data from
• Users Generated Contents (Blog, YouTube, Pinterest, etc.)
• Social Networks
• Open Government Data (data.gov, etc.)
• Mobile, Internet of Things, Sensors
True	IoT	picture	at	the	moment..	
	
	
	
	
Fragmented,	Closed,	Unconnected	
	
In	short,	it’s	the	problem	of		
“interoperability”		
(at	many	level,	from	hardware,	NW	protocols	to	informa<on	layer)	
Community	
1	
A 	
B 	
Community	
2	
C 	
D
4
RDF Data Model

for the “Web of Data”
5
Status of Today’s 

Open Data Formats
• Most popular data format in world’s open data are still
tabular-based: xls & csv/tsv [5]. Over 90% in data.gov.uk
are tabular [4].
• However, CSV is very limited data format: No formal data
structure, no datatype, no schema..
• Why still being used widely?
• Easy to produce from existing tools (Rel. DB, Excel, etc.)
• Creating XML or RDF “cost” more technically &
financially (even for US & UK gov units) [2][3]
6
• There is a standard (W3C) and many tools and
trying to upgrade CSV [2][3][6][7][8][9]
• However, most of the tools only support CSV as
defined by IETF’s RFC 4180 memo [11]









• > 40% of CSV are being ignored in an ODI study
on all CSV in data.gov.uk [10]
ID Name Address Remark
1, John Doe, “Main St.”,
2, Clark Kent, 5th Ave., “foo CRLF
bar”
CRLF
CRLF
CRLF
7
Challenge in Non-Uniform CSV

How can we describe these random patterns so
that we can perform automatic processing?
8
Method 1: Generalize

9
Method 2: Schema Model

The model MUST:
• Represent single/group of value (cell/row/col)
• Capture hidden relation between values (property)
• Specify template and data for transformation and annotation













10
CSV-X

Schema Language, Model, and Processing Engine for Non-
Uniform CSV
• Describe patterns and relations using flexible schema
constructs, adaptive matching algorithm, and cross
referencing techniques
Features: Parse, Annotate, Altering, Validation, Cross-Referencing,
auto-RDF Serialization and Template-based Transformation
11
CSV-X by Example #1 

Validate Data Type / Repeating Pattern /
Cross-Referencing / Annotation
Employee List, 2016-11-26

ID, Name, Age, Salary

1, Bob, 32, 800

2, Lisa, 24, 1200

...

13, John, 21, 700

Total Employee, 13

Average Salary, 900

{
“@cell[0,0]” : { “@regex” :
“Employee List” },
“@cell[0,1]” :
{ “@datatype” : “xsd:date” },
“@cell[1,0-3]” :
{ “@datatype” :
“xsd:string” },
12
Nested

Variable Ref.
Dynamic Variable Declaration
User-defined property

(annotation)
Regular Expr.

/ XML datatype

Validator
Repeating
pattern
directive
Ex#2 Template-based RDF
Transformation
"@template[senObs]":{
"@params" : [ "sensor", "property", "value", "time" ],
"description" : "Template for sensor observation.",
"@ttl" : "?o rdf:type ssn:Observation ;
ssn:observedBy {sensor} ;
ssn:observationSamplingTime {time} ;
...
?obsVal rdf:type ssn:ObservationValue ;
?obsVal DUL:hasDataValue {value} .
{sensor} ssn:observes {property} ." }
"@cell[4,1]":

{ 

"@mapTemplate":"senObs('{sensor.name}','{msrProp}','{@this.@value}', '{msrMthYr}')" 

},
13
Parameters list
Template
content
Parameter Variable
CSV-X Grammar in EBNF (Extended Backus-Naur Form)
	
(*	Schema	entity	expression:	*)	
schema-entity-expr	::=	'@'(named-schema-entity|ranged-schema-
entity|cell-coord-expr)	;		
named-schema-entity	::=	('table'|'prop'|'data'|'template')	
'['identifier']'	;	
identifier	::=	letter	{letter|digit|"_"}	;	
ranged-schema-entity	::=	('row'|'col')	'['range-index']'	;	
range-index	::=	digit	{digit}	['-'	digit	{digit}]	;	
cell-coord-expr	::=	'cell['coordinate-range-index']'	;	
coordinate-range-index	::=	range-index	','	range-index	;	
SERE	::=	schema-entity-expr	['.'	SERE]	;	
(*	Template	mapping	expression:	*)	
template-map-expr	::=	identifier	'('[params]')'	;	
params	::=	(params	[','	params])	|	template-parameter	;	
template-parameter	::=	'''(variable-ref-expr|string)'''	;	
(*	Variable	reference	expression:	*)	
var-ref-expr	::=	'{'var-dot-notation'}'	;	
var-dot-notation	::=	var-name	['.'	property]	
|'row'|'col'|'subrow'	;	
var-name	::=	identifier-derivation	|	'@this'	;	
identifier-derivation	::=	identifier	|	([identifier]	var-ref-
expr	[{	letter|digit|"_"	}])	;	
property	::=	identifier-derivation	|	'@value'	;	
	
14
Processing Algorithms for Schema Matching and
Variable Resolution
15
Algorithms
Implementation
• Implemented in Java with 3K LOC (w/o library, comment, and
blank lines)

Demo: http://www.dadfha.com:3232 

Github: https://github.com/nabito/csv-x
16
Evaluation
• Evaluation focused on expressivity and functional
validation:
1. 7 real-world complex non-uniform CSV from W3C
CSVW use cases report [14] and US, UK, JPN,
and TH open data sites
2. Identify non-uniform patterns and compose the
schema
3. Test operations: parsing, annotation, validation,
cross-referencing, and RDF transformation
17
Results and Conclusion
• Our definitions of non-
uniform CSV patterns cover
all patterns that appear in
sample datasets
• CSV-X can process variety
of complex non-uniform
CSV in real-world datasets
18
• It’s hope that CSV-X can help in publishing high-quality
data for iSWoT and open data community alike
References 1
[1] “Linked Data - Design Issues.” [Online]. Available: https://www.w3.org/
DesignIssues/LinkedData.html. [Accessed: 05-Aug- 2016].
[2] T. Lebo, J. S. Erickson, L. Ding, A. Graves, G. T. Williams, D. DiFranzo, X. Li, J.
Michaelis, J. G. Zheng, J. Flores, Z. Shangguan, D. L. McGuinness, and J. Hendler,
“Producing and Using Linked Open Government Data in the TWC LOGD Portal,” in
Linking Government Data, D. Wood, Ed. Springer New York, 2011, pp. 51–72.
[3] “CSV Schema Language 1.1.” [Online]. Available: http://digital-
preservation.github.io/csv-schema/csv-schema-1.1.html. [Accessed: 09- Jul-2016].
[4] “2014: The Year of CSV | News,” Open Data Institute. [Online]. Available: https://
theodi.org/blog/2014-the-year-of-csv. [Accessed: 15- Jul-2016].
[5] T. Davies, R. M. Sharif, and J. M. Alonso, “Open Data Barometer Global Report,”
World Wide Web Found., 2015.
19
References 2
[6] W. Martens, F. Neven, and S. Vansummeren, “SCULPT: A Schema Language for
Tabular Data on the Web,” in Proceedings of the 24th International Conference on
World Wide Web, New York, NY, USA, 2015, pp. 702–720.
[7] “Model for Tabular Data and Metadata on the Web.” [Online]. Available: http://
www.w3.org/TR/2015/REC-tabular-data-model- 20151217/. [Accessed: 29-Jul-2016].
[8] P. E. R. Salas, M. Martin, F. M. Da Mota, S. Auer, K. Breitman, and M. A.
Casanova, “Publishing statistical data on the web,” in Semantic Computing (ICSC),
2012 IEEE Sixth International Conference on, 2012, pp. 285–292.
[9] A. Langegger and W. Wös s, “XLWrap–Querying and Integrating Arbitrary
Spreadsheets with SPARQL.”
[10] “What is a CSV? A case study of CSVs on data.gov.uk.” [Online]. Available:
https://theodi.github.io/blog/2014/02/18/the-status-of-csvs- on-datagovuk/.
[Accessed: 09-Jul-2016].
20
References 3
[11] Y. Shafranovich, “Common format and MIME type for comma- separated values (CSV)
files,” 2005.
[12] “JSON-LD 1.0.” [Online]. Available: https://www.w3.org/TR/json-ld/. [Accessed: 29-
Jul-2016].
[13] M. Compton, P. Barnaghi, L. Bermudez, R. García-Castro, O. Corcho, S. Cox, J.
Graybeal, M. Hauswirth, C. Henson, and A. Herzog, “The SSN ontology of the W3C semantic
sensor network incubator group,” Web Semant. Sci. Serv. Agents World Wide Web, 2012.
[14] J. Tandy, D. Ceolin, and E. Stephan, “CSV on the Web: Use cases and requirements,”
W3C Working Group Note, 25-Feb-2016. [Online]. Available: https://www.w3.org/TR/csvw-
ucr/.
[15] R. Cyganiak, D. Reynolds, and J. Tennison, “The RDF data cube vocabulary,” W3C
Recomm. January 2014, 2013.
[16] S. Auer, S. Dietzold, and T. Riechert, “OntoWiki–a tool for social, semantic
collaboration,”
21
References 4
[17] C. C. Aggarwal, N. Ashish, and A. Sheth, “The internet of
things: A survey from the data-centric perspective,” in Managing
and mining sensor data, Springer, 2013, pp. 383–428.
[18] Guinard, D., Trifa, V., Mattern, F., & Wilde, E. (2011). From the
Internet of Things to the Web of Things: Resource-oriented
Architecture and Best Practices. In D. Uckelmann, M. Harrison, &
F. Michahelles (Eds.), Architecting the Internet of Things (pp. 97–
129). Springer Berlin Heidelberg. Retrieved from http://
link.springer.com/chapter/10.1007/978-3-642-19157-2_5
[19] D. Pfisterer et al., “SPITFIRE: toward a semantic web of
things.,” IEEE Communications Magazine, vol. 49, no. 11, pp. 40–
48, 2011.
22
Publications
• Human Localization Sensor Ontology: Enabling
OWL 2 DL-Based Search for User’s Location-Aware
Sensors in the IoT. In 2016 IEEE Tenth International
Conference on Semantic Computing (ICSC) (pp.
107–111). https://doi.org/10.1109/ICSC.2016.31
• CSV-X: A Linked Data Enabled Schema Language,
Model, and Processing Engine for Non-Uniform
CSV. To be appeared in proceeding of 2016 IEEE
International Conference on Smart Data
(SmartData), Dec 16-19, China
23
Thank you



Q&A
Contact:

nabito@gmail.com

http://www.github.com/nabito
24

More Related Content

What's hot

Moving to an open world
Moving to an open worldMoving to an open world
Moving to an open worldDiane Hillmann
 
Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebGillian Byrne
 
Digital Humanities in a Linked Data World - Semnantic Annotations
Digital Humanities in a Linked Data World - Semnantic AnnotationsDigital Humanities in a Linked Data World - Semnantic Annotations
Digital Humanities in a Linked Data World - Semnantic AnnotationsDov Winer
 
VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffHeather Seneff
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Getaneh Alemu
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataeswcsummerschool
 
The role of metadata for discovery: tips for content providers
The role of metadata for discovery: tips for content providersThe role of metadata for discovery: tips for content providers
The role of metadata for discovery: tips for content providersGetaneh Alemu
 
Sherif Metadata Talk - London (June 25th 2018)
Sherif Metadata Talk - London (June 25th 2018)Sherif Metadata Talk - London (June 25th 2018)
Sherif Metadata Talk - London (June 25th 2018)Getaneh Alemu
 
Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Getaneh Alemu
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollinkSSSW
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataStefan Dietze
 
Social semantic web
Social semantic webSocial semantic web
Social semantic webVlad Posea
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
From the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enrichingFrom the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enrichingGetaneh Alemu
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...Stefan Schmunk
 
The Impact of Linked Data in Digital Curation and Application to the Catalogu...
The Impact of Linked Data in Digital Curation and Application to the Catalogu...The Impact of Linked Data in Digital Curation and Application to the Catalogu...
The Impact of Linked Data in Digital Curation and Application to the Catalogu...Hong (Jenny) Jing
 

What's hot (20)

Moving to an open world
Moving to an open worldMoving to an open world
Moving to an open world
 
Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic Web
 
Wikidata
WikidataWikidata
Wikidata
 
Digital Humanities in a Linked Data World - Semnantic Annotations
Digital Humanities in a Linked Data World - Semnantic AnnotationsDigital Humanities in a Linked Data World - Semnantic Annotations
Digital Humanities in a Linked Data World - Semnantic Annotations
 
Usp dh 2013
Usp dh 2013Usp dh 2013
Usp dh 2013
 
VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_Seneff
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
 
The role of metadata for discovery: tips for content providers
The role of metadata for discovery: tips for content providersThe role of metadata for discovery: tips for content providers
The role of metadata for discovery: tips for content providers
 
Sherif Metadata Talk - London (June 25th 2018)
Sherif Metadata Talk - London (June 25th 2018)Sherif Metadata Talk - London (June 25th 2018)
Sherif Metadata Talk - London (June 25th 2018)
 
Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open Data
 
Social semantic web
Social semantic webSocial semantic web
Social semantic web
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
From the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enrichingFrom the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enriching
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
 
The Impact of Linked Data in Digital Curation and Application to the Catalogu...
The Impact of Linked Data in Digital Curation and Application to the Catalogu...The Impact of Linked Data in Digital Curation and Application to the Catalogu...
The Impact of Linked Data in Digital Curation and Application to the Catalogu...
 

Viewers also liked

図書館と人口分布の見える化
図書館と人口分布の見える化図書館と人口分布の見える化
図書館と人口分布の見える化Yoshikazu Hosono
 
Sakepediaの使い方(LODチャレンジ)
Sakepediaの使い方(LODチャレンジ)Sakepediaの使い方(LODチャレンジ)
Sakepediaの使い方(LODチャレンジ)teru1118
 
2016年 地域密着型本おすすめアプリ「なによも」
2016年 地域密着型本おすすめアプリ「なによも」2016年 地域密着型本おすすめアプリ「なによも」
2016年 地域密着型本おすすめアプリ「なによも」Keiko Noda
 
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic WebMachine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Webpauldix
 
RDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningRDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningPetar Ristoski
 
バス列の現状(2016ORFバージョン)
バス列の現状(2016ORFバージョン)バス列の現状(2016ORFバージョン)
バス列の現状(2016ORFバージョン)隆 司
 

Viewers also liked (12)

Lod2016.key
Lod2016.keyLod2016.key
Lod2016.key
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
 
声優LOD
声優LOD声優LOD
声優LOD
 
図書館と人口分布の見える化
図書館と人口分布の見える化図書館と人口分布の見える化
図書館と人口分布の見える化
 
Sakepediaの使い方(LODチャレンジ)
Sakepediaの使い方(LODチャレンジ)Sakepediaの使い方(LODチャレンジ)
Sakepediaの使い方(LODチャレンジ)
 
2016年 地域密着型本おすすめアプリ「なによも」
2016年 地域密着型本おすすめアプリ「なによも」2016年 地域密着型本おすすめアプリ「なによも」
2016年 地域密着型本おすすめアプリ「なによも」
 
Local karuta project
Local karuta projectLocal karuta project
Local karuta project
 
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic WebMachine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
 
可視化法学-大和超券ステージ
可視化法学-大和超券ステージ可視化法学-大和超券ステージ
可視化法学-大和超券ステージ
 
RDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningRDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data Mining
 
バス列の現状(2016ORFバージョン)
バス列の現状(2016ORFバージョン)バス列の現状(2016ORFバージョン)
バス列の現状(2016ORFバージョン)
 
AIの未来 ~技術と社会の関係のダイナミクス~
AIの未来~技術と社会の関係のダイナミクス~AIの未来~技術と社会の関係のダイナミクス~
AIの未来 ~技術と社会の関係のダイナミクス~
 

Similar to CSV-X

Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open DataIvan Herman
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Talis Consulting
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?Li Ding
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?Elena Simperl
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database dannyijwest
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global DataspaceOpen Education Consortium
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageNoreen Whysel
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark GreavesMediabistro
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 

Similar to CSV-X (20)

Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Enabling Citizen-empowered Apps over Linked Data
Enabling Citizen-empowered Apps over Linked DataEnabling Citizen-empowered Apps over Linked Data
Enabling Citizen-empowered Apps over Linked Data
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspace
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Semantic Web in Action
Semantic Web in ActionSemantic Web in Action
Semantic Web in Action
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 

Recently uploaded

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Recently uploaded (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

CSV-X

  • 1. CSV-X: A Linked Data Enabled Schema Language, Model, 
 and Processing Engine 
 for Non-Uniform CSV Wirawit Chaochaisit
 Sakamura-Koshizuka Laboratory Graduate School of Interdisciplinary Information Studies
 University of Tokyo
  • 2. The Era of Data • BIG Data from • Users Generated Contents (Blog, YouTube, Pinterest, etc.) • Social Networks • Open Government Data (data.gov, etc.) • Mobile, Internet of Things, Sensors
  • 4. 4
  • 5. RDF Data Model
 for the “Web of Data” 5
  • 6. Status of Today’s 
 Open Data Formats • Most popular data format in world’s open data are still tabular-based: xls & csv/tsv [5]. Over 90% in data.gov.uk are tabular [4]. • However, CSV is very limited data format: No formal data structure, no datatype, no schema.. • Why still being used widely? • Easy to produce from existing tools (Rel. DB, Excel, etc.) • Creating XML or RDF “cost” more technically & financially (even for US & UK gov units) [2][3] 6
  • 7. • There is a standard (W3C) and many tools and trying to upgrade CSV [2][3][6][7][8][9] • However, most of the tools only support CSV as defined by IETF’s RFC 4180 memo [11]
 
 
 
 
 • > 40% of CSV are being ignored in an ODI study on all CSV in data.gov.uk [10] ID Name Address Remark 1, John Doe, “Main St.”, 2, Clark Kent, 5th Ave., “foo CRLF bar” CRLF CRLF CRLF 7
  • 8. Challenge in Non-Uniform CSV
 How can we describe these random patterns so that we can perform automatic processing? 8
  • 10. Method 2: Schema Model
 The model MUST: • Represent single/group of value (cell/row/col) • Capture hidden relation between values (property) • Specify template and data for transformation and annotation
 
 
 
 
 
 
 10
  • 11. CSV-X
 Schema Language, Model, and Processing Engine for Non- Uniform CSV • Describe patterns and relations using flexible schema constructs, adaptive matching algorithm, and cross referencing techniques Features: Parse, Annotate, Altering, Validation, Cross-Referencing, auto-RDF Serialization and Template-based Transformation 11
  • 12. CSV-X by Example #1 
 Validate Data Type / Repeating Pattern / Cross-Referencing / Annotation Employee List, 2016-11-26
 ID, Name, Age, Salary
 1, Bob, 32, 800
 2, Lisa, 24, 1200
 ...
 13, John, 21, 700
 Total Employee, 13
 Average Salary, 900
 { “@cell[0,0]” : { “@regex” : “Employee List” }, “@cell[0,1]” : { “@datatype” : “xsd:date” }, “@cell[1,0-3]” : { “@datatype” : “xsd:string” }, 12 Nested
 Variable Ref. Dynamic Variable Declaration User-defined property
 (annotation) Regular Expr.
 / XML datatype
 Validator Repeating pattern directive
  • 13. Ex#2 Template-based RDF Transformation "@template[senObs]":{ "@params" : [ "sensor", "property", "value", "time" ], "description" : "Template for sensor observation.", "@ttl" : "?o rdf:type ssn:Observation ; ssn:observedBy {sensor} ; ssn:observationSamplingTime {time} ; ... ?obsVal rdf:type ssn:ObservationValue ; ?obsVal DUL:hasDataValue {value} . {sensor} ssn:observes {property} ." } "@cell[4,1]":
 { 
 "@mapTemplate":"senObs('{sensor.name}','{msrProp}','{@this.@value}', '{msrMthYr}')" 
 }, 13 Parameters list Template content Parameter Variable
  • 14. CSV-X Grammar in EBNF (Extended Backus-Naur Form) (* Schema entity expression: *) schema-entity-expr ::= '@'(named-schema-entity|ranged-schema- entity|cell-coord-expr) ; named-schema-entity ::= ('table'|'prop'|'data'|'template') '['identifier']' ; identifier ::= letter {letter|digit|"_"} ; ranged-schema-entity ::= ('row'|'col') '['range-index']' ; range-index ::= digit {digit} ['-' digit {digit}] ; cell-coord-expr ::= 'cell['coordinate-range-index']' ; coordinate-range-index ::= range-index ',' range-index ; SERE ::= schema-entity-expr ['.' SERE] ; (* Template mapping expression: *) template-map-expr ::= identifier '('[params]')' ; params ::= (params [',' params]) | template-parameter ; template-parameter ::= '''(variable-ref-expr|string)''' ; (* Variable reference expression: *) var-ref-expr ::= '{'var-dot-notation'}' ; var-dot-notation ::= var-name ['.' property] |'row'|'col'|'subrow' ; var-name ::= identifier-derivation | '@this' ; identifier-derivation ::= identifier | ([identifier] var-ref- expr [{ letter|digit|"_" }]) ; property ::= identifier-derivation | '@value' ; 14
  • 15. Processing Algorithms for Schema Matching and Variable Resolution 15 Algorithms
  • 16. Implementation • Implemented in Java with 3K LOC (w/o library, comment, and blank lines)
 Demo: http://www.dadfha.com:3232 
 Github: https://github.com/nabito/csv-x 16
  • 17. Evaluation • Evaluation focused on expressivity and functional validation: 1. 7 real-world complex non-uniform CSV from W3C CSVW use cases report [14] and US, UK, JPN, and TH open data sites 2. Identify non-uniform patterns and compose the schema 3. Test operations: parsing, annotation, validation, cross-referencing, and RDF transformation 17
  • 18. Results and Conclusion • Our definitions of non- uniform CSV patterns cover all patterns that appear in sample datasets • CSV-X can process variety of complex non-uniform CSV in real-world datasets 18 • It’s hope that CSV-X can help in publishing high-quality data for iSWoT and open data community alike
  • 19. References 1 [1] “Linked Data - Design Issues.” [Online]. Available: https://www.w3.org/ DesignIssues/LinkedData.html. [Accessed: 05-Aug- 2016]. [2] T. Lebo, J. S. Erickson, L. Ding, A. Graves, G. T. Williams, D. DiFranzo, X. Li, J. Michaelis, J. G. Zheng, J. Flores, Z. Shangguan, D. L. McGuinness, and J. Hendler, “Producing and Using Linked Open Government Data in the TWC LOGD Portal,” in Linking Government Data, D. Wood, Ed. Springer New York, 2011, pp. 51–72. [3] “CSV Schema Language 1.1.” [Online]. Available: http://digital- preservation.github.io/csv-schema/csv-schema-1.1.html. [Accessed: 09- Jul-2016]. [4] “2014: The Year of CSV | News,” Open Data Institute. [Online]. Available: https:// theodi.org/blog/2014-the-year-of-csv. [Accessed: 15- Jul-2016]. [5] T. Davies, R. M. Sharif, and J. M. Alonso, “Open Data Barometer Global Report,” World Wide Web Found., 2015. 19
  • 20. References 2 [6] W. Martens, F. Neven, and S. Vansummeren, “SCULPT: A Schema Language for Tabular Data on the Web,” in Proceedings of the 24th International Conference on World Wide Web, New York, NY, USA, 2015, pp. 702–720. [7] “Model for Tabular Data and Metadata on the Web.” [Online]. Available: http:// www.w3.org/TR/2015/REC-tabular-data-model- 20151217/. [Accessed: 29-Jul-2016]. [8] P. E. R. Salas, M. Martin, F. M. Da Mota, S. Auer, K. Breitman, and M. A. Casanova, “Publishing statistical data on the web,” in Semantic Computing (ICSC), 2012 IEEE Sixth International Conference on, 2012, pp. 285–292. [9] A. Langegger and W. Wös s, “XLWrap–Querying and Integrating Arbitrary Spreadsheets with SPARQL.” [10] “What is a CSV? A case study of CSVs on data.gov.uk.” [Online]. Available: https://theodi.github.io/blog/2014/02/18/the-status-of-csvs- on-datagovuk/. [Accessed: 09-Jul-2016]. 20
  • 21. References 3 [11] Y. Shafranovich, “Common format and MIME type for comma- separated values (CSV) files,” 2005. [12] “JSON-LD 1.0.” [Online]. Available: https://www.w3.org/TR/json-ld/. [Accessed: 29- Jul-2016]. [13] M. Compton, P. Barnaghi, L. Bermudez, R. García-Castro, O. Corcho, S. Cox, J. Graybeal, M. Hauswirth, C. Henson, and A. Herzog, “The SSN ontology of the W3C semantic sensor network incubator group,” Web Semant. Sci. Serv. Agents World Wide Web, 2012. [14] J. Tandy, D. Ceolin, and E. Stephan, “CSV on the Web: Use cases and requirements,” W3C Working Group Note, 25-Feb-2016. [Online]. Available: https://www.w3.org/TR/csvw- ucr/. [15] R. Cyganiak, D. Reynolds, and J. Tennison, “The RDF data cube vocabulary,” W3C Recomm. January 2014, 2013. [16] S. Auer, S. Dietzold, and T. Riechert, “OntoWiki–a tool for social, semantic collaboration,” 21
  • 22. References 4 [17] C. C. Aggarwal, N. Ashish, and A. Sheth, “The internet of things: A survey from the data-centric perspective,” in Managing and mining sensor data, Springer, 2013, pp. 383–428. [18] Guinard, D., Trifa, V., Mattern, F., & Wilde, E. (2011). From the Internet of Things to the Web of Things: Resource-oriented Architecture and Best Practices. In D. Uckelmann, M. Harrison, & F. Michahelles (Eds.), Architecting the Internet of Things (pp. 97– 129). Springer Berlin Heidelberg. Retrieved from http:// link.springer.com/chapter/10.1007/978-3-642-19157-2_5 [19] D. Pfisterer et al., “SPITFIRE: toward a semantic web of things.,” IEEE Communications Magazine, vol. 49, no. 11, pp. 40– 48, 2011. 22
  • 23. Publications • Human Localization Sensor Ontology: Enabling OWL 2 DL-Based Search for User’s Location-Aware Sensors in the IoT. In 2016 IEEE Tenth International Conference on Semantic Computing (ICSC) (pp. 107–111). https://doi.org/10.1109/ICSC.2016.31 • CSV-X: A Linked Data Enabled Schema Language, Model, and Processing Engine for Non-Uniform CSV. To be appeared in proceeding of 2016 IEEE International Conference on Smart Data (SmartData), Dec 16-19, China 23