SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Federated Ontology for Sports
CSCI586_Web_Group1_Topic1G: Database Interoperability Project Report
Abhishek Agrawal
agra47@usc.edu
George Sam
gsam@usc.edu
Hari Haran Venugopal
hvenugop@usc.edu
Noopur Joshi
noopurbj@usc.edu
Abstract—Our project aims at providing a brief information
on player background, tournament details (schedule, location
etc.). We have created a federated ontology to include
information about Soccer, tennis and cricket. This systems is then
modeled for the data sets from each sport. We then run queries
on the system to test our results.
Keywords—RDF, Scraping
I. INTRODUCTION
The world of sports is very important from a technology
perspective. The different sports constantly produce data and
this is where it is crucial to develop systems that continuously
analyze and extract information from this data. Many people
follow multiple sports and in this day and age, they desire to
have information about their favorite sports at the tip of their
fingers. With the development of the mobile development
platform, this information can be easily accessed. Multiple
systems have been developed which make use of many
computational concepts in Natural Language Processing,
Machine Learning and Artificial Intelligence to extract
information about the sources, process them and display the
results. The problem is that the data is continuous and
constantly increasing. Websites are available for specific
sports only and rarely provide user the information about other
sports. The idea of this project is to create an ontology based
system that can collaborate data from multiple sources. This
data is then modeled into a federated ontology for sports,
consisting of data models and ontologies from multiple
sources. Our intention is to create a system with a federated
ontology to model multiple sports. In our system, we have
started by modelling cricket, tennis and football (soccer).
II. MOTIVATION
A. Ontology for Sports
There are Existing ontologies for sports. These ontologies
cover most of the information required in the sport fields. In
cricket the ontology covers the player information, like
matches played, average batting score, average bowling score.
But it is specific to cricket only. Similar ontologies are present
for tennis and and other sports. But these sports are
structurally different and thus their ontologies are very
different. It is thus necessary to create an ontology that will try
and integrate the different information about multiple sports.
In today’s world of technology, we have access to tablets,
mobile devices, laptops, and desktops and so on. There is a
need for constant, intelligent, up-to-date, integrated and
detailed information from the Web. Though there is lot of
information available on the web but is unstructured and
varying. Ontologies provide a way of maintaining this
unstructured information in one format that can be easily
shared.
Having a common ontology helps to aggregate data from
various sources. This data can be shared with other users or
applications.
B. Need for a Federated Ontology
Everybody follows one or more sports. But we have to
search information related to our choice of sport on different
website. How much time will it take for you to search “all
sports player from England”?? You will have to search for
each sport individually. Consider a situation, your favorite
sport person is “Rafael Nadal” and you want to know latest
updates about him. You still to have to search update-to-
knowledge about him individually.
Federated sports ontology can help us to represent different
sports and presents a common view. Also federated ontology
is extendible in the sense that more information like new
player details, player statistics, new sports or changes in rules
of games can easily be added to previously gathered data.
One of the core importance of federated ontology or in general
terms “Sematic Web” is in the Data Analysis. Data Analyst
spends a lot of time to convert unstructured data into
structured data. Storing data based on a common ontology
solves this problem and saves a lot of time. Simple Statistical
information like “How do the players and/or teams measure up
against one another in various categories?” Can easily be
answered using federated ontology.
Ontologies can be used in application like News feeds.
Federated Ontologies helps us to combine editorial coverage
of sports with all data feeds presented at one place.
C. Our System
Our system for a federated ontology models information
about Cricket, Tennis and Football (soccer). This information
consists of information about players, the sporting events, and
the rankings about teams and players. Since these sports are
structurally different, it is important to find out fields and
attributes about the players and rankings that can be similar
across all the sports. Information was gathered about the
following domains:
 Cricket
o Player Information
o Player Rankings for T20, ODI and for Test Match
types
o Team Information
o Rankings for teams in T20, ODI and Test Match types
 Tennis
o Player Information including ranking
o Tournament Information for Wimbledon, US Open,
Australian Open and French Open
 Football
o Player information from the English Premier League
and from the Spanish La Liga
o Information about the games played for the 2 leagues
To limit the size of data collected, we collected data for the
duration from 2004 to 2014 (10 years). The information was
collected in multiple files pertaining to each domain and type.
These files were merged into single files.
Steps for the Project:
1. Data Collection and Data Scraping
2. Data Cleaning
3. Ontology Creation
4. Data Modelling
5. Data Publishing
6. Running queries to extract information
The diagram below highlights these steps.
Fig 1. System Development Cycle
III. DATA SCRAPING
The data for the diffrent sports was collected by developing
web scrapers. We have developed the scrapers in Java ad
Python using libraries. The scrapers scrape though the web
sites and collect information in the form of JSON files or CSV
files.
1. Data Scraping for Cricket
Cricket Dataset:
In order to collect dataset for cricket we scrapped below
websites
 http://www.icc-cricket.com/player-rankings/overview
 http://www.espncricinfo.com/ci/content/player/index.html
 http://cricsheet.org/
We collected the Player and Team information from the
espncricinfo.com and Ranking data from Icc-cricket.com.
In order collect the data regarding the T20, T20I, ODI and
Test matches, we used cricsheet.org.
Scrapper used:
Chrome Web Scrapper: In this we created the site maps
and scrapped data using it. It is very handy and very
powerful tool.
Fig 2. ICC-Cricket Website Screenshot
YAML Java Library:
Matches data set was in YAML format. We converted this
data to JSON format with required fields extracted using a
SnakeYAML library.
2. Data Scraping for Tennis
Source: www.atpworldtour.com
Scraper Tool: Beautiful Soap
Beautiful Soup is a Python based library consisting of
features:
 Simple methods and Pythonic idioms for navigating,
searching, and modifying a parse tree: a toolkit for
dissecting a document and extracting content.
 The library converts incoming documents to Unicode and
outgoing documents to UTF-8.
 Beautiful Soup sits on top of popular Python parsers like
lxml and html5lib, consisting of parsing strategies or trade
speed for flexibility.
Steps taken to retrieve dataset for tennis domain:
1. We primarily focused out data set to top 100 ranked
professional tennis players from atpworldtour.
2. For each tennis player we retrieved all the attributes like:
Rank. Age, Birthplace, Residence, Height, Weight, Plays,
TurnedPro, Coach, Website and Personal History using
Beautiful Soap in python script.
3. We also scrapped content for all the grand slam details
like year, winner, score for each slams: Australian Open,
French Open, Wimbledon and US Open respectively.
Fig 3. ATP Website Screenshot
3. Data Scraping for Football
Data Source:
http://www.soccerbase.com/
https://github.com/openfootball/en-england
http://www.footballsquads.co.uk/spain/
http://www.footballsquads.co.uk/england/
Scraper Used: JSoup
JSoup is a Java based library to scrap websites. It provide
functions to get information in the form of text, or in the form
of HTML tags. This information can then be parsed to extract
tag attributes, tag content, etc.
The data is collected in the form of JSON files. Thes files are
then combined to create one file for players in English Premier
League, for the Spanish La Liga and one file for information
about the teams respectively.
Fig 4. SoccerBase Website Screenshot
IV. DATA CLEANING
Data was cleaned using Google OpenRefine and Karma. The
files had attributes like player profile and other information
which was irrelevant for the data sets. These attributes create
problems while mapping to data models.
This step is important because conversion of files from one
format to another generates a lot of discrepancies. Entities like
white spaces and new line characters tend to break data
objects and give errors while modelling. In our project we
performed cleaning of Tennis data in Karma, while cricket and
football data were cleaned using Google OpenRefine.
Stages of Data Cleaning:
• Import Data: We imported in the form of json format,
XML and CSV files since, we our data sources was very
diverse and we used different web scraping tools.
• Merge Data Sets: We merged all the content
belonging to a single domain onto a one file Eg: All the 4
different grand slams details where combined in single file.
• Rebuild Missing Data: We filled the missing data
with empty value.
• Standardize and Normalize Data: Some of the fields
where merged using Karma Pytransform scripts eg: First
Name and Last Name to Name and separated fields into
multiple columns Eg: Height containing data in Inches and
cms into separate field for each of the height units and also
discarded all the irrelevant fields.
• De-Duplicate: All the duplicates values where
discarded which contributed to noisy data.
• Verify, Enrich and Export Data: Once the dataset was
cleaned as per the requirement for the federated ontology
mapping, we exported the dataset in json format, which would
be consumed for the next phase - Data Modeling.
Fig 5. Data Cleaning Steps
V. ONTOLOGY CREATION
Protégé is a free and open source ontology editor which was
developed at Stanford. Protégé defines a graphical user
interface to define ontologies.
Our project required the creation of a federated Sports
ontology and we used Protege for the creation of the ontology.
The ontology of our project is shown below.
Fig 6. Ontology Graph
Ontology can be created either by the top-down approach or
the bottom-up approach. The top-down approach of creating
an ontology involves the definition of the most general
concepts in the domain and then specifying the specialization
of the concepts. In our case the first step would be to define
concepts for all sports in general. The next step would be to
define the specialization of every sport like Football, Cricket
and Tennis used in our project.
The bottom-up approach start with the definition of the most
specific concepts, starting from the leaves of the hierarchy and
then mentioning the general concepts of the ontology. For
example in our ontology it would mean defining the
CricketPlayer, FootballPlayer and the TennisPlayer Class first
and then going all the way up the hierarchy of the ontology.
There also exists an hybrid approach which is a combination
of both the top-down approach and the bottom-up approach in
the development of the ontology. Here we define the salient
concepts first and then either specialize or generalize We may
start with the Sports and Player class first and then we may
specialize about the individual player classes further
continuing with the other classes like the Match or the
Tournament class.
In the creation of our ontology we have used the Hybrid
approach.
The class hierarchy of our ontology is shown below:
Fig 7. Class Heirarchy
All the subclasses are mapped under the Top-Level classes.
For example the CricketPlayer is mapped under the Players
class.
The Object Properties of our Ontology is as shown below:
Fig 8. Object Properties Screenshot
These properties help us to draw inferences among the
different concepts. For example looking at our class hierarchy
and the list of Object Properties we can show the relation that
a Player is an AthleteOf a Sport.
Fig 9. Property Mapping
While defining the object property we can define both the
Domain and Range of the property in Protégé. For example
consider the above relation between the Player and the Sport
Class. The object property being defined here is "AthleteOf".
The instance of the Player class would be the AthleteOf the
Sport Class. Thus the domain would be Player Class and the
Range for the property would be the Sport Class.
The data properties of our ontology are shown below:
Fig 10. Data Properties
They are used to specify the special attributes associated with
every class.
In our ontology we have different data properties like the
MatchTeam1, MatchTeam2 which specify the teams against
which the Matches were played. When creating the data
properties in Protégé, we can specify the domain and the range
of the property. For example let us consider the Height
property. It is a data property specified for all players in
general, so the domain of the Property would be the class
Player. Since the height of the player would either be
represented as an integer or a float, the Range of it would be
either integer or float.
VI. DATA MODELLING
A. Tools
There are many tools available at our disposal to create a data
model for this federated ontology. Most commonly used tools
are Protégé and Karma. In our system we have used Karma to
model the data for the data sets. The data modelling was done
using Karma. The steps used to model the data are:
• Import our sports ontology along with the other OWL
files into the Karma workspace
• Import the data sets one at a time into Karma
• Create Semantic mappings for every attribute in the
data set
• Create class URIs for attributes which can be used as
Keys
• Publish the model
• Load it into the triple store
The data model was done for each data set one at a time. In the
diagram below we have shown a data model for tennis player.
The player JSON file had many attributes which needed
modelling. It shows the classes used and the semantic
information for the properties. In this data set we intended to
merge the data with information about the tennis tournaments
– Wimbledon, French Open, US Open and Australian Open.
In this merging it is important for us to create a key attribute
which acts like a key for merging data. This is in the form of a
URI field. In our data set we decided to use the Player name
for a URI. We did not have a field for URI so we created one
using Python transformations to create a new data field called
PlayerURI. This field will act as a class URI for the player
class. When the data is loaded in the triple store this field will
act as the key to merge the data sets.
We created the PlayerURI field as the class URI and it shows
the link in the form of dashed green link. Other attributes are
class properties. As we build the data model to map higher
classes, Karma automatically suggested the properties like
dcterms:type between the Players class and TennisPlayer
class.
Fig 11. Tennis Player Model- Karma
Once the data model is created we had to check for all
semantic mappings if they were correct and if they matched to
the ontology. Below are screenshots for cricket and football
players.
Fig 12. Cricket Player Model- Karma
VII. DATA PUBLISHING
The important step after modelling of the data is to publish the
data into a triple store. Karma creates bindings for every
attribute and class. The data model maps onto the ontology we
have created. We have class mappings for every football
player cricket player and tennis player. Similarly in our sport
games data sets, we create models for every tournament
information.
We have used OpenRDf to publish. OpenRDF comes
integrated into Karma. It enables the user to link the triple
store repository with the Karma instance, thus providing a way
to easily transport the models and the RDFs from Karma into
the triple store.
Steps:
1. Create a Repository in OpenRDF
2. Publish the RDF files from Karma
3. Load the data into OpenRDF in the form of contexts
4. Verify the triples
Below are few screenshots of the triple store with the contexts
and the RDF triples. Each context stores RDF related to that
particular sport.
The contexts are:
http://localhost.com/tennis
http://localhost.com/cricket
http://localhost.com/football
Fig 13. RDF Triples in OpenRDF
VIII.QUERIES
The following SPARQL queries were run on the data sets to
test the results of the modelling:
1. To extract names of players from all sports living in
England
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-
ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX cs586: <http://www.semanticweb.org/CS586/>
PREFIX schema: <http://www.schema.org/>
PREFIX local: <http://localhost:8080/source/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT DISTINCT ?Pname ?type ?loc
WHERE
{
{
?pl cs586:PlayerName ?Pname;
dcterms:type ?p;
cs586:MemberOf ?team.
?team cs586:TeamName ?loc.
?p rdf:type ?type .
?p rdf:type
<http://www.semanticweb.org/CS586#CricketPlayer>.
FILTER(regex(?loc,"England"))
}
UNION
{
?pl cs586:PlayerName ?Pname;
dcterms:type ?p;
cs586:Nationality ?loc;
cs586:MemberOf ?team.
?team cs586:TeamName ?tname.
?p rdf:type ?type .
?p rdf:type
<http://www.semanticweb.org/CS586#FootballPlayer>.
FILTER(regex(?loc,"ENG"))
}
UNION
{
?pl cs586:PlayerName ?Pname;
dcterms:type ?p.
?p rdf:type ?type.
?p rdf:type
<http://www.semanticweb.org/CS586#TennisPlayer>;
<http://www.semanticweb.org/CS586/BornIn> ?n.
?n cs586:LocationName ?loc.
FILTER(regex(?loc,"England"))
}
}
Result:
Fig 14. Query 1 Result
2. Query to extract player names of left-handed players from
all sports
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-
ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX cs586: <http://www.semanticweb.org/CS586/>
PREFIX schema: <http://www.schema.org/>
PREFIX local: <http://localhost:8080/source/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT DISTINCT ?Pname
WHERE
{
{
?pl cs586:PlayerName ?Pname;
dcterms:type ?p.
?p cs586:CricketBatStyle ?batstyle.
?p rdf:type
<http://www.semanticweb.org/CS586#CricketPlayer>.
FILTER(regex(?batstyle,"Left-hand"))
}
UNION
{
?pl dcterms:type ?r;
cs586:PlayerName ?Pname.
?r cs586:TennisPlays ?q.
FILTER(regex(?q,"Left-hand"))
}
}
Result:
Fig 15. Query 2 Result
3. Query to find players above 30 years of age
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-
ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX cs586: <http://www.semanticweb.org/CS586/>
PREFIX schema: <http://www.schema.org/>
PREFIX local: <http://localhost:8080/source/>
SELECT (COUNT(?age) AS ?AgeValue)
WHERE
{
?p cs586:HasRank ?rank;
cs586:PlayerName ?Pname;
cs586:BornOn ?date;
cs586:Age ?age.
FILTER(?age > 30)
}
Result:
Fig 16. Query 3 Result
IX. TOOLS USED
A. Data Scraping:
JSoup, JSON, Request, BeautifulSoup, Sesame
OpenRDF, Apache Jena, Scrapy, WebScraper.
B. Data Cleaning:
• Karma Tool: Karma offers a programming-by-example
interface to enable users to define data transformation scripts
that transform data expressed in multiple data formats into a
common format.
• Google Refine : A power tool for working with messy data,
cleaning it up, transforming it from one format into another.
C. Ontology Creation
 Protégé: Protégé is a free and open source ontology editor
which was developed at Stanford. Protégé defines a
graphical user interface to define ontologies.
D. Data Modelling and Publishing
 Karma
 OpenRDF: It is an open source Apache Sesame based
framework to create triple stores and load and query
RDF triples.
X. INDIVIDUAL CONTRIBUTION
a. Abhishek Agrawal:
Scrapped websites (http://www.espncricinfo.com/) and
(http://www.icc-cricket.com/) and (http://cricsheet.org/)
using a combination of Scrapy python library and
Chrome Web scrapper. Did modelling for cricket files
in Karma. Worked on Ontology Creation.
b.George Sam:
Performed Data Scraping in Python to extract
information about players and league matches for
English Premier League and Spanish La Liga Players.
Did Data cleaning in Google OpenRefine. Worked on
Data Modelling in Karma and Triple Store Creation in
OpenRDF. Developed Sparql queries.
c. Hari Haran Venugopal:
Built a python script to scrap websites, collected
relevant dataset in JSON format for domain tennis,
primarily focusing on players details: Name, Bio-Data,
Ranking, Personal History, Coach, Age, Nationality and
tournament details considering all grand slams: year,
winner, scores. Did Data cleaning in Karma. Developed
SPARQL Queries. Performed data modelling for
d.Noopur Joshi:
Responsible for creating ontologies for each sport and
the federated ontology for all sports combined on
Protégé. The approach being used is the Hybrid
approach for creating the ontologies. Initially an
ontology for each individual sport namely Football,
Tennis and Cricket is created. These ontologies were
then mapped to the federated ontology for all sports.
She handled Data cleaning for the datasets using
Google OpenRefine. The paper referred for creating
ontologies:
http://130.88.198.11/tutorials/protegeowltutorial/resour
ces/ProtegeOWLTutorialP4_v1_3.pdf
Worked in data modelling for cricket and football
players.
XI. CONCLUSION AND FUTURE WORK
We have thus been able to create a system which implements a
federated ontology to include data from Tennis, Cricket and
Football. This information is then further modelled into Karma
and is then loaded into RDF files in a triple store. We have
then run queries to gather some statistical data about players
from all sports.
Further work can be done in this field. Many other sports can
be integrated into the federated ontology. A web application
can also be created to display the information and provide an
interface to search the data sets. This system can also be
integrated into a mobile application as a front end. This will
provide greater coverage of the system onto multiple devices.
References
[1] http://www.isi.edu/integration/karma/
[2] http://phd.jabenitez.com/wp-content/uploads/2014/03/A-
Practical-Guide-To-Building-OWL-Ontologies-Using-Protege-4.pdf
[3] http://ict.siit.tu.ac.th/~sun/SW/Protege%20Tutorial.pdf
[4] http://www.crummy.com/software/BeautifulSoup/
[5] https://chrome.google.com/webstore/detail/web-
scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en
[6] https://code.google.com/p/google-refine/
[7] http://www.datacleansing.net.au/Data_Cleansing_Services
[8] www.atpworldtour.com
[9] http://www.icc-cricket.com/player-rankings/overview
[10]
http://www.espncricinfo.com/ci/content/player/index.html
[11] http://cricsheet.org/
[12] https://code.google.com/p/snakeyaml/

Weitere ähnliche Inhalte

Was ist angesagt?

How to build a data dictionary
How to build a data dictionaryHow to build a data dictionary
How to build a data dictionaryPiotr Kononow
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architectureRahul Chaturvedi
 
Introduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheetsIntroduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheetsAnders Pedersen
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6varshakumar21
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementDaniel JACOB
 
Good (enough) research data management practices
Good (enough) research data management practicesGood (enough) research data management practices
Good (enough) research data management practicesLeon Osinski
 
Mis chapter5
Mis chapter5Mis chapter5
Mis chapter5Poleak
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedYugal Kumar
 

Was ist angesagt? (13)

How to build a data dictionary
How to build a data dictionaryHow to build a data dictionary
How to build a data dictionary
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architecture
 
Electronic Databases
Electronic DatabasesElectronic Databases
Electronic Databases
 
Introduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheetsIntroduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheets
 
DSA
DSADSA
DSA
 
Course outline
Course outlineCourse outline
Course outline
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6
 
Database
DatabaseDatabase
Database
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Good (enough) research data management practices
Good (enough) research data management practicesGood (enough) research data management practices
Good (enough) research data management practices
 
Mis chapter5
Mis chapter5Mis chapter5
Mis chapter5
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 

Ähnlich wie Federated Ontology for Sports- Paper

Rallying Around Standards
Rallying Around StandardsRallying Around Standards
Rallying Around Standardsahoffer
 
Sports and Big data
Sports and Big dataSports and Big data
Sports and Big dataDeZyre
 
Driving Digital Soccer Experiences with Structured Data Feeds
Driving Digital Soccer Experiences with Structured Data FeedsDriving Digital Soccer Experiences with Structured Data Feeds
Driving Digital Soccer Experiences with Structured Data FeedsDataSportsGroup
 
Technology and open knowledge in sports statistics
Technology and open knowledge in sports statisticsTechnology and open knowledge in sports statistics
Technology and open knowledge in sports statisticsdwiederman
 
Amader Project Presentation
Amader Project PresentationAmader Project Presentation
Amader Project Presentationguestd2b579
 
The Essential Role of Data Feeds in Modern Football
The Essential Role of Data Feeds in Modern FootballThe Essential Role of Data Feeds in Modern Football
The Essential Role of Data Feeds in Modern FootballDataSportsGroup
 
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...Shrikant Mandlik
 
ONLINE MOMBASA COUNTY FOOTBALL MANAGEMENT INFORMATION SYSTEM “A CASE STUDY ...
ONLINE MOMBASA COUNTY FOOTBALL MANAGEMENT INFORMATION SYSTEM   “A CASE STUDY ...ONLINE MOMBASA COUNTY FOOTBALL MANAGEMENT INFORMATION SYSTEM   “A CASE STUDY ...
ONLINE MOMBASA COUNTY FOOTBALL MANAGEMENT INFORMATION SYSTEM “A CASE STUDY ...Mwakio Joseph M
 
Major League Soccer Player Analysis
Major League Soccer Player AnalysisMajor League Soccer Player Analysis
Major League Soccer Player AnalysisChris Armstrong
 
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docxChapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docxtidwellveronique
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning PredictionIRJET Journal
 
Nfl Case Study
Nfl Case StudyNfl Case Study
Nfl Case Studydct28md
 
Analysis and Prediction of Sentiments for Cricket Tweets using Hadoop
Analysis and Prediction of Sentiments for Cricket Tweets using HadoopAnalysis and Prediction of Sentiments for Cricket Tweets using Hadoop
Analysis and Prediction of Sentiments for Cricket Tweets using HadoopIRJET Journal
 
Sports and-semantic-tech-v.public
Sports and-semantic-tech-v.publicSports and-semantic-tech-v.public
Sports and-semantic-tech-v.publicPaul Kelly
 
The Evolution and Power of Football Data Feeds.pdf
The Evolution and Power of Football Data Feeds.pdfThe Evolution and Power of Football Data Feeds.pdf
The Evolution and Power of Football Data Feeds.pdfDataSportsGroup
 
Predicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining TechniquesPredicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining TechniquesIJCSIS Research Publications
 
The Digital Revolution of Sports Data
The Digital Revolution of Sports DataThe Digital Revolution of Sports Data
The Digital Revolution of Sports DataDataSportsGroup
 

Ähnlich wie Federated Ontology for Sports- Paper (20)

Rallying Around Standards
Rallying Around StandardsRallying Around Standards
Rallying Around Standards
 
Sports and Big data
Sports and Big dataSports and Big data
Sports and Big data
 
Driving Digital Soccer Experiences with Structured Data Feeds
Driving Digital Soccer Experiences with Structured Data FeedsDriving Digital Soccer Experiences with Structured Data Feeds
Driving Digital Soccer Experiences with Structured Data Feeds
 
Technology and open knowledge in sports statistics
Technology and open knowledge in sports statisticsTechnology and open knowledge in sports statistics
Technology and open knowledge in sports statistics
 
Amader Project Presentation
Amader Project PresentationAmader Project Presentation
Amader Project Presentation
 
The Essential Role of Data Feeds in Modern Football
The Essential Role of Data Feeds in Modern FootballThe Essential Role of Data Feeds in Modern Football
The Essential Role of Data Feeds in Modern Football
 
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
 
ONLINE MOMBASA COUNTY FOOTBALL MANAGEMENT INFORMATION SYSTEM “A CASE STUDY ...
ONLINE MOMBASA COUNTY FOOTBALL MANAGEMENT INFORMATION SYSTEM   “A CASE STUDY ...ONLINE MOMBASA COUNTY FOOTBALL MANAGEMENT INFORMATION SYSTEM   “A CASE STUDY ...
ONLINE MOMBASA COUNTY FOOTBALL MANAGEMENT INFORMATION SYSTEM “A CASE STUDY ...
 
Major League Soccer Player Analysis
Major League Soccer Player AnalysisMajor League Soccer Player Analysis
Major League Soccer Player Analysis
 
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docxChapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning Prediction
 
Nfl Case Study
Nfl Case StudyNfl Case Study
Nfl Case Study
 
www-businesswire-com
www-businesswire-comwww-businesswire-com
www-businesswire-com
 
Analysis and Prediction of Sentiments for Cricket Tweets using Hadoop
Analysis and Prediction of Sentiments for Cricket Tweets using HadoopAnalysis and Prediction of Sentiments for Cricket Tweets using Hadoop
Analysis and Prediction of Sentiments for Cricket Tweets using Hadoop
 
Sports and-semantic-tech-v.public
Sports and-semantic-tech-v.publicSports and-semantic-tech-v.public
Sports and-semantic-tech-v.public
 
The Evolution and Power of Football Data Feeds.pdf
The Evolution and Power of Football Data Feeds.pdfThe Evolution and Power of Football Data Feeds.pdf
The Evolution and Power of Football Data Feeds.pdf
 
Cricket 2
Cricket 2Cricket 2
Cricket 2
 
Predicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining TechniquesPredicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining Techniques
 
STATS Middle East
STATS Middle EastSTATS Middle East
STATS Middle East
 
The Digital Revolution of Sports Data
The Digital Revolution of Sports DataThe Digital Revolution of Sports Data
The Digital Revolution of Sports Data
 

Kürzlich hochgeladen

Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...gerogepatton
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labsamber724300
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...IJAEMSJORNAL
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptNoman khan
 
Detection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackingDetection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackinghadarpinhas1
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectGayathriM270621
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...arifengg7
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 

Kürzlich hochgeladen (20)

Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labs
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).ppt
 
Detection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackingDetection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and tracking
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subject
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 

Federated Ontology for Sports- Paper

  • 1. Federated Ontology for Sports CSCI586_Web_Group1_Topic1G: Database Interoperability Project Report Abhishek Agrawal agra47@usc.edu George Sam gsam@usc.edu Hari Haran Venugopal hvenugop@usc.edu Noopur Joshi noopurbj@usc.edu Abstract—Our project aims at providing a brief information on player background, tournament details (schedule, location etc.). We have created a federated ontology to include information about Soccer, tennis and cricket. This systems is then modeled for the data sets from each sport. We then run queries on the system to test our results. Keywords—RDF, Scraping I. INTRODUCTION The world of sports is very important from a technology perspective. The different sports constantly produce data and this is where it is crucial to develop systems that continuously analyze and extract information from this data. Many people follow multiple sports and in this day and age, they desire to have information about their favorite sports at the tip of their fingers. With the development of the mobile development platform, this information can be easily accessed. Multiple systems have been developed which make use of many computational concepts in Natural Language Processing, Machine Learning and Artificial Intelligence to extract information about the sources, process them and display the results. The problem is that the data is continuous and constantly increasing. Websites are available for specific sports only and rarely provide user the information about other sports. The idea of this project is to create an ontology based system that can collaborate data from multiple sources. This data is then modeled into a federated ontology for sports, consisting of data models and ontologies from multiple sources. Our intention is to create a system with a federated ontology to model multiple sports. In our system, we have started by modelling cricket, tennis and football (soccer). II. MOTIVATION A. Ontology for Sports There are Existing ontologies for sports. These ontologies cover most of the information required in the sport fields. In cricket the ontology covers the player information, like matches played, average batting score, average bowling score. But it is specific to cricket only. Similar ontologies are present for tennis and and other sports. But these sports are structurally different and thus their ontologies are very different. It is thus necessary to create an ontology that will try and integrate the different information about multiple sports. In today’s world of technology, we have access to tablets, mobile devices, laptops, and desktops and so on. There is a need for constant, intelligent, up-to-date, integrated and detailed information from the Web. Though there is lot of information available on the web but is unstructured and varying. Ontologies provide a way of maintaining this unstructured information in one format that can be easily shared. Having a common ontology helps to aggregate data from various sources. This data can be shared with other users or applications. B. Need for a Federated Ontology Everybody follows one or more sports. But we have to search information related to our choice of sport on different website. How much time will it take for you to search “all sports player from England”?? You will have to search for each sport individually. Consider a situation, your favorite sport person is “Rafael Nadal” and you want to know latest updates about him. You still to have to search update-to- knowledge about him individually. Federated sports ontology can help us to represent different sports and presents a common view. Also federated ontology is extendible in the sense that more information like new player details, player statistics, new sports or changes in rules of games can easily be added to previously gathered data. One of the core importance of federated ontology or in general terms “Sematic Web” is in the Data Analysis. Data Analyst spends a lot of time to convert unstructured data into structured data. Storing data based on a common ontology solves this problem and saves a lot of time. Simple Statistical information like “How do the players and/or teams measure up against one another in various categories?” Can easily be answered using federated ontology. Ontologies can be used in application like News feeds. Federated Ontologies helps us to combine editorial coverage of sports with all data feeds presented at one place. C. Our System Our system for a federated ontology models information about Cricket, Tennis and Football (soccer). This information
  • 2. consists of information about players, the sporting events, and the rankings about teams and players. Since these sports are structurally different, it is important to find out fields and attributes about the players and rankings that can be similar across all the sports. Information was gathered about the following domains:  Cricket o Player Information o Player Rankings for T20, ODI and for Test Match types o Team Information o Rankings for teams in T20, ODI and Test Match types  Tennis o Player Information including ranking o Tournament Information for Wimbledon, US Open, Australian Open and French Open  Football o Player information from the English Premier League and from the Spanish La Liga o Information about the games played for the 2 leagues To limit the size of data collected, we collected data for the duration from 2004 to 2014 (10 years). The information was collected in multiple files pertaining to each domain and type. These files were merged into single files. Steps for the Project: 1. Data Collection and Data Scraping 2. Data Cleaning 3. Ontology Creation 4. Data Modelling 5. Data Publishing 6. Running queries to extract information The diagram below highlights these steps. Fig 1. System Development Cycle III. DATA SCRAPING The data for the diffrent sports was collected by developing web scrapers. We have developed the scrapers in Java ad Python using libraries. The scrapers scrape though the web sites and collect information in the form of JSON files or CSV files. 1. Data Scraping for Cricket Cricket Dataset: In order to collect dataset for cricket we scrapped below websites  http://www.icc-cricket.com/player-rankings/overview  http://www.espncricinfo.com/ci/content/player/index.html  http://cricsheet.org/ We collected the Player and Team information from the espncricinfo.com and Ranking data from Icc-cricket.com. In order collect the data regarding the T20, T20I, ODI and Test matches, we used cricsheet.org. Scrapper used: Chrome Web Scrapper: In this we created the site maps and scrapped data using it. It is very handy and very powerful tool. Fig 2. ICC-Cricket Website Screenshot YAML Java Library: Matches data set was in YAML format. We converted this data to JSON format with required fields extracted using a SnakeYAML library. 2. Data Scraping for Tennis Source: www.atpworldtour.com Scraper Tool: Beautiful Soap Beautiful Soup is a Python based library consisting of features:  Simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting content.  The library converts incoming documents to Unicode and outgoing documents to UTF-8.  Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, consisting of parsing strategies or trade speed for flexibility. Steps taken to retrieve dataset for tennis domain: 1. We primarily focused out data set to top 100 ranked professional tennis players from atpworldtour. 2. For each tennis player we retrieved all the attributes like: Rank. Age, Birthplace, Residence, Height, Weight, Plays,
  • 3. TurnedPro, Coach, Website and Personal History using Beautiful Soap in python script. 3. We also scrapped content for all the grand slam details like year, winner, score for each slams: Australian Open, French Open, Wimbledon and US Open respectively. Fig 3. ATP Website Screenshot 3. Data Scraping for Football Data Source: http://www.soccerbase.com/ https://github.com/openfootball/en-england http://www.footballsquads.co.uk/spain/ http://www.footballsquads.co.uk/england/ Scraper Used: JSoup JSoup is a Java based library to scrap websites. It provide functions to get information in the form of text, or in the form of HTML tags. This information can then be parsed to extract tag attributes, tag content, etc. The data is collected in the form of JSON files. Thes files are then combined to create one file for players in English Premier League, for the Spanish La Liga and one file for information about the teams respectively. Fig 4. SoccerBase Website Screenshot IV. DATA CLEANING Data was cleaned using Google OpenRefine and Karma. The files had attributes like player profile and other information which was irrelevant for the data sets. These attributes create problems while mapping to data models. This step is important because conversion of files from one format to another generates a lot of discrepancies. Entities like white spaces and new line characters tend to break data objects and give errors while modelling. In our project we performed cleaning of Tennis data in Karma, while cricket and football data were cleaned using Google OpenRefine. Stages of Data Cleaning: • Import Data: We imported in the form of json format, XML and CSV files since, we our data sources was very diverse and we used different web scraping tools. • Merge Data Sets: We merged all the content belonging to a single domain onto a one file Eg: All the 4 different grand slams details where combined in single file. • Rebuild Missing Data: We filled the missing data with empty value. • Standardize and Normalize Data: Some of the fields where merged using Karma Pytransform scripts eg: First Name and Last Name to Name and separated fields into multiple columns Eg: Height containing data in Inches and cms into separate field for each of the height units and also discarded all the irrelevant fields. • De-Duplicate: All the duplicates values where discarded which contributed to noisy data. • Verify, Enrich and Export Data: Once the dataset was cleaned as per the requirement for the federated ontology mapping, we exported the dataset in json format, which would be consumed for the next phase - Data Modeling. Fig 5. Data Cleaning Steps V. ONTOLOGY CREATION Protégé is a free and open source ontology editor which was developed at Stanford. Protégé defines a graphical user interface to define ontologies. Our project required the creation of a federated Sports ontology and we used Protege for the creation of the ontology. The ontology of our project is shown below.
  • 4. Fig 6. Ontology Graph Ontology can be created either by the top-down approach or the bottom-up approach. The top-down approach of creating an ontology involves the definition of the most general concepts in the domain and then specifying the specialization of the concepts. In our case the first step would be to define concepts for all sports in general. The next step would be to define the specialization of every sport like Football, Cricket and Tennis used in our project. The bottom-up approach start with the definition of the most specific concepts, starting from the leaves of the hierarchy and then mentioning the general concepts of the ontology. For example in our ontology it would mean defining the CricketPlayer, FootballPlayer and the TennisPlayer Class first and then going all the way up the hierarchy of the ontology. There also exists an hybrid approach which is a combination of both the top-down approach and the bottom-up approach in the development of the ontology. Here we define the salient concepts first and then either specialize or generalize We may start with the Sports and Player class first and then we may specialize about the individual player classes further continuing with the other classes like the Match or the Tournament class. In the creation of our ontology we have used the Hybrid approach. The class hierarchy of our ontology is shown below: Fig 7. Class Heirarchy All the subclasses are mapped under the Top-Level classes. For example the CricketPlayer is mapped under the Players class. The Object Properties of our Ontology is as shown below: Fig 8. Object Properties Screenshot These properties help us to draw inferences among the different concepts. For example looking at our class hierarchy and the list of Object Properties we can show the relation that a Player is an AthleteOf a Sport. Fig 9. Property Mapping While defining the object property we can define both the Domain and Range of the property in Protégé. For example consider the above relation between the Player and the Sport Class. The object property being defined here is "AthleteOf". The instance of the Player class would be the AthleteOf the Sport Class. Thus the domain would be Player Class and the Range for the property would be the Sport Class. The data properties of our ontology are shown below:
  • 5. Fig 10. Data Properties They are used to specify the special attributes associated with every class. In our ontology we have different data properties like the MatchTeam1, MatchTeam2 which specify the teams against which the Matches were played. When creating the data properties in Protégé, we can specify the domain and the range of the property. For example let us consider the Height property. It is a data property specified for all players in general, so the domain of the Property would be the class Player. Since the height of the player would either be represented as an integer or a float, the Range of it would be either integer or float. VI. DATA MODELLING A. Tools There are many tools available at our disposal to create a data model for this federated ontology. Most commonly used tools are Protégé and Karma. In our system we have used Karma to model the data for the data sets. The data modelling was done using Karma. The steps used to model the data are: • Import our sports ontology along with the other OWL files into the Karma workspace • Import the data sets one at a time into Karma • Create Semantic mappings for every attribute in the data set • Create class URIs for attributes which can be used as Keys • Publish the model • Load it into the triple store The data model was done for each data set one at a time. In the diagram below we have shown a data model for tennis player. The player JSON file had many attributes which needed modelling. It shows the classes used and the semantic information for the properties. In this data set we intended to merge the data with information about the tennis tournaments – Wimbledon, French Open, US Open and Australian Open. In this merging it is important for us to create a key attribute which acts like a key for merging data. This is in the form of a URI field. In our data set we decided to use the Player name for a URI. We did not have a field for URI so we created one using Python transformations to create a new data field called PlayerURI. This field will act as a class URI for the player class. When the data is loaded in the triple store this field will act as the key to merge the data sets. We created the PlayerURI field as the class URI and it shows the link in the form of dashed green link. Other attributes are class properties. As we build the data model to map higher classes, Karma automatically suggested the properties like dcterms:type between the Players class and TennisPlayer class. Fig 11. Tennis Player Model- Karma Once the data model is created we had to check for all semantic mappings if they were correct and if they matched to the ontology. Below are screenshots for cricket and football players. Fig 12. Cricket Player Model- Karma
  • 6. VII. DATA PUBLISHING The important step after modelling of the data is to publish the data into a triple store. Karma creates bindings for every attribute and class. The data model maps onto the ontology we have created. We have class mappings for every football player cricket player and tennis player. Similarly in our sport games data sets, we create models for every tournament information. We have used OpenRDf to publish. OpenRDF comes integrated into Karma. It enables the user to link the triple store repository with the Karma instance, thus providing a way to easily transport the models and the RDFs from Karma into the triple store. Steps: 1. Create a Repository in OpenRDF 2. Publish the RDF files from Karma 3. Load the data into OpenRDF in the form of contexts 4. Verify the triples Below are few screenshots of the triple store with the contexts and the RDF triples. Each context stores RDF related to that particular sport. The contexts are: http://localhost.com/tennis http://localhost.com/cricket http://localhost.com/football Fig 13. RDF Triples in OpenRDF VIII.QUERIES The following SPARQL queries were run on the data sets to test the results of the modelling: 1. To extract names of players from all sports living in England PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax- ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX cs586: <http://www.semanticweb.org/CS586/> PREFIX schema: <http://www.schema.org/> PREFIX local: <http://localhost:8080/source/> PREFIX dcterms: <http://purl.org/dc/terms/> SELECT DISTINCT ?Pname ?type ?loc WHERE { { ?pl cs586:PlayerName ?Pname; dcterms:type ?p; cs586:MemberOf ?team. ?team cs586:TeamName ?loc. ?p rdf:type ?type . ?p rdf:type <http://www.semanticweb.org/CS586#CricketPlayer>. FILTER(regex(?loc,"England")) } UNION { ?pl cs586:PlayerName ?Pname; dcterms:type ?p; cs586:Nationality ?loc; cs586:MemberOf ?team. ?team cs586:TeamName ?tname. ?p rdf:type ?type . ?p rdf:type <http://www.semanticweb.org/CS586#FootballPlayer>. FILTER(regex(?loc,"ENG")) } UNION { ?pl cs586:PlayerName ?Pname; dcterms:type ?p. ?p rdf:type ?type. ?p rdf:type <http://www.semanticweb.org/CS586#TennisPlayer>; <http://www.semanticweb.org/CS586/BornIn> ?n. ?n cs586:LocationName ?loc. FILTER(regex(?loc,"England")) } } Result: Fig 14. Query 1 Result 2. Query to extract player names of left-handed players from all sports PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax- ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX cs586: <http://www.semanticweb.org/CS586/>
  • 7. PREFIX schema: <http://www.schema.org/> PREFIX local: <http://localhost:8080/source/> PREFIX dcterms: <http://purl.org/dc/terms/> SELECT DISTINCT ?Pname WHERE { { ?pl cs586:PlayerName ?Pname; dcterms:type ?p. ?p cs586:CricketBatStyle ?batstyle. ?p rdf:type <http://www.semanticweb.org/CS586#CricketPlayer>. FILTER(regex(?batstyle,"Left-hand")) } UNION { ?pl dcterms:type ?r; cs586:PlayerName ?Pname. ?r cs586:TennisPlays ?q. FILTER(regex(?q,"Left-hand")) } } Result: Fig 15. Query 2 Result 3. Query to find players above 30 years of age PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax- ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX cs586: <http://www.semanticweb.org/CS586/> PREFIX schema: <http://www.schema.org/> PREFIX local: <http://localhost:8080/source/> SELECT (COUNT(?age) AS ?AgeValue) WHERE { ?p cs586:HasRank ?rank; cs586:PlayerName ?Pname; cs586:BornOn ?date; cs586:Age ?age. FILTER(?age > 30) } Result: Fig 16. Query 3 Result IX. TOOLS USED A. Data Scraping: JSoup, JSON, Request, BeautifulSoup, Sesame OpenRDF, Apache Jena, Scrapy, WebScraper. B. Data Cleaning: • Karma Tool: Karma offers a programming-by-example interface to enable users to define data transformation scripts that transform data expressed in multiple data formats into a common format. • Google Refine : A power tool for working with messy data, cleaning it up, transforming it from one format into another. C. Ontology Creation  Protégé: Protégé is a free and open source ontology editor which was developed at Stanford. Protégé defines a graphical user interface to define ontologies. D. Data Modelling and Publishing  Karma  OpenRDF: It is an open source Apache Sesame based framework to create triple stores and load and query RDF triples. X. INDIVIDUAL CONTRIBUTION a. Abhishek Agrawal: Scrapped websites (http://www.espncricinfo.com/) and (http://www.icc-cricket.com/) and (http://cricsheet.org/) using a combination of Scrapy python library and Chrome Web scrapper. Did modelling for cricket files in Karma. Worked on Ontology Creation.
  • 8. b.George Sam: Performed Data Scraping in Python to extract information about players and league matches for English Premier League and Spanish La Liga Players. Did Data cleaning in Google OpenRefine. Worked on Data Modelling in Karma and Triple Store Creation in OpenRDF. Developed Sparql queries. c. Hari Haran Venugopal: Built a python script to scrap websites, collected relevant dataset in JSON format for domain tennis, primarily focusing on players details: Name, Bio-Data, Ranking, Personal History, Coach, Age, Nationality and tournament details considering all grand slams: year, winner, scores. Did Data cleaning in Karma. Developed SPARQL Queries. Performed data modelling for d.Noopur Joshi: Responsible for creating ontologies for each sport and the federated ontology for all sports combined on Protégé. The approach being used is the Hybrid approach for creating the ontologies. Initially an ontology for each individual sport namely Football, Tennis and Cricket is created. These ontologies were then mapped to the federated ontology for all sports. She handled Data cleaning for the datasets using Google OpenRefine. The paper referred for creating ontologies: http://130.88.198.11/tutorials/protegeowltutorial/resour ces/ProtegeOWLTutorialP4_v1_3.pdf Worked in data modelling for cricket and football players. XI. CONCLUSION AND FUTURE WORK We have thus been able to create a system which implements a federated ontology to include data from Tennis, Cricket and Football. This information is then further modelled into Karma and is then loaded into RDF files in a triple store. We have then run queries to gather some statistical data about players from all sports. Further work can be done in this field. Many other sports can be integrated into the federated ontology. A web application can also be created to display the information and provide an interface to search the data sets. This system can also be integrated into a mobile application as a front end. This will provide greater coverage of the system onto multiple devices. References [1] http://www.isi.edu/integration/karma/ [2] http://phd.jabenitez.com/wp-content/uploads/2014/03/A- Practical-Guide-To-Building-OWL-Ontologies-Using-Protege-4.pdf [3] http://ict.siit.tu.ac.th/~sun/SW/Protege%20Tutorial.pdf [4] http://www.crummy.com/software/BeautifulSoup/ [5] https://chrome.google.com/webstore/detail/web- scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en [6] https://code.google.com/p/google-refine/ [7] http://www.datacleansing.net.au/Data_Cleansing_Services [8] www.atpworldtour.com [9] http://www.icc-cricket.com/player-rankings/overview [10] http://www.espncricinfo.com/ci/content/player/index.html [11] http://cricsheet.org/ [12] https://code.google.com/p/snakeyaml/