SlideShare ist ein Scribd-Unternehmen logo
1 von 47
dbrec
Music recommendations using DBpedia
         Alexandre Passant - DERI, NUI Galway
                  In-Use Track @ ISWC2010
             11th November 2010, Shanghai, China
Good news, it doesn’t fit
anymore in a slide !

Many producers, only a
few consumers (besides
search engines): BBC,
Drupal ,,,
Agenda

• Semantic Distance over Linked Data
• dbrec - architecture, dataset and UI
• Evaluation
• Lessons learnt
• Next steps and conclusion
Semantic Distance
Semantic Distance over
    Linked Data
• Relying only on links
• Relying only on instance data
• Using dereferencable URIs
 • And using resources following the LD
    principles
Linked Data
Linked Data
              e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
G = (R, L, I)
                                         e:l1

                           e:r1   e:l1          e:r2

• R = {r , r , ..., r }
          1    2       n
                                    e:l2

• L = {l , l , ..., l }
         1 2       n       e:l2   e:l3          e:l3


• I = {i , i , ..., i }
         1 2       n

                           e:r3                 e:r4
e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4




              e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4




              e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4
e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
LDSD
The LDSD ontology




                Our own ontology, but
                could map with MuSim
                in the future
dbrec
At a glance
• A system providing recommendations for all
  DBpedia bands and artists (±40K) using LDSD
    • And explaining its recommendations
    • Both using Linked Data and Semantic
      Web standards (RDF, SPARQL)
• Integrating related Web data for an improved
  user-experience
Architecture
                (2) Dataset reducing




 (1) Dataset                       (3) LDSD                 (4) User
identification                     computation              interface

                    RDF Data                    RDF Data
Dataset
•   Retrieving all artists and bands in DBpedia (±40K)
    •   Including incoming / outcoming links
    •   Approximately 3M triples
•   Removing datatype properties
    •   2.2M (75%)
•   Merging /ontology and /property
    •   1.7M (55%)
Distribution




               20K+ artists (50%) are
               not linked to any other
               artist
Curation
• 118 properties linking artists together
  • 18 mis-used, 35 wrongly defined (e.g.
    dbprop:klfsgProperty)
• 578 properties linking artist to resources
  • 183 used only once, 36 wrongly defined
• 767 properties linking resources to artists
  • 336 used only once, 115 wrongly defined
• Dataset reduced to 1M triples
Computing distance
• 9,797 minutes
                 Done for all artists in
                 DBpedia
                                                Artist    Time (sec.)
                                              Ramones       25.20
  • 2 x AMD Opteron 250                     Johnny Cash     61.16
    4GB Ubuntu 8.10                              U2         50.06

• 50M triples                                The Clash
                                            Bad Religion
                                                            43.34
                                                            34.98
  • Modelled using the                     The Aggrolites    7.35
    LDSD ontology                            Janis Joplin   23.12
Artist        Distance
   Elvis Presley      0.0978
June Carter Cash      0.1056
  Willie Nelson       0.1322
Kris Kristofferson    0.1407
    Bob Dylan         0.1466
  Marty Robbins       0.1673
  Rosanne Cash        0.1782
 Charlie McCoy        0.1836
   Gene Autry         0.1910
    Carl Smith        0.1980
User interface
Sorry, slideshare people,
that’s a movie so you
won’t be able to see it !
Evaluation
Evaluation settings
• Off-line and on-line user evaluation
 • Using common RecSys metrics
• 10 subjects
 • 2 women, 8 men
 • 24 to 34 years old
 • 35 to 55 minutes per interview, F2F
Metrics
•   Off-line evaluation - comparison with last.fm
    •   5 artists / bands
    •   2 blind list, 10 ranked recommendations per list
    •   Marks from 1 to 5
•   On-line recommendation - dbrec only
    •   5 artists / bands
    •   Browsing recommendations using dbrec
    •   Marks from 1 to 5, plus observations and interviews
dbrec vs last.fm

• Average mark of recommendations
 • 3.37(±1.19)
 • 3.44(±1.25) w/ on-line
 • 3.69(±1.01) for last.fm
Results for the precision
(t=X means items are
                                Precision
relevant if ranked X or
more)

Cannot compute recall

                                 dbrec           dbrec
(implies users know all
bands in the system)
                                                             last.fm
                                (off-line)   (off+on-line)
                          t=2    92.05          90.59        98.32

                          t=3    76.63          77.72        87.91

                          t=4    49.06          51.23        58.05

                          t=5    20.09            25         25.165
Novel recommendations
• Lots of unknown recommendations
 • 62% for dbrec (59.6% w/ on-line)
 • 40.4% for last.fm
 • But that’s a good news !
• Evaluated 274 of them on dbrec
 • 3.05(± 1.09)
Observations
• Explanations for unknown bands
 • Checked for 198 / 310
• But also for known ones
 • 24 / 190
• Helped to understand the recommendation
 • Even if they already knew the band
Interviews
              User-interface Explanations
 Enjoyable          9             7
  Useful            9             9
 Enriching         8             10
Easy to use        10             9
 Confusing         0              2
Complicated        0              2
 Too geeky         1              6
Lessons learnt
Data quality
• Issues with DBpedia properties
  • Misused : dbprop:notableInstruments
  • Wrongly defined : dbprop:klfsgProperty
  • Duplicates : /ontology versus /property
• Requires data curation !
  • Automated and manual
Use, but replicate
• More and more public SPARQL endpoints
 • Often limited to X max results
 • 5,000 on DBpedia              But, that’s fair enough.

                                 Hosting a SPARQL
                                 endpoint is costly and


• Difficult to use in production
                                 opening-it up fully to
                                 anyone would require lots
                                 of maintenance, etc.



 • Requires local replica
 • But implies synchronisation !
Use, but replicate
SELECT ?label
WHERE {
    ?x rdfs:label ?label .
    { ?x a dbpedia:MusicArtist }
    UNION
    { ?x a dbpedia:Band }
}
Use, but replicate

• Names of all DBpedia artists
 • Get number of results w/ COUNT
 • Run n/5000 queries (LIMIT + OFFSET)
 • Recompose results         The query had more than
                             40K results, since most
                             artists got their names


• Network errors, etc.
                             using different
                             languages.

                             So much more than 8
                             queries
SPARQL, Be quick or be neat
   • “List all artists / bands sharing common
     property-values with the current one”
     • Fits in a single SPARQL query
     • But does not scale
   • “Optimisation” has to be done manually by
     splitting the query and recomposing results
     using an external script
SPARQL, Be quick or be neat
                                                                  Tests done in the local
                                                                  RDF store

                                                                  1: full-query
                                                                  2: split by property
                                                                  3: split by property-
                                                                  object

                                                                  Up to 75% faster

                   Direct SPARQL       Property-slicing      Complete-slicing
                 Queries     Time    Queries       Time    Queries           Time
  Ramones          1        139.97     20         109.51     66              37.84
 Johnny Cash       1        257.81     30         152.60    135              75.35
     U2            1        155.53     22         122.91     70              44.03
  The Clash        1        146.43     20         110.84     79              42.61
 Bad Religion      1        104.08     23          86.49     97              47.35
The Aggrolites     1        145.92     13         114.52     28              28.33
 Janis Joplin      1        230.88     27         151.00     98              62.81
Next steps
Next steps
•   Other data sources
    •   FreeBase, MusicBrainz, etc.
•   Distance improvement
    •   Propagation, feature selection, etc.
•   User Interface
    •   User-friendly explanations
•   LOD-compliance
    •   Mapping with other ontologies, SPARQL endpoint
Conclusion
• Defined and applied a Semantic Distance
  measure to Linked Data
• Used it to build a end-user music
  recommender system, with ±40K artists
• Evaluated it using RecSys metrics
• Learnt several domain-independent lessons
  regarding LOD consumption
Questions ?
Contact:
alexandre.passant@deri.org - http://apassant.net - @terraces

                   Acknowledgements:
   Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2)

                       References:
    AIII Spring Symposium 2010 - LinkedAI Symposium
                 ESWC2010 - Demo Track
                  ISWC2010 - In-Use Track
Pictures credits
•   http://flickr.com/photos/yumlog2/20896759/ by yuki*

•   http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch

•   http://flickr.com/photos/loungerie/2196866243/ by loungerie

•   http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor

•   http://flickr.com/photos/homer4k/461407380/ by homer4k

•   http://flickr.com/photos/jpellgen/2390204986/ by jpellgen

•   http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee

•   http://flickr.com/photos/28509009@N03/2668650475/ by marcreis

•   http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone

Weitere ähnliche Inhalte

Ähnlich wie Dbrec - Music recommendations using DBpedia

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newchizhangufl
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, LucidworksHow SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, LucidworksLucidworks
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009Jose Quesada
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithmsDuyhai Doan
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonHakka Labs
 

Ähnlich wie Dbrec - Music recommendations using DBpedia (20)

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology new
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, LucidworksHow SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
 
Digital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDFDigital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDF
 
Hive at Last.fm
Hive at Last.fmHive at Last.fm
Hive at Last.fm
 

Mehr von Alexandre Passant

seevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discoveryseevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music DiscoveryAlexandre Passant
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discoveryAlexandre Passant
 
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Alexandre Passant
 
Seevl - SemTech lightning talk
Seevl - SemTech lightning talkSeevl - SemTech lightning talk
Seevl - SemTech lightning talkAlexandre Passant
 
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le WebSPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le WebAlexandre Passant
 
Social Web - The Next Generation
Social Web - The Next GenerationSocial Web - The Next Generation
Social Web - The Next GenerationAlexandre Passant
 
Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Alexandre Passant
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticAlexandre Passant
 
SMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic MicrobloggingSMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic MicrobloggingAlexandre Passant
 
A semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversationsA semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversationsAlexandre Passant
 
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...Alexandre Passant
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebAlexandre Passant
 
Ontologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en EntrepriseOntologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en EntrepriseAlexandre Passant
 
A user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:storeA user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:storeAlexandre Passant
 
Folksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate BloggingFolksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate BloggingAlexandre Passant
 
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...Alexandre Passant
 
Using Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online CommunitiesUsing Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online CommunitiesAlexandre Passant
 
Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0Alexandre Passant
 

Mehr von Alexandre Passant (20)

seevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discoveryseevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discovery
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discovery
 
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
 
Seevl - SemTech lightning talk
Seevl - SemTech lightning talkSeevl - SemTech lightning talk
Seevl - SemTech lightning talk
 
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le WebSPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
 
Social Web - The Next Generation
Social Web - The Next GenerationSocial Web - The Next Generation
Social Web - The Next Generation
 
Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you
 
i-Semantics panel
i-Semantics paneli-Semantics panel
i-Semantics panel
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
 
SMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic MicrobloggingSMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic Microblogging
 
A semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversationsA semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversations
 
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Ontologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en EntrepriseOntologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en Entreprise
 
A user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:storeA user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:store
 
Folksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate BloggingFolksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate Blogging
 
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
 
The Social Web
The Social WebThe Social Web
The Social Web
 
Using Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online CommunitiesUsing Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online Communities
 
Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0
 

Kürzlich hochgeladen

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Dbrec - Music recommendations using DBpedia

  • 1. dbrec Music recommendations using DBpedia Alexandre Passant - DERI, NUI Galway In-Use Track @ ISWC2010 11th November 2010, Shanghai, China
  • 2. Good news, it doesn’t fit anymore in a slide ! Many producers, only a few consumers (besides search engines): BBC, Drupal ,,,
  • 3.
  • 4. Agenda • Semantic Distance over Linked Data • dbrec - architecture, dataset and UI • Evaluation • Lessons learnt • Next steps and conclusion
  • 6. Semantic Distance over Linked Data • Relying only on links • Relying only on instance data • Using dereferencable URIs • And using resources following the LD principles
  • 8. Linked Data e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 9. G = (R, L, I) e:l1 e:r1 e:l1 e:r2 • R = {r , r , ..., r } 1 2 n e:l2 • L = {l , l , ..., l } 1 2 n e:l2 e:l3 e:l3 • I = {i , i , ..., i } 1 2 n e:r3 e:r4
  • 10. e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 11. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4
  • 12. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4 e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 13. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4 e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4
  • 14. e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 15. LDSD
  • 16. The LDSD ontology Our own ontology, but could map with MuSim in the future
  • 17. dbrec
  • 18. At a glance • A system providing recommendations for all DBpedia bands and artists (±40K) using LDSD • And explaining its recommendations • Both using Linked Data and Semantic Web standards (RDF, SPARQL) • Integrating related Web data for an improved user-experience
  • 19. Architecture (2) Dataset reducing (1) Dataset (3) LDSD (4) User identification computation interface RDF Data RDF Data
  • 20. Dataset • Retrieving all artists and bands in DBpedia (±40K) • Including incoming / outcoming links • Approximately 3M triples • Removing datatype properties • 2.2M (75%) • Merging /ontology and /property • 1.7M (55%)
  • 21. Distribution 20K+ artists (50%) are not linked to any other artist
  • 22. Curation • 118 properties linking artists together • 18 mis-used, 35 wrongly defined (e.g. dbprop:klfsgProperty) • 578 properties linking artist to resources • 183 used only once, 36 wrongly defined • 767 properties linking resources to artists • 336 used only once, 115 wrongly defined • Dataset reduced to 1M triples
  • 23. Computing distance • 9,797 minutes Done for all artists in DBpedia Artist Time (sec.) Ramones 25.20 • 2 x AMD Opteron 250 Johnny Cash 61.16 4GB Ubuntu 8.10 U2 50.06 • 50M triples The Clash Bad Religion 43.34 34.98 • Modelled using the The Aggrolites 7.35 LDSD ontology Janis Joplin 23.12
  • 24. Artist Distance Elvis Presley 0.0978 June Carter Cash 0.1056 Willie Nelson 0.1322 Kris Kristofferson 0.1407 Bob Dylan 0.1466 Marty Robbins 0.1673 Rosanne Cash 0.1782 Charlie McCoy 0.1836 Gene Autry 0.1910 Carl Smith 0.1980
  • 26. Sorry, slideshare people, that’s a movie so you won’t be able to see it !
  • 28. Evaluation settings • Off-line and on-line user evaluation • Using common RecSys metrics • 10 subjects • 2 women, 8 men • 24 to 34 years old • 35 to 55 minutes per interview, F2F
  • 29. Metrics • Off-line evaluation - comparison with last.fm • 5 artists / bands • 2 blind list, 10 ranked recommendations per list • Marks from 1 to 5 • On-line recommendation - dbrec only • 5 artists / bands • Browsing recommendations using dbrec • Marks from 1 to 5, plus observations and interviews
  • 30. dbrec vs last.fm • Average mark of recommendations • 3.37(±1.19) • 3.44(±1.25) w/ on-line • 3.69(±1.01) for last.fm
  • 31. Results for the precision (t=X means items are Precision relevant if ranked X or more) Cannot compute recall dbrec dbrec (implies users know all bands in the system) last.fm (off-line) (off+on-line) t=2 92.05 90.59 98.32 t=3 76.63 77.72 87.91 t=4 49.06 51.23 58.05 t=5 20.09 25 25.165
  • 32. Novel recommendations • Lots of unknown recommendations • 62% for dbrec (59.6% w/ on-line) • 40.4% for last.fm • But that’s a good news ! • Evaluated 274 of them on dbrec • 3.05(± 1.09)
  • 33. Observations • Explanations for unknown bands • Checked for 198 / 310 • But also for known ones • 24 / 190 • Helped to understand the recommendation • Even if they already knew the band
  • 34. Interviews User-interface Explanations Enjoyable 9 7 Useful 9 9 Enriching 8 10 Easy to use 10 9 Confusing 0 2 Complicated 0 2 Too geeky 1 6
  • 36. Data quality • Issues with DBpedia properties • Misused : dbprop:notableInstruments • Wrongly defined : dbprop:klfsgProperty • Duplicates : /ontology versus /property • Requires data curation ! • Automated and manual
  • 37. Use, but replicate • More and more public SPARQL endpoints • Often limited to X max results • 5,000 on DBpedia But, that’s fair enough. Hosting a SPARQL endpoint is costly and • Difficult to use in production opening-it up fully to anyone would require lots of maintenance, etc. • Requires local replica • But implies synchronisation !
  • 38. Use, but replicate SELECT ?label WHERE { ?x rdfs:label ?label . { ?x a dbpedia:MusicArtist } UNION { ?x a dbpedia:Band } }
  • 39. Use, but replicate • Names of all DBpedia artists • Get number of results w/ COUNT • Run n/5000 queries (LIMIT + OFFSET) • Recompose results The query had more than 40K results, since most artists got their names • Network errors, etc. using different languages. So much more than 8 queries
  • 40. SPARQL, Be quick or be neat • “List all artists / bands sharing common property-values with the current one” • Fits in a single SPARQL query • But does not scale • “Optimisation” has to be done manually by splitting the query and recomposing results using an external script
  • 41. SPARQL, Be quick or be neat Tests done in the local RDF store 1: full-query 2: split by property 3: split by property- object Up to 75% faster Direct SPARQL Property-slicing Complete-slicing Queries Time Queries Time Queries Time Ramones 1 139.97 20 109.51 66 37.84 Johnny Cash 1 257.81 30 152.60 135 75.35 U2 1 155.53 22 122.91 70 44.03 The Clash 1 146.43 20 110.84 79 42.61 Bad Religion 1 104.08 23 86.49 97 47.35 The Aggrolites 1 145.92 13 114.52 28 28.33 Janis Joplin 1 230.88 27 151.00 98 62.81
  • 43. Next steps • Other data sources • FreeBase, MusicBrainz, etc. • Distance improvement • Propagation, feature selection, etc. • User Interface • User-friendly explanations • LOD-compliance • Mapping with other ontologies, SPARQL endpoint
  • 44. Conclusion • Defined and applied a Semantic Distance measure to Linked Data • Used it to build a end-user music recommender system, with ±40K artists • Evaluated it using RecSys metrics • Learnt several domain-independent lessons regarding LOD consumption
  • 46. Contact: alexandre.passant@deri.org - http://apassant.net - @terraces Acknowledgements: Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2) References: AIII Spring Symposium 2010 - LinkedAI Symposium ESWC2010 - Demo Track ISWC2010 - In-Use Track
  • 47. Pictures credits • http://flickr.com/photos/yumlog2/20896759/ by yuki* • http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch • http://flickr.com/photos/loungerie/2196866243/ by loungerie • http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor • http://flickr.com/photos/homer4k/461407380/ by homer4k • http://flickr.com/photos/jpellgen/2390204986/ by jpellgen • http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee • http://flickr.com/photos/28509009@N03/2668650475/ by marcreis • http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone