Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

seevl: Data-driven music discovery

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 47 Anzeige

seevl: Data-driven music discovery

Presentation about seevl.net at the LA SemWeb meetup, October 2nd 2012

http://www.meetup.com/lasemweb/events/83232222/

Presentation about seevl.net at the LA SemWeb meetup, October 2nd 2012

http://www.meetup.com/lasemweb/events/83232222/

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie seevl: Data-driven music discovery (20)

Anzeige

Weitere von Alexandre Passant (20)

Anzeige

seevl: Data-driven music discovery

  1. 1. seevl: Data-driven music discovery Alexandre Passant, co-founder, CEO, MDG Web ltd http://seevl.net // @seevl // alex@seevl.net // @terraces LA SemWeb & WebSpeed Meet-up, 2 October 2012 Cross Campus, Santa Monica
  2. 2. a bit of backgroud...
  3. 3. • Knowledge Engineering • Social Web & Enterprise 2.0 • Sensor Networks & Real-Time
  4. 4. architecture
  5. 5. dbpedia:Bad_Brains dbpedia:Hardcore_Punk p:associatedActs p:genre p:genre :alex foaf:topic_interest dbpedia:Beastie_Boys dbpedia:Black_Flag_(band) p:currentMembers dbpedia:Adam_Yauch dbpedia:B._B._King skos:subject skos:subject dbpedia:Category:American_vegatarians
  6. 6. dbpedia:Bad_Brains dbpedia:Hardcore_Punk p:associatedActs p:genre p:genre :alex foaf:topic_interest dbpedia:Beastie_Boys dbpedia:Black_Flag_(band) p:currentMembers dbpedia:Adam_Yauch dbpedia:B._B._King skos:subject skos:subject dbpedia:Category:American_vegatarians
  7. 7. Our approach: SLADE • Semantic LAyer for Data Exploration • A framework to build data-driven apps • ETL from existing sources / APIs • Search, discovery, recommendations • Data access / API • Generic, config-based, domain-agnostic
  8. 8. The pipeline Data-extraction and interlinking Entity-centric semantic knowledge base Web data sources (artists, genres, labels, locations...) Storage REST-ful interface Search, discovery and recommendation seevl products engine, on-top of our graph-database
  9. 9. Challenges • Some technical challenges faced when building SLADE and seevl.net • Data models: Chosing the right schemas • Data access: SPARQL or API or ... ? • Scalability: Caching and optimisation strategies • User Experience: User-centric design
  10. 10. data models
  11. 11. RDF since day one • RDF ? • Agile model (ideal when iterating) • Intuitive aspect of graph modelling • Standard toolkits (SPARQL / HTTP) • OWL? RDFS? • Minor use of inference (type, hierarchies)
  12. 12. Artist data • Music Ontology • Label, Genres, Influences,Origins ... • Collaborations between artists • Activity period (add-on) • Additional models/mappings • e.g. Bio Vocabulary (birth/death), FOAF...
  13. 13. Social activities • SIOC & SIOC-actions • Social graph / sub-graph • Action-centric activities (like, listen) • Inferring user’s taste profile • Top artist, genres, labels • Using latest actions
  14. 14. Similarity / Recsys • Graph-based similarities • Data-driven recommendations • Ranking using weight-factors • Explanations / tracking • The Similarity Ontology • Domain-agnostic
  15. 15. Provenance • Keep trace of every statement in the ETL • Origin, type and time of extraction • With a low number of additional triples • Introducing “data-slices” • Multiple slices (=subgraphs) per resource • Quick updates (DELETE / INSERT)
  16. 16. Provenance and graphs GRAPH svl:seevl_id/wikipedia/facts/extract { svl: seevl_id mo:genre svl:BntvuZAy . svl:seevl_id/wikipedia/extract dc:created “2012-10-25” ; rdfs:seeAlso wikipedia:Social_Distortion . }
  17. 17. data access
  18. 18. SPARQL • Pros • W3C Standard, Powerful • HTTP-based w/ SPARQL Protocol • SPARQL Update in 1.1 • Cons • Learning curve for non-RDF people
  19. 19. URI patterns + JSON-LD • Pre-defined URIs mapped to SPARQL query patterns, returning JSON-LD data • Search queries or resources description • Content-negotiation or ?_format=json • GET and POST • POST => SPARQL UPDATE • GET => SPARQL SELECT / ASK
  20. 20. JSON-LD • JSON for Linking Data • The best of both worlds • JSON serialization, works with any parser • Additional semantics (URIs, typed links, etc.) with JSON-LD parsers • Use of context/mappings to avoid URIs
  21. 21. Search • /entity/?property=value • JSON-LD mappings used in URI templates • Works with literals, dates, resources • Ranking algorithm / alpha-ranking • Patterns defined in a single config file
  22. 22. Search (text) • /entity/? prefLabel=clash&type=artist&_sort=count_desc • Translated into SELECT ?x WHERE { ?x a mo:artist ; skos:prefLabel ?x . ?x bif:contains “clash” . }
  23. 23. Search (relations) • /entity/?genre=BntvuZAy&type=artist • Translated into SELECT ?x WHERE { ?x a mo:artist ; mo:genre svl:BntvuZAy . }
  24. 24. Resource description • Patterns mapped to resource URI to retrieve subset of the resource description • /entity/seevl_id/infos • /entity/seevl_id/facts • /entity/seevl_id/links • /entity/seevl_id/related(/related_id)
  25. 25. scalability
  26. 26. Is SPARQL fast enough? • SPARQL is very powerful, but can be slow • Some simple queries may lead to deep graph patterns or transversal queries depending on the modelling • FILTERS (e.g. text and date based queries) are expensive • Not all triple-stores are equal
  27. 27. Splitting queries • “List all resource sharing common property-values with the current one, whatever that property is” • Fits in a single SPARQL query • Doesn’t properly scale • Becoming faster when splitting the query and recomposing results via internal scripts
  28. 28. SPARQL: splitting queries Direct SPARQL Property-slicing Complete-slicing Queries Time Queries Time Queries Time Ramones 1 139.97 20 109.51 66 37.84 Johnny Cash 1 257.81 30 152.60 135 75.35 U2 1 155.53 22 122.91 70 44.03 The Clash 1 146.43 20 110.84 79 42.61 Bad Religion 1 104.08 23 86.49 97 47.35 The Aggrolites 1 145.92 13 114.52 28 28.33 Janis Joplin 1 230.88 27 151.00 98 62.81
  29. 29. SPARQL + Redis • Started by using Memcache to store query results (e.g. “?x genre $y”) • Good, but costly for the first user • Then, materialising results in-memory using Redis as a key-value cache system • Low indexing time (few minute on laptop) • Increasing query-performance, real-time
  30. 30. SPARQL + Redis • Redis • HSET to define entities (minimal data) • ZADD to store ordered sets of key- values, with our own ranking scheme • ZRANGE to retreive w/ correct order • Everything in memory, instant query results
  31. 31. SPARQL + Redis self.redis.hset(entity, 'uri', uri) self.redis.hset(entity, 'prefLabel', prefLabel) self.redis.hset(entity, 'description', description) self.redis.zadd(‘genre:BntvuZAy’, entity, score) ... self.redis.zrange(pattern, min, max, 'withscores')
  32. 32. user-experience
  33. 33. User-experience • Interfaces for graph-based/semantic data • Don’t need to be ugly! • As long as they’re built for users first • Focus on vertical-UX, rather than SemWeb-UX • Check best practices in the domain • Involve HCI / non-SemWeb people
  34. 34. take-away message
  35. 35. Lessons learnt • Don’t reinvent the wheel, check existing stacks and use what fits for the job • Make it simple for your developers, using REST-ful interfaces and design patterns • Accept compromises, be pragmatic • This of users / create persona who are not SemWeb-geeks when designing the UX
  36. 36. Questions? http://seevl.net // @seevl alex@seevl.net // @terraces

Hinweis der Redaktion

  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

×