Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)

712 Aufrufe

Veröffentlicht am

http://2015.semantics.cc/ian-piper

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)

  1. 1. Scaling  Seman+c  Technology  to   Increase  User  Engagement  -­‐  FT.com       September,  16th  2015       Ontotext, Scaling Semantic Technology #1Sept, 2015
  2. 2. •  Introducing  Ontotext   •  Related  Reads  –  a  FT.com  use  case   •  What  we  managed  to  achieve   •  Hands  on  FT.com  live   •  PosiHve  signs  across  the  news  and  media  domain   •  Hands  on  NOW  –  News  on  the  Web  demo  service     Outline   Ontotext, Scaling Semantic Technology #2Sept, 2015
  3. 3. Why?    enable  be>er  search,  analy+cs  and  content  delivery   What?    data  and  content  management  technology      graph  database  engine  +  text-­‐mining  solu+ons   How?  seman+c  analysis  of  text,  linking  text  to  data    NoSQL  database  with  inference   Best  for:  dealing  with  heterogeneous  dynamic  data   Clients:  BBC,  FT,  Bloomberg,  DK,  AstraZeneca,  Wiley,  etc.   Facts:    70  staff;  HQ  in  Sofia;  sales  in  London  &  New  York   USP:  the  best  semanHc  graph  database  engine    text-­‐mining  pla[orm  integrated  with  graph  database   Company  Brief   Ontotext, Scaling Semantic Technology #3Sept, 2015
  4. 4. Sample  RDF  Graph:  Data  and  Schema   #4Sept, 2015 myData: Maria ptop:Agent ptop:Person ptop:Woman ptop:childOf ptop:parentOf rdfs:range owl:inverseOf inferred myData:Ivan owl:relativeOf owl:inverseOfowl:SymmetricProperty rdfs:subPropertyOf owl:inverseOf owl:inverseOf rdf:type rdf:type rdf:type Ontotext, Scaling Semantic Technology
  5. 5. Interlinking  Text  and  Data   Ontotext, Scaling Semantic Technology #5Sept, 2015
  6. 6. Seman+c  Annota+on   Ontotext, Scaling Semantic Technology #6 pmid:17714090 umls:C0035204 COPD Bronchial Diseases Respiration Disorders umls:C0006261 Chronic Obstructive Airway Diseases Asthma umls:C000496 Ian A Yang Clinical and experimental pharmacology … Sept, 2015
  7. 7. Technology  PorTolio   Ontotext, Scaling Semantic Technology #7Sept, 2015
  8. 8. Ontotext  and  Financial  Times   Ontotext, Scaling Semantic Technology Profile   •  Top  3  business  media   •  Focused  both  on  B2C  publishing  and  B2B   services     Goals   •  Create  a  horizontal  pla[orm  for  both  data   and  content  based  on  semanHcs  and  serve   all  funcHonality  through  it   Challenges   •  CriHcal  part  of  the  enHre  workflow   •  MulHple  development  projects  in  parallel   with  up  to  2  months  Hme  between   incepHon  and  go  live     •  Horizontal  pla[orm  with  focus  on   organizaHons,  people,  GPEs  and  relaHons   between  them   •  AutomaHc  extracHon  of  all  these  concepts   and  relaHonships     •  Separate  stream  of  work  for  a  user  behavior   based  recommenda+on  of  relevant  content   and  data  across  the  enHre  media   #8Sept, 2015
  9. 9.       Serve  relevant  arHcles     to  increase  user  engagement     and  improve  usability   FT  Primary  Objec+ve   Ontotext, Scaling Semantic Technology #9Sept, 2015
  10. 10.   Subject:  User   Object:  Ar+cle,  Media  Asset,  Data,  …     AcHon:  Read,  Preview,  Comment,  …         Subject,  Object,  Ac+on   Ontotext, Scaling Semantic Technology #10Sept, 2015 action
  11. 11.           Contextual  Recommenda+on   Ontotext, Scaling Semantic Technology #11Sept, 2015 Contextual Similarity
  12. 12.           Behavioural  Recommenda+on   Ontotext, Scaling Semantic Technology #12Sept, 2015 Behavioural Similarity User Profile
  13. 13.           Contextual  and  Behavioural  in  Combina+on   Ontotext, Scaling Semantic Technology #13Sept, 2015 Behavioural and Contextual SimilarityReads User Profile
  14. 14.           Average  News  Ar+cle  Metadata   Ontotext, Scaling Semantic Technology #14Sept, 2015 Article N Y promoted (popular) updated created image summary title ID URL reads views votes comments
  15. 15.           FT  Ar+cle  Metadata   Ontotext, Scaling Semantic Technology #15Sept, 2015 Summary Title body editorial img:alt people regions organisations IPTC tags
  16. 16.           Metadata  Used   Ontotext, Scaling Semantic Technology #16Sept, 2015 Summary Title body editorial img:alt people regions organisations IPTC tags concepts keyphrases
  17. 17.           User  Ac+ons     Ontotext, Scaling Semantic Technology #17Sept, 2015 Limited  to  User  reads  ArHcle   reads
  18. 18.           User  Ac+ons:  Another  Perspec+ve   Ontotext, Scaling Semantic Technology #18Sept, 2015 perform comments votes posts preview read contains leads to read leads to preview Article Search Action Result Date FTS Q. Tag Cat Tag set results cat taxonomy Search Log ------------- ------------- ------------- ------------- -------------
  19. 19. •  Relies  on  the  previous  choices  of  an  individual   user  (a  user's  profile)   •  Results  on  the  basis  of  the  similarity  of  items,   defined  in  terms  of  their  content   •  The  recommended  content  is  rather   homogeneous   “Content”-­‐based  Recommenda+on   Ontotext, Scaling Semantic Technology #19Sept, 2015
  20. 20. Two-­‐fold  scoring  approach     •  Similarity  to  recently  viewed  arHcles  (context)   •  Relevance  to  a  long-­‐term  user  profile   –  Weights  reflecHng  the  relaHve  importance  of  the  individual   terms  (staHc  component)     –  TransiHon  likelihoods  among  any  pair  of  terms  (dynamic   component)   Content-­‐based  Ranking  Mechanisms   Ontotext, Scaling Semantic Technology #20Sept, 2015
  21. 21. •  Rely  on  staHsHcs  that  reflect  the  past  choices  of   all  users   •  Results  based  on  user  raHngs,  and  the  similarity   of  users  or  items   •  Content-­‐agnosHc   •  Aware  of  the  quality  of  content   Collabora+ve  Filtering   Ontotext, Scaling Semantic Technology #21Sept, 2015
  22. 22. Collabora+ve  Ranking  Mechanisms   Ontotext, Scaling Semantic Technology #22Sept, 2015 User to Content Similarity Score User to User Sim. Score Content to Content Sim. Score
  23. 23. •  Combines  both  approaches  to  improve  the   quality  of  predicHon   •  Implemented  via  staHsHcal  models   •  Takes  a  wide  array  of  features  into  consideraHon   Hybrid  Approach   Ontotext, Scaling Semantic Technology #23Sept, 2015
  24. 24.      Ini+al  Architecture   Ontotext, Scaling Semantic Technology #24Sept, 2015
  25. 25. Final  Architecture   Ontotext, Scaling Semantic Technology #25Sept, 2015 SOLR 1 SOLR 2 SOLR 3 CS Node 3 CS Node 1 CS Node 2 Replication Group I FT API Fetch & Annotation OWLIM Worker Recommendation API Varnish Cache RR RR RR Read Article 1. get related 2. ask 4. query 3. on cache miss 1. pull content 2. annotate 3. index annotate content store user profiles update popularity click stream update user AWS INSTANCE AWS INSTANCE AWS INSTANCE AWS Elastic LB
  26. 26. 1.  Pull  content  –  annotate/enrich  –  index     2.  Accumulate/update  user  profile   3.  Recommend   Main  Ac+ons   Ontotext, Scaling Semantic Technology #26Sept, 2015
  27. 27. Implementa+on  Overview   Ontotext, Scaling Semantic Technology #27Sept, 2015 Profile Update Request (User ID, Item ID) Query Generation Items Index (Solr) Profile Storage (Cassandra) Recommendation Request (User ID) Profile Update User: - context - static component - dynamic component Article: - co-visitation matrix - popularity Boosted sub-queries for all involved ranking schemes: content-based, collaborative, popularity, recency
  28. 28. •  8m  named  enHHes  and  metadata  about  them   •  20m  labels  of  People  and  OrganisaHons   •  CES  cluster  which  can  be  scaled  horizontally  to  handle   peak  loads   •  Live  dicHonary  updates  coming  from  GraphDB  through   the  EUF  (EnHty  Update  Feed)  plugin     •  Max  throughput  -­‐  10  docs/sec  on  a  single  c3.2xlarge  AWS   node,  mulHple  by  N  to  get  an  N  nodes  cluster  throughput   •  Reliability  has  been  100%,  but  the  soluHon  hasn't  been   stressed  as  much  as  we've  designed  it  for   Wrap  up  -­‐  Concept  Extrac+on  Highlights   Ontotext, Scaling Semantic Technology #28Sept, 2015
  29. 29. •  100%  reliability  in  producHon  for  a  full  year  (Ontotext   also  manages  the  deployment)   •  API  handling  1,5m  requests  a  day  on  average,  up  to  3m   requests  a  day  (1/3  recommendaHons,  1/3  logging  user   acHon,  1/3  checking  whether  a  user  has  enough  history   to  ask  for  behavioural  recommendaHons)   •  Roughly  200m  recommendaHons  served  and  200m  user   acHons  tracked  to  day  since  go  live   •  450  873  documents  indexed   •  No  caching,  since  everything  is  effecHvely  a  personalized   search  request   Wrap  up  -­‐  Recommenda+on  Highlights   Ontotext, Scaling Semantic Technology #29Sept, 2015
  30. 30. •  GraphDB  had  to  comply  with  a  set  of  tests  designed  by  FT  and   OT:  Network  lag,  Disk  Space,  Disk  Load,  Less  Memory,  CPU   Load,  etc.   •  Comprehensive  support  for  OWL  and  SPARQL   •  Efficient  inference  through  the  enHre  life-­‐cycle  of  the   data   •  High-­‐availability  cluster  architecture  –  proven  and  mature   for  more  than  5  years  now   –  GraphDB  first  HA  implementaHons  works  at  BBC  since  2010   –  Unmatched  HA  Tests  and  TransacHon  load  benchmarks   •  FTS  and  NoSQL  Connectors  for  seamless  integraHon   Wrap  up  –  GraphDB  Highlights   Ontotext, Scaling Semantic Technology #30Sept, 2015
  31. 31. •  Washington  Post  tests  new  ‘Knowledge  Map’  feature   “Our  ulHmate  goal  is  to  mine  big  data  to  surface  highly  personalized  and   contextual  data  for  both  journalisHc  and  naHve  content.”   •  New  York  Times  RnD  Lab  announced  an  experimental   project  “Editor”   1)  recognize  a  term  that  can  be  categorized,  2)  link  that  enHty  to  exisHng   databases  or  microservices,  3)  make  this  enriched  informaHon   accessible  to  journalists   •  BBC  Structured  Journalist  Manifesto   Structured  journalism  :  1)  On  the  reporter  side  -­‐  automaHon  helps   improve  a  journalist’s  reporHng  and  make  it  less  cumbersome,  2)  on   the  audience  side  semtech  helps  scale  things  that  can  improve  the   reader’s  experience   Posi+ve  Signs  from  the  News  Industry   Ontotext, Scaling Semantic Technology #31Sept, 2015
  32. 32. Selec+on  of  Ontotext  Customers   Ontotext, Scaling Semantic Technology #32Sept, 2015
  33. 33. Thanks!   Ontotext, Scaling Semantic Technology #33Sept, 2015   We  will  be  delighted  to  have  a  word  with  you  auer  the   session  or  later  today  or  tomorrow!     •  Dr.  Georgi  Georgiev  –  Head  of  Ontotext  Text  Analysis   Unit    -­‐  georgi.georgiev@ontotext.com     •  Ilian  Uzunov  –  Sales  Director  CEMEAA  -­‐   ilian.uzunov@ontotext.com     •  Nikolay  Krustev  –  GraphDB  Sales  Engineer  -­‐   nikolay.krustev@ontotext.com    

×