8. What content is relevant?
ï§ Social web anlysis:
popularity, influence, trust, diversity
ï§ Semanticanalysis:
entities, topics, events, opinions
8
10. Archivingworkflow
Collect Analyse Archive Present
ï§ Two stage archiving strategy: web ï
analyzing storage ï archive
ï§ Archivist describes target
ï§ HTML and API crawlers fetch content
10
11. Archivingworkflow
Collect Analyse Archive Present
ï§ Different modules analyse semantic
information & social context to filter
relevant content
ï§ HBase and RDF triple storage
11
12. Archivingworkflow
Collect Analyse Archive Present
ï§ Only relevant content is preserved in
(W)ARC format
ï§ Semiautomatic content selection
ï§ Heritrix and Wayback compatible
12
13. Archivingworkflow
Collect Analyse Archive Present
ï§ Fulltext search and facet browsing
ï§ Semantic and social contextualization
ï§ Visualizations to be developed on top
(not in ARCOMEM sope)
13