SlideShare ist ein Scribd-Unternehmen logo
1 von 11
The Need for Long Term Preservation of Weblogs: the
                               BlogForever Project


                                          Ilias Trochidis
                               Aristotle University of Thessaloniki
                                              Greece




Workshop 7c, 18 October 2012               eChallenges e-2012   Copyright 2012 BlogForever
State of the Blogosphere

    • Blogs have become fairly established as an online
      communication and web publishing tool.
    • Hundreds of millions of blogs are published about every
      conceivable subject




                                                              ress.com
                                       ly ba   sis in WordP
                            d on a week
            new blogs create
  Number of




Workshop 7c, 18 October 2012                                             eChallenges e-2012   Copyright 2012 BlogForever
The problem of Blog Preservation

     • Despite the fast growth of blogosphere, there is still no
       effective solution for ubiquitous semantic weblog
       archiving, digital preservation, management and
       dissemination:
             – Current web preservation initiatives are geared towards aggregating
               and preserving html pages and not information entities (posts,
               comments, authors, metadata, dates, pingbacks, etc)
             – Current web archiving efforts disregard the preservation of Social
               Networks and interrelations between the archived content (meme-
               effect)
             – Current web archives cannot identify topics, subjects or events
               (monolithic). There is no generic web archiving solution capable to
               implement arbitrary subjects and topic hierarchies.




Workshop 7c, 18 October 2012           eChallenges e-2012   Copyright 2012 BlogForever
The disappearing web




http://gigaom.com/2012/09/19/the-disappearing-web-information-decay-is-eating-away-our-history/


Workshop 7c, 18 October 2012                              eChallenges e-2012                  Copyright 2012 BlogForever
Blog archiving evaluation

     • Example: In the “Blogs of War: Weblogs as News” paper
       there were documented 29 blogs on the Iraq war:
     • of those 29 blogs,
        – 13 (45%) on June 2012 no longer exist on the Internet,
        – Only 9 blogs (31%) still contained information on the Iraq
           war
        – 12 out of the 20 (60%) blogs that don’t exist were
           preserved by the Internet Archive (problems with missing
           photos, comments not archived etc.)

     • blogs on major events have already been lost



Workshop 7c, 18 October 2012   eChallenges e-2012   Copyright 2012 BlogForever
BlogForever objectives




Workshop 7c, 18 October 2012   eChallenges e-2012   Copyright 2012 BlogForever
The BlogForever architecture




Workshop 7c, 18 October 2012    eChallenges e-2012   Copyright 2012 BlogForever
Impact

      • Output: a simple weblog archiving solution that any user,
        user group or institution could use to preserve their
        collections of weblogs ensuring:
             – authenticity, integrity, completeness, usability, long term
               accessibility
      • Parties that will benefit: Bloggers, Universities, Libraries &
        Information Centres, Museums, Education, Research,
        Business
      • Examples:
             – CERN will create a repository with all physics blogs
             – National Documentation Centre of Greece will create a repository
               with academic blogs
             – a National Library of Medicine would like to preserve a collection of
               health and medicine blogs

Workshop 7c, 18 October 2012            eChallenges e-2012            Copyright 2012 BlogForever
Business Model

     • BlogForever as a service (single installation that can be
       used as a service by users and institutions)
     • BlogForever as a software (open source distribution)

     • Universities, Research Institutes, Archives, Governments,
       Blog Communities will be able to easily preserve their
       collections of weblogs
     • BlogForever will assure the preservation, the aggregation,
       the management and the dissemination of these collections

       • Do you need to preserve some blogs? We can setup a
                    BlogForever archive for you.

Workshop 7c, 18 October 2012   eChallenges e-2012   Copyright 2012 BlogForever
Future Work

     • Analyse blog archives in order to gain a better
       understanding of the content and provide new services:
             – Use Linked Open Data to link archived blog content with other web
               content
             – Apply Semantic Extension of Tags to understand them better and
               reuse them for multiple purposes.
     • In any case, use Ontologies to interpret and reason with
       information.
     • Data mining in order to extract information from the
       archives and transform it into an understandable structure
       for further use.
     • Brand reputation management and market sector repute
       analysis

Workshop 7c, 18 October 2012          eChallenges e-2012     Copyright 2012 BlogForever
Thank you!




                                      Any Questions?

         Visit: http://blogforever.eu to learn more.
               http://twitter.com/blogforever
             http://facebook.com/BlogForever
     The research leading to these results has received funding from the European Commission Framework Programme 7
                                     (FP7), BlogForever project, grant agreement No.269963.



Workshop 7c, 18 October 2012                     eChallenges e-2012            Copyright 2012 BlogForever

Weitere ähnliche Inhalte

Ähnlich wie BlogForever eChallenges 2012

SlideWiki: Elicitation and Sharing of Knowledge using Presentations
SlideWiki: Elicitation and Sharing of Knowledge using PresentationsSlideWiki: Elicitation and Sharing of Knowledge using Presentations
SlideWiki: Elicitation and Sharing of Knowledge using Presentations
Ali Khalili
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data Education
EUCLID project
 
EPE 312 Web 2.0 Tools for Learning
EPE 312 Web 2.0 Tools for LearningEPE 312 Web 2.0 Tools for Learning
EPE 312 Web 2.0 Tools for Learning
Ley Leal
 
Integration of technology_into_the_curriculum
Integration of technology_into_the_curriculumIntegration of technology_into_the_curriculum
Integration of technology_into_the_curriculum
Yezenia C
 
Integration of technology_into_the_curriculum
Integration of technology_into_the_curriculumIntegration of technology_into_the_curriculum
Integration of technology_into_the_curriculum
Yezenia C
 
Wollongong 090408232854-phpapp01
Wollongong 090408232854-phpapp01Wollongong 090408232854-phpapp01
Wollongong 090408232854-phpapp01
Neo Ntlhokoa
 
Building an ePortfolio using Web 2.0 Technologies (2009)
Building an ePortfolio using Web 2.0 Technologies (2009)Building an ePortfolio using Web 2.0 Technologies (2009)
Building an ePortfolio using Web 2.0 Technologies (2009)
Matthew Mobbs
 

Ähnlich wie BlogForever eChallenges 2012 (20)

Read my blog
Read my blog Read my blog
Read my blog
 
Session3
Session3Session3
Session3
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data Education
 
Words to the wise
Words to the wiseWords to the wise
Words to the wise
 
Cit Discovery Learning Stillwell - PDF
Cit Discovery Learning Stillwell - PDFCit Discovery Learning Stillwell - PDF
Cit Discovery Learning Stillwell - PDF
 
SlideWiki: Elicitation and Sharing of Knowledge using Presentations
SlideWiki: Elicitation and Sharing of Knowledge using PresentationsSlideWiki: Elicitation and Sharing of Knowledge using Presentations
SlideWiki: Elicitation and Sharing of Knowledge using Presentations
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data Education
 
EPE 312 Web 2.0 Tools for Learning
EPE 312 Web 2.0 Tools for LearningEPE 312 Web 2.0 Tools for Learning
EPE 312 Web 2.0 Tools for Learning
 
Escholars Session 2
Escholars Session 2Escholars Session 2
Escholars Session 2
 
Practical Blog Preservation (Workshop)
Practical Blog Preservation (Workshop)Practical Blog Preservation (Workshop)
Practical Blog Preservation (Workshop)
 
Wolce 2012 role
Wolce 2012 role Wolce 2012 role
Wolce 2012 role
 
Learning Design in the Open: rethinking our courses for tomorrow's learners
Learning Design in the Open: rethinking our courses for tomorrow's learnersLearning Design in the Open: rethinking our courses for tomorrow's learners
Learning Design in the Open: rethinking our courses for tomorrow's learners
 
Integration of technology_into_the_curriculum
Integration of technology_into_the_curriculumIntegration of technology_into_the_curriculum
Integration of technology_into_the_curriculum
 
Integration of technology_into_the_curriculum
Integration of technology_into_the_curriculumIntegration of technology_into_the_curriculum
Integration of technology_into_the_curriculum
 
Wollongong 090408232854-phpapp01
Wollongong 090408232854-phpapp01Wollongong 090408232854-phpapp01
Wollongong 090408232854-phpapp01
 
Who needs a repository when you’ve got Google? Information and Digital Litera...
Who needs a repository when you’ve got Google? Information and Digital Litera...Who needs a repository when you’ve got Google? Information and Digital Litera...
Who needs a repository when you’ve got Google? Information and Digital Litera...
 
Building an ePortfolio using Web 2.0 Technologies (2009)
Building an ePortfolio using Web 2.0 Technologies (2009)Building an ePortfolio using Web 2.0 Technologies (2009)
Building an ePortfolio using Web 2.0 Technologies (2009)
 
OEP PPT 1
OEP PPT 1OEP PPT 1
OEP PPT 1
 
Classroom2.0
Classroom2.0Classroom2.0
Classroom2.0
 
Blogging Workshop
Blogging WorkshopBlogging Workshop
Blogging Workshop
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

BlogForever eChallenges 2012

  • 1. The Need for Long Term Preservation of Weblogs: the BlogForever Project Ilias Trochidis Aristotle University of Thessaloniki Greece Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 2. State of the Blogosphere • Blogs have become fairly established as an online communication and web publishing tool. • Hundreds of millions of blogs are published about every conceivable subject ress.com ly ba sis in WordP d on a week new blogs create Number of Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 3. The problem of Blog Preservation • Despite the fast growth of blogosphere, there is still no effective solution for ubiquitous semantic weblog archiving, digital preservation, management and dissemination: – Current web preservation initiatives are geared towards aggregating and preserving html pages and not information entities (posts, comments, authors, metadata, dates, pingbacks, etc) – Current web archiving efforts disregard the preservation of Social Networks and interrelations between the archived content (meme- effect) – Current web archives cannot identify topics, subjects or events (monolithic). There is no generic web archiving solution capable to implement arbitrary subjects and topic hierarchies. Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 5. Blog archiving evaluation • Example: In the “Blogs of War: Weblogs as News” paper there were documented 29 blogs on the Iraq war: • of those 29 blogs, – 13 (45%) on June 2012 no longer exist on the Internet, – Only 9 blogs (31%) still contained information on the Iraq war – 12 out of the 20 (60%) blogs that don’t exist were preserved by the Internet Archive (problems with missing photos, comments not archived etc.) • blogs on major events have already been lost Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 6. BlogForever objectives Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 7. The BlogForever architecture Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 8. Impact • Output: a simple weblog archiving solution that any user, user group or institution could use to preserve their collections of weblogs ensuring: – authenticity, integrity, completeness, usability, long term accessibility • Parties that will benefit: Bloggers, Universities, Libraries & Information Centres, Museums, Education, Research, Business • Examples: – CERN will create a repository with all physics blogs – National Documentation Centre of Greece will create a repository with academic blogs – a National Library of Medicine would like to preserve a collection of health and medicine blogs Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 9. Business Model • BlogForever as a service (single installation that can be used as a service by users and institutions) • BlogForever as a software (open source distribution) • Universities, Research Institutes, Archives, Governments, Blog Communities will be able to easily preserve their collections of weblogs • BlogForever will assure the preservation, the aggregation, the management and the dissemination of these collections • Do you need to preserve some blogs? We can setup a BlogForever archive for you. Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 10. Future Work • Analyse blog archives in order to gain a better understanding of the content and provide new services: – Use Linked Open Data to link archived blog content with other web content – Apply Semantic Extension of Tags to understand them better and reuse them for multiple purposes. • In any case, use Ontologies to interpret and reason with information. • Data mining in order to extract information from the archives and transform it into an understandable structure for further use. • Brand reputation management and market sector repute analysis Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 11. Thank you! Any Questions? Visit: http://blogforever.eu to learn more. http://twitter.com/blogforever http://facebook.com/BlogForever The research leading to these results has received funding from the European Commission Framework Programme 7 (FP7), BlogForever project, grant agreement No.269963. Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever

Hinweis der Redaktion

  1. Blogger is the largest of these sites with more than 46 million unique U.S. visitors during October 2011, making it second only to Facebook in the social networking category tumblr.com counts 38,884,272 total blogs with 53,399,798 posts on the 29 th of December while in July 2009 the number of posts per day was 650,000 facebook or microblogging sites such as Twitter have supported the growth of blogs by delivering traffic to content which originated in blogs
  2. 1. Current web preservation initiatives are geared towards aggregating and preserving files and not information entities. For instance, the Internet Archive aggregates web pages and stores them into WARC files (ISO 28500:2009), compressed files similar to zip which are assigned a unique identification number and stored in a distributed file system. Additionally, WARC supports some metadata such as provenance and HTTP protocol metadata. Implicit page elements, such as: · Page title, headers, content, author information, · Metadata such as Dublin Core elements, · RSS feeds and other Semantic Web technologies such as Microformats (Khare R.) and Microdata (Ronallo J.) are completely ignored. This impacts greatly the way stored information is managed, reducing the utility of the archive and also hindering the creation of added-value services.   2. Current web archiving efforts disregard the preservation of Social Networks and of interrelations between the archived content. However, weblog interdependencies demonstrated by the identification of central actors and peripheral weblogs, as well as by the meme-effect that applies to them, need to be preserved, to provide meaningful features to the weblog repository.   3. Current web archive scope is limited to monolithic regions, subjects or events. There is no generic web archiving solution capable to implement arbitrary subjects and topic hierarchies. For instance, the National Library of Catalonia has initiated a web crawling and access project aiming to collect, process and provide permanent access to the entire cultural, scientific and general output of Catalonia in digital format (PADICAT). Alternatively, the Library of Congress has developed online collections for isolated historical events such as September 11, 2001 (Library of Congress). There is an ongoing debate, about benefits or disadvantages of one or another long-term preservation methodology. Many papers have been written and many conferences dedicated to this issue have appeared. It is surprising however, how little has been done at practical level.
  3. Mention the advantages of the archives