SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Index   Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




        Improving Flickr discovery through Wikipedias

                                  Federico Gobbo
                          {federico.gobbo}@uninsubria.it
                         Universit` degli Studi dell’Insubria
                                  a
                                    Varese, Italy
                            (cc) Some rights reserved.




                                                                                       1/21
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




            Introduction
        1
               Why folksonomies are interesting

            Folksonomies
        2
              Why folksonomies differ?

            Linguistic issues
        3
              Augmented folksonomies through natural language

            Introducing Flickrpedia
        4
               Multilingual diversity as the source of knowledge

            Concluding Remarks
        5


                                                                                           2/21
Index       Introduction           Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomies are interesting


A key question of information retrieval today




        How to add meaningful metadata to web content, in order to
        increase the utility of information by improve the precision of
        information retrieval to search engines?




                                                                                                   3/21
Index       Introduction           Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomies are interesting


Folksonomies, a tentative answer. What are they?



        folksonomy = folks + taxonomy

        A folksonomy is made by tags or labels, usually single-word
        metadata attached to online items (documents, photos, videos,
        etc.), in order to add contextual meaning to the items themselves.

        Folksonomies are a tentative effort toward the goal of improving
        the precision of information retrieval.




                                                                                                   4/21
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomies differ?


Folksonomies and traditional taxonomies



        Unlike traditional taxonomies, there is no explicit hierarchy
        between tags nor tags are exclusive. For example, the photo of a
        cat may be tagged as ‘cat’ and ‘european’ and ‘animal’, but there
        is nothing that say that all cats are animals: tags can be seen as
        common facets of the item itself (Schmitz 2006). There is no
        central authority, and this is the main reason why folksonomies are
        becoming more and more popular among web resource users.




                                                                                           5/21
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomies differ?


The two different scopes of folksonomies




        Each tag has two different scopes at the same time:
              personimy, the user’s defined one (Quintarelli 2005);
              consensus, the social shared meaning.
        Consensus is becoming more and more important, as the wide use
        of tag suggestion interfaces in web applications suggests.




                                                                                           6/21
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomies differ?


Folksonomies and the Long Tail (see the video!)




                                                                                           7/21
Index       Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Why folksonomies differ?


The key concept of serendipity


        Consensus permits serendipity, i.e. users dig the web through tags
        finding new, unexpected and useful content, not easily accessible
        via traditional search engines.

        Tags are used as filters, i.e. a query on more tags returns the items
        tagged with any of the given tags – or with all tags, depending on
        the application (Golder and Huberman 2006).

        The purpose of this paper is to improve serendipity allowing people
        to dig folksonomies regardless of the natural language(s) they
        master.


                                                                                           8/21
Index       Introduction       Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Augmented folksonomies through natural language


Tags as linguistic objects


        Tags are words, i.e. alphabetical strings meaningful in some
        natural language. There is no controlled language. In particular,
        features unrecognized are:
              synonymity (different word strings, analogue meaning);
              homography (identical word string, totally different meaning);
              different strategies in encoding are possibles (e.g.
              ‘28-03-2008’, ‘2008March3’, ‘3rd March 2008’);
              misspellings are very frequent, so standard NLP techniques are
              banned.
        Guy and Tonkin (2006) even advocated tag literacy education.

                                                                                                   9/21
Index       Introduction       Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Augmented folksonomies through natural language


The linguistic divide in folksonomies



        Multilingualism is an issue not fully explored yet in folksonomies.
        In fact, tags are written in a human language and users are
        inclined to write in the languages they are comfortable in.

        It is certainly desiderable for a user not comfortable in English or
        other big language (in terms of presence in the web) to search and
        find tags using a search engine interface in his or her tongue, while
        the engine searches the corresponding tags in English and in other
        major human languages.



                                                                                                 10/21
Index        Introduction        Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Multilingual diversity as the source of knowledge


How to overcome the linguistic divide?




        A proposal: through a special web application which extracts the
        pairs language-tags in every available language before passing the
        tags to the folksonomy search engine.

        The claim is improvement in serendipity: when searching in 20
        natural languages at the same time, some interesting data will be
        found, undiscovered through a single language search.




                                                                                                   11/21
Index        Introduction        Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Multilingual diversity as the source of knowledge


Flickr and its API



        Flickr is one of the most popular web applications for photos (+2
        million photos are found if ‘flowers’ are searched, nowadays).
        Photos are freely tagged by users, so it can be considered a
        folksonomy.
        Open source APIs in major programming languages are available
        and people can make queries to the Flickr repository through an
        authentication key given on request.
        http://www.flickr.com/services/api


                                                                                                   12/21
Index        Introduction        Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Multilingual diversity as the source of knowledge


Flickrpedia = Flickr + Wikipedias




        Flickrpedia is designed on an API in Ruby and over development
        framework Ruby on Rails (Thomas 2005, Thomas and
        Heinemeier-Hansson 2005). Users can make queries in Flickr
        writing a tag specifying its natural language.

        The system crawls the Wikipedia in the corresponding language
        and look for an appropriate page. With the help of regular
        expressions, Flickrpedia parses the web page and extracts the
        existing language pairs of the same topic in other languages from
        the appropriate web page box.

                                                                                                   13/21
How Flickrpedia works

             German user


           enters the query in Flickrpedia




                                             the system
                       Flugzeug
                         German                crawls




                                              parsing with the help of regular expressions




                                    Airplane                 Avion           Hegazkin
                                                                                             ...
                                      English               French                basque




                                                          the German user
                                                          obtains the desidered
                                                          photos from Flickr!
The web page box for “alternate languages” in Wikipedia
An example: the German word ‘Flugzeug’
Index        Introduction        Folksonomies       Linguistic issues   Introducing Flickrpedia   Concluding Remarks

Multilingual diversity as the source of knowledge


The results of the German word ‘Flugzeug’




        At 2007, April, 11, Flickr finds less than 10,000 photos while
        Flickrpedia more than 20,000 for the same query, giving a lot of
        unexpected and relevant photos.                              16/21
Don’t trust me: try by yourself!
Word searched: ‘Flugzeug’, i.e. airplane in German




     http://buffy.sciva.uninsubria.it/∼rl608838/search
Index      Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




Flickrpedia until now




        Flickrpedia should only store the wikipedias according to the
        existing natural languages – actually, 85. Large and extemporaneus
        shared information repositories, like Flickr, can be managed
        through other semi-structured information repositories as the
        wikipedias.

        Flickrpedia, if refined out of its actual prototypical phase, may help
        users with poor knowledge of major languages to retrieve
        information only through their lesser-used languages.

                                                                                        18/21
Index      Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




Further direction of Flickrpedia


        Flickrpedia is far from perfect: homographies are still unmanaged,
        even if wikipedias have disambiguating pages, and it is not clear
        which wikipedias to choose in order to optimize serendipity.

        By now the parsed wikipedias are the biggest ones in terms of wiki
        pages, but this doesn’t give any guarantee of serendipity
        augmentation.

        Finally, the API given by Flickr is a severe limit: up to 20 tags can
        be inserted in a single query request, and up to 60 thumbnails may
        be given.


                                                                                        19/21
Index      Introduction   Folksonomies   Linguistic issues   Introducing Flickrpedia   Concluding Remarks




Beyond Flickrpedia


        This approach isn’t limited to Flickr as the underlying folksonomy.
        Our research direction is towards generalization, i.e. users can
        choose the appropriate folksonomy performing multilingual queries.

        It is still to demonstrate how to apply this approach to
        folksonomies where the semantic references are different from
        photos, i.e. an airplane or a flower is still so in almost every human
        language, more or less.

        The real underlying problem is how to measure serendipity, i.e.
        specific and precise metrics for serendipity are needed.


                                                                                        20/21
Index   Introduction    Folksonomies        Linguistic issues      Introducing Flickrpedia   Concluding Remarks




Thank you. Any questions?




                 Download these slides at the following permalink:

                           http://purl.org/net/fgobbo

                                    (cc) F. Gobbo 2007. Published in Italy.
                       Attribuzione – Non commerciale – Condividi allo stesso modo 2.5

                                                                                              21/21

Weitere ähnliche Inhalte

Ähnlich wie Improving Flickr discovery through Wikipedias

On Amateur Subtitling. Preliminary Findings
On Amateur Subtitling. Preliminary FindingsOn Amateur Subtitling. Preliminary Findings
On Amateur Subtitling. Preliminary FindingsMariana Salgado
 
On Amateur Translation. Preliminary Findings
On Amateur Translation. Preliminary FindingsOn Amateur Translation. Preliminary Findings
On Amateur Translation. Preliminary FindingsMariana Salgado
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Alannah Fitzgerald
 
Albarillo CUNY Conference On Best Practices
Albarillo CUNY Conference On Best PracticesAlbarillo CUNY Conference On Best Practices
Albarillo CUNY Conference On Best PracticesFrans Albarillo
 
Chapter One-History of Technology and Language Learning.pptx
Chapter One-History of Technology and Language Learning.pptxChapter One-History of Technology and Language Learning.pptx
Chapter One-History of Technology and Language Learning.pptxFawziEltayeb
 
Cultural Identities in Wikipedia (Wikimania 2016)
Cultural Identities in Wikipedia (Wikimania 2016)Cultural Identities in Wikipedia (Wikimania 2016)
Cultural Identities in Wikipedia (Wikimania 2016)Marc Miquel
 
Flexible Open Language Education for a MultiLingual World
Flexible Open Language Education for a MultiLingual WorldFlexible Open Language Education for a MultiLingual World
Flexible Open Language Education for a MultiLingual WorldAlannah Fitzgerald
 
LIS 653 Posters
LIS 653 PostersLIS 653 Posters
LIS 653 PostersPrattSILS
 
Resources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic EnglishResources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic EnglishThe Open Education Consortium
 
Resources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic EnglishResources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic EnglishThe Open Education Consortium
 
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 TechnologiesLaurel
 
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...Alannah Fitzgerald
 
Introduction to OmegaT
Introduction to OmegaTIntroduction to OmegaT
Introduction to OmegaTQabiria
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
Dissecting Wikipedia
Dissecting WikipediaDissecting Wikipedia
Dissecting WikipediaAndrew Gray
 
Flexible, Free and Open Data-Driven Learning for the Masses (MOOCs)
Flexible, Free and Open Data-Driven Learning for the Masses (MOOCs)Flexible, Free and Open Data-Driven Learning for the Masses (MOOCs)
Flexible, Free and Open Data-Driven Learning for the Masses (MOOCs)Alannah Fitzgerald
 
Folksonomies-Group 4
Folksonomies-Group 4Folksonomies-Group 4
Folksonomies-Group 4m_willis
 

Ähnlich wie Improving Flickr discovery through Wikipedias (20)

Roadmap for a multilingual BioPortal
Roadmap for a multilingual BioPortalRoadmap for a multilingual BioPortal
Roadmap for a multilingual BioPortal
 
On Amateur Subtitling. Preliminary Findings
On Amateur Subtitling. Preliminary FindingsOn Amateur Subtitling. Preliminary Findings
On Amateur Subtitling. Preliminary Findings
 
On Amateur Translation. Preliminary Findings
On Amateur Translation. Preliminary FindingsOn Amateur Translation. Preliminary Findings
On Amateur Translation. Preliminary Findings
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
 
Albarillo CUNY Conference On Best Practices
Albarillo CUNY Conference On Best PracticesAlbarillo CUNY Conference On Best Practices
Albarillo CUNY Conference On Best Practices
 
Language Teaching Web Resources
Language Teaching Web ResourcesLanguage Teaching Web Resources
Language Teaching Web Resources
 
Chapter One-History of Technology and Language Learning.pptx
Chapter One-History of Technology and Language Learning.pptxChapter One-History of Technology and Language Learning.pptx
Chapter One-History of Technology and Language Learning.pptx
 
Cultural Identities in Wikipedia (Wikimania 2016)
Cultural Identities in Wikipedia (Wikimania 2016)Cultural Identities in Wikipedia (Wikimania 2016)
Cultural Identities in Wikipedia (Wikimania 2016)
 
Flexible Open Language Education for a MultiLingual World
Flexible Open Language Education for a MultiLingual WorldFlexible Open Language Education for a MultiLingual World
Flexible Open Language Education for a MultiLingual World
 
LIS 653 Posters
LIS 653 PostersLIS 653 Posters
LIS 653 Posters
 
Resources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic EnglishResources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic English
 
Resources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic EnglishResources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic English
 
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
 
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
 
Barbiers iclave-fr
Barbiers iclave-frBarbiers iclave-fr
Barbiers iclave-fr
 
Introduction to OmegaT
Introduction to OmegaTIntroduction to OmegaT
Introduction to OmegaT
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Dissecting Wikipedia
Dissecting WikipediaDissecting Wikipedia
Dissecting Wikipedia
 
Flexible, Free and Open Data-Driven Learning for the Masses (MOOCs)
Flexible, Free and Open Data-Driven Learning for the Masses (MOOCs)Flexible, Free and Open Data-Driven Learning for the Masses (MOOCs)
Flexible, Free and Open Data-Driven Learning for the Masses (MOOCs)
 
Folksonomies-Group 4
Folksonomies-Group 4Folksonomies-Group 4
Folksonomies-Group 4
 

Mehr von Federico Gobbo

Open Issues of Language Contestation in Italy
Open Issues of Language Contestation in ItalyOpen Issues of Language Contestation in Italy
Open Issues of Language Contestation in ItalyFederico Gobbo
 
Human-Machine Communication strategies in today’s Esperanto community of prac...
Human-Machine Communication strategies in today’s Esperanto community of prac...Human-Machine Communication strategies in today’s Esperanto community of prac...
Human-Machine Communication strategies in today’s Esperanto community of prac...Federico Gobbo
 
Esperanto as Lingua Receptiva / Esperanto Ricevema
Esperanto as Lingua Receptiva / Esperanto RicevemaEsperanto as Lingua Receptiva / Esperanto Ricevema
Esperanto as Lingua Receptiva / Esperanto RicevemaFederico Gobbo
 
L’utilizzo di lingue inventate come strumento di educazione interculturale in...
L’utilizzo di lingue inventate come strumento di educazione interculturale in...L’utilizzo di lingue inventate come strumento di educazione interculturale in...
L’utilizzo di lingue inventate come strumento di educazione interculturale in...Federico Gobbo
 
One species, many languages
One species, many languagesOne species, many languages
One species, many languagesFederico Gobbo
 
Una lingua comune per l'Europa?
Una lingua comune per l'Europa?Una lingua comune per l'Europa?
Una lingua comune per l'Europa?Federico Gobbo
 
Esperanto: el Eŭropa lulilo al disvastiĝo tra la tuta mondo
Esperanto: el Eŭropa lulilo al disvastiĝo tra la tuta mondoEsperanto: el Eŭropa lulilo al disvastiĝo tra la tuta mondo
Esperanto: el Eŭropa lulilo al disvastiĝo tra la tuta mondoFederico Gobbo
 
Reflecting upon the European Day of Languages
Reflecting upon the European Day of LanguagesReflecting upon the European Day of Languages
Reflecting upon the European Day of LanguagesFederico Gobbo
 
Interlinguistica ed esperantologia oggi: l’esperienza di Amsterdam
Interlinguistica ed esperantologia oggi: l’esperienza di AmsterdamInterlinguistica ed esperantologia oggi: l’esperienza di Amsterdam
Interlinguistica ed esperantologia oggi: l’esperienza di AmsterdamFederico Gobbo
 
Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...
Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...
Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...Federico Gobbo
 
Complex Arguments in Adpositional Argumentation
Complex Arguments in Adpositional ArgumentationComplex Arguments in Adpositional Argumentation
Complex Arguments in Adpositional ArgumentationFederico Gobbo
 
Assessing linguistic unease to understand (socio)linguistic justice
Assessing linguistic unease to understand (socio)linguistic justiceAssessing linguistic unease to understand (socio)linguistic justice
Assessing linguistic unease to understand (socio)linguistic justiceFederico Gobbo
 
Lingua di contatto e mobilità: il caso dell'esperanto
Lingua di contatto e mobilità: il caso dell'esperantoLingua di contatto e mobilità: il caso dell'esperanto
Lingua di contatto e mobilità: il caso dell'esperantoFederico Gobbo
 
Le lingue di minoranza: il caso dell'esperanto
Le lingue di minoranza: il caso dell'esperantoLe lingue di minoranza: il caso dell'esperanto
Le lingue di minoranza: il caso dell'esperantoFederico Gobbo
 
The Religious Dimensions of the Esperanto Collective Identity
The Religious Dimensions of the Esperanto Collective IdentityThe Religious Dimensions of the Esperanto Collective Identity
The Religious Dimensions of the Esperanto Collective IdentityFederico Gobbo
 
I limiti dell'invenzione linguistica: la tipologia linguistica dall'esperanto...
I limiti dell'invenzione linguistica: la tipologia linguistica dall'esperanto...I limiti dell'invenzione linguistica: la tipologia linguistica dall'esperanto...
I limiti dell'invenzione linguistica: la tipologia linguistica dall'esperanto...Federico Gobbo
 
Sei definizioni in cerca di una disciplina - Interlinguistica tra comunicazio...
Sei definizioni in cerca di una disciplina - Interlinguistica tra comunicazio...Sei definizioni in cerca di una disciplina - Interlinguistica tra comunicazio...
Sei definizioni in cerca di una disciplina - Interlinguistica tra comunicazio...Federico Gobbo
 
Lingva Kafejo: Esperanto
Lingva Kafejo: EsperantoLingva Kafejo: Esperanto
Lingva Kafejo: EsperantoFederico Gobbo
 
Fare musica in esperanto : ieri, oggi, domani
Fare musica in esperanto : ieri, oggi, domaniFare musica in esperanto : ieri, oggi, domani
Fare musica in esperanto : ieri, oggi, domaniFederico Gobbo
 
Hollywood Languages: The Challenge of Interlinguistics in the New Millennium
Hollywood Languages: The Challenge of Interlinguistics in the New MillenniumHollywood Languages: The Challenge of Interlinguistics in the New Millennium
Hollywood Languages: The Challenge of Interlinguistics in the New MillenniumFederico Gobbo
 

Mehr von Federico Gobbo (20)

Open Issues of Language Contestation in Italy
Open Issues of Language Contestation in ItalyOpen Issues of Language Contestation in Italy
Open Issues of Language Contestation in Italy
 
Human-Machine Communication strategies in today’s Esperanto community of prac...
Human-Machine Communication strategies in today’s Esperanto community of prac...Human-Machine Communication strategies in today’s Esperanto community of prac...
Human-Machine Communication strategies in today’s Esperanto community of prac...
 
Esperanto as Lingua Receptiva / Esperanto Ricevema
Esperanto as Lingua Receptiva / Esperanto RicevemaEsperanto as Lingua Receptiva / Esperanto Ricevema
Esperanto as Lingua Receptiva / Esperanto Ricevema
 
L’utilizzo di lingue inventate come strumento di educazione interculturale in...
L’utilizzo di lingue inventate come strumento di educazione interculturale in...L’utilizzo di lingue inventate come strumento di educazione interculturale in...
L’utilizzo di lingue inventate come strumento di educazione interculturale in...
 
One species, many languages
One species, many languagesOne species, many languages
One species, many languages
 
Una lingua comune per l'Europa?
Una lingua comune per l'Europa?Una lingua comune per l'Europa?
Una lingua comune per l'Europa?
 
Esperanto: el Eŭropa lulilo al disvastiĝo tra la tuta mondo
Esperanto: el Eŭropa lulilo al disvastiĝo tra la tuta mondoEsperanto: el Eŭropa lulilo al disvastiĝo tra la tuta mondo
Esperanto: el Eŭropa lulilo al disvastiĝo tra la tuta mondo
 
Reflecting upon the European Day of Languages
Reflecting upon the European Day of LanguagesReflecting upon the European Day of Languages
Reflecting upon the European Day of Languages
 
Interlinguistica ed esperantologia oggi: l’esperienza di Amsterdam
Interlinguistica ed esperantologia oggi: l’esperienza di AmsterdamInterlinguistica ed esperantologia oggi: l’esperienza di Amsterdam
Interlinguistica ed esperantologia oggi: l’esperienza di Amsterdam
 
Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...
Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...
Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...
 
Complex Arguments in Adpositional Argumentation
Complex Arguments in Adpositional ArgumentationComplex Arguments in Adpositional Argumentation
Complex Arguments in Adpositional Argumentation
 
Assessing linguistic unease to understand (socio)linguistic justice
Assessing linguistic unease to understand (socio)linguistic justiceAssessing linguistic unease to understand (socio)linguistic justice
Assessing linguistic unease to understand (socio)linguistic justice
 
Lingua di contatto e mobilità: il caso dell'esperanto
Lingua di contatto e mobilità: il caso dell'esperantoLingua di contatto e mobilità: il caso dell'esperanto
Lingua di contatto e mobilità: il caso dell'esperanto
 
Le lingue di minoranza: il caso dell'esperanto
Le lingue di minoranza: il caso dell'esperantoLe lingue di minoranza: il caso dell'esperanto
Le lingue di minoranza: il caso dell'esperanto
 
The Religious Dimensions of the Esperanto Collective Identity
The Religious Dimensions of the Esperanto Collective IdentityThe Religious Dimensions of the Esperanto Collective Identity
The Religious Dimensions of the Esperanto Collective Identity
 
I limiti dell'invenzione linguistica: la tipologia linguistica dall'esperanto...
I limiti dell'invenzione linguistica: la tipologia linguistica dall'esperanto...I limiti dell'invenzione linguistica: la tipologia linguistica dall'esperanto...
I limiti dell'invenzione linguistica: la tipologia linguistica dall'esperanto...
 
Sei definizioni in cerca di una disciplina - Interlinguistica tra comunicazio...
Sei definizioni in cerca di una disciplina - Interlinguistica tra comunicazio...Sei definizioni in cerca di una disciplina - Interlinguistica tra comunicazio...
Sei definizioni in cerca di una disciplina - Interlinguistica tra comunicazio...
 
Lingva Kafejo: Esperanto
Lingva Kafejo: EsperantoLingva Kafejo: Esperanto
Lingva Kafejo: Esperanto
 
Fare musica in esperanto : ieri, oggi, domani
Fare musica in esperanto : ieri, oggi, domaniFare musica in esperanto : ieri, oggi, domani
Fare musica in esperanto : ieri, oggi, domani
 
Hollywood Languages: The Challenge of Interlinguistics in the New Millennium
Hollywood Languages: The Challenge of Interlinguistics in the New MillenniumHollywood Languages: The Challenge of Interlinguistics in the New Millennium
Hollywood Languages: The Challenge of Interlinguistics in the New Millennium
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Improving Flickr discovery through Wikipedias

  • 1. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Improving Flickr discovery through Wikipedias Federico Gobbo {federico.gobbo}@uninsubria.it Universit` degli Studi dell’Insubria a Varese, Italy (cc) Some rights reserved. 1/21
  • 2. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Introduction 1 Why folksonomies are interesting Folksonomies 2 Why folksonomies differ? Linguistic issues 3 Augmented folksonomies through natural language Introducing Flickrpedia 4 Multilingual diversity as the source of knowledge Concluding Remarks 5 2/21
  • 3. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies are interesting A key question of information retrieval today How to add meaningful metadata to web content, in order to increase the utility of information by improve the precision of information retrieval to search engines? 3/21
  • 4. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies are interesting Folksonomies, a tentative answer. What are they? folksonomy = folks + taxonomy A folksonomy is made by tags or labels, usually single-word metadata attached to online items (documents, photos, videos, etc.), in order to add contextual meaning to the items themselves. Folksonomies are a tentative effort toward the goal of improving the precision of information retrieval. 4/21
  • 5. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies differ? Folksonomies and traditional taxonomies Unlike traditional taxonomies, there is no explicit hierarchy between tags nor tags are exclusive. For example, the photo of a cat may be tagged as ‘cat’ and ‘european’ and ‘animal’, but there is nothing that say that all cats are animals: tags can be seen as common facets of the item itself (Schmitz 2006). There is no central authority, and this is the main reason why folksonomies are becoming more and more popular among web resource users. 5/21
  • 6. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies differ? The two different scopes of folksonomies Each tag has two different scopes at the same time: personimy, the user’s defined one (Quintarelli 2005); consensus, the social shared meaning. Consensus is becoming more and more important, as the wide use of tag suggestion interfaces in web applications suggests. 6/21
  • 7. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies differ? Folksonomies and the Long Tail (see the video!) 7/21
  • 8. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Why folksonomies differ? The key concept of serendipity Consensus permits serendipity, i.e. users dig the web through tags finding new, unexpected and useful content, not easily accessible via traditional search engines. Tags are used as filters, i.e. a query on more tags returns the items tagged with any of the given tags – or with all tags, depending on the application (Golder and Huberman 2006). The purpose of this paper is to improve serendipity allowing people to dig folksonomies regardless of the natural language(s) they master. 8/21
  • 9. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Augmented folksonomies through natural language Tags as linguistic objects Tags are words, i.e. alphabetical strings meaningful in some natural language. There is no controlled language. In particular, features unrecognized are: synonymity (different word strings, analogue meaning); homography (identical word string, totally different meaning); different strategies in encoding are possibles (e.g. ‘28-03-2008’, ‘2008March3’, ‘3rd March 2008’); misspellings are very frequent, so standard NLP techniques are banned. Guy and Tonkin (2006) even advocated tag literacy education. 9/21
  • 10. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Augmented folksonomies through natural language The linguistic divide in folksonomies Multilingualism is an issue not fully explored yet in folksonomies. In fact, tags are written in a human language and users are inclined to write in the languages they are comfortable in. It is certainly desiderable for a user not comfortable in English or other big language (in terms of presence in the web) to search and find tags using a search engine interface in his or her tongue, while the engine searches the corresponding tags in English and in other major human languages. 10/21
  • 11. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Multilingual diversity as the source of knowledge How to overcome the linguistic divide? A proposal: through a special web application which extracts the pairs language-tags in every available language before passing the tags to the folksonomy search engine. The claim is improvement in serendipity: when searching in 20 natural languages at the same time, some interesting data will be found, undiscovered through a single language search. 11/21
  • 12. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Multilingual diversity as the source of knowledge Flickr and its API Flickr is one of the most popular web applications for photos (+2 million photos are found if ‘flowers’ are searched, nowadays). Photos are freely tagged by users, so it can be considered a folksonomy. Open source APIs in major programming languages are available and people can make queries to the Flickr repository through an authentication key given on request. http://www.flickr.com/services/api 12/21
  • 13. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Multilingual diversity as the source of knowledge Flickrpedia = Flickr + Wikipedias Flickrpedia is designed on an API in Ruby and over development framework Ruby on Rails (Thomas 2005, Thomas and Heinemeier-Hansson 2005). Users can make queries in Flickr writing a tag specifying its natural language. The system crawls the Wikipedia in the corresponding language and look for an appropriate page. With the help of regular expressions, Flickrpedia parses the web page and extracts the existing language pairs of the same topic in other languages from the appropriate web page box. 13/21
  • 14. How Flickrpedia works German user enters the query in Flickrpedia the system Flugzeug German crawls parsing with the help of regular expressions Airplane Avion Hegazkin ... English French basque the German user obtains the desidered photos from Flickr!
  • 15. The web page box for “alternate languages” in Wikipedia An example: the German word ‘Flugzeug’
  • 16. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Multilingual diversity as the source of knowledge The results of the German word ‘Flugzeug’ At 2007, April, 11, Flickr finds less than 10,000 photos while Flickrpedia more than 20,000 for the same query, giving a lot of unexpected and relevant photos. 16/21
  • 17. Don’t trust me: try by yourself! Word searched: ‘Flugzeug’, i.e. airplane in German http://buffy.sciva.uninsubria.it/∼rl608838/search
  • 18. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Flickrpedia until now Flickrpedia should only store the wikipedias according to the existing natural languages – actually, 85. Large and extemporaneus shared information repositories, like Flickr, can be managed through other semi-structured information repositories as the wikipedias. Flickrpedia, if refined out of its actual prototypical phase, may help users with poor knowledge of major languages to retrieve information only through their lesser-used languages. 18/21
  • 19. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Further direction of Flickrpedia Flickrpedia is far from perfect: homographies are still unmanaged, even if wikipedias have disambiguating pages, and it is not clear which wikipedias to choose in order to optimize serendipity. By now the parsed wikipedias are the biggest ones in terms of wiki pages, but this doesn’t give any guarantee of serendipity augmentation. Finally, the API given by Flickr is a severe limit: up to 20 tags can be inserted in a single query request, and up to 60 thumbnails may be given. 19/21
  • 20. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Beyond Flickrpedia This approach isn’t limited to Flickr as the underlying folksonomy. Our research direction is towards generalization, i.e. users can choose the appropriate folksonomy performing multilingual queries. It is still to demonstrate how to apply this approach to folksonomies where the semantic references are different from photos, i.e. an airplane or a flower is still so in almost every human language, more or less. The real underlying problem is how to measure serendipity, i.e. specific and precise metrics for serendipity are needed. 20/21
  • 21. Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Thank you. Any questions? Download these slides at the following permalink: http://purl.org/net/fgobbo (cc) F. Gobbo 2007. Published in Italy. Attribuzione – Non commerciale – Condividi allo stesso modo 2.5 21/21