Text versus Speech: A Comparison of Tagging Input Modalities for Camera PhonesMauro Cherubini
This document summarizes a research study that compared text and speech input modalities for tagging photos on camera phones. The study tested three hypotheses: 1) speech is preferred over text for tagging, 2) the advantage of speech increases with longer tags, and 3) text is faster than speech for retrieving photos. A user study was conducted with conditions for speech-only, text-only, and allowing both. Results showed speech was not clearly better than text for tagging or retrieving photos. The implications are that systems should support multiple input modalities, enable reviewing audio tags, and allow combining modalities to address their separate strengths and weaknesses.
The students from Edremit, Turkey had a big festival at their school last week. They shared photos from their presentation of the festival in a document. Friends from Puerto Rico congratulated the students of Edremit, Turkey.
This document discusses multimodal scholarship and the digital humanities. It describes different approaches to digital humanities work, from encoding texts to more experimental uses of multimedia. It addresses issues of preserving traditional scholarship digitally, analyzing large datasets, and creating work that explores new forms of expression. The document also provides examples of the author's own multimodal projects and editorial experience in digital humanities publications.
The document discusses an enhanced interactive textbook that explores the 1939 New York World's Fair through four threaded narratives: Context, Chronotope, Specters, and Machines. It provides 3D models, maps, recordings and other digital artifacts to simulate experiences from the fair and situate it within the development of technology. It also references concepts from literature and other scholars to frame discussions around the fair and ideas of machines, automation, and human-computer interaction.
Understanding Near-Duplicate Videos: A User-Centric ApproachMauro Cherubini
Popular content in video sharing web sites (e.g., YouTube) is usually duplicated. Most scholars define near-duplicate video clips (NDVC) based on non-semantic features (e.g., different image/audio quality), while a few also include semantic features (different videos of similar content). However, it is unclear what features contribute to the human perception of similar videos. Findings of two large scale online surveys (N = 1003) confirm the relevance of both types of features. While some of our findings confirm the adopted definitions of NDVC, other findings are surprising. For example, videos that vary in visual content –by overlaying or inserting additional information– may not be perceived as near-duplicate versions of the original videos. Conversely, two different videos with distinct sounds, people, and scenarios were considered to be NDVC because they shared the same semantics (none of the pairs had additional information). Furthermore, the exact role played by semantics in relation to the features that make videos alike is still an open question. In most cases, participants preferred to see only one of the NDVC in the search results of a video search query and they were more tolerant to changes in the audio than in the video tracks. Finally, we propose a user-centric NDVC definition and present implications for how duplicate content should be dealt with by video sharing websites.
Text versus Speech: A Comparison of Tagging Input Modalities for Camera PhonesMauro Cherubini
This document summarizes a research study that compared text and speech input modalities for tagging photos on camera phones. The study tested three hypotheses: 1) speech is preferred over text for tagging, 2) the advantage of speech increases with longer tags, and 3) text is faster than speech for retrieving photos. A user study was conducted with conditions for speech-only, text-only, and allowing both. Results showed speech was not clearly better than text for tagging or retrieving photos. The implications are that systems should support multiple input modalities, enable reviewing audio tags, and allow combining modalities to address their separate strengths and weaknesses.
The students from Edremit, Turkey had a big festival at their school last week. They shared photos from their presentation of the festival in a document. Friends from Puerto Rico congratulated the students of Edremit, Turkey.
This document discusses multimodal scholarship and the digital humanities. It describes different approaches to digital humanities work, from encoding texts to more experimental uses of multimedia. It addresses issues of preserving traditional scholarship digitally, analyzing large datasets, and creating work that explores new forms of expression. The document also provides examples of the author's own multimodal projects and editorial experience in digital humanities publications.
The document discusses an enhanced interactive textbook that explores the 1939 New York World's Fair through four threaded narratives: Context, Chronotope, Specters, and Machines. It provides 3D models, maps, recordings and other digital artifacts to simulate experiences from the fair and situate it within the development of technology. It also references concepts from literature and other scholars to frame discussions around the fair and ideas of machines, automation, and human-computer interaction.
Understanding Near-Duplicate Videos: A User-Centric ApproachMauro Cherubini
Popular content in video sharing web sites (e.g., YouTube) is usually duplicated. Most scholars define near-duplicate video clips (NDVC) based on non-semantic features (e.g., different image/audio quality), while a few also include semantic features (different videos of similar content). However, it is unclear what features contribute to the human perception of similar videos. Findings of two large scale online surveys (N = 1003) confirm the relevance of both types of features. While some of our findings confirm the adopted definitions of NDVC, other findings are surprising. For example, videos that vary in visual content –by overlaying or inserting additional information– may not be perceived as near-duplicate versions of the original videos. Conversely, two different videos with distinct sounds, people, and scenarios were considered to be NDVC because they shared the same semantics (none of the pairs had additional information). Furthermore, the exact role played by semantics in relation to the features that make videos alike is still an open question. In most cases, participants preferred to see only one of the NDVC in the search results of a video search query and they were more tolerant to changes in the audio than in the video tracks. Finally, we propose a user-centric NDVC definition and present implications for how duplicate content should be dealt with by video sharing websites.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like depression and anxiety.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like depression and anxiety.