On June 10-11, 2010, the University of Maryland hosted a Workshop on Crowdsourcing and Translation. Originally put together in connection with a project on collaborative translation, the goal was to bring together a set of people whose work is helping to define this new and exciting area, and to create an opportunity for discussions that will help define its future directions.
Axa Assurance Maroc - Insurer Innovation Award 2024
Bederson and resnik june 2010
1. Translation as a Collaborative Activity Benjamin B. Bederson & Philip Resnik Computer Science Department Human-Computer Interaction Lab Department of Linguistics Computational Linguistics and Information Processing Lab University of Maryland
4. A real-world problem: ICDL Now 4,386 books 54 languages Some translations in a few languages 49,420 adults, 29,102 children registered ~100K unique visitors/month Goal 10,000 books 100 languages Every book in every language!
5. Machine Translation (MT) (餐厅= restaurant, dining hall) Large volume, cheap, fast Unreliable quality
9. Translation with the Crowd Translate with the MonolingualCrowd vs. 75,000 contributors Wikipedia: 800 translators
10. Machine Translation Monolingual Human Participation Affordability Amateur Bilingual Human Participation Professional Bilingual Human Participation Quality
11. Monolingual translation protocol Original source sentence Noisy target hypothesis F0 E0 MT Monolingual post-editing Fluent target hypothesis Noisy back translation E1 MT F1 HTER editing F2 Fluent, accurate E2 MT Et cetera…
12. Monolingual translation protocol Each participant is performing a monolingual task: Infer partner’s intended meaning as well as possible Express that meaning grammatically in own language Source language participant has extra constraint: Expressed meaning must match source sentence Conflict? Original meaning wins
13. Three Types of Errors detectable and correctable Tout le monde doit entendre l'histoire de Cendrillon. MT Everybody must to hear story about Cinderella Monolingual post-editing Everybody must hear the story about Cinderella
14. Three Types of Errors detectable but not correctable Tout le monde doit entendre l'histoire de Cendrillon. MT Everybody must heard the business by Cinderella Monolingual post-editing ?
15. Three Types of Errors not detectable Tout le monde doit entendre l'histoire de Cendrillon. MT Everybody loves the story about Cinderella
16. Enrich Translations Increase redundancy and shared context … … to help make detectable errors correctable … to help make undetectable errors detectable
25. Preliminary validation of the protocol Language pair: Russian to Chinese Hard case: no orthographic cues Easy to find local volunteers Two Russian speakers and four Chinese speakers Four Russian-Chinese translation pairs (Russians twice) One hour per pair Worked on 44 sentences (6 pages), finished 28 = ~8.5 minutes per sentence (~1 word per minute) N.B. average translators: ~2500 words per 8hr day (= ~5 words per minute)
28. Ishida, Lin and colleagues at Kyoto University Department of Social Informatics have independently developed a very similar back-and-forth protocol.In their protocol, there is no enrichment to increase redundancy: if the target participant cannot make sense of the whole sentence, he or she requests that the entire original sentence be rephrased.
29. Global Internet User Population Source: http://www.internetworldstats.com/stats7.htm
30. Announced a Popular Movement for the Liberation of Sudan to withdraw its candidate in the presidential elections scheduled in April this as confirmation of the leaders in the movement. اعلنتالحركةالشعبيةلتحريرالسودانسحبمرشحهافيالانتخاباتالرئاسيةالمقرراجراؤهافينيسان/ابريلالجاري،حسبتاكيداتلقياديينفيالحركة. Announced SPLM withdraw its candidate in the presidential elections in April by assurances to leaders in the movement. http://news.bbc.co.uk/2/hi/africa/8597996.stm
36. One more observation The original source sentence is not the only way the intended meaning could have been expressed. Suppose this phrasing is difficult to translate correctly a restaurant close by Perhaps one of these alternatives can be more successful
37. Polls indicate Brown, a state senator, and Coakley, Massachusetts’ Attorney General, are locked in a virtual tie to fill the late Sen. Ted Kennedy’s Senate seat Les sondagesindiquent Brown, un s ´enateurd’ ´ etat, et Coakley, Massachusetts’ Procureurg´en´eral, sontenferm´ esdansunecravatevirtuel `a remplir le regrett´es ´enateur Ted Kennedy’s si`ege au S´enat. Polls indicate Brown, a state senator, and Coakley, Massachusetts’ Attorney General, are locked in a virtual tie to fill the late Sen. Ted Kennedy’s Senate seat Les sondagesindiquent Brown, un s´enateurd’ ´ etat, et Coakley, Massachusetts’ Procureurg´en´eral, sontenferm´esdansunecravatevirtuel `a remplir le regrett´es´enateur Ted Kennedy’ssi`ege au S´enat. Polls indicate that Brown, a state senator, and Coakley,the Attorney General of Massachusetts, are locked in a virtual tie to fill the Senate seat of the Sen. Ted Kennedy, who died recently. Les sondagesindiquentque Brown, un s ´ enateurd’ ´ etat, et Coakley, le procureurg ´en´eral du Massachusetts, sontenferm´ esdansunecravatevirtuelpourvoir le sige au S´enat de Sen. Ted Kennedy, qui estd´ ec´ed´er´ecemment
38.
39. Automatically determining where the errors are NP NP PP F visit Jupiter to was the Pluto-bound new horizons spacecraft probe The most recent Mismatches? D S F’ visit Jupiter was the Pluto-bound new horizons spacecraft The latest research MT MT the most recent probe to visit jupiterwas the pluto-bound new horizons spacecraft E
40. the press trust of india quoted the government minister for relief and rehabilitation kadam kadam, the government’s relief and rehabilitation minister (2/3) the government minister concerned with relief and rehabiliationkadam (1/3) as revealing today that in the last week, the monsoon has started in all of india’s states one every one of india’s state, one (3/3) each of India’s states one (2/3) all states of india one (1/3) after another, and that the financial losses and casualties have been serious in all areas. just in maharashtra, the state which includes mumbai, india’s largest city, india's largest city, mumbai (3/3) the largest city in India, Mumbai, (3/3) mumbai, the largest city of india, (3/3) the number of people known to have died who died (3/3) identified to have died (2/3) known to have passed away (2/3) has now reached 358. For 31% of the sentences in an English-to-Chinese experiment, at least one new version of the sentence leads to better translation. Often the gains are quite substantial.
41. Polls indicate Brown, a state senator, and Coakley, Massachusetts’ Attorney General, are locked in a virtual tie to fill the late Sen. Ted Kennedy’s Senate seat Les sondagesindiquent Brown, un s ´enateurd’ ´ etat, et Coakley, Massachusetts’ Procureurg´en´eral, sontenferm´ esdansunecravatevirtuel `a remplir le regrett´es ´enateur Ted Kennedy’s si`ege au S´enat. Polls indicate Brown, a state senator, and Coakley, Massachusetts’ Attorney General, are locked in a virtual tie to fill the late Sen. Ted Kennedy’s Senate seat Les sondagesindiquent Brown, un s´enateurd’ ´ etat, et Coakley, Massachusetts’ Procureurg´en´eral, sontenferm´esdansunecravatevirtuel `a remplir le regrett´es´enateur Ted Kennedy’ssi`ege au S´enat. Polls indicate that Brown, a state senator, and Coakley,the Attorney General of Massachusetts, are locked in a virtual tie to fill the Senate seat of the Sen. Ted Kennedy, who died recently. Les sondagesindiquentque Brown, un s ´ enateurd’ ´ etat, et Coakley, le procureurg ´en´eral du Massachusetts, sontenferm´ esdansunecravatevirtuelpourvoir le sige au S´enat de Sen. Ted Kennedy, qui estd´ ec´ed´er´ecemment
42. Where to from here? Larger and more formal validation of the protocol Exploring the space of richer annotations Reconsidering UI for: Ease of use Throughput Parallel, multi-person contribution Exploring the space of automatic and human error detection and paraphrase
43. Collaborators and Sponsors Chang Hu CS Ph.D. student Olivia Buzek CS/Linguistics undergrad Alex Quinn CS Ph.D. student Yakov Kronrod Linguistics Ph.D. student
Hinweis der Redaktion
ICDL has4418 books in 54 (*nothing in the middle*) languages. Our goal is to have 10,000 books translated into 100 languages so we can have 1M book languages. This means not only a lot of translation, but also among some uncommon language pairs. For example, Croatian into japaneseHow do we do that?
15% of visitors visited 5 or more times this month45% of visitors visit for 3 minutes or longer21% of visitors look at 20 or more pageviews
Some of you might say, machine translation
How about giving translation to humans? Indeed, professional translators can provide HQ translation, but they are slow and expensive
Compare to all bilingual people, there are much much more monolingual people who could probably help
To give a rough idea about what we could do with monolingual crowd translation, let’s look at the methods of translation in this space.…It is the scalability that I am interested in - scalabilitybiligual doesn’t scale well
The translation is from Russian to Chinese. The Russian version is a volunteer's translation from the "original" English book (which is in itself a translation from the original story in Croatian). The 28 Russian sentences contain 213 words. The Chinese translations contain 410 characters. That is roughly 50 words per hours in Russian, 100 characters per hour in Chinese.
Shift from inaccurate to more accurateGrade the sentences from not translated to fully translatedLook at the number of sentences with each gradeshft
Shift from inaccurate to more accurateGrade the sentences from not translated to fully translatedLook at the number of sentences with each gradeshft