SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Pre and post editing
environment for Apertium




                                   Lluís Villarejo
                           Learning Technologies
                                     March 2012
c



                 What is GSoC?
• It's a global program that offers student developers stipends
  to write code for various open source software projects.
• Since 2005

• Inspire young developers to participate in OSS projects.
• Give students more exposure to real-world soft dev
  scenarios.
• Get more open source code created and released.
• Help open source prjs identify and bring in new developers.
c



             Some participants
•   Apache Soft. Found.   •   Sakai Foundation
•   Debian                •   Mozilla
•   Facebook              •   Inclusive Design Inst.
•   Drupal                •   The Linux Foundation
•   Creative Commons      •   The GNU project
•   DocBook project       •   Wikimedia Foundation
•   GCC                   •   WordPress
•   Gnome                 •   Inclusive Design Inst.
•   ...                   •   ...
c



                How does it work?
•   Orgs present themselves as mentoring agents.
•   Orgs present a list of potential projects and mentors.
•   Accepted orgs should try to attract students' interest.
•   Students build project proposals.
•   Google finances slots for each org (5.000 + 500 USD).
•   The project community decides the student-slot assignation.
•   Between end of May and end of August.
c



               GsoC'11 statistics
• $7.2M budget

• 1115 students accepted from 68 countries

• 2096 mentors and co-mentors from 55 countries

• 175 Open Source organizations

• 18.1% of students have participated in previous years

• 97 countries with student applicants

• 88% overall success rate
c



Accepted Students GSoC'11
c



Why participating with Apertium?
• Strategically:
   – Apertium is a strategic agent inside UOC.
   – Developing Apertium means further developing
     internationalization aids for UOC.
   – Attract and onboard new developers for Apertium.
   – Collaboration with Google's Open Source initiatives.

• Functionally:
   – Opporutnity to further develop specific UOC needs with
     external funding.
   – Capitalize specific user feedback on translation quality.
c



              The Apertium case
• 20 proposed tasks
• 17 tasks got interest from students [1-9]
   – Pre and post-editing environment gets 11 students
     interested.

• Apertium community ranks the 17 tasks
   – Pre and post-editing environment ranks 4th

• Google assigns 9 slots to Apertium (49.500 USD)
  – Our task goes through and Camille Mougey is selected
    from the Grenoble Insitute of Technology.
c



      Pre and post-editing, why?
• An important part of the errors you get when translating a
  document are due to deficiencies in the original.
• The integration of existing resources can help to ease this
  burden:
   – Digital knowledge sources (digital dictionaries... )
   – Automatic tools (spell-checker, grammar checker, translation
     memory generation, search & replace...)
• These processes should be integrated naturally in the
  translation workflow → the need for an integrated web interface
  to Apertium.
• To improve the system we need to have access to the human
  post-editing process.
c



     Pre and post-editing, features
•   Pre and Post-editing web interface integrated with Apertium translation toolbox.
•   Spell checking on source and target languages. Integration with Aspell
•   Grammar checking on source and target languages. Integration with
    LanguageTool
•   Integration with several external dictionaries.
•   Search & replace functionalities on source and target languages.
•   Ability to deal with formatted text.
•   Logging system. All events are logged as they happen, ie at the very moment
    the user inserts or deletes text. This allows for a further data mining process to
    be run on the logs to detect commonly modified structures or vocabulary.
•   Translation memory generation. Integration of Maligna.
•   PDF translation through pdftohtml
•   Image translation. Through tesseract.
                                                                        Final report 2010
                                                                        Final report 2011
c



        Results & learned lessons
• Fully functional environment, goals accomplished.
• Automatic availability of feedback on post-editing human
  behaviour.

•   Jointly defined task (flexible framework provided).
•   Interest in developing great empathy with the student.
•   Motivated and pro-active student.
•   Student engagement.
•   Very frequent feedback.
•   Mentoring team with access to ABSOLUTELY ALL the
    information regarding the project.
c



                   Further work
• Proof of concept accomplished.
• Base platform developed so further work can be easily
  added.
• Integration of other resources (more external dictionaries).
• Extension of currently used resources (addition of
  grammar rules, dictionaries improvement, format range
  extension).
• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
c



                    GsoC 2012




• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
• Post-edition over formatted text.
c




   Thanks
Questions & answers

Weitere ähnliche Inhalte

Ähnlich wie Google Summer of Code 2011: UOC & Apertium

HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...Bluechip Technologies
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research softwareShoaib Sufi
 
A community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationA community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationDevCSI
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising Anna Perricci
 
Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Vladimir Vassilev
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Codeguest59ccff
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation ComparisonJody Garnett
 
Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Jeff McKenna
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiSatoru Kizaki
 
Venturing into the cloud
Venturing into the cloudVenturing into the cloud
Venturing into the cloudJeff Piontek
 
CPSeis & GeoCraft
CPSeis & GeoCraftCPSeis & GeoCraft
CPSeis & GeoCraftbillmenger
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languagesDanny Liu
 
Open World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayOpen World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayAlexis Monville
 
OER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsOER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsUna Daly
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayAlexis Monville
 
Software Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeSoftware Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeNeil Chue Hong
 
Shirley Evans
Shirley EvansShirley Evans
Shirley EvansJisc
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation ComparisonJody Garnett
 

Ähnlich wie Google Summer of Code 2011: UOC & Apertium (20)

HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research software
 
A community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationA community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher education
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
 
Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Code
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 
Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizaki
 
Induction session
Induction sessionInduction session
Induction session
 
Venturing into the cloud
Venturing into the cloudVenturing into the cloud
Venturing into the cloud
 
CPSeis & GeoCraft
CPSeis & GeoCraftCPSeis & GeoCraft
CPSeis & GeoCraft
 
summer internship
summer internshipsummer internship
summer internship
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languages
 
Open World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayOpen World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source Way
 
OER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsOER Authoring and Delivery Platforms
OER Authoring and Delivery Platforms
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source Way
 
Software Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeSoftware Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a Change
 
Shirley Evans
Shirley EvansShirley Evans
Shirley Evans
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 

Mehr von Office of Learning Technologies, Universitat Oberta de Catalunya

Mehr von Office of Learning Technologies, Universitat Oberta de Catalunya (20)

My uoc mobil
My uoc mobilMy uoc mobil
My uoc mobil
 
How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014
 
Presentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprintPresentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprint
 
Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)
 
Diseño universal y personalización en entornos virtuales de aprendizaje para...
Diseño universal y personalización en entornos virtuales  de aprendizaje para...Diseño universal y personalización en entornos virtuales  de aprendizaje para...
Diseño universal y personalización en entornos virtuales de aprendizaje para...
 
2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users
 
Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...
 
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendacionesEstudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
 
Augmented reality & cultural heritage eiasm 2013
Augmented reality & cultural heritage   eiasm 2013Augmented reality & cultural heritage   eiasm 2013
Augmented reality & cultural heritage eiasm 2013
 
Augmented reality, education & tourism
Augmented reality, education & tourism Augmented reality, education & tourism
Augmented reality, education & tourism
 
E-learning, tourism and augmented reality
E-learning, tourism and augmented realityE-learning, tourism and augmented reality
E-learning, tourism and augmented reality
 
Education and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritageEducation and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritage
 
Augmented reality
Augmented reality   Augmented reality
Augmented reality
 
Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...
 
Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:
 
Iuoc mobile2.0 2011
Iuoc mobile2.0 2011Iuoc mobile2.0 2011
Iuoc mobile2.0 2011
 
iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011
 
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
 
Gestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móvilesGestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móviles
 
Presentació o2
Presentació o2Presentació o2
Presentació o2
 

Kürzlich hochgeladen

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 

Kürzlich hochgeladen (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 

Google Summer of Code 2011: UOC & Apertium

  • 1. Pre and post editing environment for Apertium Lluís Villarejo Learning Technologies March 2012
  • 2. c What is GSoC? • It's a global program that offers student developers stipends to write code for various open source software projects. • Since 2005 • Inspire young developers to participate in OSS projects. • Give students more exposure to real-world soft dev scenarios. • Get more open source code created and released. • Help open source prjs identify and bring in new developers.
  • 3. c Some participants • Apache Soft. Found. • Sakai Foundation • Debian • Mozilla • Facebook • Inclusive Design Inst. • Drupal • The Linux Foundation • Creative Commons • The GNU project • DocBook project • Wikimedia Foundation • GCC • WordPress • Gnome • Inclusive Design Inst. • ... • ...
  • 4. c How does it work? • Orgs present themselves as mentoring agents. • Orgs present a list of potential projects and mentors. • Accepted orgs should try to attract students' interest. • Students build project proposals. • Google finances slots for each org (5.000 + 500 USD). • The project community decides the student-slot assignation. • Between end of May and end of August.
  • 5. c GsoC'11 statistics • $7.2M budget • 1115 students accepted from 68 countries • 2096 mentors and co-mentors from 55 countries • 175 Open Source organizations • 18.1% of students have participated in previous years • 97 countries with student applicants • 88% overall success rate
  • 7. c Why participating with Apertium? • Strategically: – Apertium is a strategic agent inside UOC. – Developing Apertium means further developing internationalization aids for UOC. – Attract and onboard new developers for Apertium. – Collaboration with Google's Open Source initiatives. • Functionally: – Opporutnity to further develop specific UOC needs with external funding. – Capitalize specific user feedback on translation quality.
  • 8. c The Apertium case • 20 proposed tasks • 17 tasks got interest from students [1-9] – Pre and post-editing environment gets 11 students interested. • Apertium community ranks the 17 tasks – Pre and post-editing environment ranks 4th • Google assigns 9 slots to Apertium (49.500 USD) – Our task goes through and Camille Mougey is selected from the Grenoble Insitute of Technology.
  • 9. c Pre and post-editing, why? • An important part of the errors you get when translating a document are due to deficiencies in the original. • The integration of existing resources can help to ease this burden: – Digital knowledge sources (digital dictionaries... ) – Automatic tools (spell-checker, grammar checker, translation memory generation, search & replace...) • These processes should be integrated naturally in the translation workflow → the need for an integrated web interface to Apertium. • To improve the system we need to have access to the human post-editing process.
  • 10. c Pre and post-editing, features • Pre and Post-editing web interface integrated with Apertium translation toolbox. • Spell checking on source and target languages. Integration with Aspell • Grammar checking on source and target languages. Integration with LanguageTool • Integration with several external dictionaries. • Search & replace functionalities on source and target languages. • Ability to deal with formatted text. • Logging system. All events are logged as they happen, ie at the very moment the user inserts or deletes text. This allows for a further data mining process to be run on the logs to detect commonly modified structures or vocabulary. • Translation memory generation. Integration of Maligna. • PDF translation through pdftohtml • Image translation. Through tesseract. Final report 2010 Final report 2011
  • 11. c Results & learned lessons • Fully functional environment, goals accomplished. • Automatic availability of feedback on post-editing human behaviour. • Jointly defined task (flexible framework provided). • Interest in developing great empathy with the student. • Motivated and pro-active student. • Student engagement. • Very frequent feedback. • Mentoring team with access to ABSOLUTELY ALL the information regarding the project.
  • 12. c Further work • Proof of concept accomplished. • Base platform developed so further work can be easily added. • Integration of other resources (more external dictionaries). • Extension of currently used resources (addition of grammar rules, dictionaries improvement, format range extension). • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine.
  • 13. c GsoC 2012 • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine. • Post-edition over formatted text.
  • 14. c Thanks Questions & answers