SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
TAUS	
  MACHINE	
  TRANSLATION	
  SHOWCASE	
  
Vancouver,	
  Canada	
  
The Simplified Guide to Getting Started in
SMT
Wednesday, 29 October 2014
Tom Hoar, Precision Translation Tools
The	
  research	
  within	
  the	
  project	
  MosesCore	
  leading	
  to	
  these	
  results	
  has	
  received	
  funding	
  from	
  the	
  European	
  Union	
  7th	
  Framework	
  Programme,	
  grant	
  agreement	
  no	
  288487	
  
The	
  Simplified	
  Guide	
  to	
  	
  
GeGng	
  Started	
  in	
  SMT	
  
Professional	
  tools	
  	
  
Professional	
  experIse	
  
PTTools	
  
•  SoJware	
  vendor	
  -­‐	
  founded	
  Feb	
  2010	
  
– Adobe	
  :	
  Photoshop	
  
– PTTools	
  :	
  DoMT	
  
•  DoMT	
  brand	
  
– DoMT	
  Deskop:	
  organize	
  and	
  manage	
  training	
  
corpora,	
  models	
  and	
  custom	
  workflows.	
  
– DoMT	
  Server:	
  automaIon	
  soluIon	
  
•  Customer	
  educaIon	
  
Who We Are
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeGng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  
Current SMT
Current	
  State	
  
•  Who	
  has	
  not	
  heard	
  of	
  SMT?	
  
•  Requires	
  powerful,	
  expensive	
  hardware	
  
•  Huge	
  translaIon	
  memories	
  
•  Complicated	
  processes	
  
•  Dearth	
  of	
  skilled	
  personnel	
  
Current SMT
Then	
  vs	
  Now	
  
Current SMT
2007	
   2014	
  
Hardware	
   50	
  CPUs	
  in	
  private	
  cloud	
   One	
  24-­‐CPU	
  machine	
  
Mega	
  corpus	
   2	
  weeks	
   36	
  hours	
  
Cost	
   US	
  $100K++	
   US	
  $1,500	
  
1992	
   2014	
  
Computer	
   SGI	
  @	
  $100K	
   Dell	
  @	
  $5,000	
  
SoGware	
   Eclipse	
  Alias	
  @$25K	
   Adobe	
  CS	
  Cloud	
  $1,500	
  
Graphic	
  ProducKon	
   $300	
  per	
  hour	
   $30++	
  per	
  hour	
  
Business	
  Models	
  
•  Where	
  is	
  the	
  work	
  done?	
  
•  Who	
  does	
  the	
  work?	
  
•  Outsourced	
  
– Free	
  
– For	
  Fee	
  
•  Insourced	
  
– Enterprise	
  Server	
  
– Desktop	
  ApplicaIon	
  
Current SMT
Reality	
  2014	
  
•  Inexpensive	
  capable	
  hardware	
  exists	
  
•  TranslaIon	
  memories	
  within	
  reach	
  
•  Processes	
  migraIng	
  to	
  soJware	
  
•  Training	
  available	
  for	
  exisIng	
  personnel	
  
Current SMT
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeLng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  
“Simple Guide”
Is	
  Academic	
  Moses	
  Enough?	
  
“There	
  are	
  considerable	
  amounts	
  of	
  addiIonal	
  
funcIonality...	
  that	
  are	
  not	
  included	
  in	
  Moses	
  
that	
  are	
  essenIal	
  in	
  order	
  to	
  offer	
  a	
  strong	
  
and	
  innovaIve	
  commercial	
  MT	
  plajorm.”	
  
	
  
– Philipp	
  Koehn	
  –	
  Professor,	
  University	
  of	
  Edinburgh	
  
(http://kv-emptypages.blogspot.com/2013/09/understanding-mt-customization.html)
“Simple Guide”
GeGng	
  Started	
  
•  Manage	
  Corpora	
  
•  Mange	
  SMT	
  Models	
  
•  Produce	
  MT	
  
•  Post	
  Edit	
  Results	
  
“Simple Guide”
Manage	
  Corpora	
  
•  Acquire	
  
– TranslaIon	
  memory	
  archives	
  
– Public	
  corpora	
  
– Convert	
  docs	
  
– Recycle	
  post-­‐edited	
  MT	
  
•  Process	
  
– Transform/filter	
  
– Curate/categorize	
  
“Simple Guide”
Manage	
  SMT	
  Models	
  
•  Train	
  TranslaIon	
  models	
  
•  Train	
  Language	
  model	
  
•  Tune	
  SMT	
  model	
  
•  Evaluate	
  SMT	
  model	
  
•  Deploy	
  SMT	
  engine	
  
•  Versioning	
  
“Simple Guide”
Produce	
  MT	
  
•  Manual	
  
– Import/export	
  TMX	
  	
  
– Import/Export	
  XLIFF	
  
– Doc-­‐to-­‐doc	
  support	
  
•  AutomaIon	
  
– TMS	
  IntegraIon	
  
– CAT	
  IntegraIon	
  
“Simple Guide”
Post-­‐edit	
  Results	
  
•  Subject	
  of	
  other	
  presentaIons	
  
•  Recycle	
  as	
  new	
  corpus?	
  
“Simple Guide”
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeGng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  
Human Resources
SMT	
  Specialists	
  
•  ComputaIonal	
  linguists	
  are	
  scienIst	
  who	
  
specialize	
  in	
  language	
  and	
  compuIng	
  to	
  
create	
  and	
  advance	
  the	
  science.	
  
•  Specialists	
  are	
  localizaIon	
  engineers	
  who	
  
review	
  the	
  data	
  and	
  select	
  tools	
  to	
  prepare	
  a	
  
training	
  corpus	
  that	
  minimizes	
  post-­‐ediIng	
  in	
  
commercial	
  producIon.	
  
Human Resources
Specialist’s	
  Required	
  Skills	
  
•  OrganizaIon	
  skills	
  (e.g.	
  manage	
  TM’s)	
  
•  Observant	
  of	
  paserns	
  
•  Willingness	
  to	
  learn	
  
•  Regular	
  expression	
  –	
  helpful	
  
•  Programming	
  skills	
  –	
  unnecessary	
  
•  ComputaIonal	
  linguists	
  –	
  unnecessary	
  
•  System	
  Administrator	
  –	
  unnecessary	
  
Human Resources
Observant	
  of	
  Paserns	
  
Human Resources
Technical pattern
Linguistic patterns
Observant	
  of	
  Paserns	
  
<ut>{cs6f1cf6lang1024	
  </ut>	
  &lt;span	
  
class=&quot;small-­‐text&quot;&gt;	
  <ut>}	
  
</ut>Copyright	
  ©	
  1997-­‐2009	
  &amp;nbsp;	
  n	
  	
  n	
  
•  Archived	
  TMX	
  content	
  
– RTF	
  
– HTML	
  &	
  XML-­‐escaped	
  HTML	
  
– XML	
  
– Broken	
  programmer’s	
  markup	
  
Human Resources
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeGng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  
Use Cases
Use	
  Cases	
  
•  Large	
  LSP	
  
– Extensive	
  MT	
  experience	
  
– CSA	
  Top	
  10	
  
•  2	
  Medium	
  LSP’s	
  
– Post-­‐ediIng	
  experience	
  
– In-­‐house	
  localizaIon	
  engineers	
  
•  Freelance	
  Translator	
  
– United	
  NaIons	
  contractor	
  
– Technically	
  savvy	
  
Use Cases
Welocalize	
  
•  Work:	
  SoJware	
  localizaIon	
  
•  Hardware:	
  Virtual	
  machines	
  for	
  pilot	
  
•  SMT	
  models:	
  EN-­‐ES,	
  EN-­‐DE,	
  EN-­‐ZH,	
  EN-­‐RU	
  
•  Corpus:	
  All	
  corpora	
  <	
  500,000	
  segment	
  pairs	
  
•  Training:	
  3-­‐month	
  pilot	
  
•  Results:	
  “Approached	
  outsourcing	
  vendors”	
  
– Zero-­‐edit	
  measure:	
  25-­‐45%	
  
Use Cases
EQHO	
  CommunicaIons	
  
•  Work:	
  SoJware	
  localizaIon	
  	
  
•  Hardware:	
  $1,500	
  new	
  6-­‐core	
  computer	
  
•  SMT	
  model:	
  EN	
  <-­‐>	
  European	
  language	
  
•  Corpus:	
  ~130,000	
  segment	
  pairs	
  
•  Training:	
  3	
  month	
  pilot	
  
•  Results:	
  BLEU’s	
  80	
  to	
  85	
  
– Zero-­‐edit	
  measure:	
  23-­‐43%	
  
Use Cases
Mid-­‐sized	
  European	
  LSP	
  
•  Work:	
  Financial	
  and	
  regulatory	
  reports	
  
•  SMT	
  model:	
  EN	
  <-­‐>	
  European	
  language	
  
•  Corpus:	
  ~800,000	
  segment	
  pairs	
  (25	
  years)	
  
•  Training:	
  20	
  hours	
  of	
  tutorials	
  over	
  2	
  months	
  
•  Homework:	
  Categorize	
  TM’s	
  for	
  4+	
  months	
  
•  Results:	
  BLEU’s	
  rose	
  from	
  low	
  50’s	
  to	
  mid-­‐80’s	
  
Use Cases
Freelance	
  Translator	
  
•  Work:	
  United	
  NaIons	
  environmental	
  reports	
  
•  Hardware:	
  $1,500	
  new	
  6-­‐core	
  computer	
  
•  SMT	
  model:	
  EN	
  <-­‐>	
  European	
  language	
  
•  Corpus:	
  ~250,000	
  segment	
  pairs	
  (25	
  years)	
  
•  Training:	
  40	
  hours	
  of	
  tutorials	
  over	
  2	
  months	
  
•  Results:	
  BLEU’s	
  75	
  to	
  85	
  
– Zero-­‐edit	
  measure:	
  averaged	
  35%	
  
Use Cases
Conclusion	
  
•  Regardless	
  of	
  business	
  model	
  
– Mange	
  Corpora	
  
– Generate	
  Models	
  
– Product	
  MT	
  
– Publish	
  Results	
  
•  Re-­‐purpose	
  exisIng	
  staff	
  with	
  training	
  
•  Rightsourcing	
  
AGENDA	
  
Current	
  State	
  of	
  SMT	
  
GeGng	
  Started	
  
Skill	
  Requirements	
  
Use	
  Cases	
  
Q&A	
  

Weitere ähnliche Inhalte

Andere mochten auch

Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by AaronBuild Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
Lifeng (Aaron) Han
 
The Future of Technical Communication is Marketing
The Future of Technical Communication is MarketingThe Future of Technical Communication is Marketing
The Future of Technical Communication is Marketing
Scott Abel
 

Andere mochten auch (10)

Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by AaronBuild Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
 
TAUS Moses Roundtable, Prague, 11 September 2013
TAUS Moses Roundtable, Prague, 11 September 2013TAUS Moses Roundtable, Prague, 11 September 2013
TAUS Moses Roundtable, Prague, 11 September 2013
 
TAUS New Year's Reception 2014
TAUS New Year's Reception 2014TAUS New Year's Reception 2014
TAUS New Year's Reception 2014
 
The Future of Technical Communication is Marketing
The Future of Technical Communication is MarketingThe Future of Technical Communication is Marketing
The Future of Technical Communication is Marketing
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
 
TAUS MT Post-Editing Guidelines
TAUS MT Post-Editing GuidelinesTAUS MT Post-Editing Guidelines
TAUS MT Post-Editing Guidelines
 
Antzinaroa eta erdi aroa nora taus
Antzinaroa eta erdi aroa nora tausAntzinaroa eta erdi aroa nora taus
Antzinaroa eta erdi aroa nora taus
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
 
The cognitive era and the future of content
The cognitive era and the future of contentThe cognitive era and the future of content
The cognitive era and the future of content
 

Ähnlich wie TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in SMT, Precision Translation Tools, 2014

LearnFlow Industrial Training for Y.C.C.E Students
LearnFlow Industrial Training for Y.C.C.E StudentsLearnFlow Industrial Training for Y.C.C.E Students
LearnFlow Industrial Training for Y.C.C.E Students
learnflow
 
User Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia OnlineUser Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia Online
ABBYY Language Serivces
 

Ähnlich wie TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in SMT, Precision Translation Tools, 2014 (20)

Machine Teaching for workflow automation RIGA COMM 2020
Machine Teaching for workflow automation RIGA COMM 2020Machine Teaching for workflow automation RIGA COMM 2020
Machine Teaching for workflow automation RIGA COMM 2020
 
Sap abap course
Sap abap course Sap abap course
Sap abap course
 
Sap abap course content
Sap abap course contentSap abap course content
Sap abap course content
 
Translation Trends for 2015
Translation Trends for 2015Translation Trends for 2015
Translation Trends for 2015
 
Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1
Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1
Case Study: Upgrade Strategies for PeopleSoft Financials and Supply Chain 9.1
 
Machine learning specialist ver#4
Machine learning specialist ver#4Machine learning specialist ver#4
Machine learning specialist ver#4
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014
 
Practical Applications of AI: Real World Examples
Practical Applications of AI: Real World ExamplesPractical Applications of AI: Real World Examples
Practical Applications of AI: Real World Examples
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko
 
Preshanth without information
Preshanth without informationPreshanth without information
Preshanth without information
 
LearnFlow Industrial Training Program - G.H.Raisoni
LearnFlow Industrial Training Program - G.H.RaisoniLearnFlow Industrial Training Program - G.H.Raisoni
LearnFlow Industrial Training Program - G.H.Raisoni
 
MT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, LingosailMT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, Lingosail
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
LearnFlow Industrial Training for Y.C.C.E Students
LearnFlow Industrial Training for Y.C.C.E StudentsLearnFlow Industrial Training for Y.C.C.E Students
LearnFlow Industrial Training for Y.C.C.E Students
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
 
Stefan.van.Rensburg - CV-v1
Stefan.van.Rensburg - CV-v1Stefan.van.Rensburg - CV-v1
Stefan.van.Rensburg - CV-v1
 
LearnFlow Industrial Training Program - Y.C.C.E
LearnFlow Industrial Training Program - Y.C.C.ELearnFlow Industrial Training Program - Y.C.C.E
LearnFlow Industrial Training Program - Y.C.C.E
 
User Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia OnlineUser Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia Online
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 

Mehr von TAUS - The Language Data Network

Mehr von TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
How Existing Quality Models Get Challenged, by Katka Gasova (Moravia)
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in SMT, Precision Translation Tools, 2014

  • 1. TAUS  MACHINE  TRANSLATION  SHOWCASE   Vancouver,  Canada   The Simplified Guide to Getting Started in SMT Wednesday, 29 October 2014 Tom Hoar, Precision Translation Tools The  research  within  the  project  MosesCore  leading  to  these  results  has  received  funding  from  the  European  Union  7th  Framework  Programme,  grant  agreement  no  288487  
  • 2. The  Simplified  Guide  to     GeGng  Started  in  SMT   Professional  tools     Professional  experIse  
  • 3. PTTools   •  SoJware  vendor  -­‐  founded  Feb  2010   – Adobe  :  Photoshop   – PTTools  :  DoMT   •  DoMT  brand   – DoMT  Deskop:  organize  and  manage  training   corpora,  models  and  custom  workflows.   – DoMT  Server:  automaIon  soluIon   •  Customer  educaIon   Who We Are
  • 4. AGENDA   Current  State  of  SMT   GeGng  Started   Skill  Requirements   Use  Cases   Q&A   Current SMT
  • 5. Current  State   •  Who  has  not  heard  of  SMT?   •  Requires  powerful,  expensive  hardware   •  Huge  translaIon  memories   •  Complicated  processes   •  Dearth  of  skilled  personnel   Current SMT
  • 6. Then  vs  Now   Current SMT 2007   2014   Hardware   50  CPUs  in  private  cloud   One  24-­‐CPU  machine   Mega  corpus   2  weeks   36  hours   Cost   US  $100K++   US  $1,500   1992   2014   Computer   SGI  @  $100K   Dell  @  $5,000   SoGware   Eclipse  Alias  @$25K   Adobe  CS  Cloud  $1,500   Graphic  ProducKon   $300  per  hour   $30++  per  hour  
  • 7. Business  Models   •  Where  is  the  work  done?   •  Who  does  the  work?   •  Outsourced   – Free   – For  Fee   •  Insourced   – Enterprise  Server   – Desktop  ApplicaIon   Current SMT
  • 8. Reality  2014   •  Inexpensive  capable  hardware  exists   •  TranslaIon  memories  within  reach   •  Processes  migraIng  to  soJware   •  Training  available  for  exisIng  personnel   Current SMT
  • 9. AGENDA   Current  State  of  SMT   GeLng  Started   Skill  Requirements   Use  Cases   Q&A   “Simple Guide”
  • 10. Is  Academic  Moses  Enough?   “There  are  considerable  amounts  of  addiIonal   funcIonality...  that  are  not  included  in  Moses   that  are  essenIal  in  order  to  offer  a  strong   and  innovaIve  commercial  MT  plajorm.”     – Philipp  Koehn  –  Professor,  University  of  Edinburgh   (http://kv-emptypages.blogspot.com/2013/09/understanding-mt-customization.html) “Simple Guide”
  • 11. GeGng  Started   •  Manage  Corpora   •  Mange  SMT  Models   •  Produce  MT   •  Post  Edit  Results   “Simple Guide”
  • 12. Manage  Corpora   •  Acquire   – TranslaIon  memory  archives   – Public  corpora   – Convert  docs   – Recycle  post-­‐edited  MT   •  Process   – Transform/filter   – Curate/categorize   “Simple Guide”
  • 13. Manage  SMT  Models   •  Train  TranslaIon  models   •  Train  Language  model   •  Tune  SMT  model   •  Evaluate  SMT  model   •  Deploy  SMT  engine   •  Versioning   “Simple Guide”
  • 14. Produce  MT   •  Manual   – Import/export  TMX     – Import/Export  XLIFF   – Doc-­‐to-­‐doc  support   •  AutomaIon   – TMS  IntegraIon   – CAT  IntegraIon   “Simple Guide”
  • 15. Post-­‐edit  Results   •  Subject  of  other  presentaIons   •  Recycle  as  new  corpus?   “Simple Guide”
  • 16. AGENDA   Current  State  of  SMT   GeGng  Started   Skill  Requirements   Use  Cases   Q&A   Human Resources
  • 17. SMT  Specialists   •  ComputaIonal  linguists  are  scienIst  who   specialize  in  language  and  compuIng  to   create  and  advance  the  science.   •  Specialists  are  localizaIon  engineers  who   review  the  data  and  select  tools  to  prepare  a   training  corpus  that  minimizes  post-­‐ediIng  in   commercial  producIon.   Human Resources
  • 18. Specialist’s  Required  Skills   •  OrganizaIon  skills  (e.g.  manage  TM’s)   •  Observant  of  paserns   •  Willingness  to  learn   •  Regular  expression  –  helpful   •  Programming  skills  –  unnecessary   •  ComputaIonal  linguists  –  unnecessary   •  System  Administrator  –  unnecessary   Human Resources
  • 19. Observant  of  Paserns   Human Resources Technical pattern Linguistic patterns
  • 20. Observant  of  Paserns   <ut>{cs6f1cf6lang1024  </ut>  &lt;span   class=&quot;small-­‐text&quot;&gt;  <ut>}   </ut>Copyright  ©  1997-­‐2009  &amp;nbsp;  n    n   •  Archived  TMX  content   – RTF   – HTML  &  XML-­‐escaped  HTML   – XML   – Broken  programmer’s  markup   Human Resources
  • 21. AGENDA   Current  State  of  SMT   GeGng  Started   Skill  Requirements   Use  Cases   Q&A   Use Cases
  • 22. Use  Cases   •  Large  LSP   – Extensive  MT  experience   – CSA  Top  10   •  2  Medium  LSP’s   – Post-­‐ediIng  experience   – In-­‐house  localizaIon  engineers   •  Freelance  Translator   – United  NaIons  contractor   – Technically  savvy   Use Cases
  • 23. Welocalize   •  Work:  SoJware  localizaIon   •  Hardware:  Virtual  machines  for  pilot   •  SMT  models:  EN-­‐ES,  EN-­‐DE,  EN-­‐ZH,  EN-­‐RU   •  Corpus:  All  corpora  <  500,000  segment  pairs   •  Training:  3-­‐month  pilot   •  Results:  “Approached  outsourcing  vendors”   – Zero-­‐edit  measure:  25-­‐45%   Use Cases
  • 24. EQHO  CommunicaIons   •  Work:  SoJware  localizaIon     •  Hardware:  $1,500  new  6-­‐core  computer   •  SMT  model:  EN  <-­‐>  European  language   •  Corpus:  ~130,000  segment  pairs   •  Training:  3  month  pilot   •  Results:  BLEU’s  80  to  85   – Zero-­‐edit  measure:  23-­‐43%   Use Cases
  • 25. Mid-­‐sized  European  LSP   •  Work:  Financial  and  regulatory  reports   •  SMT  model:  EN  <-­‐>  European  language   •  Corpus:  ~800,000  segment  pairs  (25  years)   •  Training:  20  hours  of  tutorials  over  2  months   •  Homework:  Categorize  TM’s  for  4+  months   •  Results:  BLEU’s  rose  from  low  50’s  to  mid-­‐80’s   Use Cases
  • 26. Freelance  Translator   •  Work:  United  NaIons  environmental  reports   •  Hardware:  $1,500  new  6-­‐core  computer   •  SMT  model:  EN  <-­‐>  European  language   •  Corpus:  ~250,000  segment  pairs  (25  years)   •  Training:  40  hours  of  tutorials  over  2  months   •  Results:  BLEU’s  75  to  85   – Zero-­‐edit  measure:  averaged  35%   Use Cases
  • 27. Conclusion   •  Regardless  of  business  model   – Mange  Corpora   – Generate  Models   – Product  MT   – Publish  Results   •  Re-­‐purpose  exisIng  staff  with  training   •  Rightsourcing  
  • 28. AGENDA   Current  State  of  SMT   GeGng  Started   Skill  Requirements   Use  Cases   Q&A