Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Mark Fishel, TartuNLP
MoMo Estonia / AI & ML
January 14, 2019
Natural organic non-GMO
bio-degradable eco-friendly
Language...
Natural organic non-GMO
bio-degradable eco-friendly
Language Processing
or NLP yesterday, today and tomorrow
or NLP today,...
AI
NLP
● end-user applications
○ translation (neurotolge.ee)
○ text↔speech (neurokone.ee)
○ text mining, information extraction (...
Why?
● NLP makes mistakes!
● in practice: semi-automation, post-editing,
etc.
Why?
1. Step-by-step NLP
● solve separate steps / components
○ via ML, rules, etc.
○ one by one
● put them in a pipeline
○ for that we have to (thi...
ET: ?
LV: Vai tev ir labāka ideja?
Statistical Translation
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev...
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev...
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev...
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev...
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev...
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev...
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev...
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev...
ET: ?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai tev...
ET: Kas
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: Vai t...
ET: Kas sul
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV: V...
ET: Kas sul on
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlketekste?
LV...
ET: Kas sul on parem idee
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõlk...
ET: Kas sul on parem idee?
LV: Vai tev ir labāka ideja?
ET: Mul on parem idee.
LV: Man ir labāka ideja.
ET: Kas sul on tõl...
Actual translation:
● segment input
● translate pieces
● reorder
● put in context
● …
ET: Kas sul on parem idee?
LV: Vai t...
Text-to-speech
1. text to phonemes
e.g. through → [θru], reason → ['rizən]
2. pronunciation for phonemes (or pairs of
phon...
2. End-to-end NLP/ML
● gather input/output examples for the
end-user task
○ (sentence text, speech)
○ (Estonian sentence, English sentence)
● t...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
Thank you very …?
Neural Tran...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
Thank you very much.
Would yo...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
Thank you very much.
Would yo...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
Thank you very much.
Would yo...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art ...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art ...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art ...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art ...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art ...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art ...
p(They used a state-of-the-art approach, nad...) =
= neural_estimator(x, y) =
= { kasutasid: 0.67,
rakendasid: 0.21,
kasut...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art ...
Autoregressive:
● predict output based on (1) input and
(2) already generated partial output
They used a state-of-the-art ...
3. NLP/ML with no data
● explicit data is expensive and wasteful
● what to do for tasks without it?
NLP/ML with no explicit data
Unsupervised Translation
https://aclweb.org/anthology/D18-1549.pdf
Learn from:
A. Tere! Minu nimi on Juhan.
Kui ma eelmisel korral
sellest pildist olen…..
B. We must address this
question a...
A. Tere! Minu nimi on Juhan.
Kui ma eelmisel korral
sellest pildist olen…..
B. We must address this
question as soon as
po...
Estonian English
Latvian Swedish
Zero-shot learning
https://arxiv.org/abs/1611.04558
Estonian English
Latvian Swedish
Zero-shot learning
https://arxiv.org/abs/1611.04558
● style transfer
○ “that’s weird” → “that is strange”
● correcting errors
○ “i biggest your fan” → “I am your biggest fan”...
● “data + task understanding” is stable
● “data + end-to-end neural networks” is
cool and promising
● “no data, thing stil...
Thanks!
neurotolge.ee
neurokone.ee
livesubs.ee
nlp.cs.ut.ee
Nächste SlideShare
Wird geladen in …5
×

Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute of Computer Science, University of Tartu

Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute of Computer Science, University of Tartu

  • Loggen Sie sich ein, um Kommentare anzuzeigen.

  • Gehören Sie zu den Ersten, denen das gefällt!

Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute of Computer Science, University of Tartu

  1. 1. Mark Fishel, TartuNLP MoMo Estonia / AI & ML January 14, 2019 Natural organic non-GMO bio-degradable eco-friendly Language Processing or NLP yesterday, today and tomorrow
  2. 2. Natural organic non-GMO bio-degradable eco-friendly Language Processing or NLP yesterday, today and tomorrow or NLP today, tomorrow and the day after Mark Fishel, TartuNLP MoMo Estonia / AI & ML January 14, 2019
  3. 3. AI NLP
  4. 4. ● end-user applications ○ translation (neurotolge.ee) ○ text↔speech (neurokone.ee) ○ text mining, information extraction (texta.ee) ○ chat bots ○ world domination, destruction of humanity ○ etc. ● components ● analysis, linguistics ● etc. NLP
  5. 5. Why?
  6. 6. ● NLP makes mistakes! ● in practice: semi-automation, post-editing, etc. Why?
  7. 7. 1. Step-by-step NLP
  8. 8. ● solve separate steps / components ○ via ML, rules, etc. ○ one by one ● put them in a pipeline ○ for that we have to (think we) understand how it works ● … ● profit! NLP before: step-by-step
  9. 9. ET: ? LV: Vai tev ir labāka ideja? Statistical Translation
  10. 10. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  11. 11. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  12. 12. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  13. 13. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  14. 14. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  15. 15. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  16. 16. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  17. 17. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  18. 18. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  19. 19. ET: Kas LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  20. 20. ET: Kas sul LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  21. 21. ET: Kas sul on LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  22. 22. ET: Kas sul on parem idee LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  23. 23. ET: Kas sul on parem idee? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  24. 24. Actual translation: ● segment input ● translate pieces ● reorder ● put in context ● … ET: Kas sul on parem idee? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  25. 25. Text-to-speech 1. text to phonemes e.g. through → [θru], reason → ['rizən] 2. pronunciation for phonemes (or pairs of phonemes): e.g. θr → 3. “glue” pieces together → speech
  26. 26. 2. End-to-end NLP/ML
  27. 27. ● gather input/output examples for the end-user task ○ (sentence text, speech) ○ (Estonian sentence, English sentence) ● teach end-to-end deep neural black magic to go from input to output ○ ignore how we think it should be done ● … ● profit! NLP now: end-to-end
  28. 28. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very …? Neural Translation
  29. 29. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very much. Would you like tea or …? Neural Translation
  30. 30. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very much. Would you like tea or coffee? Dear …? Neural Translation
  31. 31. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very much. Would you like tea or coffee? Dear (ladies and gentlemen / mom / …) Neural Translation
  32. 32. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → … Neural Translation
  33. 33. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad … Neural Translation
  34. 34. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid … Neural Translation
  35. 35. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset … Neural Translation
  36. 36. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit … Neural Translation
  37. 37. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit KÕIK. Neural Translation
  38. 38. p(They used a state-of-the-art approach, nad...) = = neural_estimator(x, y) = = { kasutasid: 0.67, rakendasid: 0.21, kasutavad: 0.04, … } Neural Translation
  39. 39. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit KÕIK. Speech↔text: same/similar end-to-end approach End-to-end NLP
  40. 40. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit KÕIK. Speech↔text: same/similar end-to-end approach NB: needs lots of explicit examples (data) End-to-end NLP
  41. 41. 3. NLP/ML with no data
  42. 42. ● explicit data is expensive and wasteful ● what to do for tasks without it? NLP/ML with no explicit data
  43. 43. Unsupervised Translation https://aclweb.org/anthology/D18-1549.pdf
  44. 44. Learn from: A. Tere! Minu nimi on Juhan. Kui ma eelmisel korral sellest pildist olen….. B. We must address this question as soon as possible. Why have we not….. Task: Translate between English and Estonian without a single translation example! https://aclweb.org/anthology/D18-1549.pdf Unsupervised Translation
  45. 45. A. Tere! Minu nimi on Juhan. Kui ma eelmisel korral sellest pildist olen….. B. We must address this question as soon as possible. Why have we not….. Or: translate dog barks / kid speech Unsupervised Translation
  46. 46. Estonian English Latvian Swedish Zero-shot learning https://arxiv.org/abs/1611.04558
  47. 47. Estonian English Latvian Swedish Zero-shot learning https://arxiv.org/abs/1611.04558
  48. 48. ● style transfer ○ “that’s weird” → “that is strange” ● correcting errors ○ “i biggest your fan” → “I am your biggest fan” click Zero-shot NLP demo
  49. 49. ● “data + task understanding” is stable ● “data + end-to-end neural networks” is cool and promising ● “no data, thing still works” is sexy! Message to take home
  50. 50. Thanks! neurotolge.ee neurokone.ee livesubs.ee nlp.cs.ut.ee

×