Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

AI applications in education, Pascal Zoleko, Flexudy

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
text summarization using amr
text summarization using amr
Wird geladen in …3
×

Hier ansehen

1 von 34 Anzeige

AI applications in education, Pascal Zoleko, Flexudy

Herunterladen, um offline zu lesen

In the quest of improving the quality of education, Flexudy leverages the
power of AI to help people learn more efficiently.
During the talk, I will show how we trained an automatic extractive text
summarizer based on concepts from Reinforcement Learning, Deep Learning and Natural Language Processing. Also, I will talk about how we use pre-trained NLP models to generate simple questions for self-assessment.

In the quest of improving the quality of education, Flexudy leverages the
power of AI to help people learn more efficiently.
During the talk, I will show how we trained an automatic extractive text
summarizer based on concepts from Reinforcement Learning, Deep Learning and Natural Language Processing. Also, I will talk about how we use pre-trained NLP models to generate simple questions for self-assessment.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie AI applications in education, Pascal Zoleko, Flexudy (20)

Anzeige

Weitere von Erlangen Artificial Intelligence & Machine Learning Meetup (7)

Anzeige

AI applications in education, Pascal Zoleko, Flexudy

  1. 1. Erlangen Artificial Intelligence & Machine Learning Meetup presents
  2. 2. AI Applications In Education
  3. 3. Hi, I am Pascal Zoleko My Projects Flexudy PR & AI For Education Study Work PR & AI For Privacy People Analytics Artificial Intelligence & Pattern Recognition
  4. 4. Problems we want to solve 1. Too much to read 2. Too long to read 3. Abstracts are sometime too bold. 4. Abstracts are sometimes too vague. 5. Abstracts are not available for all kinds of text documents. (Web page) Some students (learners) … : 6. Read and forget 7. Can’t continuously evaluate their knowledge on a subject. 8. Can’t revise while on the train, bus etc.
  5. 5. Flexudy Education Today. Automatic Text Summarisation Simple Question Generation Demo Video NLP Ranking Reinforcement Learning Rules A simple overview Good enough to give an idea about the text Deep Learning Fill in the blanks Simple but useful to Remember the keywords found in a text. We won’t have enough time for this. But we can discuss about it.
  6. 6. Text Summarisation Extractive Abstractive Simpler Select the relevant phrases Harder Can generate phrases not found in the text
  7. 7. Automatic Extractive Summarisation Existing Solutions Text rank Lexrank LSA GitHub
  8. 8. Reinforce- ment Learning Ranking Algorithms Natural Language Processing + Automatic Extractive Summarisation Actually Stochastic optimisation With Cross Entropy - model free - policy based
  9. 9. The Summariser Pipeline. “Let AI do all the work and then reap the fruits of its labour.” Step by Step I will avoid technical terms as much as possible. I made no assumption about the audience. So, no Maths!
  10. 10. Summary generation algorithm. Easy, but not trivial. 1. Get user text to be summarised 2. For each sentence in the text 3. Decide if sentence should be added to the summary. 4. If yes, then append the sentence to the summary 5. Format the summary and return How do we train our a summariser ? Reinforcement Learning - Cross-Entropy Can be improved by using other state of the SOTA algorithms, e.g Deep Q-Networks.
  11. 11. First, a quick recap. 1. Agent Money & Environment icon made by Freepik from www.flaticon.com 2. Reward The central idea behind Reinforcement Learning 3. Environment Observation Actions
  12. 12. Trainable Non-linear function Reward How does it translate to our use case ? Original Text Sentence Features Prediction [0, 1] Score The next sentence First, a quick recap. Note: although the environment is fully observable, we decide to observe, sentences one at a time. Can easily be improved: By observing many at a time.
  13. 13. We need data to train the agent. Gutenberg arXiv.org wikipedia Broad corpus for higher coverage. ~50% of our development time. Data is handpicked From different domains: Biology, History, Physics, Psychology etc. Our current (English) implementation used ~300 documents. There is a lot of overfitting obviously. Next release will be trained on a lot more data.
  14. 14. Then prepare the data. Generate random Chunks of text: [25K - 50K] characters Chunks are kept small to keep training episodes short. Better for RL with Cross-entropy Cheap data augmentation. If there are few documents like in our case (e.g ~300) then we obtain overfitting. 28K chars 45.5K chars 25k <= x <= 50k chars … chars We generated 12K chunks in our case 30k <= y <= ~400k chars
  15. 15. Training step by step. Start a new training episode. Tokenise and extract sentences Extract sentence features For each chunk in batch For sentence Agent observes A sentence represents a step in RL jargon. Makes a Decision Add sentence to summary ? YES / NO If YES get a reward « Reward is accumulated « If no more Sentences Save all episode steps and final reward. Money & Environment icon made by Freepik from www.flaticon.com
  16. 16. Feature Extraction Part of speech ratios Dependency ratios Word Embeddings: We use Glove. Could BERT be better ? We will try that soon. Sentence position in document Ratio of skipped sentences Etc. Be creative. Possible Improvement: SOTA sentence embeddings, more complex features (minimising the similarity of sentences) Named Entity Recognition
  17. 17. Decision Making Extract sentence features Agent observes Random Choice Add sentence to summary ? YES / NO 1. Decisions are always random: Yes or No (1 or 0) with probabilities P(Decision = 1) and P(Decision = 0) respectively 2. Probabilities are based on Softmax predictionsoftmax 3. In early, episodes, softmax prediction are arbitrary. 4. We use a fully connected (FC) Neural Network. Five FC Layers each with high dropout probability. to minimise overfitting. Possible Improvement: Sequence models, 1D Convolutional Neural Networks
  18. 18. Reward Rewards are positive and negative: If YES get a reward « Reward is accumulated « Positive if constraints are met. Otherwise negative. How are rewards computed ? With the Textrank algorithm. We forked SummaNLP’s Implementation and modified it to our needs. What are the constraints ? Number of sentences selected S should not exceed an integer M. With M <= total number of sentences. M is the theoretical maximum number of sentences in any generated summary. In our case, M = 20. For example: If a sentence with score x is selected for the summary (i.e yes is predicted), but S >= M then x = -x . In other words, we punish the agent for exceeding the upper bound. Possible Improvement: Try different algorithms, e.g LexRank. Combine algorithms. Manually rank sentences. Money & Environment icon made by Freepik from www.flaticon.com
  19. 19. The steps are repeated for every sentence and every chunk in the batch. Step 1 Step 2 Step 3 Step k … Episode 1 Step 1 Episode 2 … Step 1 Episode j … Sentences Chunks Score E1 ∑ ∑ ∑ score1: s1 s3 score i sK s1 s1 i = 1 K Score E2 Score E3 s2
  20. 20. The learning step. 1. Select the episodes with the best scores. i.e episodes with scores at least as high as some p-th percentile. We chose 90 based on our empirical analysis. 2. Train the agent, on the elite episodes. … Our new “ground truth” { Note: The score is not fed into the Neural Network (Agent) The score is no longer need at inference time.
  21. 21. Loss Reward bound Reward mean The agent is careless. The agent is shy. The agent has learned from experience.
  22. 22. But wait, aren’t we just implicitly learning the TextRank scoring algorithm ?
  23. 23. Yes, but: 1. The model does not depend on vocabulary. 2. Transfer learning can be used to improve the agent: - For particular a use case or in general. - By simply changing the scoring function when training on new data. 3. The pipeline is flexible. - Easily integrate new algorithms and architectures. 4. In practice, summaries a usually generated faster.
  24. 24. An honest example: Summarise this page https://en.wikipedia.org/wiki/Cross_entropy
  25. 25. An honest example: TextRank results - 17 sentences - In information theory, the cross entropy between two probability distributions p {displaystyle p} p and q {displaystyle q} q over the same underlying set … - The cross entropy of the distribution q {displaystyle q} q relative to a distribution p {displaystyle p} p over a given set is defined as follows: - The definition may be formulated using the Kullback–Leibler divergence D K L ( p ‖ q ) {displaystyle D_{mathrm {KL} }(p|q)} D_{{{mathrm {KL}}}}(p|q) of … - For discrete probability distributions p {displaystyle p} p and q {displaystyle q} q with the same support X {displaystyle {mathcal {X}}} {mathcal {X}} … - H ( p , q ) = − ∑ x ∈ X p ( x ) log q ( x ) {displaystyle H(p,q)=-sum _{xin {mathcal {X}}}p(x),log q(x)} {displaystyle H(p,q)=-sum _{xin {mathcal {X}}} … - H ( p , q ) = − ∫ X P ( x ) log Q ( x ) d r ( x ) {displaystyle H(p,q)=-int _{mathcal {X}}P(x),log Q(x),dr(x)} {displaystyle H(p,q)=-int _{mathcal {X}}P(x), … - Therefore, cross entropy can be interpreted as the expected message-length per datum when a wrong distribution q {displaystyle q} q is assumed while … - That is why the expectation is taken over the true probability distribution p {displaystyle p} p and not q {displaystyle q} q. - There are many situations where cross-entropy needs to be measured but the distribution of p {displaystyle p} p is unknown. - This is a Monte Carlo estimate of the true cross entropy, where the test set is treated as samples from p ( x ) {displaystyle p(x)} p(x)[citation needed]. - If the estimated probability of outcome I {displaystyle I} I is q I {displaystyle q_{I}} q_{I}, while the frequency (empirical probability) of outcome I … - 1 N log ∏ I q I N p I = ∑ I p I log q I = − H ( p , q ) {displaystyle {frac {1}{N}}log prod _{I}q_{I}^{Np_{I}}=sum _{I}p_{I}log q_{I}=-H(p,q)} {displaystyle … - When comparing a distribution q {displaystyle q} q against a fixed reference distribution p {displaystyle p} p, cross entropy and KL divergence are … - This has led to some ambiguity in the literature, with some authors attempting to resolve the inconsistency by redefining cross-entropy to be D K L ( p … - The true probability p I {displaystyle p_{I}} p_{I} is the true label, and the given distribution q I {displaystyle q_{I}} q_{I} is the predicted value of the … - Having set up our notation, p ∈ { y , 1 − y } {displaystyle pin {y,1-y}} pin {y,1-y} and q ∈ { y ^ , 1 − y ^ } {displaystyle qin {{hat {y}},1-{hat {y}}}} … - J ( w ) = 1 N ∑ n = 1 N H ( p n , q n ) = − 1 N ∑ n = 1 N [ y n log y ^ n + ( 1 − y n ) log ( 1 − y ^ n ) ] , {displaystyle {begin{aligned}J(mathbf {w} ) … {aligned}J(mathbf {w} ) &= {frac {1}{N}}sum _{n=1}^{N}H(p_{n},q_{n}) = -{frac {1}{N}}sum _{n=1}^{N} {bigg [}y_{n}log {hat {y}}_{n}+(1-y_{n})log( …
  26. 26. An honest example: Flexudy results - 12 sentences - In information theory, the cross entropy between two probability distributions p {displaystyle p} p and q {displaystyle q} q over the … - In information theory, the Kraft–McMillan theorem establishes that any directly decodable coding scheme for coding a message to … - Therefore, cross entropy can be interpreted as the expected message-length per datum when a wrong distribution q {displaystyle … - An example is language modeling, where a model is created based on a training set T {displaystyle T} T, and then its cross-entropy … - In this example, p {displaystyle p} p is the true distribution of words in any corpus, and q {displaystyle q} q is the distribution of … - In these cases, an estimate of cross-entropy is calculated using the following formula: H ( T , q ) = - displaystyle N} N. This is a Monte Carlo estimate of the true cross entropy, where the test set is treated as samples from p ( x ) … - Cross-entropy minimization Cross-entropy minimization is frequently used in optimization and rare-event probability estimation; see the cross-entropy method. - This has led to some ambiguity in the literature, with some authors attempting to resolve the inconsistency by redefining cross- … - Cross entropy can be used to define a loss function in machine learning and optimization. - The output of the model for a given observation, given a vector of input features x {displaystyle x} x, can be interpreted as a … - The typical cost function that one uses in logistic regression is computed by taking the average of all cross-entropies in the sample.
  27. 27. Is Flexudy’s current implementation better than TextRank ?
  28. 28. We cannot tell. We do not yet have evidence to support such a claim. // TODO - Evaluate Flexudy with BLUE and ROUGE Scores
  29. 29. An honest example II: Summarise this page https://en.wikipedia.org/wiki/Renaissance
  30. 30. An honest example II: Flexudy results - 11 sentences - The School of Athens (1509–1511), Raphael Topics Humanism Age of Discovery Architecture Dance Fine arts - Depicting the Hebrew prophet-prodigy-king David as a muscular Greek athlete, the Christian humanist ideal can be seen in the .. - REN-ə-sahnss)[2][a] was a period in European history marking the transition from the Middle Ages to Modernity and covering … - In addition to the standard periodization, proponents of a long Renaissance put its beginning in the 14th century and its end in the 17th … - The traditional view focuses more on the early modern aspects of the Renaissance and argues that it was a break from the past, … The intellectual basis of the Renaissance was its version of humanism, derived from the concept of Roman Humanitas and the rediscovery … - Early examples were the development of perspective in oil painting and the recycled knowledge of how to make concrete. - Although the invention of metal movable type sped the dissemination of ideas from the later 15th century, the changes of the Renaissance … - As a cultural movement, the Renaissance encompassed innovative flowering of Latin and vernacular literatures, beginning with the … - In politics, the Renaissance contributed to the development of the customs and conventions of diplomacy, and in science to an … - Various theories have been proposed to account for its origins and characteristics, focusing on a variety of factors including the … - Other major centres were northern Italian city-states such as Venice, Genoa, Milan, Bologna, and finally Rome during the … The first 2 sentences make absolutely no sense
  31. 31. Hence, there is still a lot of work to do.
  32. 32. Future work 1. Try new architectures and algorithms e.g 1D Convolutions. 2. Support formulas e.g Mathematics: Combine Reinforcement Learning and Logic (Symbolic AI). 3. Manual annotation to improve sentence selection. 4. Collect more data. 5. Use SOTA sentence embeddings. 6. Improve sentence boundary detection algorithms. 7. Implement co-reference resolution to deal with pronouns.
  33. 33. References 1. Deep Reinforcement Learning Hands-On by Maxim Lapan 2. A survey automatic text summarization by Oguzhan Tas & Farzad Kiyani 3. Deep Transfer Reinforcement Learning for Text Summarization by Yaser Naren & Chandan 4. Variations of the Similarity Function of TextRank for Automated Summarization by Federico Barrios, Luis Argerich & Rosa W. 5. Natural language understanding with {B}loom embeddings, convolutional neural networks and incremental parsing by Honnibal, Matthew and Montani, Ines
  34. 34. To learn more about the meetup, click the Link https://www.meetup.com/Erlangen-Artificial-Intelligence-Machine-Learning-Meetup Erlangen Artificial Intelligence & Machine Learning Meetup presents

×