Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

19BCE1367_Capstone_Review 2_Final.pdf

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 34 Anzeige

Weitere Verwandte Inhalte

Ähnlich wie 19BCE1367_Capstone_Review 2_Final.pdf (20)

Aktuellste (20)

Anzeige

19BCE1367_Capstone_Review 2_Final.pdf

  1. 1. School of Computer Science and Engineering Register No: 19BCE1367 Deep Neural Network-based Limerick Generation for an Image Name: Divyanshi Thapa Register No: 19BCE1367 Programme and Specialization: B.Tech CSE CAPSTONE PROJECT REVIEW 2 Guide Name: Dr. Praveen Joe I R
  2. 2. School of Computer Science and Engineering Register No: 19BCE1367 01 Introduction Outline � � 02 03 05 06 08 07 Problem Statement Research Challenges What to be done next Guide Approval Proposed System Research Paper Status 04 Research Objectives 09 References
  3. 3. School of Computer Science and Engineering Register No: 19BCE1367 Introduction 01
  4. 4. School of Computer Science and Engineering Register No: 19BCE1367 ● Creative writing using artificial intelligence (AI) is one of the most popular and rapidly growing research fields. It is highly intriguing but also challenging as we go more to the side of generating human-like texts with constraints as we have in poems. ● Among creative writing tasks, paraphrasing and writing stories are easier than writing poetry because poems have many restrictions such as rhyming structures, number of lines, type of language, etc. Introduction
  5. 5. School of Computer Science and Engineering Register No: 19BCE1367 ● Several poem frameworks have been developed to assist AI in generating human-like poems to address the issue. Introduction ● Poems in literature can be broadly classified into nine categories depending on their rhyming structure and the number of lines. Among all the nine categories, one of the most challenging tasks is to generate a limerick using artificial intelligence and deep learning as a limerick is a five-lined poem that has a strict rhyming structure of AABBA
  6. 6. School of Computer Science and Engineering Register No: 19BCE1367 ● Image captioning has also helped to automatically generate well-formed sentences from a given image which is widely used in many NLP tasks such as VQA. ● Language models based on neural networks have improved the state of the art with regard to predictive language modeling, while topic models are successful at capturing clear-cut, semantic dimensions. ● NLP + DL = a system which can understand and analyze an image and can generate a creative human like poem based of the theme of the image. NLP + Deep learning
  7. 7. School of Computer Science and Engineering Register No: 19BCE1367 Problem Statement 02
  8. 8. School of Computer Science and Engineering Register No: 19BCE1367 ● For a poem to be meaningful, both linguistic and literary aspects need to be taken into account. ● With the advancement in image captioning, the NLP tasks such as Question Answering has gone to it’s phase 2 that is Visual Question Answering. ● “To create a deep learning model which can create limericks (a form of poem) for the given input image in English language. ” Problem Statement
  9. 9. School of Computer Science and Engineering Register No: 19BCE1367 Current approaches of generating rhyming English poetry with a neural network involve constraining output to enforce the condition of rhyme. The generated poem should be: ● According to the context or theme of the given input image ● Error free ● Coherent ● Follows the rhyming structure of the limerick (AABBA) Problem Statement
  10. 10. School of Computer Science and Engineering Register No: 19BCE1367 Research Challenges 03
  11. 11. School of Computer Science and Engineering Register No: 19BCE1367 1. Mapping the theme of the image with the topic of poem. 2. Both linguistic and literary aspects need to be taken into account so that the poem is meaningful. 3. Syntactic well-formedness and topical coherence throughout the poem. 4. Rhyming constraint (Maintaining rhyming scheme) 5. Certain amount of creativity in literature for making poem interesting. Research Challenges
  12. 12. School of Computer Science and Engineering Register No: 19BCE1367 Research Objectives 04
  13. 13. School of Computer Science and Engineering Register No: 19BCE1367 1. An attempt to mimic human creative writing by creating a simple framework for image to poem generation for English language. 2. Using a transformer models for better image captioning and limerick generation . 3. A framework to generate poems (limericks) efficiently so that it can be deployed as a public application after the post- processing. 4. Major focus on maintaining the coherency, rhyming structure of limerick and the efficiency of the framework. Research Objectives
  14. 14. School of Computer Science and Engineering Register No: 19BCE1367 Proposed System 05
  15. 15. School of Computer Science and Engineering Register No: 19BCE1367 ● The goal is also to make a speed-efficient framework and to do so, the transformer models are the choice for image analysis and limerick generation. The features of the image are extracted and the description is generated by the Vision encoder-decoder model which is a combination of a vision transformer as an encoder for image feature extraction and GPT-2 as a decoder for generating human-like captions. ● This caption is treated as the first line of the limerick and is fed to another GPT- 2 model for generating a pool of 20 limericks. ● Best limerick is selected as the final output after post-processing. Proposed System Introduction
  16. 16. School of Computer Science and Engineering Register No: 19BCE1367 Proposed System Diagram
  17. 17. School of Computer Science and Engineering Register No: 19BCE1367 Module 1 (M1): Image Captioning Module 2 (M2): GPT-2 reverse language modeling Module 3 (M3): Post-processing Module 3.1 (M3.1): Grammar and spelling error detection Module 3.2 (M3.2): BERT based word embeddings Module 4 (M4): Evaluation List of Modules
  18. 18. School of Computer Science and Engineering Register No: 19BCE1367 ● The vision encoder-decoder model is used via HuggingFace API which has ViT as its vision encoder model and GPT-2 as the text decoder model It is trained on the popular Common Objects in Context (COCO) dataset which contains more than 120 thousand images with their descriptions. ● The PyTorch version is used for generating the captions for the given input image. M1: Image Captioning
  19. 19. School of Computer Science and Engineering Register No: 19BCE1367 Problem: GPT2 is a forward language model as it utilizes the standard left-to-right order of tokens present in a limerick for fine-tuning. This helps in maintaining the subject’s continuity and coherency but it cannot maintain the rhyming structure of the poem. M2: GPT-2 reverse language modeling
  20. 20. School of Computer Science and Engineering Register No: 19BCE1367 ● Solution: The GPT-2 model can be fine-tuned with the corpus of reverse order (right to left) of tokens present in the limerick. This technique helps the GPT-2 model to learn the rhyming structure. ● The caption generated from the image caption model is fed into this fine- tuned reverse GPT-2 model as a seed sentence to generate limericks and a pool of 20 limericks is generated M2: GPT-2 reverse language modeling
  21. 21. School of Computer Science and Engineering Register No: 19BCE1367 M3.1: Grammar and spelling error detection - The generated limerick should be syntactically correct and in order to do so, an open-source spelling and grammar checker is used to assign scores to each limerick. The limerick with no errors are chosen for further processing. M3: Post-processing
  22. 22. School of Computer Science and Engineering Register No: 19BCE1367 ● Bidirectional Encoder Representations from Transformers (BERT) model can be used to generate in-context embeddings. ● The subject continuity is quantified throughout the limerick as the average noun centroid distance in the embedding space[5]. ● If: ○ mean = high, nouns far from the average subject of the limerick. ○ standard deviation = high, many subjects present in the limerick. ● The limericks with lowest mean and standard deviation is selected as final output.. M3: Post-processing M3.2: BERT based word embeddings
  23. 23. School of Computer Science and Engineering Register No: 19BCE1367 Automatic evaluation methods : - BLEU (Bilingual Evaluation Understudy ) score - Cosine Similarity - Semantic Similarity (using Sentence BERT) The MultiM-Poem dataset is a collection of 8292 images scraped from the Flikr and each image is mapped to a related human-written poem. The image will be the user input image and the related poem will be the ground truth. M4: Evaluation
  24. 24. School of Computer Science and Engineering Register No: 19BCE1367 What to be done next? 06
  25. 25. School of Computer Science and Engineering Register No: 19BCE1367 1. Compilation of the results. 2. Research paper completion. What to be done next?
  26. 26. School of Computer Science and Engineering Register No: 19BCE1367 Research Paper Status 07
  27. 27. School of Computer Science and Engineering Register No: 19BCE1367 1. Abstract. 2. Introduction. 3. Related work. 4. Approach. a. Architecture b. Image captioning c. Language model 5. Experiment. 6. Result 7. Conclusion and Future work. Research Paper Status
  28. 28. School of Computer Science and Engineering Register No: 19BCE1367 Guide Approval 08
  29. 29. School of Computer Science and Engineering Register No: 19BCE1367 Guide Approval mail screenshot
  30. 30. School of Computer Science and Engineering Register No: 19BCE1367 [1] Wang, H., Zhang, Y., & Yu, X. (2020). An overview of image caption generation methods. Computational intelligence and neuroscience, 2020. [2] Van de Cruys, T. (2020, July). Automatic poetry generation from prosaic text. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 2471-2480). [3] Beheitt, M. E. G., & Hmida, M. B. H. (2022). Automatic Arabic Poem Generation with GPT-2. In ICAART (2) (pp. 366-374). [4] Liu, D., Guo, Q., Li, W., & Lv, J. (2018, July). A multi-modal chinese poetry generation model. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE. [5] Lo, K. L., Ariss, R., & Kurz, P. (2022). GPoeT-2: A GPT-2 Based Poem Generator. arXiv preprint arXiv:2205.08847. References (Reference papers)
  31. 31. School of Computer Science and Engineering Register No: 19BCE1367 [6] Meyer, J. B. (2019). Generating Free Verse Poetry with Transformer Networks (Doctoral dissertation, Reed College). [7] Talafha, S., & Rekabdar, B. (2019, January). Arabic poem generation with hierarchical recurrent attentional network. In 2019 IEEE 13th International Conference on Semantic Computing (ICSC) (pp. 316-323). IEEE. [8] Gao, L., Fan, K., Song, J., Liu, X., Xu, X., & Shen, H. T. (2019, July). Deliberate attention networks for image captioning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 8320-8327). [9] Jhamtani, H., Mehta, S. V., Carbonell, J., & Berg-Kirkpatrick, T. (2019). Learning rhyming constraints using structured adversaries. arXiv preprint arXiv:1909.06743. [10] Lau, J. H., Cohn, T., Baldwin, T., Brooke, J., & Hammond, A. (2018). Deep-speare: A joint neural model of poetic language, meter and rhyme. arXiv preprint arXiv:1807.03491. References (Reference papers)
  32. 32. School of Computer Science and Engineering Register No: 19BCE1367 [11] Talafha, S., & Rekabdar, B. (2021, January). Poetry generation model via deep learning incorporating extended phonetic and semantic embeddings. In 2021 IEEE 15th International Conference on Semantic Computing (ICSC) (pp. 48-55). IEEE. [12] Min, K., Dang, M., & Moon, H. (2021). Deep Learning-Based Short Story Generation for an Image Using the Encoder-Decoder Structure. IEEE Access, 9, 113550-113557. [13] Zhang, D., Ni, B., Zhi, Q., Plummer, T., Li, Q., Zheng, H., ... & Wang, D. (2019, August). Through the eyes of a poet: Classical poetry recommendation with visual input on social media. In 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 333-340). IEEE. [14] Ghazvininejad, M., Shi, X., Priyadarshi, J., & Knight, K. (2017, July). Hafez: an interactive poetry generation system. In Proceedings of ACL 2017, System Demonstrations (pp. 43-48). [15] Liu, Z., Fu, Z., Cao, J., de Melo, G., Tam, Y. C., Niu, C., & Zhou, J. (2019, July). Rhetorically controlled encoder-decoder for modern chinese poetry generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1992-2001). References (Reference papers)
  33. 33. School of Computer Science and Engineering Register No: 19BCE1367 1. https://scottmduda.medium.com/generating-an-edgar-allen-poe-styled- poem-using-gpt-2-289801ded82c 2. https://timesofindia.indiatimes.com/readersblog/newtech/artificial- intelligence-in-education-39512/ 3. https://news.climate.columbia.edu/2022/04/22/haiku-ai-generated-poetry/ 4. https://towardsdatascience.com/transformers-89034557de14 5. https://github.com/minimaxir/gpt-2-simple 6. https://languagetool.org/ References (Websites and articles)
  34. 34. School of Computer Science and Engineering Register No: 19BCE1367 THANK YOU

×