a paper review. This presentation introduces Abductive Commonsense Reasoning which is the published paper in ICLR 2020. In this paper, the authors use commonsense to generate plausible hypotheses. They generate new data set 'ART' and propose new models for 'aNLI', 'aNLG' using BERT, and GPT.]]>

a paper review. This presentation introduces Abductive Commonsense Reasoning which is the published paper in ICLR 2020. In this paper, the authors use commonsense to generate plausible hypotheses. They generate new data set 'ART' and propose new models for 'aNLI', 'aNLG' using BERT, and GPT.]]>

ELECTRA which is a language modeling algorithm.]]>

ELECTRA which is a language modeling algorithm.]]>

XLnet RoBERTa Reformer]]>

XLnet RoBERTa Reformer]]>

This slide introduces transformer-xl which is the base paper for xl-net. You can understand what is the major contribution of this paper using this slide. This slide also explains the transformer for comparing differences between transformer and transformer-xl. Happy NLP!]]>

This slide introduces transformer-xl which is the base paper for xl-net. You can understand what is the major contribution of this paper using this slide. This slide also explains the transformer for comparing differences between transformer and transformer-xl. Happy NLP!]]>

A brief survey of face recognition.]]>

A brief survey of face recognition.]]>

Generative Adversarial Networks]]>

Generative Adversarial Networks]]>

deep learning study 3]]>

deep learning study 3]]>

deep learning study 2]]>

deep learning study 2]]>

Deep learning study 1. this slide includes basic mathematical theorems for deep learning, such as Bayes's theorem, Bayesian inference, information theorem.]]>

Deep learning study 1. this slide includes basic mathematical theorems for deep learning, such as Bayes's theorem, Bayesian inference, information theorem.]]>

explain backpropagation with a simple example. normally, we use cross-entropy as loss function. and we set the activation function of the output layer as the logistic sigmoid. because we want to maximize (log) likelihood. (or minimize negative (log) likelihood), and we suppose that the function is a binomial distribution which is the maximum entropy function in two-class classification. but in this example, we set the loss function (objective function or cost function) as sum of square, which is normally used in logistic regression, for simplifying the problem.]]>

explain backpropagation with a simple example. normally, we use cross-entropy as loss function. and we set the activation function of the output layer as the logistic sigmoid. because we want to maximize (log) likelihood. (or minimize negative (log) likelihood), and we suppose that the function is a binomial distribution which is the maximum entropy function in two-class classification. but in this example, we set the loss function (objective function or cost function) as sum of square, which is normally used in logistic regression, for simplifying the problem.]]>

]]>

]]>

]]>

]]>