SlideShare a Scribd company logo
1 of 10
Out of curiosity, I have been studying about how chatGPT works. I pleasantly
learnt that it is an innovation built on the foundation of many open-source
research works within the AI community. I refer to open-source research as a
collaborative effort among researchers, developers, and enthusiasts who work
together to advance the field of AI by sharing their work, data, and code openly.
It is built on several published research works across the AI community. To
name a few predominant ones – Transformers paper "Attention Is All You
Need" by Vaswani et al. in 2017, GPT series of papers by Radford et al. at
OpenAI, Deep reinforcement learning from human preferences by Christiano et
al. at OpenAI between 2017(v1) and 2023(v4), etc.
ChatGPT is built on the foundation of many open-source AI projects, including
deep learning frameworks like PyTorch, TensorFlow, and Keras. These
frameworks allow developers to build and train neural networks, which are the
foundation of ChatGPT's ability to understand and generate natural language.
ChatGPT also relies on the Hugging Face Transformers library, which is an
open-source library for building and using Transformer-based models for
natural language processing tasks
Moreover, ChatGPT is trained on large datasets of text, which are also made
available to the research community as open-source resources. The training
data for ChatGPT includes massive amounts of text from sources such as
books, articles, and websites, which are preprocessed and made available for
use by other researchers and developers.
This approach has allowed for rapid progress in the field of AI and has made it
possible to build powerful language models like ChatGPT that can understand
and generate natural language with remarkable accuracy and fluency.
My attempt is to share my learning and understanding in order to
• develop enough intuition towards a fair understanding of how these
components fit together to achieve such a marvel.
References:
• Introducing ChatGPT (openai.com)
• InstructGPT paper (OpenAI): 2203.02155.pdf (arxiv.org)
Let’s dive into the
details!
The significance of deep learning in
contemporary AI lies in its ability to perform
tasks that were previously difficult or
impossible for traditional machine learning
algorithms. Deep learning has been used to
improve image and speech recognition,
natural language processing, and
autonomous driving, among other
applications. It has also enabled the
development of advanced AI systems, such as
AlphaGo, which beat human champions at the
game of Go.
Importantly, Deep learning is a universal
function approximator. This means that a
deep neural network with a sufficient number
of parameters can approximate any function,
including highly nonlinear and complex ones,
to an arbitrary degree of accuracy.
One of the key advantages of deep learning is
its ability to learn features automatically
from raw data, which can save time and
effort in feature engineering. Additionally,
deep learning models can continue to
improve their performance as they are
exposed to more data, making them
particularly useful in applications where data
is abundant. As a result, deep learning has
become a powerful tool for solving complex
problems and driving innovation in AI.
Image source: Deep learning - Wikipedia
Paper: [1706.03762] Attention Is All You Need (arxiv.org)
Transformer architecture is arguably one of the most
impactful research papers in the last few years. It has
disrupted almost all subdomains of cognitive AI like natural
language processing (NLP) tasks such as machine translation,
question answering, language understanding, etc., computer
vision tasks such as image classification, object detection, etc.,
speech processing tasks like Automatic Speech Recognition
(ASR), diarization, etc., to reinforcement learning like
TransformRL.
The Transformer architecture is a type of neural network that
uses self-attention mechanisms to process sequential data,
such as natural language. Instead of using recurrent or
convolutional layers, the Transformer network consists of an
encoder and a decoder, both composed of multiple layers of
self-attention and feedforward neural networks.
Intuitively, The self-attention mechanism allows a neural
network to dynamically focus on different parts of the
input data by computing the importance of each element
(such as word in a sentence) based on its relationship with
all the other elements. This enables the network to process
sequences of data effectively and adaptively, without relying
on a fixed processing order.
Papers: language_understanding_paper.pdf (openai.com),
Language Models are Unsupervised Multitask Learners
(openai.com)
[2005.14165] Language Models are Few-Shot Learners (arxiv.org)
Then, comes the simple yet powerful and scalable idea of
self-supervised learning. In this setup, the ML algorithm
learns from unlabeled data by predicting certain aspects of
the data, such as the next word in a sentence. This approach
enables the development of models that can generalize well
to new domains and tasks, without the need for labeled data.
GPT, GPT 2 and GPT 3 applies this technique on hundreds of
billions of tokens (read sub-words loosely) crawled on the
Internet data to create what is called a base Language Model
(LM). For training, only Decoder component of Transformer
is employed in auto-regressive manner. Intuitively, it means
that the model is asked to predict the next word or
sequence of words given a context of preceding words
from a corpus of text data and the process repeats over
the humongous training data such as books, articles,
websites, without any explicit supervision or labels from the
training data.
Importantly, the decoder implements a masked attention
which intuitively means that only the past tokens are used for
causal self-attention and the future tokens are masked during
the attention calculation.
Source: language_understanding_paper.pdf (openai.com)
Papers: language_understanding_paper.pdf (openai.com),
Language Models are Unsupervised Multitask Learners
(openai.com)
[2005.14165] Language Models are Few-Shot Learners (arxiv.org)
As an astonishing result of this simple training approach, the
model learns what is popularly known as representation
learning i.e., generate high-quality text representations
that capture the semantic and syntactic structure of
natural language. This enables the model to perform well on
a wide range of downstream NLP tasks with minimal
additional training. The models across the GPT versions all
follow this basic approach, however, with increasing number
of model layers results in higher number of parameters, data
size, length of training time.
A critical insight on the learnings of GPT LMs reveal that they
are excellent meta and multi-task learners. As the authors
of GPT3 explained in their paper, the model demonstrates
zero-shot, one-shot and few-shot in-context learning
during inference time without any gradient updates. This
is truly mind-blowing!
Source: [2005.14165] Language Models are Few-Shot Learners (arxiv.org)
Paper: [2203.02155] Training language models to follow instructions
with human feedback (arxiv.org)
Next, several humans (referred to as labelers) are engaged from
different domains to create labelled data for different tasks. The
labelers are hired following a screening test which is mentioned
in the precursor of chatGPT called InstructGPT (see paper above).
During this process, a labeler is shown a prompt from the prompt
dataset. The labeler demonstrates the desired output. This
prompt + labeler response is used as a supervised dataset. Of
course, at a much smaller scale - may be thousands.
The pre-trained auto regressive model (GPT) is used as a base to
fine-tune following the prepared supervised dataset. This is
referred to as Supervised Fine Tuning (SFT).
InstructGPT paper (OpenAI): [2203.02155] Training language models to follow instructions with human feedback (arxiv.org)
Prompt - A piece of text or a question that a user inputs to initiate a
conversation with the model. The prompt provides context for the model
to generate a response that is relevant and useful to the user.
The quality of the response generated by ChatGPT is highly dependent
on the quality and specificity of the prompt provided by the user.
Therefore, providing a clear and concise prompt can help ensure that
the model generates a response that meets the user's needs.
GPT
[Prompt,
Response]
pairs
dataset
Supervised
fine-tuning
SFT
Paper: [2203.02155] Training language models to follow
instructions with human feedback (arxiv.org)
Well, the model is kind of ready, but its responses may have
potential misalignment to human values. Examples of human
values include honesty, compassion, fairness, respect,
freedom, responsibility, and loyalty. Ensuring that AI systems
are aligned with human values and goals can help to
promote ethical and responsible use of AI and avoid
potential negative consequences, such as bias or
unintended harm. As one can appreciate, this is quite a
challenging task for algorithms to learn about. To approach
this quite open ended and challenging set of issues,
reinforcement learning from human feedback (RLHF) is
used. RLHF is a more recent approach that extends the
reward model to incorporate feedback from humans. The
idea is to provide a way for humans to give feedback to the AI
system about whether its actions align with their values and
preferences. The AI system can then use this feedback to
adjust its behaviour and improve its alignment with human
values over time. This reward model(𝑟𝜃 ) tends to assign
higher reward (a scalar value) to the generated text if it is
better aligned with human values.
The Reward model (𝑟𝜃) is implemented by taking the SFT
model and modifying it by replacing the unembedding layer
with one that outputs a numerical value (as scalar reward).
This reward can be used to assess the quality of the response.
InstructGPT paper (Open AI)
:[2203.02155] Training language
models to follow instructions with
human feedback (arxiv.org)
– Labeling interface
Loss function for the reward model:
Intuition of the loss function is to compare two possible predictions and try to make the one that
labelers thought was better to have a higher score. This formula uses the dataset of comparisons
that labelers have already ranked for each prompt to express what the best predictions are.
In order to understand Reinforcement Learning from
Human Feedback (RLHF), let’s first understand the bare
basics of reinforcement learning system. In this type of
machine learning, the task is to learn from experience
through trial and error. Let’s take Autonomous driving as
an example to help understand the different
components of Reinforcement Learning(RL):
Environment: The environment in which the
autonomous vehicle operates, including the road,
weather, other vehicles, pedestrians, and obstacles.
State space: The set of possible states that the vehicle
can be in at any given time. This includes information
about the vehicle's speed, position, acceleration, and
other relevant sensor data.
Action space: The set of possible actions that the
vehicle can take. This includes turning the steering
wheel, applying the brakes, accelerating, and other
actions that the vehicle can perform.
Reward function: The function that evaluates the
performance of the vehicle based on a predefined set of
criteria. This includes staying within the lane,
maintaining a safe distance from other vehicles, and
reaching the destination as quickly and safely as
possible.
Policy: Part of the agent, the decision-making
algorithm that maps the current state of the vehicle to
the optimal action to take. This can be a neural network,
decision tree, or other machine learning algorithm.
Training data: The data used to train the reinforcement
learning algorithm. This includes real-world driving data,
simulated driving data, and other data sources.
Image source: https://www.oreilly.com/library/view/ros-robotics-
projects/9781783554713/ch10s02.html
With the bare basics on RL, let’s see how RL is used
along with factoring in for human preferences.
Now, the reward model we saw earlier is used in
the reinforcement learning (RL) setup. Here, the
SFT model is further fine-tuned using the reward
model. It follows a policy gradient variant called
Proximal Policy Optimization (PPO).
In the context of policy gradient method of RL
training involving language model,
• Action space is all the possible tokens from
the vocabulary of the SFT model.
• State space is the possible input token
sequences which is equivalent to size of
vocabulary ^ maximum sequence length of
input x. This is a very large state space.
• policy function takes the state from the
environment and returns the probability
distribution over actions. Here the policy (𝜋𝜙
𝑅𝐿
)
is implemented as a language model that is
initialized from the SFT model. It takes prompt
(x) as input and returns a sequence of tokens
with their probability distributions (𝜋𝜙
𝑅𝐿
(𝑦|𝑥) ).
Intuitively the objective function does the
following - the reward model output is adjusted
with the difference between the SFT model output
and the learned RL policy (using KL-Divergence).
This mitigates over-optimization of the RL and
ensures that the overall generated text is like the
SFT model however adjusted for human
preferences.
nstructGPT paper (OpenAI): 2203.02155.pdf (arxiv.org) – Labeling interface
Well, as you must have already experienced, ChatGPT behaves
like a Swiss knife. It can perform different types of tasks like
brainstorming (e.g. create a 5-point strategy to start a
company that is based on applied AI?), classification (e.g. rate
sarcasm in the text in a scale of 1=not at all, 10=extremely
sarcastic), information extraction (e.g. read all place names
from the article below), generation (e.g. write a create ad for
the following product description aimed at under 30 year
adults to run on Facebook), rewriting (e.g. rewrite the
following text to be more light-hearted), open/closed QA
(e.g. what shape is the earth, ), role play (e.g. imagine you are
a leading astronaut, explain <followed by a specific
question>), summarization (summarize the following
information for an 8th grade student), etc.
In conclusion, the power comes from how the auto-regressive
model surprisingly exhibits meta learning and multi-task
learning capabilities coupled with the grounding to human
values using the RM in an RLHF setup as we saw during our
exploration.
With the pace of advancements happening in the AI space, so
much has happened since ChatGPT. Exciting times ahead! 

More Related Content

What's hot

Praneet’s Pre On ChatGpt edited.pptx
Praneet’s Pre On ChatGpt edited.pptxPraneet’s Pre On ChatGpt edited.pptx
Praneet’s Pre On ChatGpt edited.pptxSalunke2
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬VINCI Digital - Industrial IoT (IIoT) Strategic Advisory
 
ChatGPT 101 - Vancouver ChatGPT Experts
ChatGPT 101 - Vancouver ChatGPT ExpertsChatGPT 101 - Vancouver ChatGPT Experts
ChatGPT 101 - Vancouver ChatGPT ExpertsAli Tavanayan
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersSteven Van Vaerenbergh
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesDianaGray10
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023CoriFaklaris1
 
ChatGPT Evaluation for NLP
ChatGPT Evaluation for NLPChatGPT Evaluation for NLP
ChatGPT Evaluation for NLPXiachongFeng
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!taozen
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scaleMaxim Salnikov
 
Learn Prompting with ChatGPT
Learn Prompting with ChatGPTLearn Prompting with ChatGPT
Learn Prompting with ChatGPTNikhil Gadkar
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsAdventureWorld5
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfLiming Zhu
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AIMark DeLoura
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPTannusharma26
 
Deep dive into ChatGPT
Deep dive into ChatGPTDeep dive into ChatGPT
Deep dive into ChatGPTvaluebound
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & ConcernsAjitesh Kumar
 

What's hot (20)

Praneet’s Pre On ChatGpt edited.pptx
Praneet’s Pre On ChatGpt edited.pptxPraneet’s Pre On ChatGpt edited.pptx
Praneet’s Pre On ChatGpt edited.pptx
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
 
ChatGPT 101 - Vancouver ChatGPT Experts
ChatGPT 101 - Vancouver ChatGPT ExpertsChatGPT 101 - Vancouver ChatGPT Experts
ChatGPT 101 - Vancouver ChatGPT Experts
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
 
OpenAI Chatgpt.pptx
OpenAI Chatgpt.pptxOpenAI Chatgpt.pptx
OpenAI Chatgpt.pptx
 
ChatGPT Evaluation for NLP
ChatGPT Evaluation for NLPChatGPT Evaluation for NLP
ChatGPT Evaluation for NLP
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 
ChatGPT 101.pptx
ChatGPT 101.pptxChatGPT 101.pptx
ChatGPT 101.pptx
 
Learn Prompting with ChatGPT
Learn Prompting with ChatGPTLearn Prompting with ChatGPT
Learn Prompting with ChatGPT
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AI
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
 
Deep dive into ChatGPT
Deep dive into ChatGPTDeep dive into ChatGPT
Deep dive into ChatGPT
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & Concerns
 

Similar to Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful components!

NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsHimanshu kandwal
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...ijnlc
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...kevig
 
Hot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and ThesisHot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and ThesisWriteMyThesis
 
IRJET - Mobile Chatbot for Information Search
 IRJET - Mobile Chatbot for Information Search IRJET - Mobile Chatbot for Information Search
IRJET - Mobile Chatbot for Information SearchIRJET Journal
 
About-Chatgpt-Info-vjsycoo@gmail.com.pdf
About-Chatgpt-Info-vjsycoo@gmail.com.pdfAbout-Chatgpt-Info-vjsycoo@gmail.com.pdf
About-Chatgpt-Info-vjsycoo@gmail.com.pdfVjSycoo
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingScott Faria
 
ChatGPT Shaping Tomorrow's Conversations
ChatGPT Shaping Tomorrow's ConversationsChatGPT Shaping Tomorrow's Conversations
ChatGPT Shaping Tomorrow's ConversationsFahadZafar39
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET Journal
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3AIRCC Publishing Corporation
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering SystemIRJET Journal
 
Seminar DevOPS Mohamed Nejjar SS23 03757306.pdf
Seminar DevOPS Mohamed Nejjar SS23 03757306.pdfSeminar DevOPS Mohamed Nejjar SS23 03757306.pdf
Seminar DevOPS Mohamed Nejjar SS23 03757306.pdfMohamedNejjar
 
Writing Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric ProfilingWriting Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric ProfilingGeorgeMikros3
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位eLearning Consortium 電子學習聯盟
 
ChatGPT Deck.pptx
ChatGPT Deck.pptxChatGPT Deck.pptx
ChatGPT Deck.pptxomornahid1
 

Similar to Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful components! (20)

NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
 
Hot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and ThesisHot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and Thesis
 
IRJET - Mobile Chatbot for Information Search
 IRJET - Mobile Chatbot for Information Search IRJET - Mobile Chatbot for Information Search
IRJET - Mobile Chatbot for Information Search
 
ChatGPT.pptx
ChatGPT.pptxChatGPT.pptx
ChatGPT.pptx
 
About-Chatgpt-Info-vjsycoo@gmail.com.pdf
About-Chatgpt-Info-vjsycoo@gmail.com.pdfAbout-Chatgpt-Info-vjsycoo@gmail.com.pdf
About-Chatgpt-Info-vjsycoo@gmail.com.pdf
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
ijeter35852020.pdf
ijeter35852020.pdfijeter35852020.pdf
ijeter35852020.pdf
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language Processing
 
ChatGPT Shaping Tomorrow's Conversations
ChatGPT Shaping Tomorrow's ConversationsChatGPT Shaping Tomorrow's Conversations
ChatGPT Shaping Tomorrow's Conversations
 
IRJET- Semantic Question Matching
IRJET- Semantic Question MatchingIRJET- Semantic Question Matching
IRJET- Semantic Question Matching
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
NEURAL NETWORK BOT
NEURAL NETWORK BOTNEURAL NETWORK BOT
NEURAL NETWORK BOT
 
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
 
Seminar DevOPS Mohamed Nejjar SS23 03757306.pdf
Seminar DevOPS Mohamed Nejjar SS23 03757306.pdfSeminar DevOPS Mohamed Nejjar SS23 03757306.pdf
Seminar DevOPS Mohamed Nejjar SS23 03757306.pdf
 
Writing Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric ProfilingWriting Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric Profiling
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
ChatGPT Deck.pptx
ChatGPT Deck.pptxChatGPT Deck.pptx
ChatGPT Deck.pptx
 

Recently uploaded

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful components!

  • 1. Out of curiosity, I have been studying about how chatGPT works. I pleasantly learnt that it is an innovation built on the foundation of many open-source research works within the AI community. I refer to open-source research as a collaborative effort among researchers, developers, and enthusiasts who work together to advance the field of AI by sharing their work, data, and code openly. It is built on several published research works across the AI community. To name a few predominant ones – Transformers paper "Attention Is All You Need" by Vaswani et al. in 2017, GPT series of papers by Radford et al. at OpenAI, Deep reinforcement learning from human preferences by Christiano et al. at OpenAI between 2017(v1) and 2023(v4), etc. ChatGPT is built on the foundation of many open-source AI projects, including deep learning frameworks like PyTorch, TensorFlow, and Keras. These frameworks allow developers to build and train neural networks, which are the foundation of ChatGPT's ability to understand and generate natural language. ChatGPT also relies on the Hugging Face Transformers library, which is an open-source library for building and using Transformer-based models for natural language processing tasks Moreover, ChatGPT is trained on large datasets of text, which are also made available to the research community as open-source resources. The training data for ChatGPT includes massive amounts of text from sources such as books, articles, and websites, which are preprocessed and made available for use by other researchers and developers. This approach has allowed for rapid progress in the field of AI and has made it possible to build powerful language models like ChatGPT that can understand and generate natural language with remarkable accuracy and fluency. My attempt is to share my learning and understanding in order to • develop enough intuition towards a fair understanding of how these components fit together to achieve such a marvel. References: • Introducing ChatGPT (openai.com) • InstructGPT paper (OpenAI): 2203.02155.pdf (arxiv.org) Let’s dive into the details!
  • 2. The significance of deep learning in contemporary AI lies in its ability to perform tasks that were previously difficult or impossible for traditional machine learning algorithms. Deep learning has been used to improve image and speech recognition, natural language processing, and autonomous driving, among other applications. It has also enabled the development of advanced AI systems, such as AlphaGo, which beat human champions at the game of Go. Importantly, Deep learning is a universal function approximator. This means that a deep neural network with a sufficient number of parameters can approximate any function, including highly nonlinear and complex ones, to an arbitrary degree of accuracy. One of the key advantages of deep learning is its ability to learn features automatically from raw data, which can save time and effort in feature engineering. Additionally, deep learning models can continue to improve their performance as they are exposed to more data, making them particularly useful in applications where data is abundant. As a result, deep learning has become a powerful tool for solving complex problems and driving innovation in AI. Image source: Deep learning - Wikipedia
  • 3. Paper: [1706.03762] Attention Is All You Need (arxiv.org) Transformer architecture is arguably one of the most impactful research papers in the last few years. It has disrupted almost all subdomains of cognitive AI like natural language processing (NLP) tasks such as machine translation, question answering, language understanding, etc., computer vision tasks such as image classification, object detection, etc., speech processing tasks like Automatic Speech Recognition (ASR), diarization, etc., to reinforcement learning like TransformRL. The Transformer architecture is a type of neural network that uses self-attention mechanisms to process sequential data, such as natural language. Instead of using recurrent or convolutional layers, the Transformer network consists of an encoder and a decoder, both composed of multiple layers of self-attention and feedforward neural networks. Intuitively, The self-attention mechanism allows a neural network to dynamically focus on different parts of the input data by computing the importance of each element (such as word in a sentence) based on its relationship with all the other elements. This enables the network to process sequences of data effectively and adaptively, without relying on a fixed processing order.
  • 4. Papers: language_understanding_paper.pdf (openai.com), Language Models are Unsupervised Multitask Learners (openai.com) [2005.14165] Language Models are Few-Shot Learners (arxiv.org) Then, comes the simple yet powerful and scalable idea of self-supervised learning. In this setup, the ML algorithm learns from unlabeled data by predicting certain aspects of the data, such as the next word in a sentence. This approach enables the development of models that can generalize well to new domains and tasks, without the need for labeled data. GPT, GPT 2 and GPT 3 applies this technique on hundreds of billions of tokens (read sub-words loosely) crawled on the Internet data to create what is called a base Language Model (LM). For training, only Decoder component of Transformer is employed in auto-regressive manner. Intuitively, it means that the model is asked to predict the next word or sequence of words given a context of preceding words from a corpus of text data and the process repeats over the humongous training data such as books, articles, websites, without any explicit supervision or labels from the training data. Importantly, the decoder implements a masked attention which intuitively means that only the past tokens are used for causal self-attention and the future tokens are masked during the attention calculation. Source: language_understanding_paper.pdf (openai.com)
  • 5. Papers: language_understanding_paper.pdf (openai.com), Language Models are Unsupervised Multitask Learners (openai.com) [2005.14165] Language Models are Few-Shot Learners (arxiv.org) As an astonishing result of this simple training approach, the model learns what is popularly known as representation learning i.e., generate high-quality text representations that capture the semantic and syntactic structure of natural language. This enables the model to perform well on a wide range of downstream NLP tasks with minimal additional training. The models across the GPT versions all follow this basic approach, however, with increasing number of model layers results in higher number of parameters, data size, length of training time. A critical insight on the learnings of GPT LMs reveal that they are excellent meta and multi-task learners. As the authors of GPT3 explained in their paper, the model demonstrates zero-shot, one-shot and few-shot in-context learning during inference time without any gradient updates. This is truly mind-blowing! Source: [2005.14165] Language Models are Few-Shot Learners (arxiv.org)
  • 6. Paper: [2203.02155] Training language models to follow instructions with human feedback (arxiv.org) Next, several humans (referred to as labelers) are engaged from different domains to create labelled data for different tasks. The labelers are hired following a screening test which is mentioned in the precursor of chatGPT called InstructGPT (see paper above). During this process, a labeler is shown a prompt from the prompt dataset. The labeler demonstrates the desired output. This prompt + labeler response is used as a supervised dataset. Of course, at a much smaller scale - may be thousands. The pre-trained auto regressive model (GPT) is used as a base to fine-tune following the prepared supervised dataset. This is referred to as Supervised Fine Tuning (SFT). InstructGPT paper (OpenAI): [2203.02155] Training language models to follow instructions with human feedback (arxiv.org) Prompt - A piece of text or a question that a user inputs to initiate a conversation with the model. The prompt provides context for the model to generate a response that is relevant and useful to the user. The quality of the response generated by ChatGPT is highly dependent on the quality and specificity of the prompt provided by the user. Therefore, providing a clear and concise prompt can help ensure that the model generates a response that meets the user's needs. GPT [Prompt, Response] pairs dataset Supervised fine-tuning SFT
  • 7. Paper: [2203.02155] Training language models to follow instructions with human feedback (arxiv.org) Well, the model is kind of ready, but its responses may have potential misalignment to human values. Examples of human values include honesty, compassion, fairness, respect, freedom, responsibility, and loyalty. Ensuring that AI systems are aligned with human values and goals can help to promote ethical and responsible use of AI and avoid potential negative consequences, such as bias or unintended harm. As one can appreciate, this is quite a challenging task for algorithms to learn about. To approach this quite open ended and challenging set of issues, reinforcement learning from human feedback (RLHF) is used. RLHF is a more recent approach that extends the reward model to incorporate feedback from humans. The idea is to provide a way for humans to give feedback to the AI system about whether its actions align with their values and preferences. The AI system can then use this feedback to adjust its behaviour and improve its alignment with human values over time. This reward model(𝑟𝜃 ) tends to assign higher reward (a scalar value) to the generated text if it is better aligned with human values. The Reward model (𝑟𝜃) is implemented by taking the SFT model and modifying it by replacing the unembedding layer with one that outputs a numerical value (as scalar reward). This reward can be used to assess the quality of the response. InstructGPT paper (Open AI) :[2203.02155] Training language models to follow instructions with human feedback (arxiv.org) – Labeling interface Loss function for the reward model: Intuition of the loss function is to compare two possible predictions and try to make the one that labelers thought was better to have a higher score. This formula uses the dataset of comparisons that labelers have already ranked for each prompt to express what the best predictions are.
  • 8. In order to understand Reinforcement Learning from Human Feedback (RLHF), let’s first understand the bare basics of reinforcement learning system. In this type of machine learning, the task is to learn from experience through trial and error. Let’s take Autonomous driving as an example to help understand the different components of Reinforcement Learning(RL): Environment: The environment in which the autonomous vehicle operates, including the road, weather, other vehicles, pedestrians, and obstacles. State space: The set of possible states that the vehicle can be in at any given time. This includes information about the vehicle's speed, position, acceleration, and other relevant sensor data. Action space: The set of possible actions that the vehicle can take. This includes turning the steering wheel, applying the brakes, accelerating, and other actions that the vehicle can perform. Reward function: The function that evaluates the performance of the vehicle based on a predefined set of criteria. This includes staying within the lane, maintaining a safe distance from other vehicles, and reaching the destination as quickly and safely as possible. Policy: Part of the agent, the decision-making algorithm that maps the current state of the vehicle to the optimal action to take. This can be a neural network, decision tree, or other machine learning algorithm. Training data: The data used to train the reinforcement learning algorithm. This includes real-world driving data, simulated driving data, and other data sources. Image source: https://www.oreilly.com/library/view/ros-robotics- projects/9781783554713/ch10s02.html
  • 9. With the bare basics on RL, let’s see how RL is used along with factoring in for human preferences. Now, the reward model we saw earlier is used in the reinforcement learning (RL) setup. Here, the SFT model is further fine-tuned using the reward model. It follows a policy gradient variant called Proximal Policy Optimization (PPO). In the context of policy gradient method of RL training involving language model, • Action space is all the possible tokens from the vocabulary of the SFT model. • State space is the possible input token sequences which is equivalent to size of vocabulary ^ maximum sequence length of input x. This is a very large state space. • policy function takes the state from the environment and returns the probability distribution over actions. Here the policy (𝜋𝜙 𝑅𝐿 ) is implemented as a language model that is initialized from the SFT model. It takes prompt (x) as input and returns a sequence of tokens with their probability distributions (𝜋𝜙 𝑅𝐿 (𝑦|𝑥) ). Intuitively the objective function does the following - the reward model output is adjusted with the difference between the SFT model output and the learned RL policy (using KL-Divergence). This mitigates over-optimization of the RL and ensures that the overall generated text is like the SFT model however adjusted for human preferences. nstructGPT paper (OpenAI): 2203.02155.pdf (arxiv.org) – Labeling interface
  • 10. Well, as you must have already experienced, ChatGPT behaves like a Swiss knife. It can perform different types of tasks like brainstorming (e.g. create a 5-point strategy to start a company that is based on applied AI?), classification (e.g. rate sarcasm in the text in a scale of 1=not at all, 10=extremely sarcastic), information extraction (e.g. read all place names from the article below), generation (e.g. write a create ad for the following product description aimed at under 30 year adults to run on Facebook), rewriting (e.g. rewrite the following text to be more light-hearted), open/closed QA (e.g. what shape is the earth, ), role play (e.g. imagine you are a leading astronaut, explain <followed by a specific question>), summarization (summarize the following information for an 8th grade student), etc. In conclusion, the power comes from how the auto-regressive model surprisingly exhibits meta learning and multi-task learning capabilities coupled with the grounding to human values using the RM in an RLHF setup as we saw during our exploration. With the pace of advancements happening in the AI space, so much has happened since ChatGPT. Exciting times ahead! 