What is ChatGPT OpenAI and How it is Built: The Methodology Behind it

Dilip Kashyap
Level Up Coding
Published in
5 min readDec 31, 2022

--

Image Source: Amarujala

We’ve all heard about ChatGPT, the trending artificial intelligence program. A powerful artificial intelligence can provide you with an optimal answer to your question.

We’ll take a look at how it works — its methodology.

About ChatGPT

ChatGPT is a large language chatbot model developed by OpenAI based on GPT-3.5. It has a remarkable ability to communicate in the form of conversational dialogue and provide responses that can seem surprisingly human.

Large language models perform the task of predicting the next word in a sequence of words.

Reinforcement learning with human feedback (RLHF) is another layer of training that uses human feedback to help ChatGPT learn the ability to follow instructions and generate responses that are satisfying to humans.

Image Source: Wikipedia

ChatGPT Creator

ChatGPT was created by San Francisco-based artificial intelligence company OpenAI. OpenAI Inc. is the non-profit parent company of the for-profit OpenAI LP.

OpenAI is famous for its well-known DALL·E, a deep-learning model that generates images from text instructions called prompts.

The CEO is Sam Altman, who previously was president of Y Combinator.

Microsoft is a partner and investor in the amount of $1 billion dollars. They jointly developed the Azure AI Platform.

ChatGPT Training Model

GPT-3.5 was trained on vast amounts of code data and information from the Internet, including sources such as Reddit discussions, to help ChatGPT learn dialogue and achieve a human-like response style.

ChatGPT was also trained with human feedback (a technique called Reinforcement Learning with Human Feedback) so that the AI ​​learned what people expected when they asked a question. Training LLM in this way is revolutionary because it goes beyond just training LLM to predict the next word.

Image Source: ChatGPT

A March 2022 research paper entitled Linguistic Models of Instructional Training with Human Feedback explains why this is a breakthrough approach:

This work is motivated by our goal to increase the positive impact of large language models by training them to do what a given group of people wants them to do.

By default, language models optimize for the goal of predicting the next word, which is just a proxy for what we want these models to do.

Our results suggest that our techniques promise to make language models more useful, true, and harmless.

Augmenting language models alone will not make them better at tracking user intent.

For example, large language models can generate output that is false, toxic, or simply not useful to the user.

In other words, these models are not aligned with their users.

The engineers who built ChatGPT hired contractors (called labelers) to evaluate the outputs of the two systems, GPT-3 and the new InstructGPT (ChatGPT’s “sibling model”).

Based on the evaluation, the researchers came to the following conclusions:

Labels strongly prefer InstructGPT outputs over GPT-3 outputs.

InstructGPT models show an improvement in veracity over GPT-3.

InstructGPT shows small improvements in toxicity over GPT-3, but not bias.

The research paper concludes that the results for InstructGPT have been positive. Still, he also noted that there is room for improvement.

Overall, our results suggest that fine-tuning large language models using human preferences significantly improves their behavior across a wide range of tasks, although much work remains to be done to improve their security and reliability.

What sets ChatGPT apart from a simple chatbot is that it has been specially trained to understand human intent in a question and provide helpful, truthful and harmless answers.

Because of this training, ChatGPT may challenge certain questions and discard parts of the question that do not make sense.

Another research paper related to ChatGPT shows how they trained an AI to predict what people prefer.

The researchers noticed that the metrics used to evaluate AI output for natural language processing resulted in machines that scored well on the metrics but fell short of what humans expected.

The researchers explained the problem as follows

Many machine learning applications optimize for simple metrics that are only rough proxies for what the designer intended. This can lead to issues like YouTube referrals promoting clickbait.

The solution they proposed was to create an AI that could provide answers optimized for what people prefer.

To do this, they trained the AI ​​using datasets of human comparisons between different answers, so the machine became better at predicting what people found to be satisfactory answers.

The paper shares that the training was done by summarizing Reddit posts and also tested on news summaries.

The February 2022 research paper is called Learning to Summarize from Human Feedback.

The researchers write

In this work, we show that it is possible to significantly improve the quality of summaries by training a model to optimize for human preferences.

We collect a large, high-quality dataset of human-to-summary comparisons, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune the summarization policy using reinforcement learning.

I hope you find this article knowledgeable and in continuation of this article we will understand — What are limitations of ChatGPT OpenAI and can it replace Google in order to search queries automatically.

For more such articles please upvote, follow and share this with friends.

If you are interested to learn Google Apps Script and automate your Google Workspace ? must try this e-Book on “Google Apps Script: A Beginners Guide

Happy Learning 😁✌️

For any query, related to this article or any other technical suggestions you may send email at dilipkashyap.sd@gmail.com

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job

--

--

My goal is to share solutions to the problems I have encountered during software programming.