GPT-3 The AI that does magic

Ravi Sankar
Aug 19, 2021
5 min read

GTP is a Generative Pre-trained Transformer (GPT) that uses deep learning to produce human-like text. The quality of the text generated by GPT-3 is so high that it is difficult to distinguish from that written by a human. It was tough to get API Key from the OpenAI team, but we did it. Here is what we learnt about the capabilities of GPT-3 as we explored it.

I was awestruck by seeing the way it was conversing with me. Enjoy!

There are many fine-tuned models of GPT-3 available via OpenAI Playground. These fine-tuned models specialize in different use cases such as classification, chatbot, summarization, question and answering, code generation, etc.

Use Cases

Let us explore three use cases of GPT-3

Grammar Correction

The Grammar correction module not only corrects grammatical errors but also works as a spellchecker.

Input Sentence

We spent last week at beach and had so much of fun on the sand and in water exploring the coast we weren't prepared for rain and came to vacation. The best parts of the trip was the opportunity to sightsee and relax.

Standard American English

We spent last week at the beach and had so much fun on the sand and in the water exploring the coast. We weren't prepared for the rain and came to the vacation. The best parts of the trip were the opportunity to sightsee and relax.

Generating Ad copies from a product description

This turns a product description into an ad copy.

Input description about the brand

Write a creative ad for the following product to run on Facebook Decathalon is a line of sports products for every available sport. The products are cheap and the best. This is the ad I wrote for Facebook aimed at teenagers

Output Ad copies from the product description

Decathalon has everything you need to be the best athlete you can be. The products are cheap and the best. Decathalon is the best sports brand. Decathalon is the best sports brand because it has everything you need to be the best athlete you can be

Answering Factual Questions

This prompt helps guide the model towards factual answering by showing it how to respond to questions that fall outside its knowledge base. Using a '?' to indicate a response to words and phrases that it doesn't know provides a natural response that seems to work better than more abstract replies.

Question: How many moons does Mars have? GPT-3 : Two, Phobos and Deimos.

Question: What is a language model? GPT-3 : A language model is a statistical model that describes the probability of a word given a context.

Question: Who is Fred Rickerson? GPT-3 : ?

The OpenAI Playground

The playground is an interface where you can configure and execute your fine-tuned models or pre-existing models

Available Fine Tune models

Q&A: This prompt creates a question + answer structure for answering questions based on existing knowledge.
Chat: Open-ended conversation with an AI assistant.
Grammar correction: This zero-shot prompt corrects sentences into standard English.
Summarize for a 2nd grader: This prompt translates difficult text into simpler concepts.
Text to command: This prompt translates text into programmatic commands.
English to French: This prompt translates English text into French.
Parse unstructured data: Create tables from the long-form text by specifying a structure and supplying some examples.
Classification: Classify items into categories via example.

You can find all fine-tuned models here

API Pricing

Pricing is based on the number of output tokens generated. Each token can be considered a word.

If you would like to build using GPT-3, get in touch with us

For the "NERD" in you!

Configurations

Models available

Davinci Good at: Complex intent, cause and effect, summarization for audience
Curie Good at: Language translation, complex classification, text sentiment, summarization
Babbage Good at: Moderate classification, semantic search classification
Ada Good at: Parsing text, simple classification, address correction, keywords

Temperature

One of the most important settings to control the output of the GPT-3 engine is the temperature. This setting controls the randomness of the generated text. A value of 0 makes the engine deterministic, which means that it will always generate the same output for a given input text. A value of 1 makes the engine take the most risks and use a lot of creativity.

Response length

You will probably notice that GPT-3 often stops in the middle of a sentence. To control how much text is generated, you can use the “Response Length” setting.

The default setting for response length is 64, which means that GPT-3 will add 64 tokens to the text, with a token being defined as a word or a punctuation mark.

Frequency and Presence penalties

“Frequency Penalty” and “Presence Penalty” sliders allow you to control the level of repetition GPT-3 is allowed in its responses.

Fine-Tuning

To start a fine-tuning task you need to follow these steps

Installation
pip install --upgrade openai

Exporting the API key

export OPENAI_API_KEY="<OPENAI_API_KEY>"

Preparing training data

Your data must be a JSONL document

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

You can use the CLI tool for preparing your data

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

This tool accepts different formats, with the only requirement that they contain a prompt and a completion column/key. You can pass a CSV, TSV, XLSX, JSON or JSONL file, and it will save the output into a JSONL file ready for fine-tuning, after guiding you through the process of suggested changes.

Creating model

Once your data is prepared you can start your fine-tuning job using the OpenAI CLI:

openai api fine_tunes.create -t <TRAIN_FILE_OR_PATH> -m <MODEL>

Running the above command does several things:

Uploads the file using the files API (or uses an already-uploaded file)

Creates a fine-tuning job

Streams events until the job are done (this often takes minutes, but can take hours if there are many jobs in the queue or your dataset is large)

Generally, it takes minutes to a couple of hours for a training job to be completed.

You can follow the progress anytime.

openai api fine_tunes.follow -i <YOUR_FINE_TUNE_JOB_ID>

To get the status of your JOB you can use this

openai api fine_tunes.get -i <YOUR_FINE_TUNE_JOB_ID>

You can cancel a JOB anytime

openai api fine_tunes.cancel -i <YOUR_FINE_TUNE_JOB_ID>

To list all of the JOBS available

openai api fine_tunes.list

Once the training is done and the model is available you can use it in the following ways

openai api completions.create -m <FINE_TUNED_MODEL> -p <YOUR_PROMPT>

You can also use this python script to get your answer.

import os
import openai
 
openai.api_key = os.getenv("OPENAI_API_KEY")
 
response = openai.Completion.create(
   engine=MODEL,
   prompt=YOUR_PROMPT,
   temperature=TEMP_VALUE,
   max_tokens=MAX_NUMBER_TOKENS,
   top_p=P_VALUE,
   frequency_penalty=FPENALITY_VALUE,
   presence_penalty=P_PENALITY_VALUE
  )

GPT-3 internal details

GPT-3 is a language model trained to predict the next word in an unsupervised manner.

GPT-3 is an autoregressive language model with 175 billion parameters.
Consists of 96 decoder transformers layers with each layer containing 96 multi-head attention heads.
Input word embedding vector is of length 12888, with context window of 2048
Alternating dense & locally banded sparse attention patterns like sparse transformers have been used.

GPT-3 is training stages

Pre-training language model The language model was trained based on the below objective where T was the set of tokens in unsupervised data {t_1,…,t_n}, k was the size of the context window, θ were the parameters of the neural network trained using stochastic gradient descent.
Fine-tuning task-specific (No gradient updating needed) 1. Zero-shot setting 2. One-shot setting 3. Few shots setting

Below is a transformer decoder block used in GPT-1, same has been used with little improvements like changing layer normalization position after the input to each transformer block in GPT-2 and the same architecture with the above-mentioned improvements are used

The above block is repeated until the specified number of times in the model.

Dataset for training GPT-3