Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): ChatGPT Notes - Tech Perspectives

January 08, 2023

ChatGPT Notes - Tech Perspectives

Paper - Link

Ref Link

Label output of GPT
Apply human feedback
SFT - Supervised Finetune model
Rank outputs

Label Approach

Plain: We simply ask the labelers to come up with an arbitrary task, while ensuring thetasks had sufficient diversity.
Few-shot: We ask the labelers to come up with an instruction, and multiple query/responsepairs for that instruction.
User-based: We had a number of use cases stated in waitlist applications to the OpenAIAPI. We asked labelers to come up with prompts corresponding to these use cases.

sequence of indices feeds into a Transformer
probability distribution over the next index in the sequence comes out
clever with batching (both across examples and over sequence length) for efficiency.

Key params

GPT-1-like: 12 layers, 12 heads, d_model 768 (125M)
GPT-3: 96 layers, 96 heads, with d_model of 12,288 (175B parameters).
feedforward layer four times the size of the bottleneck layer, dff = 4 ∗ dmodel

Awesome ChatGPT Prompts

GPT refers to a family of models which includes ChatGPT. text-davinci-003 is the model trained almost the same way as ChatGPT except it is not tuned for dialogue. We can use text-davinci-003 through OpenAI api today. There is a cost based on usage we need to factor when building applications.
This could be a combination of factors - one the attention mechanism helps a model to focus on the appropriate parts of the sentence for context. Also the inclusion of code in training corpus may have a role in its ability to remember the tokens mentioned in the beginning of a dialog when answering a question that comes much later. For instance code completion requires model to complete a braces that marks end of a code block or remember to use a global variable that was mentioned earlier.
There is no middleware - the entire context is present in the input token sequence that could be as long as 2048–4096 tokens

Ref - Link

Transformer models: an introduction and catalog — 2023 Edition

The encoder takes the input and encodes it into a fixed-length vector.
The decoder takes that vector and decodes it into the output sequence.
Transformers use multi-headed attention, which is a parallel computation of a specific attention function called scaled dot-product attention.

Transformer Catalog

The ChatGPT Models Family

GPT Competitors
PEER by Meta AI
LaMDA by Google AI
PaLM by Google AI

But how do you generate those cool animations?

High-Resolution Image Synthesis with Latent Diffusion Models

MusicLM: Generating Music From Text

How to train ChatGPT

Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch

Computing the Unnormalized Attention Weights
Computing the Attention Scores
Multi-Head Attention

GPT in 60 Lines of NumPy

Generative: A GPT generates text.
Pre-trained: A GPT is trained on lots of text from books, the internet, etc ...
Transformer: A GPT is a decoder-only transformer neural network

At a high level, the GPT architecture has three sections:

Text + positional embeddings
A transformer decoder stack
A projection to vocab step

Man and machine: GPT for second brains

Take a bunch of text blocks and feed them to the OpenAI embeddings API
Find the most similar vectors among my notes

Good Read - GAN

𝐈𝐦𝐩𝐫𝐨𝐯𝐞 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐯𝐢𝐭𝐲 𝐚𝐧𝐝 𝐩𝐫𝐨𝐜𝐞𝐬𝐬 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲.
𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐞 𝐯𝐚𝐥𝐮𝐞 𝐜𝐡𝐚𝐢𝐧𝐬.
𝐑𝐞𝐝𝐞𝐟𝐢𝐧𝐞 𝐭𝐡𝐞 𝐞𝐧𝐭𝐢𝐫𝐞 𝐞𝐜𝐨𝐬𝐲𝐬𝐭𝐞𝐦.

Keep Exploring!!!

No comments:

Subscribe to: Post Comments (Atom)