Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Encoder / Decoder Discussions

May 16, 2023

GPT

The GPT-2 is built using transformer decoder blocks
GPT generates one token at a time just like decoder of transformer and has causal language modeling so it is strictly decoder only model.
GPT-2 does not require the encoder part of the original transformer architecture as it is decoder-only, and there are no encoder attention blocks, so the decoder is equivalent to the encode
Next word as outputs but it is auto-regressive as each token in the sentence has the context of the previous words

BERT

BERT, on the other hand, uses transformer encoder blocks
BERT gained the ability to incorporate the context on both sides of a word to gain better results
BERT generates same number of tokens as input that can be fed to linear layer and uses masked language modeling so this is strictly encoder only model.
BERT, by contrast, is not auto-regressive. It uses the entire surrounding context all-at-once.

Decoder - pay attention to specific segments from the encoder

Keep Exploring!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)