"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 29, 2023

ChatGPT

  • Decoder-only model.
  • Decoder-only architecture does not have an explicit encoder to summarize / context vector the input information
  • In Decode only, input sequence is directly fed into the decoder, which generates the output sequence by attending to the input sequence through self-attention mechanisms.

Ref - Link1, Link2

Transformer Key blocks

  • In the attention step, words “look around” for other words that have relevant context and share information with one another.
  • In the feed-forward step, each word “thinks about” information gathered in previous attention steps and tries to predict the next word.


Ref - Link

What we know about transformers
"What differentiates the Transformer from its predecessors is it’s ability to learn the contextual relationship of values within a sequence through a mechanism called self-attention.

Transformers can be generally categorized into one of three categories:
- encoder onlya la BERT,
- decoder only a la GPT and
- having an encoder-decoder architecture a la T5

Ref - Link

Keep Exploring!!!

No comments: