- Decoder-only model.
- Decoder-only architecture does not have an explicit encoder to summarize / context vector the input information
- In Decode only, input sequence is directly fed into the decoder, which generates the output sequence by attending to the input sequence through self-attention mechanisms.
Transformer Key blocks
- In the attention step, words “look around” for other words that have relevant context and share information with one another.
- In the feed-forward step, each word “thinks about” information gathered in previous attention steps and tries to predict the next word.
What we know about transformers
"What differentiates the Transformer from its predecessors is it’s ability to learn the contextual relationship of values within a sequence through a mechanism called self-attention.
Transformers can be generally categorized into one of three categories:
- encoder onlya la BERT,
- decoder only a la GPT and
- having an encoder-decoder architecture a la T5
Ref - Link
Keep Exploring!!!
No comments:
Post a Comment