2.1.4: Transformers | GenAI Learning

2.1.4: Transformers | GenAI Learning https://genai.gitpull.in/2-intro-genai/2.1-key-concept/4-transformers/index.html Transformer Architecture The Transformer Model is the core architecture behind most modern NLP and GenAI models like GPT, BERT, and LLaMA. Here’s how it works: Key Components of the Transformer Self-Attention Mechanism: This allows the model to focus on different words in a sentence when processing each word. For example, when processing the word “bank” in the sentence “I went to the bank to withdraw money,” the model can focus on the context to determine if “bank” refers to a financial institution or the side of a river. Multi-Head Attention: This technique allows the model to focus on different aspects of the sentence simultaneously, using multiple attention heads to capture different relationships between words. Positional Encoding: Since transformers don’t inherently understand the order of words (like sequential models), positional encoding is added to provide information about the position of words in a sentence. Encoder-Decoder Architecture: Encoder: Processes the input data (e.g., a sentence). Decoder: Generates the output data (e.g., a translation of the sentence). Both the Transformer Encoder and Decoder consist of: Hugo en-us