1. Understanding LLMs & Text Generation

1. Understanding LLMs & Text Generation | GenAI Learning https://genai.gitpull.in/3-llm-and-text-gen/index.html How LLMs Generate Text LLMs don’t “think” like humans. They predict the most probable next word (token) based on previous words. Step 1: Convert Text to Tokens Example (Word-based tokenization): Sentence: "The cat sat on the mat." Tokens: ["The", "cat", "sat", "on", "the", "mat", "."] Example (Sub-word tokenization, used in LLaMA models): Sentence: "Artificial intelligence" Tokens: ["Art", "ificial", "intelli", "gence"] Why sub-word tokenization? Handles new words by breaking them into smaller known parts. Reduces vocabulary size, improving efficiency. Step 2: Assign Probability to Next Token Example: Predicting the next token for the phrase: "The capital of France is" Hugo en-us