2.2: Controlling GenAI Model Output

2.2: Controlling GenAI Model Output | GenAI Learning https://genai.gitpull.in/2-intro-genai/2.2-controlling-genai-output/index.html Temperature Purpose: Controls the randomness of the predictions. It’s a hyperparameter used to scale the logits (predicted probabilities) before sampling. How it works: The model computes probabilities for each token, and the temperature parameter adjusts these probabilities. Low temperature (<1.0): Makes the model more deterministic by amplifying the difference between high-probability tokens and low-probability tokens. This makes the model more likely to choose the most probable token. High temperature (>1.0): Makes the model more random by flattening the probabilities. This results in more diverse, creative, and sometimes less coherent text. Example Temperature = 0.7: The model will likely choose the more predictable or likely tokens. Temperature = 1.5: The model will take more risks, leading to more unexpected, diverse outputs. # Example of lower temperature (more deterministic) outputs = model.generate(inputs['input_ids'], max_length=50, temperature=0.7) # Example of higher temperature (more creative/random) outputs = model.generate(inputs['input_ids'], max_length=50, temperature=1.5) Top-k Sampling Purpose: Limits the number of tokens to sample from, making the generation process more efficient and sometimes more coherent. How it works: Instead of considering all possible tokens (the entire vocabulary), top-k sampling restricts the set of possible next tokens to the top-k most likely tokens based on their probability scores. k = 1: This would make the model behave deterministically, always picking the most probable token. k = 50: The model will sample from the top 50 tokens with the highest probabilities. Example Top-k = 10: The model will only consider the 10 tokens with the highest probabilities when selecting the next word. Top-k = 100: The model will consider the top 100 tokens, giving it more variety. # Example with top-k sampling (restricted to top 50 tokens) outputs = model.generate(inputs['input_ids'], max_length=50, top_k=50) Effect of Top-k: By limiting the token options to the top-k, the model’s output tends to be more controlled and less random than pure sampling from all tokens. Top-p (Nucleus Sampling) Purpose: Similar to top-k, but instead of limiting to a fixed number of tokens, top-p limits the tokens considered based on their cumulative probability. How it works: The model keeps sampling from the smallest set of tokens whose cumulative probability exceeds a threshold p (where p is between 0 and 1). This dynamic method is often referred to as nucleus sampling. p = 0.9: The model will consider the smallest set of tokens whose cumulative probability is at least 90%. This results in considering a variable number of tokens based on how steep the probability distribution is. p = 1.0: This would be equivalent to top-k sampling with k = all tokens, allowing the model to sample from all tokens. Example Top-p = 0.9: The model considers the smallest set of tokens whose combined probability is at least 90%. This prevents very unlikely tokens from being considered while still allowing more diversity. Top-p = 0.95: The model will sample from a slightly larger set of tokens. # Example with top-p (nucleus) sampling outputs = model.generate(inputs['input_ids'], max_length=50, top_p=0.9) Effect of Top-p: Nucleus sampling tends to generate more coherent and diverse text than top-k sampling, as the model is free to choose tokens from a set that dynamically adjusts based on their probabilities. Temperature, Top-k, and Top-p Combined You can combine these parameters to fine-tune the model’s output. For example: Hugo en-us