2: Introduction to GenAI | GenAI Learning

2.1: Key Concepts in GenAI

Mon, 01 Jan 0001 00:00:00 +0000

Key Concepts in Generative AI Concept Definition Large Language Models (LLMs) LLMs are AI models trained on vast amounts of text data. They use the Transformer architecture, which relies on attention mechanisms to process input data. Examples: GPT (Generative Pre-trained Transformer), BERT, T5. Tokenization Breaking down text into smaller units (tokens) for processing. Example: The sentence “Hello, world!” might be tokenized into ["Hello", ",", "world", "!"]. Embeddings Representing tokens as numerical (vectors) in a high-dimensional space. Embeddings capture semantic meaning (e.g., “king” - “man” + “woman” ≈ “queen”). Self-Attention/Attention Mechanism Mechanism that helps models focus on relevant words. Transformers The deep learning architecture used in LLMs. Transformers are the backbone of most modern generative models. Key components: Encoder, Decoder, and Attention Mechanism. Pre-training Training a model on a large dataset (e.g., all of Wikipedia) to learn general language patterns. Fine-tuning Adapting the pre-trained model to a specific task (e.g., sentiment analysis, chatbot). Prompt Engineering Designing effective inputs to guide model responses. Tokenization Tokenization is the process of converting text into smaller units, typically words or subwords, that can be processed by machine learning models. In natural language processing (NLP), tokens are the basic building blocks for understanding and generating language. Tokenization helps the model “understand” the text by converting it into a format that can be fed into the neural network.

2.2: Controlling GenAI Model Output

Mon, 01 Jan 0001 00:00:00 +0000

Temperature Purpose: Controls the randomness of the predictions. It’s a hyperparameter used to scale the logits (predicted probabilities) before sampling. How it works: The model computes probabilities for each token, and the temperature parameter adjusts these probabilities. Low temperature (<1.0): Makes the model more deterministic by amplifying the difference between high-probability tokens and low-probability tokens. This makes the model more likely to choose the most probable token. High temperature (>1.0): Makes the model more random by flattening the probabilities. This results in more diverse, creative, and sometimes less coherent text. Example Temperature = 0.7: The model will likely choose the more predictable or likely tokens. Temperature = 1.5: The model will take more risks, leading to more unexpected, diverse outputs. # Example of lower temperature (more deterministic) outputs = model.generate(inputs['input_ids'], max_length=50, temperature=0.7) # Example of higher temperature (more creative/random) outputs = model.generate(inputs['input_ids'], max_length=50, temperature=1.5) Top-k Sampling Purpose: Limits the number of tokens to sample from, making the generation process more efficient and sometimes more coherent. How it works: Instead of considering all possible tokens (the entire vocabulary), top-k sampling restricts the set of possible next tokens to the top-k most likely tokens based on their probability scores. k = 1: This would make the model behave deterministically, always picking the most probable token. k = 50: The model will sample from the top 50 tokens with the highest probabilities. Example Top-k = 10: The model will only consider the 10 tokens with the highest probabilities when selecting the next word. Top-k = 100: The model will consider the top 100 tokens, giving it more variety. # Example with top-k sampling (restricted to top 50 tokens) outputs = model.generate(inputs['input_ids'], max_length=50, top_k=50) Effect of Top-k: By limiting the token options to the top-k, the model’s output tends to be more controlled and less random than pure sampling from all tokens. Top-p (Nucleus Sampling) Purpose: Similar to top-k, but instead of limiting to a fixed number of tokens, top-p limits the tokens considered based on their cumulative probability. How it works: The model keeps sampling from the smallest set of tokens whose cumulative probability exceeds a threshold p (where p is between 0 and 1). This dynamic method is often referred to as nucleus sampling. p = 0.9: The model will consider the smallest set of tokens whose cumulative probability is at least 90%. This results in considering a variable number of tokens based on how steep the probability distribution is. p = 1.0: This would be equivalent to top-k sampling with k = all tokens, allowing the model to sample from all tokens. Example Top-p = 0.9: The model considers the smallest set of tokens whose combined probability is at least 90%. This prevents very unlikely tokens from being considered while still allowing more diversity. Top-p = 0.95: The model will sample from a slightly larger set of tokens. # Example with top-p (nucleus) sampling outputs = model.generate(inputs['input_ids'], max_length=50, top_p=0.9) Effect of Top-p: Nucleus sampling tends to generate more coherent and diverse text than top-k sampling, as the model is free to choose tokens from a set that dynamically adjusts based on their probabilities. Temperature, Top-k, and Top-p Combined You can combine these parameters to fine-tune the model’s output. For example:

2.3: Seeing In Action

Mon, 01 Jan 0001 00:00:00 +0000

Simple Hands On: Text Generation with GPT Let’s write some code to generate text using a pre-trained GPT model. We’ll use the transformers library by Hugging Face, which provides easy access to many pre-trained models. Step 1: Install the Required Libraries You’ll need Python installed on your machine along with the following packages: transformers (from Hugging Face) torch (PyTorch backend) pip install transformers torch Step 2: Write the Code