备份 - decalyu/HPC-AI-Resources GitHub Wiki

🚀 ✨ PC: Introduction to Large Language Models (LLMs) ✨ 🚀

🎯 Goal

🤖 Understand what a Large Language Model (LLM) is and how it can be used for text generation, question answering, and more, using Python. No prior experience needed—just bring your curiosity! 🚀

📌 What You Will Learn 🧠 💡

✅ What is a Large Language Model (LLM)?
✅ How do LLMs work?
✅ How to use pre-trained LLMs for text generation
✅ How to use LLMs for answering questions
✅ Hands-on coding with Google Colab
✅ Basics of Hugging Face library for LLMs

🤖 1. What is a Large Language Model (LLM)?

🧠 Understanding LLMs in Simple Terms

A Large Language Model (LLM) is a type of AI model that can understand and generate human-like text. It learns by analyzing huge amounts of text data, recognizing patterns, and predicting words based on context.

📌 Real-World Examples:

✅ Chatbots like Siri, Google Assistant, ChatGPT 🗣
✅ AI-powered writing assistants (Grammarly, Jasper AI) ✍
✅ Search engines predicting your queries 🔍
✅ AI-generated stories and essays 📖

🔥 2. How do LLMs work?

🛠 How Do LLMs Generate Text?

📚 Training on Large Datasets
- LLMs are trained on billions of words from books, articles, and websites.
🔍 Predicting the Next Word
- When given a sentence, the model predicts the most likely next word.
✍ Generating Coherent Responses
- By repeating this process, it forms complete sentences that make sense.

📌 Example: How AI predicts text

Input: "The sky is"
Predicted Output: "blue because of the way light scatters."

💡 *Quick Thought: Where else do you see AI predicting text? Jot down your ideas! 📝 *

🔧 3. How to use pre-trained LLMs for text generation

💻 Step 1: Open Google Colab

1️⃣ Open Google Colab
2️⃣ Click + New Notebook

📚 Step 2: Import Required Libraries

➕ 🐍 Add a New Code Cell

1️⃣ Click + Code in the top left to add a new code cell.
2️⃣ Copy and paste the following code into the new code cell.

🔗 ChatGPT prompt for generating the code


# Import the pipeline function from the transformers library
from transformers import pipeline

# Load a pre-trained text generation pipeline
# The 'text-generation' pipeline uses a model like GPT-2 by default
text_generator = pipeline("text-generation")

# Define a prompt for text generation
prompt = "Once upon a time, in a distant kingdom,"

# Generate text based on the prompt
# The max_length parameter controls the output length
output = text_generator(prompt, max_length=50)

# Print the generated text
print(output)

🔗 ChatGPT explanation for the code

3️⃣ Click Run (▶) and check the output!

✅ Libraries imported successfully! You’re now ready to use the Hugging Face pipeline. 🚀 🎉

🧠 Step 3: Load a Pre-Trained LLM

💡 We will use GPT-2, a popular model that generates human-like text.

➕ 🐍 Add a New Code Cell

1️⃣ Click + Code in the top left to add a new code cell.
2️⃣ Copy and paste the following code into the new code cell.

🔗 ChatGPT prompt for generating this code


# First, we need to install the transformers and torch libraries if they are not already installed.
# Uncomment and run the following line if you haven't installed them:
# !pip install transformers torch

# Import necessary libraries
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Step 1: Load the pre-trained GPT-2 model and tokenizer
# GPT2Tokenizer is used for encoding input text and GPT2LMHeadModel is used for text generation.

# Load pre-trained GPT-2 tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Load pre-trained GPT-2 model
model = GPT2LMHeadModel.from_pretrained("gpt2")

# Step 2: Encode the input text
# Tokenize the input text using the tokenizer. 
# The model expects input in tokenized form, so the text is converted into token IDs.
input_text = "Once upon a time, there was a brave knight"
input_ids = tokenizer.encode(input_text, return_tensors='pt')

# Step 3: Generate text using the model
# The model generates a continuation of the input text based on the provided input.
# Here, 'max_length' specifies the maximum length of the generated text.
output = model.generate(input_ids, max_length=100, num_return_sequences=1, no_repeat_ngram_size=2, top_p=0.95, temperature=0.7)

# Step 4: Decode the generated output
# After the model generates the token IDs, we decode them back to text.
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

# Step 5: Display the generated text
print("Generated text:\n", generated_text)

🔗 ChatGPT explanation for the code

3️⃣ Click Run (▶) and check the output!

✅ Pre-trained LLM loaded successfully! You're now ready to generate text using GPT-2. 🧠 🚀 🎉

📝 Step 4: Generate Text Using the LLM

➕ 🐍 Add a New Code Cell

1️⃣ Click + Code in the top left to add a new code cell.
2️⃣ Copy and paste the following code into the new code cell.

🔗 ChatGPT prompt for generating this code


# Install the necessary libraries (uncomment if not installed)
# !pip install transformers torch

# Import required libraries from Hugging Face's transformers
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Define the input text prompt
input_prompt = "Once upon a time in a land far away"

# Set the maximum length for the generated text
max_length = 100  # You can adjust this to control the length of the output text

# Load the pre-trained GPT-2 model and tokenizer
# GPT-2 is a popular language model trained by OpenAI
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

# Encode the input text prompt into tokens that the model understands
# This converts the input string into a format that can be fed into the model
input_ids = tokenizer.encode(input_prompt, return_tensors='pt')

# Generate text using the model based on the input prompt
# 'max_length' specifies the maximum length of the output sequence (including the input prompt)
generated_output = model.generate(input_ids, max_length=max_length, num_return_sequences=1, no_repeat_ngram_size=2, temperature=0.7)

# Decode the generated tokens back into human-readable text
# This step converts the token IDs back into a string
generated_text = tokenizer.decode(generated_output[0], skip_special_tokens=True)

# Print the generated text
print("Generated Text:\n", generated_text)

🔗 ChatGPT explanation for the code

3️⃣ Click Run (▶) and check the output!

✅ Text generated successfully! You should now see a creative continuation of your prompt. 📝 ✨ 🎉

🎯 Challenge: Try changing the prompt to something funny or mysterious!🔥

💡 Extra Tip: Edit the prompt_text = "..." line and try your own creative ideas. For example, start with “In a haunted bakery...” or “The cat who ruled the internet...”

🤖 4. How to use LLMs for answering questions

💡 LLMs can also answer questions based on a given context!

➕ 🐍 Add a New Code Cell

1️⃣ Click + Code in the top left to add a new code cell.
2️⃣ Copy and paste the following code into the new code cell.

🔗 ChatGPT prompt for generating this code


# Step 1: Install necessary libraries (if not installed yet)
# You can run these commands in your terminal or Jupyter notebook cell
# !pip install transformers torch

# Step 2: Import necessary libraries
from transformers import pipeline

# Step 3: Load the question-answering pipeline using the 'distilbert-base-uncased-distilled-squad' model
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

# Step 4: Define the context (the passage that contains information to answer the question)
context = """
Hugging Face is a company that focuses on Natural Language Processing (NLP). They have developed a popular library called transformers, 
which provides pre-trained models for various NLP tasks, including text generation, sentiment analysis, and question answering. 
Hugging Face aims to democratize AI by making these models accessible to the wider community.
"""

# Step 5: Define the question you want to ask
question = "What does Hugging Face do?"

# Step 6: Use the QA pipeline to answer the question based on the context
answer = qa_pipeline(question=question, context=context)

# Step 7: Print the answer
print(f"Question: {question}")
print(f"Answer: {answer['answer']}")

🔗 ChatGPT explanation for the code

3️⃣ Click Run (▶) and check the output!

✅ Question answered successfully! You should now see the model's response based on the given context. 🤖 🎉

🎯 5. Wrap-Up & Next Steps

🎉 Congratulations! Today you learned how to:

✅ Use LLMs for text generation ✍
✅ Use LLMs for question answering 💬
✅ Work with Hugging Face models 🤖

🚀 Next Workshop: 📚 LLM + RAG (AI-Powered Search)

🔗 Additional AI Resources 📚

Google Colab Guide
Python Introduction (Use only the two green buttons “Previous” and “Next” to navigate the tutorial and avoid ads.)
AI for Beginners (Microsoft)
What is LLM (Large Language Model)?

🎉 Keep learning AI, and see you at the next workshop! 🚀

📝 Workshop Feedback Survey

Thanks for completing this workshop!🎆

We'd love to hear what you think so we can make future workshops even better. 💡

📌 Survey link

备份 - decalyu/HPC-AI-Resources GitHub Wiki

🚀 ✨ PC: Introduction to Large Language Models (LLMs) ✨ 🚀

🎯 Goal

📌 What You Will Learn 🧠 💡

🤖 1. What is a Large Language Model (LLM)?

🧠 Understanding LLMs in Simple Terms

📌 Real-World Examples:

🔥 2. How do LLMs work?

🛠 How Do LLMs Generate Text?

📚 Training on Large Datasets

🔍 Predicting the Next Word

✍ Generating Coherent Responses

🔧 3. How to use pre-trained LLMs for text generation

💻 Step 1: Open Google Colab

📚 Step 2: Import Required Libraries

➕ 🐍 Add a New Code Cell

🧠 Step 3: Load a Pre-Trained LLM

➕ 🐍 Add a New Code Cell

📝 Step 4: Generate Text Using the LLM

➕ 🐍 Add a New Code Cell

🤖 4. How to use LLMs for answering questions

➕ 🐍 Add a New Code Cell

🎯 5. Wrap-Up & Next Steps

🔗 Additional AI Resources 📚

📝 Workshop Feedback Survey