Installing PyTorch

Installing CPU Version

If you don’t have a GPU or only need to run on CPU, you can use the following command to install PyTorch:

pip install torch torchvision torchaudio

Installing GPU Version

If you have an NVIDIA GPU, it’s recommended to use CUDA to accelerate computation. Please choose the installation command according to your CUDA version. You can visit the official website PyTorch Website to get the latest installation commands.

For example, for CUDA 11.8 version, use the following command:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

For CUDA 12.1 version, use:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

If you’re not sure about your CUDA version, you can check with the following command:

nvcc --version

Importing PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader, Dataset

torch: PyTorch’s core library, responsible for tensor operations, computational graphs, GPU acceleration, and other functions.
torch.nn: Defines neural network structures, including fully connected layers, convolutional layers, LSTM layers, etc.
torch.optim: Provides various optimizers, such as Stochastic Gradient Descent (SGD), Adam, RMSprop, etc.
torch.nn.functional: Contains many functions without weights, such as activation functions (ReLU, Sigmoid), pooling (MaxPool2d), etc.
torchvision.transforms: Used for image data augmentation and normalization, suitable for computer vision tasks.
torchvision.datasets: Built-in with multiple common image datasets, such as MNIST, CIFAR-10, which can be downloaded directly.
torch.utils.data.Dataset: Base class for custom datasets, allowing us to define how to read data.
torch.utils.data.DataLoader: Batch loads data, supports multi-threaded loading and data shuffling.

Checking if GPU is Available

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

This code checks if a GPU is available, and falls back to CPU if not.

Creating Tensors

tensor = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
print(tensor)

Tensor is PyTorch’s core data structure, similar to NumPy arrays, but can accelerate computations.

Moving Tensors to GPU

tensor = tensor.to(device)

Use the .to(device) method to move Tensors to the specified device (CPU or GPU).

Custom Dataset

class CustomDataset(Dataset):
    def __init__(self):
        self.data = torch.randn(100, 2)  # Generate 100 random data points, each with 2 dimensions
        self.labels = torch.randn(100, 1)  # Generate 100 random labels
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

Custom Dataset allows us to define how data is stored and retrieved. This is a simple custom Dataset that generates 100 random data points, each with 2 features and 1 label.

Creating a DataLoader

dataset = CustomDataset()
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

DataLoader is used for batch loading data, batch_size=10 means 10 data points per batch, shuffle=True randomly shuffles the data.

Iterating Through a DataLoader

for batch in dataloader:
    inputs, targets = batch
    inputs, targets = inputs.to(device), targets.to(device)
    outputs = model(inputs)
    print(outputs)
    break  # Only display one batch of data

Load data in batches via DataLoader and feed it into the model for prediction.

Creating a Simple Neural Network

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(2, 4)  # First linear layer, input 2 features, output 4 features
        self.fc2 = nn.Linear(4, 1)  # Second linear layer, input 4 features, output 1 feature
    
    def forward(self, x):
        x = F.relu(self.fc1(x))  # Use ReLU activation function
        x = self.fc2(x)  # Output layer
        return x

This is a simple neural network with two fully connected layers (Linear). The first layer takes 2-dimensional data and outputs 4-dimensional data, then passes through a ReLU activation function, and finally outputs 1 value.

Creating a Model

model = SimpleNN().to(device)
print(model)

Move the model to GPU (if available) and print the model architecture.

Loss Functions and Optimizers

criterion = nn.MSELoss()  # Mean Squared Error loss function
optimizer = optim.Adam(model.parameters(), lr=0.01)  # Use Adam optimizer

Loss functions measure the error between model predictions and actual values, while optimizers adjust model parameters to minimize the loss.

Loss Functions

Loss functions measure the error between the model’s predictions and actual values. Choosing an appropriate loss function is crucial for effective model training.

Commonly Used Loss Functions for Regression Problems

Loss Function	Description
`nn.MSELoss()`	Mean Squared Error (MSE), calculates the average of squared errors between predicted and true values, suitable for regression problems.
`nn.L1Loss()`	Mean Absolute Error (MAE), calculates the average of absolute errors between predicted and true values, less sensitive to outliers than MSE.
`nn.SmoothL1Loss()`	Smooth L1 Loss, combines the advantages of L1 and L2 losses. For small errors, it behaves like MSE; for large errors, it behaves like MAE. Commonly used in robust regression in machine learning.

Commonly Used Loss Functions for Classification Problems

Loss Function	Description
`nn.CrossEntropyLoss()`	Cross-Entropy Loss, suitable for multi-class classification problems. It includes a `Softmax` layer internally, so the input should be raw logits (without `softmax` processing).
`nn.NLLLoss()`	Negative Log Likelihood Loss, typically used with `nn.LogSoftmax()`, suitable for multi-class classification problems.
`nn.BCELoss()`	Binary Cross Entropy Loss, suitable for binary classification problems, requires output values processed by `sigmoid`.
`nn.BCEWithLogitsLoss()`	A variant of `BCELoss` that includes a `sigmoid` operation internally, so it can directly use raw logits as input, improving numerical stability.

Optimizers

Optimizers adjust model parameters to minimize the value of the loss function.

Commonly Used Optimizers

Optimizer	Description
`optim.SGD(model.parameters(), lr=0.01, momentum=0.9)`	Stochastic Gradient Descent (SGD), with optional `momentum` parameter to accelerate convergence.
`optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999))`	Adam optimizer, combines the advantages of `SGD` and `RMSprop`, suitable for most deep learning problems.
`optim.RMSprop(model.parameters(), lr=0.01, alpha=0.99)`	RMSprop optimizer, suitable for recurrent neural networks (RNN) and non-stationary data learning.
`optim.AdamW(model.parameters(), lr=0.001)`	AdamW optimizer, compared to Adam, adds weight decay, suitable for models like Transformers.
`optim.Adagrad(model.parameters(), lr=0.01)`	Adaptive Gradient (Adagrad), suitable for learning with sparse data.

How to Choose Loss Functions and Optimizers?

Regression Problems: Usually use MSELoss or SmoothL1Loss, optimizers can be Adam or SGD.
Binary Classification Problems: Usually use BCELoss (or BCEWithLogitsLoss if input is not processed by sigmoid), optimizers can be Adam or SGD.
Multi-Class Classification Problems: Usually use CrossEntropyLoss, optimizers can be Adam or SGD.
Deep Learning Large Models (such as Transformers): Optimizers can be AdamW, to reduce the negative effects of L2 regularization.

Training the Model

num_epochs = 20  # Train for 20 Epochs
for epoch in range(num_epochs):  # Loop 20 times, each loop is called an epoch
    running_loss = 0.0  # Used to accumulate the loss of each batch, to calculate the average loss of the epoch
    
    for batch in dataloader:  # Iterate through the DataLoader, getting one batch of data each time
        inputs, targets = batch  # Get input data (inputs) and labels (targets)
        inputs, targets = inputs.to(device), targets.to(device)  # Move to GPU or CPU

        # Forward Pass
        outputs = model(inputs)  # Model takes data as input, produces predictions
        loss = criterion(outputs, targets)  # Calculate the error between predictions and true values
        
        # Backward Pass & Optimization
        optimizer.zero_grad()  # Clear gradients to avoid accumulation
        loss.backward()  # Backward propagation, calculate gradients
        optimizer.step()  # Update model parameters
        
        running_loss += loss.item()  # Accumulate loss to calculate average loss
    
    # Output the average loss of the current epoch
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss / len(dataloader):.4f}")

This code trains the model for 20 epochs. Each epoch traverses the entire dataset, calculates the loss, and uses backpropagation to update the model parameters.

CS Note
中

PyTorch Common Syntax Summary