PyTorch Common Syntax Summary
Table of Contents
Installing PyTorch
Installing CPU Version
If you don’t have a GPU or only need to run on CPU, you can use the following command to install PyTorch:
pip install torch torchvision torchaudio
Installing GPU Version
If you have an NVIDIA GPU, it’s recommended to use CUDA to accelerate computation. Please choose the installation command according to your CUDA version. You can visit the official website PyTorch Website to get the latest installation commands.
For example, for CUDA 11.8 version, use the following command:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
For CUDA 12.1 version, use:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
If you’re not sure about your CUDA version, you can check with the following command:
nvcc --version
Importing PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader, Dataset
torch
: PyTorch’s core library, responsible for tensor operations, computational graphs, GPU acceleration, and other functions.torch.nn
: Defines neural network structures, including fully connected layers, convolutional layers, LSTM layers, etc.torch.optim
: Provides various optimizers, such as Stochastic Gradient Descent (SGD), Adam, RMSprop, etc.torch.nn.functional
: Contains many functions without weights, such as activation functions (ReLU, Sigmoid), pooling (MaxPool2d), etc.torchvision.transforms
: Used for image data augmentation and normalization, suitable for computer vision tasks.torchvision.datasets
: Built-in with multiple common image datasets, such as MNIST, CIFAR-10, which can be downloaded directly.torch.utils.data.Dataset
: Base class for custom datasets, allowing us to define how to read data.torch.utils.data.DataLoader
: Batch loads data, supports multi-threaded loading and data shuffling.
Checking if GPU is Available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
This code checks if a GPU is available, and falls back to CPU if not.
Creating Tensors
tensor = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
print(tensor)
Tensor is PyTorch’s core data structure, similar to NumPy arrays, but can accelerate computations.
Moving Tensors to GPU
tensor = tensor.to(device)
Use the .to(device)
method to move Tensors to the specified device (CPU or GPU).
Custom Dataset
class CustomDataset(Dataset):
def __init__(self):
self.data = torch.randn(100, 2) # Generate 100 random data points, each with 2 dimensions
self.labels = torch.randn(100, 1) # Generate 100 random labels
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx], self.labels[idx]
Custom Dataset
allows us to define how data is stored and retrieved.
This is a simple custom Dataset
that generates 100 random data points, each with 2 features and 1 label.
Creating a DataLoader
dataset = CustomDataset()
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)
DataLoader
is used for batch loading data, batch_size=10
means 10 data points per batch, shuffle=True
randomly shuffles the data.
Iterating Through a DataLoader
for batch in dataloader:
inputs, targets = batch
inputs, targets = inputs.to(device), targets.to(device)
outputs = model(inputs)
print(outputs)
break # Only display one batch of data
Load data in batches via DataLoader and feed it into the model for prediction.
Creating a Simple Neural Network
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(2, 4) # First linear layer, input 2 features, output 4 features
self.fc2 = nn.Linear(4, 1) # Second linear layer, input 4 features, output 1 feature
def forward(self, x):
x = F.relu(self.fc1(x)) # Use ReLU activation function
x = self.fc2(x) # Output layer
return x
This is a simple neural network with two fully connected layers (Linear). The first layer takes 2-dimensional data and outputs 4-dimensional data, then passes through a ReLU
activation function, and finally outputs 1 value.
Creating a Model
model = SimpleNN().to(device)
print(model)
Move the model to GPU (if available) and print the model architecture.
Loss Functions and Optimizers
criterion = nn.MSELoss() # Mean Squared Error loss function
optimizer = optim.Adam(model.parameters(), lr=0.01) # Use Adam optimizer
Loss functions measure the error between model predictions and actual values, while optimizers adjust model parameters to minimize the loss.
Loss Functions
Loss functions measure the error between the model’s predictions and actual values. Choosing an appropriate loss function is crucial for effective model training.
Commonly Used Loss Functions for Regression Problems
Loss Function | Description |
---|---|
nn.MSELoss() | Mean Squared Error (MSE), calculates the average of squared errors between predicted and true values, suitable for regression problems. |
nn.L1Loss() | Mean Absolute Error (MAE), calculates the average of absolute errors between predicted and true values, less sensitive to outliers than MSE. |
nn.SmoothL1Loss() | Smooth L1 Loss, combines the advantages of L1 and L2 losses. For small errors, it behaves like MSE; for large errors, it behaves like MAE. Commonly used in robust regression in machine learning. |
Commonly Used Loss Functions for Classification Problems
Loss Function | Description |
---|---|
nn.CrossEntropyLoss() | Cross-Entropy Loss, suitable for multi-class classification problems. It includes a Softmax layer internally, so the input should be raw logits (without softmax processing). |
nn.NLLLoss() | Negative Log Likelihood Loss, typically used with nn.LogSoftmax() , suitable for multi-class classification problems. |
nn.BCELoss() | Binary Cross Entropy Loss, suitable for binary classification problems, requires output values processed by sigmoid . |
nn.BCEWithLogitsLoss() | A variant of BCELoss that includes a sigmoid operation internally, so it can directly use raw logits as input, improving numerical stability. |
Optimizers
Optimizers adjust model parameters to minimize the value of the loss function.
Commonly Used Optimizers
Optimizer | Description |
---|---|
optim.SGD(model.parameters(), lr=0.01, momentum=0.9) | Stochastic Gradient Descent (SGD), with optional momentum parameter to accelerate convergence. |
optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999)) | Adam optimizer, combines the advantages of SGD and RMSprop , suitable for most deep learning problems. |
optim.RMSprop(model.parameters(), lr=0.01, alpha=0.99) | RMSprop optimizer, suitable for recurrent neural networks (RNN) and non-stationary data learning. |
optim.AdamW(model.parameters(), lr=0.001) | AdamW optimizer, compared to Adam, adds weight decay, suitable for models like Transformers. |
optim.Adagrad(model.parameters(), lr=0.01) | Adaptive Gradient (Adagrad), suitable for learning with sparse data. |
How to Choose Loss Functions and Optimizers?
- Regression Problems: Usually use
MSELoss
orSmoothL1Loss
, optimizers can beAdam
orSGD
. - Binary Classification Problems: Usually use
BCELoss
(orBCEWithLogitsLoss
if input is not processed bysigmoid
), optimizers can beAdam
orSGD
. - Multi-Class Classification Problems: Usually use
CrossEntropyLoss
, optimizers can beAdam
orSGD
. - Deep Learning Large Models (such as Transformers): Optimizers can be
AdamW
, to reduce the negative effects of L2 regularization.
Training the Model
num_epochs = 20 # Train for 20 Epochs
for epoch in range(num_epochs): # Loop 20 times, each loop is called an epoch
running_loss = 0.0 # Used to accumulate the loss of each batch, to calculate the average loss of the epoch
for batch in dataloader: # Iterate through the DataLoader, getting one batch of data each time
inputs, targets = batch # Get input data (inputs) and labels (targets)
inputs, targets = inputs.to(device), targets.to(device) # Move to GPU or CPU
# Forward Pass
outputs = model(inputs) # Model takes data as input, produces predictions
loss = criterion(outputs, targets) # Calculate the error between predictions and true values
# Backward Pass & Optimization
optimizer.zero_grad() # Clear gradients to avoid accumulation
loss.backward() # Backward propagation, calculate gradients
optimizer.step() # Update model parameters
running_loss += loss.item() # Accumulate loss to calculate average loss
# Output the average loss of the current epoch
print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss / len(dataloader):.4f}")
This code trains the model for 20 epochs. Each epoch traverses the entire dataset, calculates the loss, and uses backpropagation to update the model parameters.
Disclaimer: All reference materials on this website are sourced from the internet and are intended for learning purposes only. If you believe any content infringes upon your rights, please contact me at csnote.cc@gmail.com, and I will remove the relevant content promptly.
Feedback Welcome: If you notice any errors or areas for improvement in the articles, I warmly welcome your feedback and corrections. Your input will help this blog provide better learning resources. This is an ongoing process of learning and improvement, and your suggestions are valuable to me. You can reach me at csnote.cc@gmail.com.