Skip to content
Home » News » How to visualize features in a fine tuned LLM using PyTorch

How to visualize features in a fine tuned LLM using PyTorch

To visualize the features of a fine-tuned language model in PyTorch, you can use a technique called “gradient-weighted class activation mapping” (Grad-CAM). This technique allows you to visualize which parts of the input text are most important for the model’s prediction.

Here’s an example of how you can implement Grad-CAM in PyTorch:

1. First, you’ll need to choose a layer of the model whose activations you want to visualize. It’s often a good idea to choose a layer near the end of the model, since these layers tend to capture higher-level features of the input.

2. Next, you’ll need to compute the gradient of the output of the model with respect to the activations of the chosen layer. You can do this using PyTorch’s backward function.

3. Once you have the gradients, you can average them across the different channels of the activations to obtain a “gradient weight” for each position in the input text.

4. Finally, you can apply a heatmap visualization to the input text, using the gradient weights as the intensity values for the heatmap. This will show you which parts of the input text are most important for the model’s prediction.

Code

import torch
import torch.nn as nn

# Assume that we have a fine-tuned language model called "model"
# and a batch of input text called "inputs"

# Choose a layer of the model to visualize
layer = model.bert.encoder.layer[11]

# Set the model to evaluation mode
model.eval()

# Forward pass
with torch.no_grad():
outputs = model(inputs)

# Choose a class to visualize
class_idx = 0

# Compute the gradient of the output with respect to the activations of the chosen layer
outputs[0, class_idx].backward()

# Obtain the gradients of the activations
gradients = layer.weight.grad

# Average the gradients across the different channels
gradients = gradients.mean(dim=1).mean(dim=1)

# Apply a heatmap visualization to the input text
heatmap = (gradients * inputs).sum(dim=1).squeeze()

# The heatmap will have the same size as the input text, and the intensity values
# will show which parts of the input text are most important for the model's prediction

Leave a Reply

Your email address will not be published. Required fields are marked *