Bonus. Exploring the Softmax Function¶
In lecture and tutorial, we explored how the sigmoid function $\sigma(z) = \dfrac{1}{1 + e^{-z}}$ is used in single-class and multi-class classification problems. Yet, in the case of multi-class classification problems, a common alternative is to use the softmax function instead.
The softmax function is defined as follows, where $K$ is the number of classes:
$$\text{softmax}(\boldsymbol{z}) = \begin{bmatrix} \dfrac{e^{z_1}}{\sum_{j=1}^K e^{z_j}} \\ \dfrac{e^{z_2}}{\sum_{j=1}^K e^{z_j}} \\ \vdots \\ \dfrac{e^{z_K}}{\sum_{j=1}^K e^{z_j}} \\ \end{bmatrix}$$In Pytorch, these two functions are implemented in torch.sigmoid
and torch.softmax
respectively.
Your task:
- Investigate the relationship between the softmax function and the sigmoid function when $K = 2$. To showcase your understanding, complete the function
mysoftmax
that computes the softmax of tensorz
only relying on thetorch.sigmoid
function (and a minimal amount of other operations). - Work out the derivative of the softmax function. Again, to showcase your understanding, complete the function
mysoftmax_grad
that computes the derivative of the softmax of tensorz
. You may use thetorch.softmax
function, but you should use your own formula to compute the derivative instead of usingtorch.autograd.functional.jacobian()
. (Hint: Recall that $\sigma'(x) = \sigma(x) (1 - \sigma(x))$. You should expect to get something very similar!) - In the case of multi-class classification (when $K > 2$), under what scenarios would you consider using the softmax function instead of the sigmoid function, or vice versa?
Submission: Send me a screenshot of your mysoftmax
+ mysoftmax_grad
implementation and the writeup before/during the tutorial (for bonus EXP)!
You may check out Pytorch documentation for the functions manipulating a torch.tensor
. Work out the equation on paper first before coding, and do some googling if you are stuck.
P.S. If no one solves all three tasks, I will still give out bonus EXP to those who solved at least 2.
import torch
def mysoftmax(z):
"""
Computes the softmax of a two-dimensional tensor z across dimension 1.
Same as the problem sets, your solution must not involve any iteration.
"""
return z
# Sample test case.
torch.manual_seed(2109)
z = torch.randn(5, 2)
z_correct = z.clone().detach()
z_correct = torch.softmax(z_correct, dim=1)
print("Softmax of z:\n", z_correct)
z_test = z.clone().detach()
z_test = mysoftmax(z_test)
print("Softmax of z (implemented using sigmoid):\n", z_test)
assert z_test.shape == z_correct.shape, "Output shape does not match"
assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1)), \
"Output does not match"
# Large test case. Please make sure you pass this.
torch.manual_seed(3264)
z_eval = torch.randn(100, 2)
z_eval_correct = z_eval.clone().detach()
z_eval_correct = torch.softmax(z_eval_correct, dim=1)
z_eval_test = z_eval.clone().detach()
z_eval_test = mysoftmax(z_eval_test)
assert torch.all(torch.isclose(z_eval_test, z_eval_correct), dim=(0,1)), \
"Output does not match"
print("Large test case passed. Congratulations!")
def mysoftmax_grad(z):
"""
Computes the softmax derivative of a two-dimensional tensor z across dimension 1.
It should return a three-dimensional tensor.
The (i,j,k)-th entry of the output tensor should contain the derivative of softmax(z_i)_j with respect to (z_i)_k.
Same as the problem sets, your solution must not involve any iteration.
"""
return z
# Sample test case.
torch.manual_seed(2109)
z = torch.randn(2, 3)
softmax_fn = lambda z: torch.softmax(z, dim=1)
z_correct = z.clone().detach().requires_grad_(True)
z_correct = torch.autograd.functional.jacobian(softmax_fn, z_correct)
z_correct = z_correct.diagonal(dim1=0, dim2=2).permute((2, 0, 1))
print("Softmax derivative of z:\n", z_correct)
z_test = z.clone().detach()
z_test = mysoftmax_grad(z_test)
print("Softmax derivative of z (from the formula):\n", z_test)
assert z_test.shape == z_correct.shape, "Output shape does not match"
assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1,2)), \
"Output does not match"
# Large test case. Please make sure you pass this.
torch.manual_seed(3264)
z = torch.randn(50, 10)
softmax_fn = lambda z: torch.softmax(z, dim=1)
z_correct = z.clone().detach().requires_grad_(True)
z_correct = torch.autograd.functional.jacobian(softmax_fn, z_correct)
z_correct = z_correct.diagonal(dim1=0, dim2=2).permute((2, 0, 1))
z_test = z.clone().detach()
z_test = mysoftmax_grad(z_test)
assert z_test.shape == z_correct.shape, "Output shape does not match"
assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1,2)), \
"Output does not match"
print("Large test case passed. Congratulations!")