Bonus. Exploring the Softmax Function

In lecture and tutorial, we explored how the sigmoid function $\sigma(z) = \dfrac{1}{1 + e^{-z}}$ is used in single-class and multi-class classification problems. Yet, in the case of multi-class classification problems, a common alternative is to use the softmax function instead.

The softmax function is defined as follows, where $K$ is the number of classes:

$$\text{softmax}(\boldsymbol{z}) = \begin{bmatrix} \dfrac{e^{z_1}}{\sum_{j=1}^K e^{z_j}} \\ \dfrac{e^{z_2}}{\sum_{j=1}^K e^{z_j}} \\ \vdots \\ \dfrac{e^{z_K}}{\sum_{j=1}^K e^{z_j}} \\ \end{bmatrix}$$

In Pytorch, these two functions are implemented in torch.sigmoid and torch.softmax respectively.

Your task:

  1. Investigate the relationship between the softmax function and the sigmoid function when $K = 2$. To showcase your understanding, complete the function mysoftmax that computes the softmax of tensor z only relying on the torch.sigmoid function (and a minimal amount of other operations).
  2. Work out the derivative of the softmax function. Again, to showcase your understanding, complete the function mysoftmax_grad that computes the derivative of the softmax of tensor z. You may use the torch.softmax function, but you should use your own formula to compute the derivative instead of using torch.autograd.functional.jacobian(). (Hint: Recall that $\sigma'(x) = \sigma(x) (1 - \sigma(x))$. You should expect to get something very similar!)
  3. In the case of multi-class classification (when $K > 2$), under what scenarios would you consider using the softmax function instead of the sigmoid function, or vice versa?

Submission: Send me a screenshot of your mysoftmax + mysoftmax_grad implementation and the writeup before/during the tutorial (for bonus EXP)!

You may check out Pytorch documentation for the functions manipulating a torch.tensor. Work out the equation on paper first before coding, and do some googling if you are stuck.

P.S. If no one solves all three tasks, I will still give out bonus EXP to those who solved at least 2.

In [3]:
import torch
In [4]:
def mysoftmax(z):
    """
    Computes the softmax of a two-dimensional tensor z across dimension 1.
    Same as the problem sets, your solution must not involve any iteration.
    """
    return z
In [5]:
# Sample test case.

torch.manual_seed(2109)
z = torch.randn(5, 2)

z_correct = z.clone().detach()
z_correct = torch.softmax(z_correct, dim=1)
print("Softmax of z:\n", z_correct)

z_test = z.clone().detach()
z_test = mysoftmax(z_test)
print("Softmax of z (implemented using sigmoid):\n", z_test)

assert z_test.shape == z_correct.shape, "Output shape does not match"
assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1)), \
    "Output does not match"
Softmax of z:
 tensor([[0.6214, 0.3786],
        [0.2190, 0.7810],
        [0.1938, 0.8062],
        [0.7362, 0.2638],
        [0.1739, 0.8261]])
Softmax of z (implemented using sigmoid):
 tensor([[ 1.6579,  1.1623],
        [-0.7738,  0.4975],
        [-0.0625,  1.3629],
        [ 1.1550,  0.1289],
        [-0.3080,  1.2504]])
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[5], line 15
     12 print("Softmax of z (implemented using sigmoid):\n", z_test)
     14 assert z_test.shape == z_correct.shape, "Output shape does not match"
---> 15 assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1)), \
     16     "Output does not match"

AssertionError: Output does not match
In [6]:
# Large test case. Please make sure you pass this.

torch.manual_seed(3264)
z_eval = torch.randn(100, 2)

z_eval_correct = z_eval.clone().detach()
z_eval_correct = torch.softmax(z_eval_correct, dim=1)

z_eval_test = z_eval.clone().detach()
z_eval_test = mysoftmax(z_eval_test)

assert torch.all(torch.isclose(z_eval_test, z_eval_correct), dim=(0,1)), \
    "Output does not match"

print("Large test case passed. Congratulations!")
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[6], line 12
      9 z_eval_test = z_eval.clone().detach()
     10 z_eval_test = mysoftmax(z_eval_test)
---> 12 assert torch.all(torch.isclose(z_eval_test, z_eval_correct), dim=(0,1)), \
     13     "Output does not match"
     15 print("Large test case passed. Congratulations!")

AssertionError: Output does not match
In [7]:
def mysoftmax_grad(z):
    """
    Computes the softmax derivative of a two-dimensional tensor z across dimension 1.
    It should return a three-dimensional tensor.
    The (i,j,k)-th entry of the output tensor should contain the derivative of softmax(z_i)_j with respect to (z_i)_k.
    Same as the problem sets, your solution must not involve any iteration.
    """
    return z
In [8]:
# Sample test case.

torch.manual_seed(2109)
z = torch.randn(2, 3)

softmax_fn = lambda z: torch.softmax(z, dim=1)

z_correct = z.clone().detach().requires_grad_(True)
z_correct = torch.autograd.functional.jacobian(softmax_fn, z_correct)
z_correct = z_correct.diagonal(dim1=0, dim2=2).permute((2, 0, 1))
print("Softmax derivative of z:\n", z_correct)

z_test = z.clone().detach()
z_test = mysoftmax_grad(z_test)
print("Softmax derivative of z (from the formula):\n", z_test)

assert z_test.shape == z_correct.shape, "Output shape does not match"
assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1,2)), \
    "Output does not match"
Softmax derivative of z:
 tensor([[[ 0.2420, -0.2115, -0.0305],
         [-0.2115,  0.2301, -0.0186],
         [-0.0305, -0.0186,  0.0491]],

        [[ 0.1892, -0.0367, -0.1525],
         [-0.0367,  0.1238, -0.0871],
         [-0.1525, -0.0871,  0.2396]]])
Softmax derivative of z (from the formula):
 tensor([[ 1.6579,  1.1623, -0.7738],
        [ 0.4975, -0.0625,  1.3629]])
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[8], line 17
     14 z_test = mysoftmax_grad(z_test)
     15 print("Softmax derivative of z (from the formula):\n", z_test)
---> 17 assert z_test.shape == z_correct.shape, "Output shape does not match"
     18 assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1,2)), \
     19     "Output does not match"

AssertionError: Output shape does not match
In [9]:
# Large test case. Please make sure you pass this.

torch.manual_seed(3264)
z = torch.randn(50, 10)

softmax_fn = lambda z: torch.softmax(z, dim=1)

z_correct = z.clone().detach().requires_grad_(True)
z_correct = torch.autograd.functional.jacobian(softmax_fn, z_correct)
z_correct = z_correct.diagonal(dim1=0, dim2=2).permute((2, 0, 1))

z_test = z.clone().detach()
z_test = mysoftmax_grad(z_test)

assert z_test.shape == z_correct.shape, "Output shape does not match"
assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1,2)), \
    "Output does not match"

print("Large test case passed. Congratulations!")
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[9], line 15
     12 z_test = z.clone().detach()
     13 z_test = mysoftmax_grad(z_test)
---> 15 assert z_test.shape == z_correct.shape, "Output shape does not match"
     16 assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1,2)), \
     17     "Output does not match"
     19 print("Large test case passed. Congratulations!")

AssertionError: Output shape does not match