Lets use the example of the SVM loss function for a single datapoint:
We can differentiate the function with respect to the weights. For example, taking the gradient with respect to Wyi we obtain:
Notice that this is the gradient only with respect to the row of W that corresponds to the correct class. For the other rows where j != yi the gradient is: