Gradient Checking

Gradient checking is a procedure used to verify if the implementation of a neural network's backpropagation algorithm is correct. It works by approximating the gradient numerically and comparing this approximation to the gradient computed by backpropagation.

The gradient checking procedure is usually performed as follows:

Take a small step in the direction of each individual weight and bias in the network and calculate the network's output.
Estimate the gradient by calculating the difference in the output divided by the change in the weight or bias (this is just the definition of a derivative).
Compare this estimate to the gradient calculated by backpropagation.

While gradient checking is a powerful tool to ensure the correctness of a neural network implementation, it is computationally expensive and should not be used in a production model. Instead, it should be used for debugging and turned off when running the network on the full dataset.

Gradient checking, while useful, has a few limitations to keep in mind. The foremost is that it doesn't work well with functions that aren't smooth, such as ReLU or max pooling in Convolutional Neural Networks (CNNs). The reason for this is that these functions have sharp corners or discontinuities that make it tricky to calculate accurate numerical gradients.

Secondly, gradient checking is not ideal for verifying the correctness of the gradient calculations when regularization techniques are used. The reason is that these techniques often introduce additional terms into the loss function, which may not be taken into account when performing gradient checking.

Lastly, it is worth noting that gradient checking should never be used during training because it is computationally expensive. It should only be used for debugging purposes when developing the model. Once you are confident that backpropagation is implemented correctly, you should turn off gradient checking.

Despite these limitations, gradient checking is an invaluable tool for debugging neural networks. It provides a way to verify whether the backpropagation is implemented correctly, which is crucial because errors in gradient computation can lead to subtle bugs that are difficult to detect.