All Questions

Tagged with
10 questions
Filter by
Sorted by
Tagged with
4
votes
1answer
59 views

How is the Jacobian a generalisation of the gradient?

I came across these slides Natural Language Processing with Deep Learning CS224N/Ling284, in the context of natural language processing, which talk about the Jacobian as a generalization of the ...
1
vote
1answer
114 views

Understanding the derivation of the first-order model-agnostic meta-learning

According to the authors of this paper, to improve the performance, they decided to drop backward pass and using a first-order approximation I found a blog which discussed how to derive the math ...
1
vote
0answers
33 views

Is Gradient Descent algorithm a part of Calculus of Variations?

As in https://en.wikipedia.org/wiki/Calculus_of_variations The calculus of variations is a field of mathematical analysis that uses variations, which are small changes in functions and ...
0
votes
1answer
32 views

Backpropagation: Chain Rule to the Third Last Layer

I'm trying to solve dLoss/dW1. The network is as in picture below with identity activation at all neurons: Solving dLoss/dW7 is simple as there's only 1 way to output: $Delta = Out-Y$ $Loss = abs(...
4
votes
3answers
194 views

Which function $(\hat{y} - y)^2$ or $(y - \hat{y})^2$ should I use to compute the gradient?

The MSE can be defined as $(\hat{y} - y)^2$, which should be equal to $(y - \hat{y})^2$, but I think their derivative is different, so I am confused of what derivative will I use for computing my ...
6
votes
2answers
483 views

How is local minima possible in gradient descent?

Gradient descent works on the equation of mean squared error, which is an equation of a parabola $y=x^2$. We often say that weight adjustment in a neural network by gradient descent algorithm can hit ...
5
votes
2answers
109 views

Are on-line backpropagation iterations perpendicular to the constraint?

Raul Rojas' Neural Networks A Systematic Introduction, section 8.1.2 relates off-line backpropagation and on-line backpropagation with Gauss-Jacobi and Gauss-Seidel methods for finding the ...
2
votes
1answer
96 views

Weight Normalization paper

I am trying to dissect paper about weight normalization: https://papers.nips.cc/paper/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks.pdf ...
3
votes
0answers
608 views

How to calculate gradient of filter in convolution network

I have similar architecture like in image:CNN. I don't understand how to calculate gradient of filter F. I found these equations(source): Gradient and delta, where first equation calculate gradient ...
9
votes
2answers
2k views

Is the mean-squared error always convex in the context of neural networks?

Multiple resources I referred to mention that MSE is great because it's convex. But I don't get how, especially in the context of neural networks. Let's say we have the following: $X$: training ...