# All Questions

Tagged withmath
gradient-descent

10 questions

**4**

votes

**1**answer

59 views

### How is the Jacobian a generalisation of the gradient?

I came across these slides Natural Language Processing with Deep Learning CS224N/Ling284, in the context of natural language processing, which talk about the Jacobian as a generalization of the ...

**1**

vote

**1**answer

114 views

### Understanding the derivation of the first-order model-agnostic meta-learning

According to the authors of this paper, to improve the performance, they decided to drop backward pass and using a first-order approximation I found a blog which discussed how to derive the math ...

**1**

vote

**0**answers

33 views

### Is Gradient Descent algorithm a part of Calculus of Variations?

As in https://en.wikipedia.org/wiki/Calculus_of_variations The calculus of variations is a field of mathematical analysis that uses variations, which are small changes in functions and ...

**0**

votes

**1**answer

32 views

### Backpropagation: Chain Rule to the Third Last Layer

I'm trying to solve dLoss/dW1. The network is as in picture below with identity activation at all neurons: Solving dLoss/dW7 is simple as there's only 1 way to output: $Delta = Out-Y$ $Loss = abs(...

**4**

votes

**3**answers

194 views

### Which function $(\hat{y} - y)^2$ or $(y - \hat{y})^2$ should I use to compute the gradient?

The MSE can be defined as $(\hat{y} - y)^2$, which should be equal to $(y - \hat{y})^2$, but I think their derivative is different, so I am confused of what derivative will I use for computing my ...

**6**

votes

**2**answers

483 views

### How is local minima possible in gradient descent?

Gradient descent works on the equation of mean squared error, which is an equation of a parabola $y=x^2$. We often say that weight adjustment in a neural network by gradient descent algorithm can hit ...

**5**

votes

**2**answers

109 views

### Are on-line backpropagation iterations perpendicular to the constraint?

Raul Rojas' Neural Networks A Systematic Introduction, section 8.1.2 relates off-line backpropagation and on-line backpropagation with Gauss-Jacobi and Gauss-Seidel methods for finding the ...

**2**

votes

**1**answer

96 views

### Weight Normalization paper

I am trying to dissect paper about weight normalization: https://papers.nips.cc/paper/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks.pdf ...

**3**

votes

**0**answers

608 views

### How to calculate gradient of filter in convolution network

I have similar architecture like in image:CNN. I don't understand how to calculate gradient of filter F. I found these equations(source): Gradient and delta, where first equation calculate gradient ...

**9**

votes

**2**answers

2k views

### Is the mean-squared error always convex in the context of neural networks?

Multiple resources I referred to mention that MSE is great because it's convex. But I don't get how, especially in the context of neural networks. Let's say we have the following: $X$: training ...