perf/solver: don't zero fill weight gradients #90

alexandermorozov · 2016-03-23T12:19:59Z

Please verify this PR! I'm not completely sure that it's correct.

Weight gradients aren't used before they are overwritten at backpropagation
step, so initialization is redundant.

If #89 is applied, this patch improves performance 30% on
leaf-examples mnist.

hobofan · 2016-03-23T13:05:11Z

I highly doubt that it works well when removing this.

The idea of clearing the gradients at that point is that the training of each minibatch starts out with a clean slate. If we don't do that, the gradients accumulate over multiple minibatches leading to unexpected behaviour.

As an alternative the clearing of the weights could be moved to the end of the function, but that would make some diagnostics like looking at the gradients after each minibatch hard/impossible. Personally I prefer the current way between the two alternatives.

alexandermorozov · 2016-03-23T22:39:25Z

Hmm, I was under impression that weight gradients for i-th layer are recomputed in the backpropagation pass from the following data:

output of i-th layer (computed during forward pass),
output gradients of i-th layer (computed in the previous backpropagation step for i+1-th layer or provided by solver for the last layer),
weights of i-th layer if layer has weigths (computed in the previous epoch),

So weight gradients for i-th are strictly output of the computation, so their contents before computation don't matter.

I've also tried to verify it in practice, here are graphs for mlp and conv with and without this patch. It looks like there is no visible change.

Edit: changed wording a bit.

MichaelHirn · 2016-03-31T16:21:33Z

Hey @alexandermorozov, thanks for the PR and the charts 👍

Max and I agree, that it makes sense to merge the PR as it provides immediate value. If we should decide later to handle the gradients differently, it would be quite easy to put the one line back in.

I am happy to merge!

MichaelHirn · 2016-03-31T16:21:53Z

@homu r+

homu · 2016-03-31T16:21:55Z

📌 Commit 01be854 has been approved by MichaelHirn

perf/solver: don't zero fill weight gradients **Please verify this PR! I'm not completely sure that it's correct.** Weight gradients aren't used before they are overwritten at backpropagation step, so initialization is redundant. If #89 is applied, this patch improves performance 30% on `leaf-examples mnist`.

homu · 2016-03-31T16:22:10Z

⌛ Testing commit 01be854 with merge 434a0db...

homu · 2016-03-31T21:09:32Z

💥 Test timed out

Weight gradients aren't used before they are overwritten at backpropagation step, so initialization is redundant. If autumnai#89 is applied, this patch improves performance 30% on `leaf-examples mnist`.

alexandermorozov · 2016-03-31T21:32:25Z

@MichaelHirn, thanks! It's great that this PR is useful )

I've rebased this branch, it should merge now.

hobofan · 2016-04-01T07:19:22Z

@homu r+

homu · 2016-04-01T07:19:24Z

📌 Commit 6c4482c has been approved by hobofan

homu · 2016-04-01T07:19:36Z

⚡ Test exempted - status

perf/solver: don't zero fill weight gradients **Please verify this PR! I'm not completely sure that it's correct.** Weight gradients aren't used before they are overwritten at backpropagation step, so initialization is redundant. If #89 is applied, this patch improves performance 30% on `leaf-examples mnist`.

alexandermorozov mentioned this pull request Mar 23, 2016

Add API to fill tensors autumnai/collenchyma#58

Open

perf/solver: don't zero fill weight gradients

6c4482c

Weight gradients aren't used before they are overwritten at backpropagation step, so initialization is redundant. If autumnai#89 is applied, this patch improves performance 30% on `leaf-examples mnist`.

alexandermorozov force-pushed the solver-rm-zerofill branch from 01be854 to 6c4482c Compare March 31, 2016 21:29

homu merged commit 6c4482c into autumnai:master Apr 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf/solver: don't zero fill weight gradients #90

perf/solver: don't zero fill weight gradients #90

alexandermorozov commented Mar 23, 2016

hobofan commented Mar 23, 2016

alexandermorozov commented Mar 23, 2016

MichaelHirn commented Mar 31, 2016

MichaelHirn commented Mar 31, 2016

homu commented Mar 31, 2016

homu commented Mar 31, 2016

homu commented Mar 31, 2016

alexandermorozov commented Mar 31, 2016

hobofan commented Apr 1, 2016

homu commented Apr 1, 2016

homu commented Apr 1, 2016

perf/solver: don't zero fill weight gradients #90

perf/solver: don't zero fill weight gradients #90

Conversation

alexandermorozov commented Mar 23, 2016

hobofan commented Mar 23, 2016

alexandermorozov commented Mar 23, 2016

MichaelHirn commented Mar 31, 2016

MichaelHirn commented Mar 31, 2016

homu commented Mar 31, 2016

homu commented Mar 31, 2016

homu commented Mar 31, 2016

alexandermorozov commented Mar 31, 2016

hobofan commented Apr 1, 2016

homu commented Apr 1, 2016

homu commented Apr 1, 2016