Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf/solver: don't zero fill weight gradients #90

Merged
merged 1 commit into from Apr 1, 2016

Conversation

alexandermorozov
Copy link
Contributor

Please verify this PR! I'm not completely sure that it's correct.

Weight gradients aren't used before they are overwritten at backpropagation
step, so initialization is redundant.

If #89 is applied, this patch improves performance 30% on
leaf-examples mnist.

@hobofan
Copy link
Member

hobofan commented Mar 23, 2016

I highly doubt that it works well when removing this.

The idea of clearing the gradients at that point is that the training of each minibatch starts out with a clean slate. If we don't do that, the gradients accumulate over multiple minibatches leading to unexpected behaviour.

As an alternative the clearing of the weights could be moved to the end of the function, but that would make some diagnostics like looking at the gradients after each minibatch hard/impossible. Personally I prefer the current way between the two alternatives.

@alexandermorozov
Copy link
Contributor Author

Hmm, I was under impression that weight gradients for i-th layer are recomputed in the backpropagation pass from the following data:

  • output of i-th layer (computed during forward pass),
  • output gradients of i-th layer (computed in the previous backpropagation step for i+1-th layer or provided by solver for the last layer),
  • weights of i-th layer if layer has weigths (computed in the previous epoch),

So weight gradients for i-th are strictly output of the computation, so their contents before computation don't matter.

I've also tried to verify it in practice, here are graphs for mlp and conv with and without this patch. It looks like there is no visible change.
mlp
conv

Edit: changed wording a bit.

@MichaelHirn
Copy link
Member

Hey @alexandermorozov, thanks for the PR and the charts 👍

Max and I agree, that it makes sense to merge the PR as it provides immediate value. If we should decide later to handle the gradients differently, it would be quite easy to put the one line back in.

I am happy to merge!

@MichaelHirn
Copy link
Member

@homu r+

@homu
Copy link
Collaborator

homu commented Mar 31, 2016

📌 Commit 01be854 has been approved by MichaelHirn

homu added a commit that referenced this pull request Mar 31, 2016
perf/solver: don't zero fill weight gradients

**Please verify this PR! I'm not completely sure that it's correct.**

Weight gradients aren't used before they are overwritten at backpropagation
step, so initialization is redundant.

If #89 is applied, this patch improves performance 30% on
`leaf-examples mnist`.
@homu
Copy link
Collaborator

homu commented Mar 31, 2016

⌛ Testing commit 01be854 with merge 434a0db...

@homu
Copy link
Collaborator

homu commented Mar 31, 2016

💥 Test timed out

Weight gradients aren't used before they are overwritten at backpropagation
step, so initialization is redundant.

If autumnai#89 is applied, this patch improves performance 30% on
`leaf-examples mnist`.
@alexandermorozov
Copy link
Contributor Author

@MichaelHirn, thanks! It's great that this PR is useful )

I've rebased this branch, it should merge now.

@hobofan
Copy link
Member

hobofan commented Apr 1, 2016

@homu r+

@homu
Copy link
Collaborator

homu commented Apr 1, 2016

📌 Commit 6c4482c has been approved by hobofan

@homu homu merged commit 6c4482c into autumnai:master Apr 1, 2016
@homu
Copy link
Collaborator

homu commented Apr 1, 2016

⚡ Test exempted - status

homu added a commit that referenced this pull request Apr 1, 2016
perf/solver: don't zero fill weight gradients

**Please verify this PR! I'm not completely sure that it's correct.**

Weight gradients aren't used before they are overwritten at backpropagation
step, so initialization is redundant.

If #89 is applied, this patch improves performance 30% on
`leaf-examples mnist`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants