New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf/solver: don't zero fill weight gradients #90
Conversation
I highly doubt that it works well when removing this. The idea of clearing the gradients at that point is that the training of each minibatch starts out with a clean slate. If we don't do that, the gradients accumulate over multiple minibatches leading to unexpected behaviour. As an alternative the clearing of the weights could be moved to the end of the function, but that would make some diagnostics like looking at the gradients after each minibatch hard/impossible. Personally I prefer the current way between the two alternatives. |
Hey @alexandermorozov, thanks for the PR and the charts 👍 Max and I agree, that it makes sense to merge the PR as it provides immediate value. If we should decide later to handle the gradients differently, it would be quite easy to put the one line back in. I am happy to merge! |
@homu r+ |
📌 Commit 01be854 has been approved by |
perf/solver: don't zero fill weight gradients **Please verify this PR! I'm not completely sure that it's correct.** Weight gradients aren't used before they are overwritten at backpropagation step, so initialization is redundant. If #89 is applied, this patch improves performance 30% on `leaf-examples mnist`.
💥 Test timed out |
Weight gradients aren't used before they are overwritten at backpropagation step, so initialization is redundant. If autumnai#89 is applied, this patch improves performance 30% on `leaf-examples mnist`.
01be854
to
6c4482c
Compare
@MichaelHirn, thanks! It's great that this PR is useful ) I've rebased this branch, it should merge now. |
@homu r+ |
📌 Commit 6c4482c has been approved by |
⚡ Test exempted - status |
perf/solver: don't zero fill weight gradients **Please verify this PR! I'm not completely sure that it's correct.** Weight gradients aren't used before they are overwritten at backpropagation step, so initialization is redundant. If #89 is applied, this patch improves performance 30% on `leaf-examples mnist`.
Please verify this PR! I'm not completely sure that it's correct.
Weight gradients aren't used before they are overwritten at backpropagation
step, so initialization is redundant.
If #89 is applied, this patch improves performance 30% on
leaf-examples mnist
.