There are a lot of places where we increment and decrement the retain count when we would not have to. For example, a very common use-case with PropertyLists is to just directly cast them to their correct sub type and then throw the PropertyList away (because there is nothing you can do with the PropertyList instance directly.)
The reason we do this, I guess, is mostly because all Drop
implementations force a retain count decrement. So we have to keep incrementing it to keep it correct. But if we add a flag to types that can be used to disable the decrement, then we can also get rid of the increment when we virtually just want to cast a pointer between two types.
This PR implements this by adding a boolean flag to all CF structs that is used by the Drop
impl to determine if it should CFRelease
the instance or not. Then this flag is modified in appropriate places where we cast to other types and consume ourselves.
I added some benchmarks locally. I did not want to commit them since benchmarking only works on nighty, and I did not feel like adding a feature flag for it etc. Anyhow, the following benchmarks:
#[bench]
fn bench_before(b: &mut Bencher) {
let string = CFString::from_static_string("Bar");
b.iter(|| unsafe {
string.clone()
.to_CFPropertyList()
.downcast::<_, CFString>()
.unwrap()
.disable_release();
})
}
#[bench]
fn bench_after(b: &mut Bencher) {
let string = CFString::from_static_string("Bar");
b.iter(|| unsafe {
string.clone()
.into_CFPropertyList()
.downcast_into::<_, CFString>()
.unwrap()
.disable_release();
})
}
Given the following results:
test propertylist::test::bench_after ... bench: 16 ns/iter (+/- 3)
test propertylist::test::bench_before ... bench: 73 ns/iter (+/- 14)
The final call to disable_release()
in the benchmarks are to prevent the type coming out of the unwrap()
from calling CFRelease
on drop, and thus avoid measuring that. I want to try to measure only {to,into}_CFPropertyList
, downcast
and downcast_into
. The clone
is needed because the into
version requires it and it would then not be fair to not have it on both.
I would say that speeding it up by a factor of ~4 can be well worth it? (Well, that 4 is of course no good measurement of anything, microbenchmarks etc. etc.. The benchmark mostly shows that a lot can be gained from avoiding some retain counting.)
This change is