Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task system overhaul #436

Merged
merged 35 commits into from May 25, 2017
Merged

Task system overhaul #436

merged 35 commits into from May 25, 2017

Conversation

carllerche
Copy link
Member

@carllerche carllerche commented Mar 31, 2017

Updated description

The primary change made in this PR is to restructure memory management and
notifications throughout the "task system" in the futures crate. It is
intended that this will have very little impact, if any, on consumers of
futures. Implementations of runtimes of futures (e.g. crates like tokio-core)
are the target for this series of changes, enabling a suite of possible
optimizations that were not previously feasible. One major use case that is
now enabled is usage of the task and executor modules in the no_std ecosystem.
This means that bare-metal applications of futures should be able to use the same
task system that the std-based futures ecosystem uses.

One of the largest changes being made to support this is an update to the memory management of
objects behind Task handles. Previously it was required that Arc<Unpark>
instances were passed into the various Spawn::poll_* functions, but new
functions Spawn::poll_*_notify were added which operate with a NotifyHandle
instead. A NotifyHandle is conceptually very similar to an Arc<Unpark>
instance, but it works through an unsafe trait UnsafeNotify to manage memory
instead of requiring Arc is used. You can still use Arc safely, however, if
you'd like.

In addition to supporting more forms of memory management, the poll_*_notify
functions also take a new id parameter. This parameter is intended to be an
opaque bag-of-bits to the futures crate itself but runtimes can use this to
identify the future being notified. This is intended to enable situations where
the same instance of a NotifyHandle can be used for all futures executed by
using the id field to distinguish which future is ready when it gets
notified.

API Additions

  • A FuturesUnordered::push method was added and the FuturesUnordered type
    itself was completely rewritten to efficiently track a large number of
    futures.
  • A Task::will_notify_current method was added with a slightly different
    implementation than Task::is_current but with stronger guarantees and
    documentation wording about its purpose.

Compatibility Notes

As with all 0.1.x releases this PR is intended to be 100% backwards compatible.
All code that previously compiled should continue to do so with these changes.
As with other changes, though, there are also some updates to be aware of:

  • The task::park function has been renamed to task::current.
  • The Task::unpark function has been renamed to Task::notify, and in general
    terminology around "unpark" has shifted to terminology around "notify"
  • The Unpark trait has been deprecated in favor of the Notify trait
    mentioned above.
  • The UnparkEvent structure has been deprecated. It currently should perform
    the same as it used to, but it's planned that in a future 0.1.x release the
    performance will regress for crates that have not transitioned away. The
    primary primitive to replace this is the addition of a push function on the
    FuturesUnordered type. If this does not help implement your use case though,
    please let us know!
  • The Task::is_current method is now deprecated, and you likely want to use
    Task::will_notify_current instead, but let us know if this doesn't suffice!

Original description

This PR is a work in progress

I'm submitting this PR early to hopefully try to illustrate my thoughts and get some early feedback.

Checklist

Overview

The previous implementation of the task system required a number of allocations per task instance. Each task required a dedicated Arc<Unpark> handle which means that executors require at least two allocations per task.

Things get worse when using with_unpark_event as nested calls to with_unpark_event result in Vec allocation and cloning during each call to task::park.

This commit provides an overhaul to the task system to work around these problems. The Unpark trait is changed so that only one instance is required per executor. In order to identify which task is being unparked, Unpark::unpark takes an unpark_id: u64 argument.

with_unpark_event is removed in favor of UnparkContext which satisfies a similar end goal, but requires only a single allocation per lifetime of the UnparkContext.

The new Unpark trait

In general, tasks are driven to completion by executors and executors are able to handle large number of tasks. As such, the Unpark trait has been tweaked to require a single allocation per executor instead of one per task. The idea is that the executor creates one Arc<Unpark> handle and uses the unpark_id: u64 to identify which task is unparked.

In the case of tokio-core, each task is stored in a slab and the the unpark_id is the slab index. Now, given that an Arc is no longer used, it can be that a slab slot is released and repurposed for a different task while there are still outstanding Task handles referencing the now released task.

There are two potential ways to deal with this.

a) Not care. Futures need to be able to handle spurious wake ups already. Spurious wake ups can be reduced by splitting the u64 into 28 bits for the slab offset and use the rest of the u64 as a slot usage counter.

b) Use Unpark::ref_inc and Unpark::ref_dec to allow the executor implementation to handle its own reference counting.

Option b) would allow an executor implementation to store a pointer as the unpark_id and the ref_inc and ref_dec allow for atomic reference counting. This could be used in cases where using a slab is not an option.

UnparkContext

This is quite similar in spirit to with_unpark_event except it requires significantly less allocations when used in a nested situation.

It does have some different behavior. Given the following:

// Currently in task A
let my_context = UnparkContext::new(my_unpark);

my_context.with(0, || {
    let task = task::park();

    thread::spawn(move || {
        thread::sleep(a_few_secs);
        task.unpark();
    });
});

my_executor.spawn(move || {
    // Currently in task B

    my_context.with(0, || {
        // some work here
    });
});

Task B will be the root task that is notified.

@carllerche
Copy link
Member Author

I should elaborate a bit.

This change should also enable to implement a tokio-core server that can run without allocations. Given that UnparkContext is now pretty lightweight, it seems reasonable for a server task to store a slab itself for the futures handling each accepted connection. In this case, the only allocations would be for the task handling the TcpListener and a pre-allocated slab for futures handling client sockets. The task then uses UnparkContext to either accept from the listener or poll a socket connection task.

@carllerche
Copy link
Member Author

If this ends up being a direction that we want to pursue, I think that it would be safe to keep this in a task2 module. I'm fairly confident that the existing task API (task module) could use task2 under the hood.

@carllerche
Copy link
Member Author

This PR is related to #432. If this strategy is taken, I don't think #432 is relevant anymore.

@alexcrichton
Copy link
Member

This looks pretty slick, I really like it how it's making with_unpark_event essentially free (which is not the case today).

Your code snippet of how behavior changes was intially troubling to me, but in practice I don't think it will matter. Each UnparkContext will typically be associated with a particular future, which means that if you call with you're typically in a Future::poll. In that case you always want to override any previously stored task.

So to make sure I get this straight in my head:

  • All top-level executors assign ids to futures.
  • Top-level executors create one Arc<Unpark> for the entire duration of that executor.
  • Code currently using with_unpark_event will now instead use UnparkContext
  • UnparkContext requires on allocation at creation, and no more afterwards
  • Code uses UnparkContext::with to receive a custom notification with a custom id whenever something is unparked, allowing it to arrange for it to learn later about precisely which ids were unparked.

Does that sound like a correct interpretation? I can't really see many downsides to this!

Some other random thoughts:

  • This is essentially creating a linked list of Arc, right? I'd want to think careful about cycles here to see if they're possible, and if necessary we may wish to use weak pointers in some locations.
  • Can you explain in more detail how this would work in tokio-core? Currently we create a SetReadiness for each future, but would we perhaps no longer do that and just have a linked list of "ready futures"?

@carllerche
Copy link
Member Author

Re: cycles, I don't think they are possible. UnparkContext::with takes &mut self and a closure, which prevents nesting UnparkContext::with calls w/ the same self. Only currently running tasks can be linked together... so it should be fine.

Re: how to hook this into tokio-core, you could most likely get rid of the SetReadiness in favor of a linked list of readiness. The slab in -core that tracks tasks could have have a next_readiness: Option<usize> for each entry..

@leoyvens leoyvens mentioned this pull request Apr 3, 2017
@alexcrichton
Copy link
Member

Ok yeah, that both makes sense to me.

Before merging I'd want to see tokio-core working (ideally something like sccache compiling llvm) just to be sure though.

@carllerche
Copy link
Member Author

Seems fine to me, there is still work to be done though!

@carllerche
Copy link
Member Author

Ok, I have a tweaked proposal.

  • Not introduce task_impl2 (as requested by @alexcrichton).
  • Deprecate the Unpark trait.
  • Introduce the Notify trait:
pub trait Notify: Send + Sync {
    /// Indicates that an associated future and/or task are ready to make
    /// progress.
    ///
    /// Typically this means that the receiver of the notification should
    /// arrange for the future to get poll'd in a prompt fashion.
    fn notify(&self, id: u64);

    /// Called when an instance of `NotifyHandle` has been cloned.
    fn ref_inc(&self);

    /// Called when an instance of `NotifyHandle` has been dropped.
    fn ref_dec(&self);

    /// A new `Task` handle referencing `id` has been created.
    fn id_ref_inc(&self, id: u64);

    /// A `Task` handle referencing `id` has been dropped.
    fn id_ref_dec(&self, id: u64);
}
  • Instead of Arc<Notify> (or Arc<Unpark>), introduce NotifyHandle as a wrapper struct:
struct NotifyHandle {
    ptr: *const NotifyHandle,
}

impl NotifyHandle {
    unsafe pub fn from_ptr(ptr: *const NotifyHandle) -> NotifyHandle {
        unimplemented!();
    }

    pub fn noop() -> NotifyHandle {
        unimplemented!();
    }

    pub fn notify_fn<F: Fn() + Sync + Send>(f: F) -> NotifyHandle {
        unimplemented!();
    }
}

On one hand, Notify is a much more complex trait to implement... and very unsafe. However, the Notify trait would be able to support executors that run without std. NotifyHandle would provide a couple of additional constructors (noop, notify_fn) to assist in constructing "simple" Notify implementations.

Another option would be to provide trait NotifySimple which is safe and has fewer methods to implement. I prefer sticking with one trait Notify as increasing the number of traits increases the complexity of both decisions that users have to make + the implementation.

The one unknown here is how to add Notify support to Spawn. The easiest option would be to deprecate all existing poll_* fns and introduce new ones with a _notify syntax.

IMO, poll_future() (and co.) should be repurposed in 0.2 to be:

fn poll_future(&mut self) -> Poll<F::Item, F::Error> {
    self.poll_future_notify(&NotifyHandle::noop())
}

carllerche and others added 4 commits April 26, 2017 16:17
The previous implementation of the task system required a number of
allocations per task instance. Each task required a dedicated
`Arc<Unpark>` handle which means that executors require at least two
allocations per task.

Things get worse when using `with_unpark_event` as nested calls to
`with_unpark_event` result in Vec allocation and cloning during each
call to `task::park`.

This commit provides an overhaul to the task system to work around these
problems. The `Unpark` trait is changed so that only one instance is
required per executor. In order to identify which task is being
unparked, `Unpark::unpark` takes an `unpark_id: u64` argument.

`with_unpark_event` is removed in favor of `UnparkContext` which
satisfies a similar end goal, but requires only a single allocation per
lifetime of the `UnparkContext`.
The old task system is depreated and updated to run implicitly on the
new task system.
* Move task2 back into task
* Rename new trait to `Notify`
* Add `NotifyHandle` which doesn't embed `Arc`
* Add new `Spawn::*_notify` APIs
* Rename `UnparkContext` to `NotifyContext`
* Deprecate `Unpark` and associated machinery
* Reimplement tasks to keep the same performance of `Unpark` as before.
@carllerche
Copy link
Member Author

So, it seems that NotifyContext has a relatively big flaw when it comes to nesting them... this will take some thinking.

`ReadyQueue` is a structure into which an arbitrary number of futures
can be pushed. `ReadyQueue` implements `Stream`, yielding the inner
futures' resolved values as they become ready.

This structure can handle large numbers of futures efficiently.
Switch ReadyQueue to use UnsafeNotify. This is needed because Notify is
implemented on `Inner<T>`. UnsafeNotify requires 'static but T may not
be static. Even though T may not be 'static, it is safe for Inner to
implement Notify because the implementation will never read `T`.
@alexcrichton
Copy link
Member

I think to merge we need an up-to-date summary of what this PR does, how it changes things, and why the changes are all justified. Basically just a correct PR description and/or correct commit message.

@alexcrichton
Copy link
Member

Other than that though I believe I'm comfortable landing everything here. I can try to get around to the docs next week if it's not done already.

@alexcrichton
Copy link
Member

Ok I started writing up documentation for this but ended up finding what I believe is some lingering unsafety and memory leaks.

@leoyvens
Copy link
Contributor

If drop_raw takes &self now (I think it was &mut self at some point) then should we change NotifyHandle::new to take a *const UnsafeNotify rather than *mut UnsafeNotify?

@leoyvens
Copy link
Contributor

Why is id an u64 rather than usize? Given that a lot of this is for no_std compatibility, isn't usize better for compatibility and performance on non-64bit architectures?

@alexcrichton
Copy link
Member

@leodasvacas the intention is to take *mut Self in drop_raw but we're not currently allowed to do that for object safety. The &self pointer has "less guarantees" in terms of uniqueness and such that the compiler knows about, so we're currently using &self instead of &mut self.

@carllerche can speak more though to the choice of u64 instead of usize

A nice way of working with tasks without allocations.
@carllerche
Copy link
Member Author

I opted for u64 instead of usize specifically because the extra bits available w/ 64 are very useful when implementing things like a treiber stack (extra bits are used to guard against ABA).

Picking u64 definitely gears towards optimizing for 64 bit platforms...

I'm not sure how common it is to have 32 bit platforms and care about the extra perf hit.

THAT said, it occurs to me that hardcoding to u64 would be a hazard if there ever are 128bit CPUs... though I don't know how realistic that is :) (especially since ptrs these days are limited to 48bit).

If there is strong preference to switching to usize and a good argument for it, I would be fine to do so.

@carllerche
Copy link
Member Author

@alexcrichton said he wanted to keep the various _notify functions on Spawn for consistency reasons. They have a low risk of naming conflict and could be easily deprecated later if needed.

Avoids twiddling reference counts ourselves and makes it clear who owns what,
also led to the discovery of a memory leak...

At the same time this also renames `ref_inc` to `clone_id` and `ref_dec` to
`drop_id` for closer mirroring to ownership in Rust.
There were a few independent bugs here which were fixed:

* Nodes left in the mpsc queue after the `FuturesUnordered` was dropped were
  previously leaked.
* The `drop_raw` calls were incorrect, not actually freeing anything.
* Memory unsafety was possible if polling a future panicked, for a number of
  reasons.

Memory management is now entirely done through `Arc` and `Weak` and code was
restructured slightly to work around panics in polling future.
@alexcrichton
Copy link
Member

Ok here's my updated summary of the PR: @carllerche mind checking?


Major Changes

The primary change made in this PR is to restructure memory management and
notifications throughout the "task system" in the futures crate. It is
intended that this will have very little impact, if any, on consumers of
futures. Implementations of runtimes of futures (e.g. crates like tokio-core)
are the target for this series of changes, enabling a suite of possible
optimizations that were not previously feasible.

One of the largest changes being made is an update to the memory management of
objects behind Task handles. Previously it was required that Arc<Unpark>
instances were passed into the various Spawn::poll_* functions, but new
functions Spawn::poll_*_notify were added which operate with a NotifyHandle
instead. A NotifyHandle is conceptually very similar to an Arc<Unpark>
instance, but it works through an unsafe trait UnsafeNotify to manage memory
instead of requiring Arc is used. You can still use Arc safely, however, if
you'd like.

In addition to supporting more forms of memory management, the poll_*_notify
functions also take a new id parameter. This parameter is intended to be an
opaque bag-of-bits to the futures crate itself but runtimes can use this to
identify the future being notified. This is intended to enable situations where
the same instance of a NotifyHandle can be used for all futures executed by
using the id field to distinguish which future is ready when it gets
notified.

API Additions

  • A FuturesUnordered::push method was added and the FuturesUnordered type
    itself was completely rewritten to efficiently track a large number of
    futures.
  • A Task::will_notify_current method was added with a slightly different
    implementation than Task::is_current but with stronger guarantees and
    documentation wording about its purpose.

Compatibility Notes

As with all 0.1.x releases this PR is intended to be 100% backwards compatible.
All code that previously compiled should continue to do so with these changes.
As with other changes, though, there are also some updates to be aware of:

  • The task::park function has been renamed to task::current.
  • The Task::unpark function has been renamed to Task::notify, and in general
    terminology around "unpark" has shifted to terminology around "notify"
  • The Unpark trait has been deprecated in favor of the Notify trait
    mentioned above.
  • The UnparkEvent structure has been deprecated. It currently should perform
    the same as it used to, but it's planned that in a future 0.1.x release the
    performance will regress for crates that have not transitioned away. The
    primary primitive to replace this is the addition of a push function on the
    FuturesUnordered type. If this does not help implement your use case though,
    please let us know!
  • The Task::is_current method is now deprecated, and you likely want to use
    Task::will_notify_current instead, but let us know if this doesn't suffice!

@carllerche
Copy link
Member Author

@alexcrichton seems good to me.

This commit enables usage of the standard task system in `no_std` contexts. This
essentially piggy-backs the `use_std` feature already have and, when
deactivated, will provide the `futures::{task, executor}` modules. Building on
all the previous refactorings for types like `NotifyHandle` this should enable
usage of the task system in these contexts.

This does, however, come with a caveat. A relatively large dependency of the
task system is currently its usage of thread-local storage, unavailable in task
contexts! To work around this an initialization hook for the futures library is
provided, and this initialization hook is required to be called before working
with the task system. This hook takes just two pointers, a `get` and a `set` for
a pointer read/write into some "local storage", depending on the context of the
system that we're running on.

Usage of this crate from a `std` context (e.g. with default features enabled)
should not be impacted by this change. Everything previously present is still
there and no explicit initialization is required.
@alexcrichton
Copy link
Member

Alright after some discussion with @aturon I reached the conclusion that I'd personally at least be much more comfortable landing this if we could achieve one of the original goals (no_std usage or a better unpark event). Along those lines I've just pushed up a commit which enables usage of the entire non-deprecated surface area of the task system in no_std contexts.

The crux here is that no_std contexts must explicitly initialize the task system with a getter/setter for thread-local-storage-like information. This is lazily provided when running in the context of libstd but when disabling the default features of this crate it's required for usage.

carllerche and others added 3 commits May 25, 2017 10:18
The initial implementation of no-std introduced a performance regression
when using std. This was caused by a few factors:

* Additional calls to TLS accessors.
* Addition of dynamic dispatch.
* Addition of atomic ops.
* Less inlining

Most of these additions have been removed, leaving a single relaxed
atomic load in the `task::current()` path as well as limiting the
`call_once` to when the task is entered vs. in all task functions.
Clean up a few more hot spots as well:

* Inline some trivial functions that are called across crates
* Avoid cross-crate TLS access as it's a vtable hit right now which is a bummer
Using u64 isn't a clear win and it will usually represent a pointer
anyway.
@carllerche
Copy link
Member Author

As far as I know, everything that is needed for this PR has happened...

@alexcrichton
Copy link
Member

Alright, let's do it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants