Task system overhaul #436

carllerche · 2017-03-31T21:32:45Z

Updated description

The primary change made in this PR is to restructure memory management and
notifications throughout the "task system" in the futures crate. It is
intended that this will have very little impact, if any, on consumers of
futures. Implementations of runtimes of futures (e.g. crates like tokio-core)
are the target for this series of changes, enabling a suite of possible
optimizations that were not previously feasible. One major use case that is
now enabled is usage of the task and executor modules in the no_std ecosystem.
This means that bare-metal applications of futures should be able to use the same
task system that the std-based futures ecosystem uses.

One of the largest changes being made to support this is an update to the memory management of
objects behind Task handles. Previously it was required that Arc<Unpark>
instances were passed into the various Spawn::poll_* functions, but new
functions Spawn::poll_*_notify were added which operate with a NotifyHandle
instead. A NotifyHandle is conceptually very similar to an Arc<Unpark>
instance, but it works through an unsafe trait UnsafeNotify to manage memory
instead of requiring Arc is used. You can still use Arc safely, however, if
you'd like.

In addition to supporting more forms of memory management, the poll_*_notify
functions also take a new id parameter. This parameter is intended to be an
opaque bag-of-bits to the futures crate itself but runtimes can use this to
identify the future being notified. This is intended to enable situations where
the same instance of a NotifyHandle can be used for all futures executed by
using the id field to distinguish which future is ready when it gets
notified.

API Additions

A FuturesUnordered::push method was added and the FuturesUnordered type
itself was completely rewritten to efficiently track a large number of
futures.
A Task::will_notify_current method was added with a slightly different
implementation than Task::is_current but with stronger guarantees and
documentation wording about its purpose.

Compatibility Notes

As with all 0.1.x releases this PR is intended to be 100% backwards compatible.
All code that previously compiled should continue to do so with these changes.
As with other changes, though, there are also some updates to be aware of:

The task::park function has been renamed to task::current.
The Task::unpark function has been renamed to Task::notify, and in general
terminology around "unpark" has shifted to terminology around "notify"
The Unpark trait has been deprecated in favor of the Notify trait
mentioned above.
The UnparkEvent structure has been deprecated. It currently should perform
the same as it used to, but it's planned that in a future 0.1.x release the
performance will regress for crates that have not transitioned away. The
primary primitive to replace this is the addition of a push function on the
FuturesUnordered type. If this does not help implement your use case though,
please let us know!
The Task::is_current method is now deprecated, and you likely want to use
Task::will_notify_current instead, but let us know if this doesn't suffice!

Original description

This PR is a work in progress

I'm submitting this PR early to hopefully try to illustrate my thoughts and get some early feedback.

Checklist

Overview

The previous implementation of the task system required a number of allocations per task instance. Each task required a dedicated Arc<Unpark> handle which means that executors require at least two allocations per task.

Things get worse when using with_unpark_event as nested calls to with_unpark_event result in Vec allocation and cloning during each call to task::park.

This commit provides an overhaul to the task system to work around these problems. The Unpark trait is changed so that only one instance is required per executor. In order to identify which task is being unparked, Unpark::unpark takes an unpark_id: u64 argument.

with_unpark_event is removed in favor of UnparkContext which satisfies a similar end goal, but requires only a single allocation per lifetime of the UnparkContext.

The new `Unpark` trait

In general, tasks are driven to completion by executors and executors are able to handle large number of tasks. As such, the Unpark trait has been tweaked to require a single allocation per executor instead of one per task. The idea is that the executor creates one Arc<Unpark> handle and uses the unpark_id: u64 to identify which task is unparked.

In the case of tokio-core, each task is stored in a slab and the the unpark_id is the slab index. Now, given that an Arc is no longer used, it can be that a slab slot is released and repurposed for a different task while there are still outstanding Task handles referencing the now released task.

There are two potential ways to deal with this.

a) Not care. Futures need to be able to handle spurious wake ups already. Spurious wake ups can be reduced by splitting the u64 into 28 bits for the slab offset and use the rest of the u64 as a slot usage counter.

b) Use Unpark::ref_inc and Unpark::ref_dec to allow the executor implementation to handle its own reference counting.

Option b) would allow an executor implementation to store a pointer as the unpark_id and the ref_inc and ref_dec allow for atomic reference counting. This could be used in cases where using a slab is not an option.

`UnparkContext`

This is quite similar in spirit to with_unpark_event except it requires significantly less allocations when used in a nested situation.

It does have some different behavior. Given the following:

// Currently in task A
let my_context = UnparkContext::new(my_unpark);

my_context.with(0, || {
    let task = task::park();

    thread::spawn(move || {
        thread::sleep(a_few_secs);
        task.unpark();
    });
});

my_executor.spawn(move || {
    // Currently in task B

    my_context.with(0, || {
        // some work here
    });
});

Task B will be the root task that is notified.

carllerche · 2017-03-31T21:36:53Z

I should elaborate a bit.

This change should also enable to implement a tokio-core server that can run without allocations. Given that UnparkContext is now pretty lightweight, it seems reasonable for a server task to store a slab itself for the futures handling each accepted connection. In this case, the only allocations would be for the task handling the TcpListener and a pre-allocated slab for futures handling client sockets. The task then uses UnparkContext to either accept from the listener or poll a socket connection task.

carllerche · 2017-03-31T21:41:48Z

If this ends up being a direction that we want to pursue, I think that it would be safe to keep this in a task2 module. I'm fairly confident that the existing task API (task module) could use task2 under the hood.

carllerche · 2017-03-31T21:56:33Z

This PR is related to #432. If this strategy is taken, I don't think #432 is relevant anymore.

alexcrichton · 2017-04-03T17:55:47Z

This looks pretty slick, I really like it how it's making with_unpark_event essentially free (which is not the case today).

Your code snippet of how behavior changes was intially troubling to me, but in practice I don't think it will matter. Each UnparkContext will typically be associated with a particular future, which means that if you call with you're typically in a Future::poll. In that case you always want to override any previously stored task.

So to make sure I get this straight in my head:

All top-level executors assign ids to futures.
Top-level executors create one Arc<Unpark> for the entire duration of that executor.
Code currently using with_unpark_event will now instead use UnparkContext
UnparkContext requires on allocation at creation, and no more afterwards
Code uses UnparkContext::with to receive a custom notification with a custom id whenever something is unparked, allowing it to arrange for it to learn later about precisely which ids were unparked.

Does that sound like a correct interpretation? I can't really see many downsides to this!

Some other random thoughts:

This is essentially creating a linked list of Arc, right? I'd want to think careful about cycles here to see if they're possible, and if necessary we may wish to use weak pointers in some locations.
Can you explain in more detail how this would work in tokio-core? Currently we create a SetReadiness for each future, but would we perhaps no longer do that and just have a linked list of "ready futures"?

carllerche · 2017-04-03T20:33:47Z

Re: cycles, I don't think they are possible. UnparkContext::with takes &mut self and a closure, which prevents nesting UnparkContext::with calls w/ the same self. Only currently running tasks can be linked together... so it should be fine.

Re: how to hook this into tokio-core, you could most likely get rid of the SetReadiness in favor of a linked list of readiness. The slab in -core that tracks tasks could have have a next_readiness: Option<usize> for each entry..

alexcrichton · 2017-04-04T19:08:15Z

Ok yeah, that both makes sense to me.

Before merging I'd want to see tokio-core working (ideally something like sccache compiling llvm) just to be sure though.

carllerche · 2017-04-04T19:47:23Z

Seems fine to me, there is still work to be done though!

carllerche · 2017-04-22T20:12:33Z

Ok, I have a tweaked proposal.

Not introduce task_impl2 (as requested by @alexcrichton).
Deprecate the Unpark trait.
Introduce the Notify trait:

pub trait Notify: Send + Sync {
    /// Indicates that an associated future and/or task are ready to make
    /// progress.
    ///
    /// Typically this means that the receiver of the notification should
    /// arrange for the future to get poll'd in a prompt fashion.
    fn notify(&self, id: u64);

    /// Called when an instance of `NotifyHandle` has been cloned.
    fn ref_inc(&self);

    /// Called when an instance of `NotifyHandle` has been dropped.
    fn ref_dec(&self);

    /// A new `Task` handle referencing `id` has been created.
    fn id_ref_inc(&self, id: u64);

    /// A `Task` handle referencing `id` has been dropped.
    fn id_ref_dec(&self, id: u64);
}

Instead of Arc<Notify> (or Arc<Unpark>), introduce NotifyHandle as a wrapper struct:

struct NotifyHandle {
    ptr: *const NotifyHandle,
}

impl NotifyHandle {
    unsafe pub fn from_ptr(ptr: *const NotifyHandle) -> NotifyHandle {
        unimplemented!();
    }

    pub fn noop() -> NotifyHandle {
        unimplemented!();
    }

    pub fn notify_fn<F: Fn() + Sync + Send>(f: F) -> NotifyHandle {
        unimplemented!();
    }
}

Rename task::park()->task::current()andTask::unpark()->Task::notify()` (Consider renaming task park / unpark #312 after discussing w/ a number of users, I think this is much clearer naming).

On one hand, Notify is a much more complex trait to implement... and very unsafe. However, the Notify trait would be able to support executors that run without std. NotifyHandle would provide a couple of additional constructors (noop, notify_fn) to assist in constructing "simple" Notify implementations.

Another option would be to provide trait NotifySimple which is safe and has fewer methods to implement. I prefer sticking with one trait Notify as increasing the number of traits increases the complexity of both decisions that users have to make + the implementation.

The one unknown here is how to add Notify support to Spawn. The easiest option would be to deprecate all existing poll_* fns and introduce new ones with a _notify syntax.

IMO, poll_future() (and co.) should be repurposed in 0.2 to be:

fn poll_future(&mut self) -> Poll<F::Item, F::Error> {
    self.poll_future_notify(&NotifyHandle::noop())
}

The previous implementation of the task system required a number of allocations per task instance. Each task required a dedicated `Arc<Unpark>` handle which means that executors require at least two allocations per task. Things get worse when using `with_unpark_event` as nested calls to `with_unpark_event` result in Vec allocation and cloning during each call to `task::park`. This commit provides an overhaul to the task system to work around these problems. The `Unpark` trait is changed so that only one instance is required per executor. In order to identify which task is being unparked, `Unpark::unpark` takes an `unpark_id: u64` argument. `with_unpark_event` is removed in favor of `UnparkContext` which satisfies a similar end goal, but requires only a single allocation per lifetime of the `UnparkContext`.

The old task system is depreated and updated to run implicitly on the new task system.

* Move task2 back into task * Rename new trait to `Notify` * Add `NotifyHandle` which doesn't embed `Arc` * Add new `Spawn::*_notify` APIs * Rename `UnparkContext` to `NotifyContext` * Deprecate `Unpark` and associated machinery * Reimplement tasks to keep the same performance of `Unpark` as before.

carllerche · 2017-04-26T23:19:53Z

So, it seems that NotifyContext has a relatively big flaw when it comes to nesting them... this will take some thinking.

…stem

`ReadyQueue` is a structure into which an arbitrary number of futures can be pushed. `ReadyQueue` implements `Stream`, yielding the inner futures' resolved values as they become ready. This structure can handle large numbers of futures efficiently.

Switch ReadyQueue to use UnsafeNotify. This is needed because Notify is implemented on `Inner<T>`. UnsafeNotify requires 'static but T may not be static. Even though T may not be 'static, it is safe for Inner to implement Notify because the implementation will never read `T`.

alexcrichton · 2017-05-20T05:14:08Z

I think to merge we need an up-to-date summary of what this PR does, how it changes things, and why the changes are all justified. Basically just a correct PR description and/or correct commit message.

alexcrichton · 2017-05-20T05:14:32Z

Other than that though I believe I'm comfortable landing everything here. I can try to get around to the docs next week if it's not done already.

alexcrichton · 2017-05-20T20:27:48Z

Ok I started writing up documentation for this but ended up finding what I believe is some lingering unsafety and memory leaks.

leoyvens · 2017-05-21T19:30:42Z

If drop_raw takes &self now (I think it was &mut self at some point) then should we change NotifyHandle::new to take a *const UnsafeNotify rather than *mut UnsafeNotify?

leoyvens · 2017-05-21T20:23:30Z

Why is id an u64 rather than usize? Given that a lot of this is for no_std compatibility, isn't usize better for compatibility and performance on non-64bit architectures?

alexcrichton · 2017-05-22T14:16:13Z

@leodasvacas the intention is to take *mut Self in drop_raw but we're not currently allowed to do that for object safety. The &self pointer has "less guarantees" in terms of uniqueness and such that the compiler knows about, so we're currently using &self instead of &mut self.

@carllerche can speak more though to the choice of u64 instead of usize

A nice way of working with tasks without allocations.

carllerche · 2017-05-22T18:39:12Z

I opted for u64 instead of usize specifically because the extra bits available w/ 64 are very useful when implementing things like a treiber stack (extra bits are used to guard against ABA).

Picking u64 definitely gears towards optimizing for 64 bit platforms...

I'm not sure how common it is to have 32 bit platforms and care about the extra perf hit.

THAT said, it occurs to me that hardcoding to u64 would be a hazard if there ever are 128bit CPUs... though I don't know how realistic that is :) (especially since ptrs these days are limited to 48bit).

If there is strong preference to switching to usize and a good argument for it, I would be fine to do so.

carllerche · 2017-05-22T19:05:12Z

@alexcrichton said he wanted to keep the various _notify functions on Spawn for consistency reasons. They have a low risk of naming conflict and could be easily deprecated later if needed.

Avoids twiddling reference counts ourselves and makes it clear who owns what, also led to the discovery of a memory leak... At the same time this also renames `ref_inc` to `clone_id` and `ref_dec` to `drop_id` for closer mirroring to ownership in Rust.

There were a few independent bugs here which were fixed: * Nodes left in the mpsc queue after the `FuturesUnordered` was dropped were previously leaked. * The `drop_raw` calls were incorrect, not actually freeing anything. * Memory unsafety was possible if polling a future panicked, for a number of reasons. Memory management is now entirely done through `Arc` and `Weak` and code was restructured slightly to work around panics in polling future.

alexcrichton · 2017-05-22T19:53:12Z

Ok here's my updated summary of the PR: @carllerche mind checking?

Major Changes

The primary change made in this PR is to restructure memory management and
notifications throughout the "task system" in the futures crate. It is
intended that this will have very little impact, if any, on consumers of
futures. Implementations of runtimes of futures (e.g. crates like tokio-core)
are the target for this series of changes, enabling a suite of possible
optimizations that were not previously feasible.

One of the largest changes being made is an update to the memory management of
objects behind Task handles. Previously it was required that Arc<Unpark>
instances were passed into the various Spawn::poll_* functions, but new
functions Spawn::poll_*_notify were added which operate with a NotifyHandle
instead. A NotifyHandle is conceptually very similar to an Arc<Unpark>
instance, but it works through an unsafe trait UnsafeNotify to manage memory
instead of requiring Arc is used. You can still use Arc safely, however, if
you'd like.

In addition to supporting more forms of memory management, the poll_*_notify
functions also take a new id parameter. This parameter is intended to be an
opaque bag-of-bits to the futures crate itself but runtimes can use this to
identify the future being notified. This is intended to enable situations where
the same instance of a NotifyHandle can be used for all futures executed by
using the id field to distinguish which future is ready when it gets
notified.

API Additions

A FuturesUnordered::push method was added and the FuturesUnordered type
itself was completely rewritten to efficiently track a large number of
futures.
A Task::will_notify_current method was added with a slightly different
implementation than Task::is_current but with stronger guarantees and
documentation wording about its purpose.

Compatibility Notes

As with all 0.1.x releases this PR is intended to be 100% backwards compatible.
All code that previously compiled should continue to do so with these changes.
As with other changes, though, there are also some updates to be aware of:

The task::park function has been renamed to task::current.
The Task::unpark function has been renamed to Task::notify, and in general
terminology around "unpark" has shifted to terminology around "notify"
The Unpark trait has been deprecated in favor of the Notify trait
mentioned above.
The UnparkEvent structure has been deprecated. It currently should perform
the same as it used to, but it's planned that in a future 0.1.x release the
performance will regress for crates that have not transitioned away. The
primary primitive to replace this is the addition of a push function on the
FuturesUnordered type. If this does not help implement your use case though,
please let us know!
The Task::is_current method is now deprecated, and you likely want to use
Task::will_notify_current instead, but let us know if this doesn't suffice!

carllerche · 2017-05-22T20:01:34Z

@alexcrichton seems good to me.

This commit enables usage of the standard task system in `no_std` contexts. This essentially piggy-backs the `use_std` feature already have and, when deactivated, will provide the `futures::{task, executor}` modules. Building on all the previous refactorings for types like `NotifyHandle` this should enable usage of the task system in these contexts. This does, however, come with a caveat. A relatively large dependency of the task system is currently its usage of thread-local storage, unavailable in task contexts! To work around this an initialization hook for the futures library is provided, and this initialization hook is required to be called before working with the task system. This hook takes just two pointers, a `get` and a `set` for a pointer read/write into some "local storage", depending on the context of the system that we're running on. Usage of this crate from a `std` context (e.g. with default features enabled) should not be impacted by this change. Everything previously present is still there and no explicit initialization is required.

alexcrichton · 2017-05-24T15:33:19Z

Alright after some discussion with @aturon I reached the conclusion that I'd personally at least be much more comfortable landing this if we could achieve one of the original goals (no_std usage or a better unpark event). Along those lines I've just pushed up a commit which enables usage of the entire non-deprecated surface area of the task system in no_std contexts.

The crux here is that no_std contexts must explicitly initialize the task system with a getter/setter for thread-local-storage-like information. This is lazily provided when running in the context of libstd but when disabling the default features of this crate it's required for usage.

The initial implementation of no-std introduced a performance regression when using std. This was caused by a few factors: * Additional calls to TLS accessors. * Addition of dynamic dispatch. * Addition of atomic ops. * Less inlining Most of these additions have been removed, leaving a single relaxed atomic load in the `task::current()` path as well as limiting the `call_once` to when the task is entered vs. in all task functions.

Clean up a few more hot spots as well: * Inline some trivial functions that are called across crates * Avoid cross-crate TLS access as it's a vtable hit right now which is a bummer

Using u64 isn't a clear win and it will usually represent a pointer anyway.

carllerche · 2017-05-25T19:23:45Z

As far as I know, everything that is needed for this PR has happened...

alexcrichton · 2017-05-25T19:25:00Z

Alright, let's do it!

carllerche mentioned this pull request Mar 31, 2017

Avoid quadratic complexity in poll() tokio-rs/tokio-proto#155

Open

leoyvens mentioned this pull request Apr 3, 2017

Zero allocation futures #437

Closed

This was referenced Apr 4, 2017

Consider renaming task park / unpark #312

Closed

Do not use unpark events as part Shared #442

Merged

RFC: Customizable wait() behavior #360

Closed

alexcrichton mentioned this pull request Apr 11, 2017

Provide an atomic Task cell #361

Closed

carllerche and others added 4 commits April 26, 2017 16:17

Expose new task system

002ba26

The old task system is depreated and updated to run implicitly on the new task system.

Fix a bug with Unpark ref counting

2cf43af

leoyvens mentioned this pull request May 2, 2017

Consider passing the Task that is driving the future to Future::poll and renaming the function #129

Closed

carllerche added 10 commits May 11, 2017 11:10

Merge remote-tracking branch 'origin/master' into restructure-task-sy…

d327a4a

…stem

ArcWrapped should delegate ref_inc and ref_dec

fa14342

Merge remote-tracking branch 'origin/master' into restructure-task-sy…

c69132e

…stem

Add tests for ReadyQueue

a62d795

ReadyQueue: Drop futures when queue is dropped

541f15d

Improve the test slightly

9dcaa4b

Update BufferUnordered to use ReadyQueue

4c96c2b

Switch FuturesUnordered to use ReadyQueue

3fc80d8

Into<NotifyHandle> for &'static T where T : Notify (#5)

b743abd

A nice way of working with tasks without allocations.

alexcrichton added 4 commits May 22, 2017 12:13

Handle rebase conflicts

b681e6f

Merge branch 'master' into task-rewrite

4fdb76b

leoyvens mentioned this pull request May 22, 2017

Async IO for embedded concurrency rust-embedded/wg#23

Closed

alexcrichton mentioned this pull request May 23, 2017

Missing event loop understanding documentation tokio-rs/tokio#9

Closed

carllerche and others added 3 commits May 25, 2017 10:18

Add a benchmark for task initialization

503dbf0

Clean up a few more hot spots as well: * Inline some trivial functions that are called across crates * Avoid cross-crate TLS access as it's a vtable hit right now which is a bummer

Use usize as the Notify task identifier.

62dbba2

Using u64 isn't a clear win and it will usually represent a pointer anyway.

alexcrichton merged commit 878a9a8 into rust-lang:master May 25, 2017

This was referenced May 25, 2017

[experiment] remove CURRENT_TASK thread-local variable #374

Closed

Futures could be zero-allocation (at least for tokio) #430

Closed

Consider switching Arc<Unpark> to an opaque type #432

Closed

carllerche deleted the restructure-task-system branch May 25, 2017 21:12

carllerche mentioned this pull request May 30, 2017

Add an executor::Wait trait #487

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task system overhaul #436

Task system overhaul #436

carllerche commented Mar 31, 2017 •

edited by alexcrichton

carllerche commented Mar 31, 2017

carllerche commented Mar 31, 2017

carllerche commented Mar 31, 2017

alexcrichton commented Apr 3, 2017

carllerche commented Apr 3, 2017

alexcrichton commented Apr 4, 2017

carllerche commented Apr 4, 2017

carllerche commented Apr 22, 2017

carllerche commented Apr 26, 2017

alexcrichton commented May 20, 2017

alexcrichton commented May 20, 2017

alexcrichton commented May 20, 2017

leoyvens commented May 21, 2017

leoyvens commented May 21, 2017

alexcrichton commented May 22, 2017

carllerche commented May 22, 2017

carllerche commented May 22, 2017

alexcrichton commented May 22, 2017

carllerche commented May 22, 2017

alexcrichton commented May 24, 2017

carllerche commented May 25, 2017

alexcrichton commented May 25, 2017

Task system overhaul #436

Task system overhaul #436

Conversation

carllerche commented Mar 31, 2017 • edited by alexcrichton

Updated description

API Additions

Compatibility Notes

Original description

Checklist

Overview

The new Unpark trait

UnparkContext

carllerche commented Mar 31, 2017

carllerche commented Mar 31, 2017

carllerche commented Mar 31, 2017

alexcrichton commented Apr 3, 2017

carllerche commented Apr 3, 2017

alexcrichton commented Apr 4, 2017

carllerche commented Apr 4, 2017

carllerche commented Apr 22, 2017

carllerche commented Apr 26, 2017

alexcrichton commented May 20, 2017

alexcrichton commented May 20, 2017

alexcrichton commented May 20, 2017

leoyvens commented May 21, 2017

leoyvens commented May 21, 2017

alexcrichton commented May 22, 2017

carllerche commented May 22, 2017

carllerche commented May 22, 2017

alexcrichton commented May 22, 2017

Major Changes

API Additions

Compatibility Notes

carllerche commented May 22, 2017

alexcrichton commented May 24, 2017

carllerche commented May 25, 2017

alexcrichton commented May 25, 2017

carllerche commented Mar 31, 2017 •

edited by alexcrichton

The new `Unpark` trait

`UnparkContext`