Improved public API soundess #27

TomGillen · 2019-11-09T15:21:52Z

Changed World and Query APIs to require &mut World, such that it is statically impossible to write safe code with unsound borrowing. New unsafe _unchecked alternatives have been added with the previous behaviour.

This required changing the PreparedQuery APIs to also require they be given a PreparedWorld reference, in order to control whether a system can execute two of its queries at the same time.

PreparedWorld now provides more of the API that is available on World. PreparedWorld allows random access to components and archetypes that a system's queries declare access to. SystemBuilder::[read/write]_component can now be used to mark the system as accessing all archetypes.

Refactored world and query APIs to require &mut World in safe APIs and added _unchecked alternatives.

Ralith · 2019-11-09T16:52:22Z

It'd be nice to have per-component-type dynamic borrow checking eventually to make more patterns Just Work, but this is definitely a step in the right direction!

jaynus · 2019-11-09T17:36:31Z

It'd be nice to have per-component-type dynamic borrow checking eventually to make more patterns Just Work, but this is definitely a step in the right direction!

The whole point of this change, however, is the fact that dynamic borrow checking is excessively bad for performance as we'd need to be performing checks in all the iterators and accessors, which would be prohibitively expensive.

I still believe allowing the dispatcher to guarantee soundness via its preperations and dispatch is the ultimate solution here - placing these restrictions on the world proper for soundness is already, as seen here, limiting the API drastically.

@TomGillen - Am I correct in my understandings below here:

It appears the change of require mutable borrows of PreparedWorld within the queries inherently will make the "sound" API limit all query executions and data acess to exclusive accessors. This means that random-access, by default, of components will also be exclusive. (get_component not defaults to a mutable borrow, so you can never read 2 components of different archetypes ever at the same time except with the unchecked API). Is this a correct assumption? If so, I believe this drastically ruins the usefulness of basically the entire API - we now by default are basically worthless except for glorified for-loops; any actual performant and useful use cases will require the use of the unchecked APIs.

I dont agree with drastically limiting all of legions use cases for allowing the public API more soundness for direct access to world. As an example, the dispatcher already provides the static garuntees for access which are needed to safely use the unchecked API's. I dont see why we need to make this API so unergonomic now for the safety of World direct access, when PreparedWorld access already provided all the safety/soundness for us via dispatcher execution.

Correct me if I am off base or wrong about any of these assumptions so far. But this change basically:

We can never run multiple queries at the same time without unchecked
We can never access two different component types at the same time without get_component_unchecked or from within a Query

I see this as drastically limiting to say, the transform project by @AThilenius, as due to the nature of transform hierarchies, it basically requires a large amount of random access for parent-child relationships. Now, those relationships must be iterated and checked on a component-by-component basis; which means we cannot randomly access say, Translation and Rotation, at the same time (without unchecked).

From my perspective, this would now make the legion safe api a single-component access API, or a glorified for-loop iterator with queries, preventing any kind of multi-query execution is drastically limiting on standard use cases.

I realize now after writing this all its a brainfart, but I believe the discussion of soundness/safety needs to occur more from a design perspective. I believe its safe to make checked API's in world, while the garuntees of the executor allow for the default API to not need the exclusivity garuntees.

@Frizi for visibility, as this is a huge limiting change which may drastically affect the current amethyst RFC.

TomGillen · 2019-11-09T18:13:25Z

get_component actually only requires &World, because the only code that the compiler would allow to run concurrently with such a component borrow that should not be allowed would be the new unsafe APIs. get_component_mut does require &mut World and would need to be called via get_component_mut_unchecked if you wanted to do it while holding on to other references.

The system scheduler doesn't provide sufficient safety, because a system can either run more than one query at a time, or perform random accesses while inside a query loop.

Current (system) code:

for (mut pos, vel) in query.iter() {
    // this "safe" code could panic or perform undefined behaviour, depending on what `entity` refers to
    *world.get_component_mut::<Position>(entity) = *pos;
}

This PR:

for (mut pos, vel) in query.iter(&mut *world) {
    // this won't compile
    *world.get_component_mut::<Position>(entity) = *pos;
}

for (mut pos, vel) in unsafe { query.iter_unchecked(&*world) } {
    // this compiles, but the user has to aknowledge the potential unsafety between
    // the query and random access
    unsafe { *world.get_component_mut_unchecked::<Position>(entity) = *pos };
}

The issue is more pronounced if a system was doing something like nesting multiple query loops. That is only safe if the queries are designed such that they can not possibly both access the same chunks (i.e. one reads a component while the other requires the component does not exist). By default the compiler will not allow such nested loops, but it would if both loops used iter_unchecked.

I don't think the *_unchecked APIs are that unergonomic, and I think forcing the user to aknowledge when they are performing a potentially erronous access is a good thing. That is exactly what the unsafe keyword is for. You also do not need it the vast majority of the time. I don't think I needed to use them once when updating all of the unit tests, benchmarks, examples, and documentation code in the PR. The only place I used the *_unchecked calls was inside the dispatcher itself, and indeed that is exactly the kind of place where we should be using them as that code is only safe because of guarentees that the surrounding code makes.

TomGillen · 2019-11-09T18:21:00Z

In most cases, the most notable change to user code is that query code inside systems now looks the same as it does outside of systems.

Ralith · 2019-11-09T18:33:04Z

The whole point of this change, however, is the fact that dynamic borrow checking is excessively bad for performance as we'd need to be performing checks in all the iterators and accessors, which would be prohibitively expensive.

I don't believe that maintaining a single type -> borrow table in the World and checking it on random access and when a query is begun will be ruinously expensive for most applications, and for niche cases (or cases which enforce the guarantees themselves) there's always the unchecked API.

I still believe allowing the dispatcher to guarantee soundness via its preperations and dispatch is the ultimate solution here

I don't see the conflict here; the dispatcher can build on top of the unchecked APIs, on the basis of the guarantees it enforces itself.

TomGillen · 2019-11-09T18:42:57Z

It would be difficult to remove the runtime checking from the safe APIs and keep them in the unsafe APIs (the different code path would run very deep), which is why we instead perform runtime checks on both in debug builds and disable the checks in release. That is only safe to do with the changes in this PR.

Per-component checks would certainly not be ideal for performance, but also not totally ruinous provided that there was a way of marking an entire slice as borrowed with a similar performance cost to how the borrow checking works currently. I suspect it may be possible to implement this at a similar cost to the current checks. Even so, I would still be much more comfortable if such checks could be disabled in release builds.

There is one more side affect of this PR; we could now safely remove Ref from most of the public API and directly return references. Doing so would harm the debug build's ability to runtime check the _unchecked calls (because most code would no longer hold onto the runtime borrows for the full duration of their references being in scope), but would perhaps improve ergonomics a little. That was not safe to do while we were trying (unsuccessfully) to use the runtime checks as a safety guarentee, but now those checks are only there for debug diagnostics and are not part of the contract. I am not sure if the trade off it worth it or not (and I am leaning towards "not"), which is why I did not implement that in this PR.

Ralith · 2019-11-09T19:03:52Z

Per-component checks would certainly not be ideal for performance, but also not totally ruinous provided that there was a way of marking an entire slice as borrowed with a similar performance cost to how the borrow checking works currently.

To be clear, I'm talking about per type checks, i.e. behavior approximately analogous to having a bunch of RefCell<Vec<T>>. Nothing that'd help for multiple random access within a single type.

TomGillen · 2019-11-09T19:41:01Z

We already perform runtime borrow checks per-type, per-chunk.

Ralith · 2019-11-09T19:51:59Z

Oh, you're saying that it's unreasonably difficult to have safe and unsafe APIs that always and never hit those checks respectively? I'd be interested to see what the quantitative performance impact of just running them always would be; per-chunk strikes me as pretty coarse. Also, would lifting them out to the world scope, ignorant of individual chunks, reduce the difficulty?

Ralith · 2019-11-09T19:57:34Z

(but of course none of that needs to be fully explored right now; this PR addresses the immediate problem as-is)

TomGillen · 2019-11-09T20:01:13Z

If the checks were global and not per-chunk, that would make it very difficult to run queries concurrently, even between systems. Two queries which both access the same component type can easily still be disjoint about which entities they return, and the scheduler is aware of this and takes advantage of it to allow more systems to run concurrently.

@jaynus Currently in this PR, you can use the safe API even when two systems are going to be running concurrently. You only need to worry about safe vs unsafe code within a single system. If a system does not do nested queries or perform random accesses while simultaneously holding a mutable reference, then that system can be entirely written with the safe API. The sheduler ensures that you don't need to worry about what other systems are doing, but you still need to ensure that your own system is correct.

I did at one point (about a year ago) manage to get queries to conditionally require &World or &mut World depending on whether or not its View contained any mutable components. I ended up not using that code, and can't remember how I did it... I think it probably relied on specialization. It would be nice, though, as it would mean you could perform (immutable) random accesses inside (immutable) queries with 100% safe code.

Ralith · 2019-11-09T20:16:00Z

If the checks were global and not per-chunk, that would make it very difficult to run queries concurrently, even between systems. Two queries which both access the same component type can easily still be disjoint about which entities they return, and the scheduler is aware of this and takes advantage of it to allow more systems to run concurrently.

Which is exactly the sort of case a distinct unsafe API is for, since the scheduler can enforce the necessary guarantees itself while presenting a safe API to the user. Hence, if global checks make a safe/unsafe API distinction feasible, the result might be a usability sweet spot.

jaynus · 2019-11-09T20:40:17Z

I've just ported legion_transform and amethyst-legion to this new API.

The changes needed can be seen here:
jaynus/amethyst@21830ec
jaynus/legion_transform@c840a77

A few things become apparently that I missed previously:

The need for a mutable world breaks the use of closures. What I mean by this, is because we have to mutably move around and mutably borrow world with queries, this prevents the usage of function-chaining. So this will be an ergonomic cost. Seen here: jaynus/amethyst@21830ec#diff-182095cb1cfedbf598d569992241b122L62
Any type of manual threading and scoping inherently requires unsafe calls, as you say, within a single system. Seen here:jaynus/legion_transform@c840a77#diff-d93544654658cc70e288f16c70cbab69R113

This raises the question, that I don't really understand the reasoning: Queries require a mutable accessor because we cannot garuntee if it has Read or Write types, so inherently any type of access of the query is possibly mutable, so requires this exclusivity garunteed for the checked API's, correct?

Is there not a way we could still provide iter and iter_mut for the queries; effectively making all accessors Read on the iter case, and only allowing the writable members in the iter_mut case, thus allowing us to immutably borrow the world (safely, as its an immutable query) and that would allow all these unsafe checks to go away?

I think this would greatly help the ergonomics, and massively reduce the needed amount of unsafe calls while still holding the guarantee.

TomGillen · 2019-11-09T21:02:47Z

We could do that, but it will add half as many again functions to Query. Also, most useful queries will mutate at least one component, so I am not sure if we want the alternative function to be iter_mut or iter_readonly or some such.

I am 99% sure it is possible to have iter switch arguments depending on whether the View has any mutable accessors... on nightly with specialization. I am not sure if it is possible on stable; the difficult bit is conditionally implementing the view tuple macro, setting an associated type differently based on whether or not all tuple elements implement a readonly trait. Maybe it is possible with procedural macros? I am not sure, I have never written a proc macro, and I am not sure if they have access to the kind of type metadata you would need.

jaynus · 2019-11-09T21:07:03Z

so I am not sure if we want the alternative function to be iter_mut or iter_readonly or some such.

Alternatively, this is mainly for the case of read-only accessors which is the hot-path for legion, being rendering. Alternatively, how about we add the functions as explicit-immutable instead of mutable (reverse of the rusty-method, but more sensible in this case). This could either be as you say. I think as a simple stop-gap, would be to allow say fn iter_immutable(&World), which performs a single runtime-check at the construction of the iterator which checks View::writes == 0, and if thats the case, allow that call.

This would be a fairly simple addition, and I think meet the needs of the renderer paths in amethyst, and providing some more flexability without needing the unsafe code. I feel this is a good compromise until specialization lands.

It would also be nice to have a nightly feature branch we can expose the specialization in.

TomGillen · 2019-11-09T21:20:43Z

Well legion's "hot path" are query iterators as a whole, it should perform the same regardless of whether components within that are mutable or immutable.

which performs a single runtime-check at the construction of the iterator which checks View::writes == 0

We can do that statically with a ReadOnly marker trait on all of the immutable View types, and then only impl iter_immutable on Query<V, F> where Self::V: ReadOnly.

jaynus · 2019-11-09T21:25:29Z

We can do that statically with a ReadOnly marker trait on all of the immutable View types, and then only impl iter_immutable on Query<V, F> where Self::V: ReadOnly.

I like it. Can we do that?

TomGillen · 2019-11-09T21:51:40Z

Produces compile errors like this if you try and call it on a query that has mutable component access:

Improved public API soundess

c70eefc

Refactored world and query APIs to require &mut World in safe APIs and added _unchecked alternatives.

Ralith approved these changes Nov 9, 2019

View reviewed changes

Added *_immutable operators to queries

b2e8e49

Ralith mentioned this pull request Nov 11, 2019

[RFC Discussion] Legion ECS Evolution amethyst/rfcs#22

Open

caelunshun mentioned this pull request Nov 12, 2019

[WIP] Query random access APIs #32

Closed

2 tasks

TomGillen added 2 commits November 12, 2019 11:46

Merge branch 'master' into runtime-check-soundness

28f6965

Fixed query API tests

310e3aa

TomGillen merged commit 2eab55d into master Nov 12, 2019

TomGillen deleted the runtime-check-soundness branch November 12, 2019 12:11

TomGillen mentioned this pull request Nov 14, 2019

Random access APIs should be marked as unsafe #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved public API soundess #27

Improved public API soundess #27

TomGillen commented Nov 9, 2019

Ralith commented Nov 9, 2019

jaynus commented Nov 9, 2019

TomGillen commented Nov 9, 2019 •

edited

TomGillen commented Nov 9, 2019

Ralith commented Nov 9, 2019 •

edited

TomGillen commented Nov 9, 2019

Ralith commented Nov 9, 2019

TomGillen commented Nov 9, 2019

Ralith commented Nov 9, 2019

Ralith commented Nov 9, 2019

TomGillen commented Nov 9, 2019 •

edited

Ralith commented Nov 9, 2019

jaynus commented Nov 9, 2019

TomGillen commented Nov 9, 2019 •

edited

jaynus commented Nov 9, 2019 •

edited

TomGillen commented Nov 9, 2019

jaynus commented Nov 9, 2019

TomGillen commented Nov 9, 2019 •

edited

Improved public API soundess #27

Improved public API soundess #27

Conversation

TomGillen commented Nov 9, 2019

Ralith commented Nov 9, 2019

jaynus commented Nov 9, 2019

TomGillen commented Nov 9, 2019 • edited

TomGillen commented Nov 9, 2019

Ralith commented Nov 9, 2019 • edited

TomGillen commented Nov 9, 2019

Ralith commented Nov 9, 2019

TomGillen commented Nov 9, 2019

Ralith commented Nov 9, 2019

Ralith commented Nov 9, 2019

TomGillen commented Nov 9, 2019 • edited

Ralith commented Nov 9, 2019

jaynus commented Nov 9, 2019

TomGillen commented Nov 9, 2019 • edited

jaynus commented Nov 9, 2019 • edited

TomGillen commented Nov 9, 2019

jaynus commented Nov 9, 2019

TomGillen commented Nov 9, 2019 • edited

TomGillen commented Nov 9, 2019 •

edited

Ralith commented Nov 9, 2019 •

edited

TomGillen commented Nov 9, 2019 •

edited

TomGillen commented Nov 9, 2019 •

edited

jaynus commented Nov 9, 2019 •

edited

TomGillen commented Nov 9, 2019 •

edited