This RFC proposes a convention for structuring methods in SciRust
which can cater to the conflicting needs of efficiency, easy of use
and effective error handling.
For the impatient:
// Efficient access without bound checks
unsafe fn get_unchecked(&self, r : usize, c : usize) -> T;
// Safe access with bound checks, raises error if invalid address
fn get_checked(&self, r : usize, c : usize) -> Result<T, Error>;
// User friendly version. Panics in case of error
fn get(&self, r : usize, c : usize) -> T;
// Efficient modification without bound checks
unsafe fn set_unchecked(&mut self, r : usize, c : usize, value : T);
// Safe modification with bound check
fn set(&mut self, r : usize, c : usize, value : T);
Detailed discussion
The audience of SciRust can be possibly divided into
two usage scenarios.
- A script style usage, where the objective is to quickly
do some numerical experiment, get the results and analyze them.
- A library development usage, where more professional libraries
would be built on top of fundamental building blocks provided
by SciRust (these may be other modules shipped in SciRust itself).
While the first usage scenario is important for getting new users hooked
to the library, the second usage scenario is also important for justifying
why Rust should be used for scientific software development compared
to other scientific computing platforms.
In context of the two usage scenarios, the design of SciRust has three conflicting goals:
- Ease of use
- Efficiency
- Well managed error handling
While ease of use is important for script style usage,
efficiency and well managed error handling are important
for serious software development on top of core components
provided by SciRust.
We will consider the example of a get(r,c)
method
on a matrix object to discuss these conflicting goals.
Please note that get
is just a representative method
for this discussion. The design ideas can be applied in
many different parts of SciRust once accepted.
If get
is being called in a loop, usually the code
around it can ensure that the conditions for accessing
data within the boundary of the matrix are met correctly.
Thus, a bound checking within the implementation of get
is just an extra overhead.
While this design is good for writing efficient software,
it can lead to a number of memory related bugs and goes
against the fundamental philosophy of Rust (Safety first).
There are actually two different options for error handling:
- Returning either
Option<T>
or Result<T, Error>
.
- Using the
panic
mechanism.
Option<T>
or Result<T, Error>
provides the users a
fine grained control over what to do when an error occurs.
This is certainly the Rusty way of doing things. At the
same time, both of these return types make the user code
more complicated. One has to add extra calls to .unwrap()
even if one is sure that the function is not going to fail.
Users of scientific computing tend to prefer an environment
where they can get more work done with less effort. This is
a reason of the success of specialized environments like
MATLAB. Open source environments like Python (NumPy, SciPy)
try to achieve something similar.
While SciRust doesn't intend to compete at the level of
simplicity provided by MATLAB/Python environments, it does
intend to take an extra effort wherever possible to address
the ease of use goal.
In this context, the return type of a getter
should
be just the value type T
. This can be achieved
safely by using a panic if the access boundary
conditions are not met.
The discussion above suggests up to 3 possible ways of
implementing methods like get
.
- An unchecked (and unsafe) version for high efficiency code
where the calling code is responsible for ensuring that
the necessary requirements for correct execution of the
method are being met.
- A safe version which returns either
Option<T>
or
Result<T, Error>
which can be used for professional
software development where the calling code has full control
over error handling.
- Another safe version which panics in case of error but provides
an API which is simpler to use for writing short scientific
computing scripts.
Proposed convention
We propose that a method for which these variations
need to be supported, should follow the convention defined below:
- A
method_unchecked
version should provide basic implementation
of the method. This should assume that necessary conditions
for successful execution of the methods are already being
ensured by the calling code. The unchecked version of method
MUST be marked unsafe
. This ensures that the calling code
knows that it is responsible for ensuring the right conditions
for calling the unchecked method.
- A
method_checked
version should be implemented on top of
a method_unchecked
method. The checked version should
check for all the requirements for calling the method safely.
The return type should be either Option<T>
or
Result<T, Error>
. In case the required conditions for
calling the method are not met, a None
or Error
should be returned. Once the required conditions are met,
method_unchecked
should be called to get the result
which would be wrapped inside Option
or Result
.
- A
method
version should be built on top of method_checked
version.
It should simply attempt to unwrap
the value returned by method_checked
and return as T
.
If method_checked
returns an error or None, this version
should panic.
First two versions are suitable for professional development
where most of the time we need a safe API while at some times
we need an unsafe API for efficient implementation.
The third version is suitable for script style usage scenario.
The convention has been illustrated in the three versions of
get
at the beginning of this document.
API bloat
While this convention is expected to lead into an API bloat,
but if the convention is followed properly across the library,
then it should be easy to follow (both from the perspective
of users of the library and from the perspective of developers
of the library).
RFC