Disaggregate zone-based origin/destination data to specific points

Related tags

Utilities odjitter
Overview

odjitter

This crate contains an implementation of the ‘jittering’ technique for pre-processing origin-destination (OD) data. Jittering in a data visualisation context refers to the addition of “random noise to the data” to prevent points in graphs from overlapping, as described in by Wickham et al. (2016) and in the documentation page for the function geom_jitter().

In the context of OD data jittering refers to randomly moving start and end points associated with OD pairs, as described in an under review paper on the subject (Lovelace et al. under review). The technique is implemented in the function od_jitter() in the od R package. The functionality contained in this repo is an extended and much faster implementation: according to our benchmarks on a large dataset it was around 1000 times faster than the R implementation.

The crate is still a work in progress: the API may change. Issues and pull requests are particularly useful at this stage.

Installation

Install the package from the system command line as follows (you need to have installed and set-up cargo first):

cargo install --git https://github.com/dabreegster/odjitter

To check the package installation worked, you can run odjitter command without arguments. If it prints the following message congratulations, it works 🎉

odjitter
## error: The following required arguments were not provided:
##     --od-csv-path <OD_CSV_PATH>
##     --zones-path <ZONES_PATH>
##     --output-path <OUTPUT_PATH>
##     --max-per-od <MAX_PER_OD>
## 
## USAGE:
##     odjitter [OPTIONS] --od-csv-path <OD_CSV_PATH> --zones-path <ZONES_PATH> --output-path <OUTPUT_PATH> --max-per-od <MAX_PER_OD>
## 
## For more information try --help

Usage

To run algorithm you need a minimum of three inputs, examples of which are provided in the data/ folder of this repo:

  1. A .csv file containing OD data with two columns containing zone IDs (specified with --origin-key=geo_code1 --destination-key=geo_code2 by default) and other columns representing trip counts:
geo_code1 geo_code2 all from_home train bus car_driver car_passenger bicycle foot other
S02001616 S02001616 82 0 0 3 6 0 2 71 0
S02001616 S02001620 188 0 0 42 26 3 11 105 1
S02001616 S02001621 99 0 0 13 7 3 15 61 0
  1. A .geojson file representing zones that contains values matching the zone IDs in the OD data (the field containing zone IDs is specified with --zone-name-key=InterZone by default):
head -6 data/zones.geojson
## {
## "type": "FeatureCollection",
## "name": "zones_min",
## "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
## "features": [
## { "type": "Feature", "properties": { "InterZone": "S02001616", "Name": "Merchiston and Greenhill", "TotPop2011": 5018, "ResPop2011": 4730, "HHCnt2011": 2186, "StdAreaHa": 126.910911, "StdAreaKm2": 1.269109, "Shape_Leng": 9073.5402482000009, "Shape_Area": 1269109.10155 }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ -3.2040366, 55.9333372 ], [ -3.2036354, 55.9321624 ], [ -3.2024036, 55.9321874 ], [ -3.2019838, 55.9315586 ], [ -3.2005071, 55.9317411 ], [ -3.199902, 55.931113 ], [ -3.2033504, 55.9308279 ], [ -3.2056319, 55.9309507 ], [ -3.2094979, 55.9308666 ], [ -3.2109753, 55.9299985 ], [ -3.2107073, 55.9285904 ], [ -3.2124928, 55.927854 ], [ -3.2125633, 55.9264661 ], [ -3.2094928, 55.9265616 ], [ -3.212929, 55.9260741 ], [ -3.2130774, 55.9264384 ], [ -3.2183973, 55.9252709 ], [ -3.2208941, 55.925282 ], [ -3.2242732, 55.9258683 ], [ -3.2279975, 55.9277452 ], [ -3.2269867, 55.928489 ], [ -3.2267625, 55.9299817 ], [ -3.2254561, 55.9307854 ], [ -3.224148, 55.9300725 ], [ -3.2197791, 55.9315472 ], [ -3.2222706, 55.9339127 ], [ -3.2224909, 55.934809 ], [ -3.2197844, 55.9354692 ], [ -3.2204535, 55.936195 ], [ -3.218362, 55.9368806 ], [ -3.2165749, 55.937069 ], [ -3.215582, 55.9380761 ], [ -3.2124132, 55.9355465 ], [ -3.212774, 55.9347972 ], [ -3.2119068, 55.9341947 ], [ -3.210138, 55.9349668 ], [ -3.208051, 55.9347716 ], [ -3.2083105, 55.9364224 ], [ -3.2053546, 55.9381495 ], [ -3.2046077, 55.9395298 ], [ -3.20356, 55.9380951 ], [ -3.2024323, 55.936318 ], [ -3.2029121, 55.935831 ], [ -3.204832, 55.9357555 ], [ -3.2040366, 55.9333372 ] ] ] ] } },
  1. A .geojson file representing a transport network from which origin and destination points are sampled
head -6 data/road_network.geojson
## {
## "type": "FeatureCollection",
## "name": "road_network_min",
## "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
## "features": [
## { "type": "Feature", "properties": { "osm_id": "3468", "name": "Albyn Place", "highway": "tertiary", "waterway": null, "aerialway": null, "barrier": null, "man_made": null, "access": null, "bicycle": null, "service": null, "z_order": 4, "other_tags": "\"lit\"=>\"yes\",\"lanes\"=>\"3\",\"maxspeed\"=>\"20 mph\",\"sidewalk\"=>\"both\",\"lanes:forward\"=>\"2\",\"lanes:backward\"=>\"1\"" }, "geometry": { "type": "LineString", "coordinates": [ [ -3.207438, 55.9533584 ], [ -3.2065953, 55.9535098 ] ] } },

The jitter function requires you to set the maximum number of trips for all trips in the jittered result. A value of 1 will create a line for every trip in the dataset, a value above the maximum number of trips in the ‘all’ column in the OD ata will result in a jittered dataset that has the same number of desire lines (the geographic representation of OD pairs) as in the input (50 in this case).

With reference to the test data in this repo, you can run the jitter command line tool as follows:

odjitter --od-csv-path data/od.csv \
  --zones-path data/zones.geojson \
  --subpoints-path data/road_network.geojson \
  --max-per-od 50 --output-path output_max50.geojson
## Scraped 7 zones from data/zones.geojson
## Scraped 5073 subpoints from data/road_network.geojson
## Disaggregating OD data
## Wrote output_max50.geojson

Try running it with a different max-per-od value (10 in the command below):

odjitter --od-csv-path data/od.csv \
  --zones-path data/zones.geojson \
  --subpoints-path data/road_network.geojson \
  --max-per-od 10 --output-path output_max10.geojson
## Scraped 7 zones from data/zones.geojson
## Scraped 5073 subpoints from data/road_network.geojson
## Disaggregating OD data
## Wrote output_max10.geojson

Outputs

The figure below shows the output of the jitter commands above visually, with the left image showing unjittered results with origins and destinations going to zone centroids (as in many if not most visualisations of desire lines between zones), the central image showing the result after setting max-per-od argument to 50, and the right hand figure showing the result after setting max-per-od to 10.

Note: odjitter uses a random number generator to sample points, so the output will change each time you run it, unless you set the rng-seed, as documented in the next section.

Details

For full details on odjitter’s arguments run odjitter --help which gives the following output:

odjitter --help
## odjitter 0.1.0
## Dustin Carlino <[email protected]
## Disaggregate origin/destination data from zones to points
## 
## USAGE:
##     odjitter [OPTIONS] --od-csv-path <OD_CSV_PATH> --zones-path <ZONES_PATH> --output-path <OUTPUT_PATH> --max-per-od <MAX_PER_OD>
## 
## OPTIONS:
##         --all-key <ALL_KEY>
##             Which column in the OD row specifies the total number of trips to disaggregate?
##             [default: all]
## 
##         --destination-key <DESTINATION_KEY>
##             Which column in the OD row specifies the zone where trips ends? [default: geo_code2]
## 
##     -h, --help
##             Print help information
## 
##         --max-per-od <MAX_PER_OD>
##             What's the maximum number of trips per output OD row that's allowed? If an input OD row
##             contains less than this, it will appear in the output without transformation. Otherwise,
##             the input row is repeated until the sum matches the original value, but each output row
##             obeys this maximum
## 
##         --min-distance-meters <MIN_DISTANCE_METERS>
##             Guarantee that jittered points are at least this distance apart [default: 1.0]
## 
##         --od-csv-path <OD_CSV_PATH>
##             The path to a CSV file with aggregated origin/destination data
## 
##         --origin-key <ORIGIN_KEY>
##             Which column in the OD row specifies the zone where trips originate? [default:
##             geo_code1]
## 
##         --output-path <OUTPUT_PATH>
##             The path to a GeoJSON file where the disaggregated output will be written
## 
##         --rng-seed <RNG_SEED>
##             By default, the output will be different every time the tool is run, based on a
##             different random number generator seed. Specify this to get deterministic behavior,
##             given the same input
## 
##         --subpoints-path <SUBPOINTS_PATH>
##             The path to a GeoJSON file with subpoints to sample from. If this isn't specified,
##             random points within each zone will be used instead
## 
##     -V, --version
##             Print version information
## 
##         --zone-name-key <ZONE_NAME_KEY>
##             In the zones GeoJSON file, which property is the name of a zone [default: InterZone]
## 
##         --zones-path <ZONES_PATH>
##             The path to a GeoJSON file with named zones

References

Lovelace, Robin, Rosa Félix, and Dustin Carlino Under Review Jittering: A Computationally Efficient Method for Generating Realistic Route Networks from Origin-Destination Data. TBC.

Wickham, Hadley 2016 ggplot2: Elegant Graphics for Data Analysis. 2nd ed. 2016 edition. New York, NY: Springer.

Comments
  • Add new `subpoint_origins` and `subpoint_destinations` optional arguments

    Add new `subpoint_origins` and `subpoint_destinations` optional arguments

    In many OD datasets the locations of destinations (e.g. work places, shops, schools) are different than the locations of the origins (e.g. residential buildings). Some destinations attract more trips than others, so weighting values are probably also needed.

    Based on input data in #8, I imagine this could work something like this:

    odjitter --od-csv-path data/od_school.csv \
      --zones-path data/zones.geojson \
      --subpoints_destinations data/schools.geojson \
      --weight_key_destinations weight \
      --all-key car \
      --max-per-od 10 --output-path output_to_schools_max_10.geojson
    

    A naive approach, that I think should at least provide an output (but errors when I try it) is as follows:

    odjitter --od-csv-path data/od_school.csv \
      --zones-path data/zones.geojson \
      --subpoints-path data/schools.geojson \
      --max-per-od 50 --output-path output_max50.geojson
    

    Illustration of what the output could look like (with --max-per-od 1000 in this case):

    image

    opened by Robinlovelace 17
  • R odjitter in Windows

    R odjitter in Windows

    Hi,

    In Win 10, there is apparently an error of reading/writing the temporary od_jittered.geojson file.

    library(od) #just to get a small data input
    library(odjitter)
    #> 
    #> Attaching package: 'odjitter'
    #> The following object is masked from 'package:base':
    #> 
    #>     jitter
    
    od_all_jittered = odjitter::jitter(
      od = od_data_df,
      zones = od_data_zones_min,
      subpoints = sf::st_sample(od_data_zones_min, 200)
    )
    #> Warning in system(msg): 'odjitter' not found
    #> Error: Cannot open "C:\Users\UTILIZ~1\AppData\Local\Temp\RtmpeWTALa/od_jittered.geojson"; The file doesn't seem to exist.
    

    Created on 2022-03-25 by the reprex package (v2.0.1)

    The error is persistent between runs/data input.

    Error: Cannot open "C:\Users\UTILIZ~1\AppData\Local\Temp\RtmpC08uIU/od_jittered.geojson"; The file doesn't seem to exist.

    opened by temospena 13
  • Update README

    Update README

    This should probably be seen in context of broader meta-issue on documentation but opening this after changes in #11. My plan is to:

    • [x] Switch from .Rmd source to .qmd for source to reduce dependencies
    • [x] Tidy-up (no leftover files)
    • [x] Explain and demonstrate with a reproducible example the use of new subpoints arguments using the schools dataset

    Can work on this later today but happy to hold horses if other features, e.g. addition of weight_key_origins and weight_key_destinations arguments, are in the pipeline. Note to self: most recent version of the pkg has the following arguments:

    odjitter --help
    odjitter 0.1.0
    Dustin Carlino <[email protected]
    Disaggregate origin/destination data from zones to points
    
    USAGE:
        odjitter [OPTIONS] --od-csv-path <OD_CSV_PATH> --zones-path <ZONES_PATH> --output-path <OUTPUT_PATH> --disaggregation-threshold <DISAGGREGATION_THRESHOLD>
    
    OPTIONS:
            --destination-key <DESTINATION_KEY>
                Which column in the OD row specifies the zone where trips ends? [default: geo_code2]
    
            --disaggregation-key <DISAGGREGATION_KEY>
                Which column in the OD row specifies the total number of trips to disaggregate?
                [default: all]
    
            --disaggregation-threshold <DISAGGREGATION_THRESHOLD>
                What's the maximum number of trips per output OD row that's allowed? If an input OD row
                contains less than this, it will appear in the output without transformation. Otherwise,
                the input row is repeated until the sum matches the original value, but each output row
                obeys this maximum
    
        -h, --help
                Print help information
    
            --min-distance-meters <MIN_DISTANCE_METERS>
                Guarantee that jittered points are at least this distance apart [default: 1.0]
    
            --od-csv-path <OD_CSV_PATH>
                The path to a CSV file with aggregated origin/destination data
    
            --origin-key <ORIGIN_KEY>
                Which column in the OD row specifies the zone where trips originate? [default:
                geo_code1]
    
            --output-path <OUTPUT_PATH>
                The path to a GeoJSON file where the disaggregated output will be written
    
            --rng-seed <RNG_SEED>
                By default, the output will be different every time the tool is run, based on a
                different random number generator seed. Specify this to get deterministic behavior,
                given the same input
    
            --subpoints-destinations-path <SUBPOINTS_DESTINATIONS_PATH>
                The path to a GeoJSON file to use for sampling subpoints for destination zones. If this
                isn't specified, random points within each zone will be used instead
    
            --subpoints-origins-path <SUBPOINTS_ORIGINS_PATH>
                The path to a GeoJSON file to use for sampling subpoints for origin zones. If this isn't
                specified, random points within each zone will be used instead
    
        -V, --version
                Print version information
    
            --zone-name-key <ZONE_NAME_KEY>
                In the zones GeoJSON file, which property is the name of a zone [default: InterZone]
    
            --zones-path <ZONES_PATH>
                The path to a GeoJSON file with named zones
    
    opened by Robinlovelace 9
  • Command to fully disaggregate

    Command to fully disaggregate

    cargo run -- disaggregate --od-csv-path data/od.csv --zones-path data/zones.geojson --output-path output_individual.geojson

    Output looks like this:

    {"geometry":{"coordinates":[[-3.203857147334649,55.95213138764797],[-3.222935941651701,55.95172951209746]],"type":"LineString"},"properties":{"mode":"bus"},"type":"Feature"},
    {"geometry":{"coordinates":[[-3.221598670781587,55.951891527310494],[-3.2243560653816594,55.947639333117095]],"type":"LineString"},"properties":{"mode":"bus"},"type":"Feature"},
    {"geometry":{"coordinates":[[-3.2315398898250978,55.94935689855381],[-3.22343974294507,55.9478142626001]],"type":"LineString"},"properties":{"mode":"bus"},"type":"Feature"},
    
    opened by dabreegster 5
  • Support weighted subpoints #7

    Support weighted subpoints #7

    Adds flags --weight-key-destinations and --weight-key-origins. I haven't tested this manually with the schools dataset or figured out how to sanely unit test it (fix the RNG seed and plug in a very high weight for one school and tiny for the others?). So up to you if we proceed, or if you have some idea for validation

    opened by dabreegster 5
  • `odjitter` crashing with big OD data and `--max-per-od 1`

    `odjitter` crashing with big OD data and `--max-per-od 1`

    Hi, @dabreegster .

    I am trying to use odjitter with a subset of the São Paulo OD data and it is crashing when I set --max-per-od 1. It works fine when I try with --max-per-od 100 and --max-per-od 10. My PC freezes in the process, so it is probably a RAM usage related problem -- I have a core i5 6th gen with 8GB running Ubuntu 20.04.3 LTS.

    Here is a reproducible example (using R):

    piggyback::pb_download(file = "zones_sp_center.geojson", 
                           repo = "spstreets/OD2017"
                           )
    
    piggyback::pb_download(file = "od_sp_center.csv",
                           repo = "spstreets/OD2017"
                           )
    
    system("odjitter --od-csv-path ./od_sp_center.csv --zones-path ./zones_sp_center.geojson --max-per-od 1 --output-path result.geojson")
    
    # Scraped 114 zones from ./zones_sp_center.geojson
    # Disaggregating OD data
    # Killed
    
    opened by lucasccdias 3
  • Document jittering of OD data in which origins and destinations are different

    Document jittering of OD data in which origins and destinations are different

    Sometimes origin zones are different from destinations. odjitter will still work in this case, but pre-processing is needed.

    • [ ] Description of the data
    • [ ] Example input dataset
    • [ ] Reproducible example
    • [ ] Tests?
    opened by Robinlovelace 2
  • Port R interface into this repo

    Port R interface into this repo

    Currently we have a basic but seemingly effective and tested (by me) R interface based on system calls: https://github.com/atumworld/odrust

    At some point it would be good to switch from using system calls to using the rextendr interface framework, a separate issue #22

    This issue aims at cohesion, so that all odjitter code, in any language, is in one easy to find place.

    I'm happy to do the port and first thought is to put the R bindings/package into a new subfolder called simply r/, mirroring the approach in https://github.com/apache/arrow/tree/master/r and with a view to there being at some point Python bindings in a py/ subfolder, linking to #23.

    opened by Robinlovelace 1
  • Separate origin/destination subpoints

    Separate origin/destination subpoints

    This splits the --subpoints-path flag into separate --subpoints-origins-path and --subpoints-destinations-path flags. If either one isn't specified, the tool falls back to picking random points instead.

    No support for weighted subpoints yet; I'll do that separately. There was some other cleanup to do first, so this PR is already big

    opened by dabreegster 1
  • Usage from R on PowerShell

    Usage from R on PowerShell

    Just tested on a new installation on Windows and it fails:

    > library(odjitter)
    
    Attaching package: ‘odjitter’
    
    The following object is masked from ‘package:base’:
    
        jitter
    
    > #> 
    > #> Attaching package: 'odjitter'
    > #> The following object is masked from 'package:base':
    > #> 
    > #>     jitter
    > od = readr::read_csv("https://github.com/dabreegster/odjitter/raw/main/data/od.csv")
    Rows: 49 Columns: 11                                                   
    ── Column specification ─────────────────────────────────────────────────
    Delimiter: ","
    chr (2): geo_code1, geo_code2
    dbl (9): all, from_home, train, bus, car_driver, car_passenger, bicyc...
    
    ℹ Use `spec()` to retrieve the full column specification for this data.
    ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    > #> Rows: 49 Columns: 11
    > #> ── Column specification ───────────────────────────────────────────────────────────
    > #> Delimiter: ","
    > #> chr (2): geo_code1, geo_code2
    > #> dbl (9): all, from_home, train, bus, car_driver, car_passenger, bicycle, foo...
    > #> 
    > #> ℹ Use `spec()` to retrieve the full column specification for this data.
    > #> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    > zones = sf::read_sf("https://github.com/dabreegster/odjitter/raw/main/data/zones.geojson")
    > names(zones)[1] = "geo_code"
    > road_network = sf::read_sf("https://github.com/dabreegster/odjitter/raw/main/data/road_network.geojson")
    > od_unjittered = od::od_to_sf(od, zones)
    0 origins with no match in zone ids
    0 destinations with no match in zone ids
     points not in od data removed.
    > #> 0 origins with no match in zone ids
    > #> 0 destinations with no match in zone ids
    > #>  points not in od data removed.
    > set.seed(42) # for reproducibility
    > od_jittered = jitter(od, zones, subpoints = road_network)
    Error in system("odjitter --help", intern = TRUE) : 'odjitter' not found
    

    Solution: something like this:

    system(r"(powershell C:\Users\geoevid\.cargo\bin\odjitter.exe)")
    
    opened by eugenividal 0
  • Rename arguments?

    Rename arguments?

    Currently there are two potentially misleading argument names:

    ## OPTIONS:
    ##         --all-key <ALL_KEY>
    ##             Which column in the OD row specifies the total number of trips to disaggregate?
    ##             [default: all]
    ## 
    ...
    ## 
    ##         --max-per-od <MAX_PER_OD>
    ##             What's the maximum number of trips per output OD row that's allowed? If an input OD row
    ##             contains less than this, it will appear in the output without transformation. Otherwise,
    ##             the input row is repeated until the sum matches the original value, but each output row
    ##             obeys this maximum
    

    Misleading because it's too specific. Really the first is the name of the column used to determine if an OD pair should be disaggregated and into how many 'sub-OD pairs'. The second can be described simply as the disaggregation threshold. I'm not precious about this and open minded to other options including keeping it as is. Plan to put in a PR as the basis for informed conversation on this.

    opened by Robinlovelace 0
  • Example code fails

    Example code fails

    Hi. The code given in the vignette is not working for me (running R 4.2.0, RStudio 2022.07.1).

    > library(odjitter)
    Attaching package: ‘odjitter’
    The following object is masked from ‘package:base’:
        jitter
    > od <- readr::read_csv("https://github.com/dabreegster/odjitter/raw/main/data/od.csv")
    Rows: 49 Columns: 11                                                                                                                     
    ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    Delimiter: ","
    chr (2): geo_code1, geo_code2
    dbl (9): all, from_home, train, bus, car_driver, car_passenger, bicycle, foot, other
    ℹ Use `spec()` to retrieve the full column specification for this data.
    ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    > zones = sf::read_sf("https://github.com/dabreegster/odjitter/raw/main/data/zones.geojson")
    > names(zones)[1] = "geo_code"
    > road_network = sf::read_sf("https://github.com/dabreegster/odjitter/raw/main/data/road_network.geojson")
    > od_unjittered = od::od_to_sf(od, zones)
    0 origins with no match in zone ids
    0 destinations with no match in zone ids
     points not in od data removed.
    > set.seed(42) # for reproducibility
    > od_jittered <- jitter(od, zones, subpoints = road_network)
    Error in system(paste0(odjitter_location, " --help"), intern = TRUE) : 
      'odjitter' not found
    

    Thanks for any help.

    opened by blackburnstat 1
  • Jittering fails when input zones.geojson file contains mixxed geometry types

    Jittering fails when input zones.geojson file contains mixxed geometry types

    This was the cause of errors that were driving me insane, no need for a fix on the Rust side and it's an edge case that can be seen as an issue with dodgy input data my side. Still wanted to document it here while it's fresh in my head.

    Input zone object that failed looked broadly like this:

    Simple feature collection with 14 features and 2 fields
    Geometry type: GEOMETRY
    Dimension:     XY
    Bounding box:  xmin: -6.918001 ymin: 53.00297 xmax: -6.71583 ymax: 53.27975
    Geodetic CRS:  WGS 84
    # A tibble: 14 × 3
       geo_code  social                                                                                             geometry
     * <chr>      <dbl>                                                                                       <GEOMETRY [°]>
     1 o06065     1118. MULTIPOLYGON (((-6.71583 53.16379, -6.716134 53.16175, -6.728203 53.15357, -6.731625 53.15206, -6...
     2 o06029      337. MULTIPOLYGON (((-6.828214 53.01617, -6.834301 53.0142, -6.846945 53.0165, -6.851927 53.02025, -6....
     3 o06075      865. MULTIPOLYGON (((-6.82495 53.27751, -6.821514 53.27175, -6.82003 53.27001, -6.821732 53.2688, -6.8...
     4 d35402922     1  POLYGON ((-6.747552 53.1533, -6.747565 53.15278, -6.747606 53.15226, -6.747675 53.15174, -6.74777...
    

    Note MULTIPOLYGON objects suddenly switch to POLYGON objects. Attached is a .zip of a reproducible example that worked post simply changing the geometry type. test-data.zip

    opened by Robinlovelace 2
  • Python interface

    Python interface

    We already have a simple R interface, one that relies on system calls rather than the more sophisticated 'rextendr' approach #22: https://github.com/atumworld/odrust

    It would be good to have a Python interface. Any Python developers out there very welcome to help out with this!

    help wanted 
    opened by Robinlovelace 3
  • Consider an rextendr interface

    Consider an rextendr interface

    @Robinlovelace, I want to understand the current friction of R calling odjitter by command-line. https://github.com/atumworld/odrust/blob/main/R/odr_jitter.R is how it works today, correct?

    Is the problem...

    1. Having to compile the Rust tool on a target system? (#6 solves if so)
    2. Packaging for CRAN and having a dependency on any extra binary tool?
    3. Slow to write input CSV or zone geojson?
    4. Slow to read the output geojson? (We can look at geopackage, flatgeobuf, etc if so)

    or something else?

    opened by dabreegster 2
  • Consider adding departure time

    Consider adding departure time

    If the CSV input has something like a departure_seconds column, the jittered output could have this too. For each output row, the departure time would be jittered somehow -- maybe a uniform or normal distribution centered around the input time? We would need extra config and flags (with default values) to specify all of this.

    @lucasccdias, is my understanding correct? How specifically would you want to jitter departure_seconds?

    I'm hesitant to add this feature, because I'm not convinced that it will be easy for the user to learn and specify a bunch of extra command-line flags to say how they want to transform time. Instead, why couldn't they add departure time on their end? In other words:

    1. Write the desire line CSV file
    2. Call odjitter on it
    3. Read the output GeoJSON file and add a departure time property, using whatever logic they want

    One question is how departure time is determined. Does any of the input desire line data have something like this? How is it specified -- maybe just the hour range that a bunch of trips go from zone1 to zone2? If so, maybe what we should instead do is make it easy to match up the jittered GeoJSON output with the original input, and have some kind of lookup key, or just copy over the departure time property, and do the jittering on that elsewhere.

    And backing up a little more, I think the motivation for this feature request was to generate A/B Street scenarios, either with abstr or not. If so, it could be helpful to understand how we want to do that. Part of odjitter input is weighted subpoints, and there's more in-progress code within A/B Street to generate these weights for the exact purpose of creating scenarios. If our ultimate aim is to create scenarios from raw desire line data, we have a spectrum of options how to do it -- some of them using odjitter, some of them directly calling this other pipeline.

    opened by dabreegster 2
  • Sanit checking of weighted results

    Sanit checking of weighted results

    There are some statistical sanity checks in #7 but I've just done some more sanity checks and the results are good. Summary of them below. Three schools in one zone:

    image

    With weights, how many trips have destinations at each?

    Strong near-linear positive relationship between n_trips and weight:

    image

    opened by Robinlovelace 4
Owner
Dustin Carlino
Speculative cartographer
Dustin Carlino
Doku is a framework for building documentation with code-as-data methodology in mind.

Doku is a framework for building documentation with code-as-data methodology in mind. Say goodbye to stale, hand-written documentation - with D

ANIXE 73 Nov 28, 2022
Utilities to gather data out of roms. Written in Rust. It (should) support all types.

snesutilities Utilities to gather data out of roms. Written in Rust. It (should) support all types. How Have a look at main.rs: use snesutilities::Sne

Layle | Luca 5 Oct 12, 2022
A VtubeStudio plugin that allows iFacialMocap to stream data to the app, enabling full apple ARkit facial tracking to be used for 2D Vtuber models.

facelink_rs A VtubeStudio plugin that allows iFacialMocap to stream data to the app, enabling full apple ARkit facial tracking to be used for 2D Vtube

Slashscreen 2 May 6, 2022
A real-time event-oriented data-hub

Redcar A real-time event-oriented data-hub, inspired by the data hub. It is: Universal: the front end uses gRPC to provide services. Fast: benchmarked

null 6 Mar 2, 2022
Parses COVID-19 testing data from DC government ArcGIS APIs

covid-dc Parses COVID-19 testing data from DC government ArcGIS APIs Example debug output from cargo run RapidSite { attributes: RapidSiteAttribut

Mike Morris 1 Jan 8, 2022
Code for connecting an RP2040 to a Bosch BNO055 IMU and having the realtime orientation data be sent to the host machine via serial USB

Code for connecting an RP2040 (via Raspberry Pi Pico) to a Bosch BNO055 IMU (via an Adafruit breakout board) and having the realtime orientation data be sent to the host machine via serial USB.

Gerald Nash 3 Nov 4, 2022
Rust libraries for working with GPT (GUID Partition Table) disk data

gpt-disk-rs no_std libraries related to GPT (GUID Partition Table) disk data. There are three Rust packages in this repository: uguid The uguid packag

Google 25 Dec 24, 2022
Stdto provides a set of functional traits for conversion between various data representations.

Stdto stdto provides a set of functional traits for conversion between various data representations. | Examples | Docs | Latest Note | stdto = "0.13.0

Doha Lee 5 Dec 21, 2022
Code examples, data structures, and links from my book, Rust Atomics and Locks.

This repository contains the code examples, data structures, and links from Rust Atomics and Locks. The examples from chapters 1, 2, 3, and 8 can be f

Mara Bos 338 Jan 6, 2023
A naive buffered/sync channel implementation in Rust, using the queue data structure

buffered-queue-rs Introduction This is my attempt at a simple and very naive buffered/synced queue implementation in Rust. The base thread-safe queue

Dhruv 4 Jul 22, 2023
Transfer data with the LCU in the command-line interface! 🖥

llux llux (short for LCU Lux) is an open-source CLI tool to easily transfer data with the LCU without any script or code setup, hehe! Preview Download

Blossomi Shymae 5 Jul 30, 2023
Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI

s3-utils Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI. This tool contains a small set of command line utilities for

Isaac Whitfield 47 Dec 15, 2022
A DIY, IMU-based skateboard activity tracker

tracksb A DIY, IMU-based skateboard activity tracker. The idea is to come up with algorithms to track activity during skateboarding sessions. A compan

null 21 May 5, 2022
A tiling window manager for Windows 10 based on binary space partitioning

yatta BSP Tiling Window Manager for Windows 10 Getting Started This project is still heavily under development and there are no prebuilt binaries avai

Jade 143 Nov 12, 2022
A high level diffing library for rust based on diffs

Similar: A Diffing Library Similar is a dependency free crate for Rust that implements different diffing algorithms and high level interfaces for it.

Armin Ronacher 617 Dec 30, 2022
A low-ish level tool for easily writing and hosting WASM based plugins.

A low-ish level tool for easily writing and hosting WASM based plugins. The goal of wasm_plugin is to make communicating across the host-plugin bounda

Alec Deason 62 Sep 20, 2022
wasm actor system based on lunatic

Wactor WASM actor system based on lunatic. Actors run on isolated green threads. They cannot share memory, and communicate only through input and outp

Noah Corona 25 Nov 8, 2022
a hobby OS for x86_64 based on MikanOS.

a hobby OS for x86_64 based on MikanOS.

algon 22 Dec 29, 2022
Nannou/Rust tutorial based on Schotter by Georg Nees

Schotter (German for gravel) is a piece by computer art pioneer Georg Nees. It consists of a grid of squares 12 across and 22 down with random rotation and displacement that increases towards the bottom.

null 101 Dec 27, 2022