OpenStreetMap flatdata format and compiler

Overview

osmflat

berlin-features

Flat OpenStreetMap (OSM) data format providing an efficient random data access through memory mapped files.

The data format is described and implemented in flatdata. The schema describes the fundamental OSM data structures: nodes, ways, relations and tags as simple non-nested data structures. The relations between these are expressed through indexes.

Compiler

Besides the library for working with osmflat archives, the crate osmflatc contains an OSM pbf format to osmflat data compiler.

To compile OSM data from pbf to osmflat use:

cargo run --release -- input.osm.pbf output.osm.flatdata

The output is a flatdata which is a directory consisting of several files. The schema is also part of the archive. It is checked every time the archive is opened. This guarantees that the compiler which was used to produce the archive fits to the schema used for reading it. The archive data is not compressed.

Using data

You can use any flatdata supported language for reading an osmflat archive. For reading the data in Rust, we provide the osmflat crate.

First, add this to your Cargo.toml:

[dependencies]
osmflat = "0.1.0"

Now, you can open an osmflat archive as any other flatdata archive and read its data:

use osmflat::{FileResourceStorage, Osm};

fn main() {
    let storage = FileResourceStorage::new("path/to/archive.osm.flatdata");
    let archive = Osm::open(storage).unwrap();

    for node in archive.nodes().iter() {
        println!("{:?}", node);
    }
}

Examples

Check the osmflat/examples directory. Feel free to add another example, if you have an idea what to do with the amazing OSM data in few lines of code. ๐Ÿ˜

The above map was rendered by osmflat/examples/roads2png.rs in ~ 170 loc from the osmflat archive based on the latest Berlin OSM data.

License

The files src/proto/fileformat.proto and src/proto/osmformat.proto are copies from the OSM-binary project and are under the LGPLv3 license.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this document by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Comments
  • UnexpectedDataSize Error loading europe-latest

    UnexpectedDataSize Error loading europe-latest

    I have compiled europe-latest.osm.pbf as follows osmflatc europe-latest.osm.pbf europe-latest.osm.flatdata.

    When loading europe-latest.osm.flatdata I get the following error: Err value: UnexpectedDataSize

    Smaller data sets, e.g. bremen-latest, work fine for me.

    opened by RoffelKartoffel 7
  • Europe compilation is slow when there is not enough RAM

    Europe compilation is slow when there is not enough RAM

    When compiling Europe (tried on machines with 8GB and 16GB RAM), the node_id to node_index mapping (osm node id is mapped to position/index in nodes flatdata vector) grows over the memory limit and is swapped. Nodes compilation finishes, however ways compilation is stuck due to random access in mapping data.

    One possible solution would be to change the underlying mapping data structure such that it is accessed in a similar pattern to ways compilation.

    opened by boxdot 6
  • lat and lon only need to be 32 bits

    lat and lon only need to be 32 bits

    lat and lon in the schema are defined as 40 bytes each, represented as signed 64 bit integers.

    https://github.com/boxdot/osmflat-rs/blob/5b3b6292cdad156b7b503c37981a4a1c348525c2/flatdata/osm.flatdata#L83-L85

    Though OSM PBFs provide the ability of a defined precision to a coordinate, the reality is that OSM data is always 32 bits for lat and lon. The database for the website / Editing API is the source of truth, and here you can see they are defined as 32 bit integers:

    https://github.com/openstreetmap/openstreetmap-website/blob/f407def8ba4bc55ea70807c33ff011472bdf8720/db/structure.sql#L444-L445

    Many fields have the bit width of 40. Why is this?

    opened by hallahan 5
  • Make osmflat more compact (especially when compressed)

    Make osmflat more compact (especially when compressed)

    • Store granularity explicitly instead of pre-multiplied numbers
    • Move Ids to separate optional sub-archive
    • Reduce bits needed for coordinates from 40 to 32
    • Remove unused header information

    Comparison:

    Comparing:

    • Compressed PBF (internal zlib compression)
    • Unpacked osmflat with and without optional Ids (only new version has them optional)
    • Compression with pzstd level 3
    • Compression with shuffly ( https://github.com/VeaaC/shuffly ) + pzstd level 3

    Before:

    | Dataset | PBF (zlib) | osmflat w Ids | zstd + osmflat w Ids | shuffly + zstd + osmflat w Ids | | ------------- | ------------- | ------------- | ------------- | ------------- | | Berlin | 70M | 227M | 94M | 58M | | Europe | 26G | 89G | 41G | 23G | | Planet | 66G | 223G | 102G | 57G |

    After:

    | Dataset | PBF (zlib) | osmflat w Ids | osmflat w/o Ids | zstd + osmflat w Ids | zstd + osmflat w/o Ids | shuffly + zstd + osmflat w Ids | shuffly + zstd + osmflat w/o Ids | | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | | Berlin | 70M | 214M | 176M | 83M | 69M | 53M | 49M | | Europe | 26G | 84G | 68G | 36G | 30G | 20G | 19G | | Planet | 66G | 208G | 164G | 97G | 76G | 48G | 47G |

    Observations:

    • More compact in all scenarios.
    • Using shuffly is still worth it (Ids have a compression ratio of > factor 20), but a bit less so due to rearranged data, and granularity.
    • Shuffly compressed version is smaller than compressed PBF (with and without Ids)
    • Most people might want to use a version without ids since it saves disk space
    • Ids are almost for free if compressed with shuffly (due to data being sorted by id)
    opened by VeaaC 4
  • DNM: Example of sorting a flatdata vector.

    DNM: Example of sorting a flatdata vector.

    Disclaimer: I'm a Rust noob.

    I wanted to describe a little better what I'm looking to do. Here we are sorting a a memmapped flatdata vector struct.

    I compute the index of a lat lon of a Node on the Hilbert curve. Sorting nodes by this will result in spatial locality on disk.

    If every OSM entity is sorted by their index on the Hilbert curve, you can easily build a spatial index on top of this.

    Also, it would be good to sort the strings array based on how often a given string is used. That way, you're going to have less cache misses.

    opened by hallahan 1
  • Fix dead code warnings.

    Fix dead code warnings.

    The dead code analysis is more strict now. In particular, fields are considered unused, even if they are used in derived code, like Debug derive.

    Cf. https://github.com/rust-lang/rust/issues/88900

    opened by boxdot 1
  • INVALID_IDX is not handled properly

    INVALID_IDX is not handled properly

    • Docs do not state which fields can contain it
    • No example handles it
    • We most likely should add some helpers to work with it (e.g. get all nodes of a way, get all nodes of a way only if all exist, etc)
    • Bonus: add support to flatdata for proper optional types (so that this kind of thing could not be forgotten)
    opened by VeaaC 1
  • Port libosmium examples

    Port libosmium examples

    Very simple examples

    • [x] Read
    • [x] Count
    • [x] Debug
    • [ ] Tiles

    Still reasonably simple examples

    • [ ] amenity_list
    • [ ] read_with_progress
    • [ ] filter_discussions
    • [ ] convert
    • [x] pub_names
    • [x] road_length
    opened by boxdot 1
  • Fix 32 bit references

    Fix 32 bit references

    The schema (and code) use a lot of 32 bit references. When compiling the world we get close (or exceed) those limitations (e.g. 5 billion nodes). Best to switch to something more future-proof, e.g. 40 bit references

    opened by VeaaC 1
  • Implement serialization of infos

    Implement serialization of infos

    Right now, we are not serializing infos in osmflat. This should be implemeneted. We should also add a flag to osmflatc to omit infos. For that, we need to make the infos resource optional.

    opened by boxdot 1
  • Port examples from osmpbfreader

    Port examples from osmpbfreader

    https://crates.io/crates/osmpbfreader has several small examples how to use OSM data. We should port these examples. After that we could use them to compare performance of both implementations.

    opened by boxdot 1
  • Consume OSM entity attributes.

    Consume OSM entity attributes.

    The attributes you find with an OSM entity are invaluable for analysis. For example, keeping track of the OSM user is useful if you are trying to track down a malicious user. Or, you might want to look at the timestamp to filter OSM entities edited in a specific time period.

    <way id="479834982" visible="true" version="3" changeset="68316730" timestamp="2019-03-20T01:52:13Z" user="stevea" uid="123633">
    

    It might be nice to add all of these to the ID archive and instead call it attributes instead?

    opened by hallahan 2
  • Handle negative OSM IDs

    Handle negative OSM IDs

    Many OSM tools, such as JSOM, assign negative IDs to OSM entities, this denotes that the given entity has not yet been united with the main OSM dataset.

    The OSM PBF supports negative IDs, but IdTableBuilder only supports u64.

    Here is example data that crashes osmflatc. 4nodes.zip

    opened by hallahan 3
  • Redundent transmute in osmflat_generated

    Redundent transmute in osmflat_generated

    Often times there are accessors that transmute the same time into itself such as:

    https://github.com/boxdot/osmflat-rs/blob/011d70a90f784f8e60b1f4903e1367be454445d4/osmflat/src/osmflat_generated.rs#L494-L497

    Is there any reason why this is being done, or is it something odd with the code generator? In this instance, the id is already an i64. Why transmute?

    opened by hallahan 3
  • Sorting vectors

    Sorting vectors

    Right now, it looks like we can only build a Vec by appending. It would be nice to be able to sort our vectors after they have been built. For example, I may want to compute a hilbert index for the nodes and sort them by that. That way, I can access the data with good locality.

    Is there a straightforward way to sort?

    opened by hallahan 9
  • metrics about file size

    metrics about file size

    What size does such file have if it holds the data from a planet.pbf file? Are there any other metrics you can give to estimate performance or size?

    Thank you

    opened by snuup 5
Owner
null
Rust read/write support for GPS Exchange Format (GPX)

gpx gpx is a library for reading and writing GPX (GPS Exchange Format) files. It uses the primitives provided by geo-types to allow for storage of GPS

GeoRust 63 Dec 5, 2022
Library for serializing the GeoJSON vector GIS file format

geojson Documentation Library for serializing the GeoJSON vector GIS file format Minimum Rust Version This library requires a minimum Rust version of

GeoRust 176 Dec 27, 2022
Rust read/write support for GPS Exchange Format (GPX)

gpx gpx is a library for reading and writing GPX (GPS Exchange Format) files. It uses the primitives provided by geo-types to allow for storage of GPS

GeoRust 63 Dec 5, 2022
A TinyVG vector graphics format parsing library.

tinyvg-rs A TinyVG vector graphics format parsing library. Testing This library uses the example files from the TinyVG/examples repo for integration t

null 2 Dec 31, 2021
Convert perf.data files to the Firefox Profiler format

fxprof-perf-convert A converter from the Linux perf perf.data format into the Firefox Profiler format, specifically into the processed profile format.

Markus Stange 12 Sep 19, 2022
Optimized geometry primitives for Microsoft platforms with the same memory layout as DirectX and Direct2D and types.

geoms Geometry for Microsoft platforms - a set of geometry primitives with memory layouts optimized for native APIs (Win32, Direct2D, and Direct3D). T

Connor Power 2 Dec 11, 2022
Geospatial primitives and algorithms for Rust

geo Geospatial Primitives, Algorithms, and Utilities The geo crate provides geospatial primitive types such as Point, LineString, and Polygon, and pro

GeoRust 989 Dec 29, 2022
Blazing fast and lightweight PostGIS vector tiles server

Martin Martin is a PostGIS vector tiles server suitable for large databases. Martin is written in Rust using Actix web framework. Requirements Install

Urbica 921 Jan 7, 2023
Geospatial primitives and algorithms for Rust

geo Geospatial Primitives, Algorithms, and Utilities The geo crate provides geospatial primitive types such as Point, LineString, and Polygon, and pro

GeoRust 990 Jan 1, 2023
Zero-Copy reading and writing of geospatial data.

GeoZero Zero-Copy reading and writing of geospatial data. GeoZero defines an API for reading geospatial data formats without an intermediate represent

GeoRust 155 Dec 29, 2022
TIFF decoding and encoding library in pure Rust

image-tiff TIFF decoding and encoding library in pure Rust Supported Features Baseline spec (other than formats and tags listed below as not supported

image-rs 66 Dec 30, 2022
A traffic simulation game exploring how small changes to roads affect cyclists, transit users, pedestrians, and drivers.

A/B Street Ever been stuck in traffic on a bus, wondering why is there legal street parking instead of a dedicated bus lane? A/B Street is a game expl

A/B Street 6.8k Jan 4, 2023
Calculates a stars position and velocity in the cartesian coordinate system.

SPV Calculates a stars position and velocity in the cartesian coordinate system. Todo Expand the number of available operation Batch processing by tak

Albin Sjรถgren 11 Feb 18, 2022
Didactic implementation of the type checker described in "Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism" written in OCaml

bidi-higher-rank-poly Didactic implementation of the type checker described in "Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorph

Sรธren Nรธrbรฆk 23 Oct 18, 2022
A single-binary, GPU-accelerated LLM server (HTTP and WebSocket API) written in Rust

Poly Poly is a versatile LLM serving back-end. What it offers: High-performance, efficient and reliable serving of multiple local LLM models Optional

Tommy van der Vorst 13 Nov 5, 2023
A set of tools for generating isochrones and reverse isochrones from geographic coordinates

This library provides a set of tools for generating isochrones and reverse isochrones from geographic coordinates. It leverages OpenStreetMap data to construct road networks and calculate areas accessible within specified time limits.

null 3 Feb 22, 2024
A helper bevy plugin to handle downloading OpenStreetMap-compliant slippy tiles

Bevy Slippy Tiles A helper bevy plugin to handle downloading OpenStreetMap-compliant slippy tiles. [DownloadSlippyTilesEvent] can be fired to request

Edouard Poitras 4 Jan 25, 2023
Slippy map (openstreetmap) widget for egui

Slippy maps widget for egui. Limitations There are couple of limitations when using this library. Some of them will might probably be lifted at some p

Piotr 13 Jun 14, 2023
C-like language compiler, the final project of ZJU Compiler Principle course

cc99 cc99 (not cc98.org) is a C-like language compiler, which is the final project of ZJU Compiler Principle course. It supports many of the C99 langu

Ralph 37 Oct 18, 2022