libspatialindex
(and Rtree)

Fast and fun spatial indexing for bounding boxes

@howardbutler






libspatialindex

is a C++ library (C API) for spatially indexing kD bounding box data

Practically only 3 or 4, however

Features

  • R*-tree with quadratic and linear splitting strategies
  • Range, nearest, and parametric (bbox) queries
  • GoF-inspired codebase
  • Bulk loader and clustered storage
  • In-memory and on-disk serialization
  • MIT license as of 1.8.0

Example


    You don't want to see a C++ code example
    					

Notes

  • Items are not unique (by id or bbox)
  • Bulk loading pre-sorts your data -- 10x faster
  • Custom storage API
  • Linux, OS X, Windows (32 and 64 bit)
  • MSVC, g++, clang

Rtree

is a Python library for spatially indexing kD bounding box data (based on libspatialindex)

Features

  • libspatialindex's features. In Python.
  • clustered storage -- store pickles alongside your bounds
  • LGPL license

Bulk Loading vs. Single Insert

Inserting 30k random points (usec)

Use generator-based bulk loader!


def data_gen():
    for i, (minx, miny, maxx, maxy) in data:
        yield (i, (minx, miny, maxx, maxy), None)

idx = index.Rtree('bulk', data_gen())

# after most of the data is loaded, go add a few stragglers 
idx.add(12345, (minx, miny, maxx, maxy))
idx.add(42, (minx-1, miny-1, maxx+1, maxy+1))

                    					

Not


idx = index.Index('single')
for i in range(data):
    idx.add(i, data[i])

                    					

Intersection


>>> idx.insert(4321, (34.37, 26.73, 49.37, 41.73), obj=42)
>>> hits = idx.intersection((0, 0, 60, 60), objects=True)
>>> for i in hits:
...     if i.id == 4321:
...         i.object
...         i.bbox
42
[34.3776829412, 26.737585373400002, 49.3776829412, 41.737585373400002]
                    					

nearest and intersection only return candidates that fall within bounds, not those that are perfectly contained. If you want true intersection, you need to do that with your favorite geometry algebra library after querying the index for candidates.

Nearest


>>> idx.insert(4321, (34.37, 26.73, 49.37, 41.73), obj=42)
>>> hits = idx.nearest((0, 0, 10, 10), 3, objects=True)
                    					
  • Ask for the 3-nearest items
  • If the last two items are the exactly same distance away, both are returned, meaning an ask for 3 items might actually return you 4.

Delete


>>> idx.delete(4321, (34.3776829412, 26.7375853734, 49.3776829412, 41.7375853734) )
                    					
  • id + bbox defines index item
  • ids need not be unique
  • bboxes need not be unique
  • bbox coordinates are equal per std::numeric_limits<double>::epsilon()

Data uniqueness is on you to manage

from fiona import collection
from shapely.geometry import mapping, shape
from rtree import index

p = index.Property()
p.filename='theindex'
p.overwrite=True
p.storage=index.RT_Disk
p.dimension = 2

idx = index.Index(  p.filename, 
                    build(),  
                    properties=p, 
                    interleaved=True, 
                    overwrite=True)
                    


def build():
    with collection("data/ne_10m_admin_1_states_provinces_shp.shp", "r") as shapes:
        for s in shapes:
            geom = shape(s['geometry'])
            id = int(s['id'])
            yield (id, geom.bounds, s)
            


query = idx.nearest((-93.00,42.0,-93.00,42.00), 
                    3, 
                    objects=True)
states = list(query)

print len(airports)
print states[0].object['properties']['name']


libspatialindex - http://libspatialindex.github.com/
Rtree - http://toblerity.github.com/rtree/
Shapely - http://toblerity.github.com/shapely/
Fiona - http://toblerity.github.com/fiona/
irc.freenode.net #geopython
toblerity: noun a collection of more related Python geographic software projects.