When I first looked at MURI's geospatial requirements — find available drivers within 5km of a passenger, manage service regions as polygons, calculate routes — my initial instinct was to store latitude and longitude as two float columns in PostgreSQL and handle proximity math in Python. This is the path of least resistance, and for small scale, it works fine.
After benchmarking both approaches under realistic load, I chose PostGIS. Here's the honest comparison.
The flat float approach
With plain lat/lng columns, a proximity query looks like this:
-- Find drivers within 5km using Haversine formula
SELECT driver_id, lat, lng,
(6371 * acos(
cos(radians(passenger_lat)) * cos(radians(lat)) *
cos(radians(lng) - radians(passenger_lng)) +
sin(radians(passenger_lat)) * sin(radians(lat))
)) AS distance_km
FROM drivers
WHERE status = 'available'
HAVING distance_km < 5
ORDER BY distance_km
LIMIT 20;This works. The math is correct. But there's a critical problem: this query cannot use a standard B-tree index on the lat/lng columns. The computation references both columns together inside a function, which means PostgreSQL scans the entire drivers table on every call. At 10,000 active drivers, that's a full table scan for every passenger request.
The PostGIS approach
With PostGIS, the same query becomes:
-- PostGIS proximity query with spatial index
SELECT driver_id,
ST_Distance(location::geography,
ST_Point(passenger_lng, passenger_lat)::geography) AS distance_m
FROM drivers
WHERE status = 'available'
AND ST_DWithin(
location::geography,
ST_Point(passenger_lng, passenger_lat)::geography,
5000 -- meters
)
ORDER BY distance_m
LIMIT 20;With a GiST spatial index on the location column, ST_DWithin uses the index to eliminate the vast majority of rows before doing any distance calculation. The difference in query time is dramatic.
Benchmark on 50,000 driver records: plain Haversine — 340ms average. PostGIS with GiST index — 4ms average. At scale, this isn't a micro-optimization; it's the difference between a usable system and one that falls over under load.
Region polygons: the case PostGIS wins decisively
MURI also needs to answer: 'Is this pickup location inside a serviced region?' With flat coordinates, point-in-polygon is a complex ray casting algorithm you implement in Python. With PostGIS, it's a single function call:
-- Is this location within any active service region?
SELECT r.region_id, r.name
FROM service_regions r
WHERE ST_Contains(r.boundary, ST_Point(lng, lat))
AND r.is_active = true;The boundary column stores a POLYGON geometry. The GiST index makes this efficient regardless of how complex the polygon shapes are.
When to skip PostGIS
PostGIS is not always the right answer. It adds a PostgreSQL extension that you need to install and maintain, and it has a learning curve. I'd skip it if:
- You only need to store a single location per record and never query by proximity (a user's home address, for example)
- Your dataset is small enough that a full table scan is fast — under ~1,000 rows, the index overhead isn't worth it
- You're on a managed database service that doesn't support PostGIS
- Your geo queries only need approximate answers and you can do them in application code post-fetch
The verdict
For MURI's use case — real-time driver proximity, service region polygons, and route-aware queries — PostGIS was the right choice. The 85x query speedup under load wasn't theoretical; we measured it. If you're building any system where users ask spatial questions against dynamic data, PostGIS is worth the setup time.
These techniques were applied in building MURI — a multi-role transportation platform with real-time geospatial routing, WebSocket live tracking, and role-scoped booking flows for riders, drivers, and admins. Full case study at /case-studies/muri