Creating Geo-Enabled applications with MongoDB, GeoJSON and MySQLThis blog post will discuss creating geo-enabled applications with MongoDB, GeoJSON and MySQL.

Introduction

Recently I published a blog post about the new GIS features in MySQL 5.7. Today I’ve looked into how to use MongoDB (I’ve tested with 3.0 and 3.2, with 3.2 being much faster) for the same purpose. I will also talk about GIS in MySQL and MongoDB at Percona Live next week (together with my colleague Michael Benshoof).

MongoDB and GIS

MongoDB has a very useful feature called “geoNear.” There are other MongoDB spatial functions available to calculate the distance on a sphere (like the Earth), i.e. $nearSphere , $centerSphere, $near – but all of them have restrictions. The most important one is that they do not support sharding. The  geoNear command in MongodDB, on the other hand, supports sharding. I will use geoNear in this post.

For this test, I exported Open Street Map data from MySQL to MongoDB (see the “Creating GEO-enabled applications with MySQL 5.6” post for more details on how to load this data to MySQL).

  1. Export the data to JSON. In MySQL 5.7, we can use JSON_OBJECT to generate the JSON file:
  2. Use mongoimport  to import JSON into MongoDB (I’m using 24 threads, -j 24, to use parallel import):
  3. Create a 2d index:

Another option would be using the osm2mongo Ruby script, which will convert the osm file and load it directly to MongoDB.

Now I can use the geoNear command to find all the restaurants near my location:

MongoDB 3.0 vs 3.2 with geoNear

MongoDB 3.2 features Geospatial Optimization:

MongoDB 3.2 introduces version 3 of 2dsphere indexes, which index GeoJSON geometries at a finer gradation. The new version improves performance of 2dsphere index queries over smaller regions. In addition, for both 2d indexes and 2dsphere indexes, the performance of geoNear queries has been improved for dense datasets.

I’ve tested the performance of the above geoNear query with MongoDB 3.0 and MongoDB 3.2 (both the old and new versions of 2dsphere index). All the results statistics are for a "limit": 5 and "maxDistance": 10000.

MongoDB 3.0, index version 2:

MongoDB 3.2, index version 2:

What is interesting here is that even with the "2dsphereIndexVersion" : 2, MongoDB 3.2 performs much faster and scans a much smaller number of documents.

MongoDB 3.2, index version 3:

That is significantly faster, 1ms for five results!

MySQL and GeoJSON revisited

To compare to the performance of the above query, I’ve created a similar query in MySQL. First of all, we will need to use the good old bounding rectangle (envelope) trick to only include the points in the 10 miles radius (or so). If we don’t, MySQL will not be able to use spatial (RTREE) index. I’ve created the following function to generate the envelope:

Then we can use the following query (an update of the GeoJSON query from my previous post):

The time is slower: 40ms in MySQL compared to 1ms – 12ms in MongoDB. The box is AWS EC2 t2.medium.

To recap the difference between MongoDB geoNear and MySQL st_distance_sphere:

  • MongoDB geoNear uses 2dsphere index, so it is fast; however, it can’t just calculate the distance between two arbitrary points
  • MySQL st_distance_sphere is a helper function and will only calculate the distance between two points; it will not use an index – we will have to use the create_envelope function to restrict the search so MySQL will use an index

Time-wise, this is not an apples to apples comparison as the query is quite different and uses a different technique.

Visualizing the results

Results for GeoJSON for Google Maps API:

Now we can add those on a map:

Back to MongoDB: pluses and minuses

MongoDB uses Google’s S2 library to perform GIS calculations. The geoNear command is fast and easy to use for finding points of interests near you (which is the most common operation). However, full GIS support does not natively exist.

Another issue I came across when creating a 2dsphere index: MongoDB is very strict when checking the lines and polygons. For example:

MongoDB complains about this: type: “LineString”, coordinates: [ [ -85.808852, 41.245582 ], [ -85.808852, 41.245582 ] ]

This is a “bad” line string as the starting point and ending point are the same. I had to remove the bad data from my MongoDB imported dataset, which was tricky by itself. MongoDB (as opposed to MySQL) does not have a native way to compare the values inside the JSON, so I had to use $where construct – which is slow and acquires a global lock:

After that, I was able to add the 2dsphere index.

Conclusion

MongoDB looks good, is pretty fast and easy for geo-proximity search queries – until you go outside of the one function and need full GIS support (which does not natively exist). It may be trickier to implement other GIS functions like st_contains or st_within.

Update: as pointed out, MongoDB actually supports $geoWithin and $geoIntersects GIS functions.

Update 2: I was asked about MySQL and GeoJSON: why not to use the ST_MakeEnvelope function. One of the issues with ST_MakeEnvelope is that it only works with SRID 0 (it requires point geometry arguments with an SRID of 0) and OSM data is stored with SRID 1. But also I will need to “add” 10 miles to my point. The only way to do that is to calculate the new point, 10 miles apart from “my” point/location. I have to use a custom function to manipulate the lat/lon pair.

The explain plan for the MySQL GeoJSON query shows that MySQL uses SHAPE (Spatial) index:

And if I remove “st_within(shape, create_envelope(@lat, @lon, 10))” from the query it will show the full table scan:

 

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Daniël van Eeden

Some other helpfull tools: http://geojsonlint.com/ and MySQL Workbech with the Geo viewer. http://databaseblog.myname.nl/2016/03/re-json-document-fast-lookup-with-mysql.html

What about using GeoHash?

holmberd

Great article. Now it would be interesting to see a geospatial comparison of mongodb and postGIS.

Me me

At such small runtimes, your measurement accuracy matters, as well as the SQL parsing cost. Use prepared statements and query at minimum 10000 points. At a runtime of 30 seconds, benchmark results may become reliable enough to measure.