This post was originally published in 2020 and has been updated in 2023.
Mastering Geographical Information Systems, better known simply as GIS, can be considered in some ways as a rite of passage. The complexities and challenges involved in learning, which are ostensibly non-IT concepts, are steep. However, as they say, “There’s more than one way to skin a cat.” I’d like to share with you one way to tackle this challenge.
Let me introduce you to PostGIS.
What is PostGIS?
PostGIS is a PostgreSQL extension that adds GIS capabilities to this RDBMS. Its popularity stems not only from being “free” but because it’s considered to be among the leading GIS implementations in the world today. Virtually every major front-end application provides the hooks for a PostGIS, PostgreSQL-enabled back-end.
The PostGIS project, which is BSD licensed, began back in 2001. It turns our vanilla-flavored postgres into a spatial database and includes spatial datatypes (geometry, geography), spatial indexes (r-tree, quad-tree, kd-tree), and spatial functions.
Try Now: Get Percona Support for PostgreSQL
Essential PostGIS Tools and Applications
Working with GIS normally requires several layers of technology of Geo-Spatial Software, for example:
Boundless Server (formerly the OpenGeo Suite)
- PostGIS – Spatially enabled object-relational database.
- GeoServer – Software server for loading and sharing geospatial data
- GeoWebCache – Tile cache server that accelerates the serving of maps
- Composer – Web-based map configuration and styling utility
- WPS Builder – Web-based graphical constructor for generating server-side processes
- QuickView – Web application for composing and styling web maps
Note: Boundless Server GitHub repository
Advantages of Using PostGIS for Better Performance
PostGIS offers a solution to provide geospatial data capabilities in PostgreSQL for indexing, storing, and querying geographical data.
Spatial Indexing
Spatial indexing accelerates the querying and retrieval of geospatial data in PostgreSQL, improving response times for applications and analytics. It uses various spatial indexing techniques, such as R-tree and GiST (Generalized Search Tree), to organize data based on their spatial relationships, allowing the database to quickly narrow down the search space.
Geospatial Functions and Queries
PostGIS offers a wide array of geospatial functions for PostgreSQL, allowing users to perform complex spatial operations such as distance and area computations, geometric alterations, and buffering. Optimized spatial queries, like finding nearby points or intersecting polygons, use these functions to retrieve geospatial data efficiently.
Spatial Data Types
PostGIS supports spatial data types like Point, LineString, and Polygon in 2D and 3D. These allow efficient spatial data storage and retrieval due to their optimized indexing mechanisms. They enable spatial indexing, enabling faster query processing by organizing data based on geometric properties.
Querying The Spatial Database
It’s amazing the kinds of answers one can get by a single, well-composed, and yet standard query by asking such simple questions as:
- How far is it from here to there?
- What’s the closest point between two meandering streets?
- Is a street found in a certain zip code?
- How many homes are susceptible to flooding?
Here are a few example queries using a PostGIS-powered database. Refer here for more community-related shapefiles for Kamloops British Columbia:
Learn More: Find out how to improve PostgreSQL query performance insights with pg_stat_monitor
1 2 3 4 5 6 7 8 | -- -- EXAMPLE 1: What is the total length of all roads in British Columbia? -- SELECT sum(ST_Length(the_geom))/1000 AS km_roads FROM bc_roads; km_roads ------------------ 70842.1243039643 |
1 2 3 4 5 6 7 8 9 10 | -- -- EXAMPLE 2: How large is the city of Prince George, British Columbia? -- SELECT ST_Area(the_geom)/10000 AS hectares FROM bc_municipality WHERE name = 'PRINCE GEORGE'; hectares ------------------ 32657.9103824927 |
1 2 3 4 5 6 7 8 9 10 11 12 | -- -- EXAMPLE 3: What is the largest municipality in the province of British Columbia? -- SELECT name, ST_Area(the_geom)/10000 AS hectares FROM bc_municipality ORDER BY hectares DESC LIMIT 1; name | hectares ---------------+----------------- TUMBLER RIDGE | 155020.025561 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | -- -- EXAMPLE 4: What is the length of roads fully contained within each municipality? -- SELECT m.name, sum(ST_Length(r.the_geom))/1000 as roads_km FROM bc_roads AS r, bc_municipality AS m WHERE ST_Contains(m.the_geom,r.the_geom) GROUP BY m.name ORDER BY roads_km; name | roads_km ----------------------------+------------------ SURREY | 1539.47553551242 VANCOUVER | 1450.33093486576 LANGLEY DISTRICT | 833.793392535662 BURNABY | 773.769091404338 PRINCE GEORGE | 694.375543691 |
Learn More: How to tune PostgreSQL database parameters to optimize performance
PostGIS Use Cases
PostGIS extends PostgreSQL’s capabilities to handle geospatial data for solutions in real-world scenarios. From mapping to visualizations to logistics, PostGIS helps users to make informed decisions based on spatial insights.
Location-Based Services
PostGIS plays a role in location-based applications by enabling spatial data storage and analysis. For instance, a grocery delivery service could utilize PostGIS to optimize delivery for certain areas. A query that finds available drivers within a certain distance, facilitated by PostGIS’s spatial indexing, would significantly enhance the platform’s responsiveness.
Geospatial Analytics
PostGIS utilizes geospatial data analysis by offering advanced tools for operations such as identifying the nearest points or calculating areas within polygons. The functions are optimized through spatial indexing for quicker and better queries of complex geospatial analyses, improving overall performance.
Mapping and Visualization
PostGIS improves map rendering and visualization through efficient geospatial data handling. It allows for quickly retrieving map data to ensure smooth and responsive displays.
Related: Watch the PostgreSQL high-performance tuning and optimization webinar sessions
How to Install the PostGIS Extension for PostgreSQL
Basic Installation
The following installation instructions assume one is using PostgreSQL version 12 on Linux/CENTOS-7, although any major version of postgres and OS can be used.
1 2 3 4 5 | # # REDHAT/CENTOS DERIVATIVES, USING COMMUNITY POSTGRES # yum update -y yum install -y postgis30_12 postgis30_12-client |
1 2 3 4 | # # CREATE THE DATACLUSTER # /usr/pgsql-12/bin/postgresql-12-setup initdb |
1 2 3 4 | # # SERVICE STARTUP # systemctl start postgresql-12 |
1 2 3 4 5 6 | # # INSTALLING THE POSTGIS EXTENSION # Execute as UNIX account postgres # createdb mydatabase; psql -c 'create extension postgis' mydatabase |
Note: Although it is on the roadmap, the Percona Distribution for PostgreSQL does not currently include PostGIS. But you should be able to install the community PostGIS packages to it with the pertinent YUM/RPM package switches.
Complete Installation for PostGIS Extension
The following provides a more complete installation of all PostGIS capabilities. Refer to the PostGIS documentation for more information.
1 2 3 4 | -- -- if you want to install raster support -- CREATE EXTENSION postgis_raster; |
1 2 3 4 | -- -- if you want to install topology support -- CREATE EXTENSION postgis_topology; |
1 2 3 4 | -- -- if you want to install sfcgal support -- CREATE EXTENSION postgis_sfcgal; |
1 2 3 4 5 | -- -- if you want to install tiger geocoder -- CREATE EXTENSION fuzzystrmatch CREATE EXTENSION postgis_tiger_geocoder; |
1 2 3 4 | -- -- if you want to install pcre, add the address standardizer extension -- CREATE EXTENSION address_standardizer; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | -- -- list of all installed extensions -- mydatabase=# select * from pg_available_extensions where name ~ '^postgis'; name | default_version | installed_version | comment --------------------------+-----------------+-------------------+--------------------------------------------------------------------- postgis-3 | 3.0.1 | | PostGIS geometry, geography, and raster spatial types and functions postgis_raster-3 | 3.0.1 | | PostGIS raster types and functions postgis_sfcgal-3 | 3.0.1 | | PostGIS SFCGAL functions postgis_tiger_geocoder-3 | 3.0.1 | | PostGIS tiger geocoder and reverse geocoder postgis_topology-3 | 3.0.1 | | PostGIS topology spatial types and functions postgis | 3.0.1 | 3.0.1 | PostGIS geometry, geography, and raster spatial types and functions postgis_raster | 3.0.1 | 3.0.1 | PostGIS raster types and functions postgis_sfcgal | 3.0.1 | 3.0.1 | PostGIS SFCGAL functions postgis_tiger_geocoder | 3.0.1 | 3.0.1 | PostGIS tiger geocoder and reverse geocoder postgis_topology | 3.0.1 | | PostGIS topology spatial types and functions |
Working With Shapefiles
There are literally hundreds of terabytes available online. Ironically, the most precious data one can get is free to download because it’s been generated by governments from all over the world for the public good.
Now it’s time to get some data. The most common format is “shapefile.”
PostGIS includes these two command-line utilities:
Working With ZipCodes From The Us Census Data: Tiger Files
TIGER/Line files are a digital database of geographic features, such as roads, railroads, rivers, lakes, legal boundaries, census statistical boundaries, etc., covering the entire United States and are freely available here.
Once you’ve navigated to this website, you can download a multitude of fascinating pieces of data.
Example:
Step 1: Execute The Following As A Bash Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | psql -U postgres <<_eof_ set ON_ERROR_STOP on drop database if exists gis_demo; create database gis_demo; c gis_demo create extension if not exists postgis; create extension if not exists postgis_topology; create schema gis; create or replace function gis.findzipcode( inout longitude double precision, inout latitude double precision, out zipcode varchar ) as $$ begin select zcta5ce10 into zipcode from zipcodes, (values (st_makepoint(longitude,latitude)))t(p) where st_contains(geom, p); end; $$ language plpgsql security definer set search_path=gis,public,topology; _eof_ |
Step 2: Download The 2019 ZipCode Shapefile
The shapefile is stored in a zip file. Unzip the entire archive before attempting an upload.
URL: https://www.census.gov/cgi-bin/geo/shapefiles/index.php
1 2 3 4 | # # Execute the following, the actual arguments will vary # shp2pgsql -c -D -I tl_2019_us_zcta510.shp gis.zipcodes | psql -U postgres gis_demo |
Once the upload into the postgres database is complete, this is what you’ll get:
1 2 3 4 5 6 7 8 9 10 11 | gis_demo=# d List of relations Schema | Name | Type | Owner ----------+-------------------+----------+---------- public | geography_columns | view | postgres public | geometry_columns | view | postgres public | spatial_ref_sys | table | postgres topology | layer | table | postgres topology | topology | table | postgres topology | topology_id_seq | sequence | postgres (6 rows) |
1 2 3 4 5 6 7 8 | gis_demo=# dn List of schemas Name | Owner ----------+---------- gis | postgres public | postgres topology | postgres (3 rows) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | gis_demo=# d zipcodes Table "gis.zipcodes" Column | Type | Collation | Nullable | Default ------------+------------------------+-----------+----------+--------------------------------------- gid | integer | | not null | nextval('zipcodes_gid_seq'::regclass) zcta5ce10 | character varying(5) | | | geoid10 | character varying(5) | | | classfp10 | character varying(2) | | | mtfcc10 | character varying(5) | | | funcstat10 | character varying(1) | | | aland10 | double precision | | | awater10 | double precision | | | intptlat10 | character varying(11) | | | intptlon10 | character varying(12) | | | geom | geometry(MultiPolygon) | | | Indexes: "zipcodes_pkey" PRIMARY KEY, btree (gid) "zipcodes_geom_idx" gist (geom) |
Step 3: An Example Query
1 2 3 4 5 6 7 | -- -- getting a zipcode based upon the LAT/LON -- gis_demo=# select * from gis.findzipcode(-87.798615,30.53711); longitude | latitude | zipcode ------------+----------+--------- -87.798615 | 30.53711 | 36576 |
On The Road To Mastering GIS
There are a couple of things to keep in mind when querying PostGIS-powered databases:
- Follow the PostGIS reference documentation closely: it’s easy to read, explains many of the ideas behind the function call, and provides many examples.
- The GIS standard requires that functions be documented using mixed case.
- GIS function calls, implemented in postgres, are written using lowercase.
- For the adventurous, here’s a comprehensive and freely available dataset of shapefiles for the city of Kamloops, British Columbia.
Because so much of our modern big data insights depend upon raw GIS data, directly querying a GIS database empowers one to create even more powerful and precise insights. PostGIS is an excellent way for the DEV, DBA, and SRA to learn all things GIS.
Have Fun!
Get PostGIS Support From Percona
PostGIS is included in Percona Distribution for PostgreSQL, which provides the best and most critical enterprise components from the open source community in a single distribution. And with Percona support for PostGIS, you can reliably run spatial data on PostgreSQL in production without fear of vendor lock-in.
- 24x7x365 consultative or hands-on support from PostgreSQL experts
- Industry-best, 15-minute SLAs
- Simplified support and pricing
- No restrictive commitments or lock-in
- Assistance on any environment – on-premises, cloud, or hybrid