Working with Geospatial Data: An Introduction

October 31, 2017 by Bryan McBride

At our core, Spatial Networks is a geography company. From building geospatial technology products to collecting, organizing, and analyzing geodata, we eat, drink, and breathe geography. It’s sobering to learn that, for many of our customers, Fulcrum often provides their first exposure to the wonderful world of Geographic Information Systems (GIS).

Working with Geospatial Data

We don’t build traditional GIS tools, but so much of field data collection revolves around location. Location provides tremendous context for analysis and helps to literally visualize trends and identify outliers. Whether you are monitoring light pollution on the beaches of Florida, supporting homeless outreach on the streets of Rhode Island, or helping to get your cousin elected in Cleveland, location leverage is always advantageous.

While there are tons of GIS learning resources available, it can be an intimidating subject, filled with strange acronyms, big software packages with complex toolbars full of unrecognizable icons, and an absurd amount of geospatial data formats. GIS blends a variety of specialty domains, from spatial reference systems, map projections and cartography to relational databases, scripting and statistics.

My hope here is to present an approachable introduction to help folks new to GIS start working with location-based data in a meaningful way, using free tools.

Understanding Location

In the GIS world, location is represented by geometry, the spatial component of geographic features. For practical purposes, vector geometries typically consist of points, lines, or polygons, defined by one or more interconnected vertices. A vertex is simply a position in space defined by an X, Y (optional Z) axis. Fulcrum currently supports point geometries, defined by a single pair of X, Y coordinates - longitude and latitude in decimal degree format.

If you have a spreadsheet of addresses you’d like to import into Fulcrum, you can use GIS software or a web-based service to geocode the addresses, which will calculate the coordinates, generally based on street networks or building footprints.

Working With Geospatial Data Formats

Geospatial data can come in a variety of formats, some of which are open, human readable, and familiar (CSV, GeoJSON) but many are not (shapefile, geodatabase, coverage). Fulcrum supports importing data in CSV and shapefile format. If you have a simple app without any repeatable fields, you can simply upload a single CSV file with latitude and longitude columns and your data will be mapped. Shapefiles actually consist of several files all with the same name and different extensions. These files all need to be zipped up before they can be imported into Fulcrum. If you’ve got a complicated Fulcrum app with nested levels of repeatables, you will need to make sure your import files have linked fields which can be related during the import.

If your data is not in CSV or shapefile format or you need to convert coordinates from another coordinate reference system, I highly recommend the open source GDAL translator library. The ogr2ogr utility which is a part of GDAL can be used to convert formats and transform/reproject coordinates. Ogr2ogr supports a wide variety of vector data formats and will write to both CSV and shapefile.

SQLite + GeoPackage

Whether you are preparing existing data for import, running quality control checks throughout the collection process, or analyzing the final results, you’re going to need tools for working with your data. While spreadsheet software such as Excel often works well for simple tabular datasets, you’ll likely want something more powerful once you start working with relational, geospatial datasets. But fear not - despite plenty of examples to the contrary, powerful tools don’t always have to be complicated and expensive!

A while back I made the case for Managing Data Exports with SQLite. Fast forward a few years and SQLite is even more prevalent in the geospatial industry. It’s popular because it’s simple, open, cross-platform, and incredibly powerful. SQLite-based GIS formats have been around for a while, in the form of MBTiles and SpatiaLite, but the recent formalization of the GeoPackage standard and its adoption across major GIS platforms has solidified SQLite as the go-to technology for data exchange across desktop and mobile platforms.

So now that we know SQLite is simple, powerful, and broadly utilized, let’s jump right in and see how we can start leveraging this technology to begin asking some location-based questions from our data. The simplicity of working with SQLite databases is probably one of its biggest attractions. As a serverless database, there are no installation, setup, or administrative procedures required. The database engine is embedded in the programs that use it and the process that wants to access the database reads and writes directly from the database files on disk. Databases can be emailed, copied across the file system, and trashed when you are finished with them - just like any other standalone file, but very much unlike most database systems.

Tools for working with SQLite

My go-to tool for working with SQLite databases is the excellent DB Browser for SQLite. This freely available, open source, cross-platform utility provides a clean Graphical User Interface for creating, viewing, editing, and querying SQLite database files. This is strictly a database utility and while it can support spatial querying, there is no mapping functionality with this tool. If you need to view your data on a map, I recommend using QGIS, however, you can do a lot of spatial analysis without ever needing to view a map.

In order to support all the spatial functionality of our GeoPackage or SpatiaLite databases, we need to “spatially enable” DB Browser by loading the SpatiaLite extension. We’ve made this step easy by providing a full set of SpatiaLite binaries for macOS, Linux, and Windows, which are available for download at this GitHub repo.

Fortunately, DB Browser for SQLite makes it easy to load persistent extensions. Once you’ve downloaded the binary file for your operating system, from within the DB Browser menu toolbar, go to ‘DB Browser for SQLite -> Preferences’ and click on the ‘Extensions’ tab. Click the puzzle piece button with the green plus and browse out to the file you just downloaded. Windows users, see here for additional notes on setting this up.

DB Browser Extensions

You can confirm that the SpatiaLite extension is properly installed by executing the following SQL command to return the SpatiaLite version:

SpatiaLite Version

If it returns a version, you should be good to go. Congratulations, you’ve now got a spatially enabled SQLite utility ready for querying!

Creating a GeoPackage with Reference Data

For this exercise we will be investigating data collected from our internal “GeoBeers” Fulcrum app. GeoBeers is a simple app for logging and sharing where we’ve enjoyed our favorite beverages. This app also integrates with our Slack communication platform to post photos directly to a channel for immediate sharing, so it’s a fun way to use Fulcrum! While these records are geotagged with latitude and longitude coordinates, we are not storing any additional address or location information, but I’m interested in seeing how these records are distributed across states. What state has the most records? Which beer type is most represented in each state? You can see where we are going with this, but we first need some reference data to determine the state boundaries.

While GeoPackage is a relatively new format, most downloadable data is available in shapefile format. I’m going to download the latest state boundary shapefile from the US Census Bureau available here.

  • Unzip the downloaded file and you will see several files named tl_2017_us_state, all with different extensions (more on that here).

  • Create a GeoPackage database using ogr2ogr to import the states shapefile as a new table: ogr2ogr -f GPKG -nln states -t_srs EPSG:4326 working.gpkg tl_2017_us_state.shp
    • The -f GPKG option defines the output format as GeoPackage.
    • The -nln states option sets the table name to states (default is the shapefile name).
    • The -t_srs EPSG:4326 option sets to spatial reference system to World Geodetic System 1984 (Global system of latitude & longitude coordinates used by GPS). This will “reproject” the coordinates from EPSG:4269 to EPSG:4326.
    • The result is a GeoPackage database file name working.gpkg with a table named states holding our state boundary polygons.
  • Download Fulcrum data in CSV format and unzip the downloaded file. If you open the file in a spreadsheet program you should see _latitude & _longitude columns holding the point coordinates.

  • Import this data into the GeoPackage database we just created with the following ogr2ogr command: ogr2ogr -f GPKG -update -nln geobeers -a_srs EPSG:4326 -oo X_POSSIBLE_NAMES=_longitude -oo Y_POSSIBLE_NAMES=_latitude working.gpkg geobeers.csv
    • The -update option prevents a new GeoPackage file from being created.
    • The -a_srs EPSG:4326 option assigns the spatial reference system to WGS 84. For CSV we have to assign the SRS since unlike shapefiles, it is not defined anywhere in the file.
    • The -oo X_POSSIBLE_NAMES & -oo Y_POSSIBLE_NAMES options directly specify the potential names of the columns that can contain X/longitude and Y/latitude.

Simple Spatial Analysis

We now have a GeoPackage database with our Fulcrum GeoBeers point data and state boundary reference data as tables, ready for some spatial analysis!

Let’s begin with some standard SQL to get a better understanding of our data. If you are new to SQL or SQLite, you may want to review this handy document: SQL As Understood By SQLite.

How many GeoBeers records do we have?

GeoBeers Count

What are the most popular beer types?

Popular Beer Types

What is the average rating, grouped by beer type?

Average Beer Rating

Who’s a morning drinker? Caveat: we use this app for testing and demo purposes :)

Day Drinkers

Now that we have a better understanding of our data, let’s try to incorporate our states table to start writing some spatial queries. Note that GeoPackage database files created by org2ogr (and Fulcrum) are raw GeoPackages, with the geometry stored in a slightly different binary format than SpatiaLite, but starting with version 4.2.0, SpatiaLite supports working with the GeoPackage format.

This may sound incredibly confusing, but the bottom line is that in order to work with GeoPackage databases from the SpatiaLite plugin, we need to first run the following command: SELECT EnableGpkgAmphibiousMode(); This command basically tells SpatiaLite to work natively with GeoPackage geometry, removing the need to explicitly call the appropriate format conversion functions such as GeomFromGPB() or CastAutomagic().

SpatiaLite generally follows the OGC Simple Feature specification for SQL, but you can review all the supported SQL functions here.

First let’s enable GeoPackage amphibious mode to make writing spatial queries easier:

Geopackage Amphibious Mode

Now let’s take a look at the Well-Known Text representation of our geometries.

Geobeers Geometries

Notice they are all POINT geometries

States Geometries

Notice they are a mix of POLYGON and MULTIPOLYGON geometries

Next find the top 10 largest states by calculated area:

Largest States

Okay, now we are ready to perform a spatial join, an operation used to combine two or more datasets with respect to a spatial relationship. Let’s join our GeoBeers points with our states’ boundaries to see which state each record is contained by:

Geobeers States

So which state has the most records?

States With Most Records

What are the top beer types grouped by state?

Top Beer Types by State

Which records are located outside of the United States? For this one we need to collect the states geometries into a single geometry and test which points are not contained within.

Records Outside USA

Conclusions

We’ve just scratched the surface of the power of GIS and spatial SQL, but hopefully this post has helped demystify some of the concepts and introduced some useful tools for working with your data. There are plenty of resources available to help expand your knowledge in this space, and I highly recommend the following online resources:

In conclusion, GeoPackage, based on SQLite technology, is a great format for storing and working with geospatial data. Free and open source tools such as GDAL/OGR and DB Browser for SQLite with the SpatiaLite extension provide mechanisms for importing data and running spatial queries for analysis. Once you break your dependency on spreadsheets, you can truly experience the benefits of location leverage!

Bryan

About the author

Bryan manages Professional Services at Spatial Networks, where he is an all-around utility player, helping customers leverage our technology to the fullest potential.

comments powered by Disqus

Sign Up Today & Start Collecting Data In Just Minutes!

Start Your FREE Trial