A Geographic Daydream

Will Carter
Level Up Coding
Published in
10 min readFeb 8, 2021

--

Let’s imagine a new reality.

POOF! At the wave of Alex’s hand, we magically discover that we now inhabit the body of a completely new person…

…one that has just graduated from the College of Veterinary Medicine at Purdue University, in the state of Indiana of all things. Amazing!

That’s us on the right, holding the cat. Our name is Lisa. After a quick scan of our new memory, we “remember” that veterinary school was difficult, and now we are eager to make our mark in the world. We want to devote our life’s work to helping animals and their human’s be more healthy and happy. This is our life mission.

Also, as luck would have it, we remember that we just won the Hooser Lottery for a whopping 1 billion dollars 💰 💰 💰! So, money is no object for us to follow our life mission.

Dreaming big, we set the goal to open a new state-of-the-art Animal Hospital, but have no idea of the best location to maximize our efforts. The only thing we know for sure is that we want to focus on Indiana, our favorite state. Also, we want a location that is populated enough to support the hospital, but do not want a super densely populated metropolitan area. Further, we don’t want to put our hospital out in the middle of nowhere with a low population without the possibility to support many pets and their human’s.

Geographic Data

In order to answer the question of finding the best location, we can think about leveraging geographic data to help us differentiate potential animal hospital location areas of interest, showing the way to the best choice of location.

Geographic data is tied to a specific location. This allows us to ask location based questions upon it, such as counting how many locations are present within a particular area.

Geographic data is available in three types: Polygons, Points, and Lines.

Polygons represent enclosed shapes on a map such as state or county borderlines.

Indiana state and counties polygons

Points are individual “points of interest” on a map. They represent unique latitude and longitude locations on the earth’s surface. Existing Animal Hospital locations will be represented in our map by points.

Existing animal hospitals in Indiana

Lines represent continuous geographic features such as roads or rivers. We will not be looking at line geographic data for our efforts here.

PostGIS is a spatial database extender for PostgreSQL object-relational databases. It adds support for geographic objects allowing location related queries to be run in SQL. We will use the power of SQL answer the question of where to build our new Animal Hospital and use QGIS, a free and open source geographic information system, to view our results visually on a map. All the maps in this post are composed in QGIS.

QGIS + PostGreSQL + PostGIS

How will we hunt for the best location for a new hospital?

To keep it simple, we’ll consider the population (person count) in a particular area and compare that count to the number of existing hospitals in that area.

Areas that have a high population with lower number of hospitals will be those that we consider under-served, and this is where we will look first.

Luckily, the U.S. Census Bureau provides a wealth of data, both geographic and otherwise. We will be able to gather our geographic polygons (counties) as well as the needed population data from there.

Gather Geo Data

We’ll need the state boundary of Indiana to begin with. State boundary geographic data can be found here.

Pick the lower detail, smaller dataset of state boundaries

The raw state data zip file is named: cb_2018_us_state_20m.zip

For a double check, I loaded the geographic data into QGIS, just to make sure it’s what we want. Simply drag the downloaded zip file onto a new project in QGIS to see it visually on a map.

State geographic data

County boundaries (polygons)

Next, we need county boundaries because want to find the number of existing hospitals inside each county and compare that count to the population for a particular county.

The raw county data zip file is named: cb_2018_us_county_20m.zip. It looks good on the map as well.

County geographic data

Animal Hospitals (points)

Importantly, we need the locations of all existing Animal hospitals in Indiana. POI Factory (Point of Interest Factory) often proves useful to find point based datasets of many topics. They have a great dataset of national animal hospital locations.

After some munging with the downloaded data, I was able to edit it down to the following relevant columns:

I imported this data into a geo_animal_hospitals_usa table, converting the latitude/longitude columns into geographic data. Then, I can query that data, and display the result on a map.

-- animal hospitals in usa
SELECT
id,
geom,
longitude,
latitude,
name,
city,
state
FROM public.geo_animal_hospitals_usa;
Querying all the animal hospitals in the USA

Then, I can add a where clause to just return the existing hospitals in Indiana.

-- animal hospitals in indiana
SELECT
id,
geom,
longitude,
latitude,
name,
city,
state
FROM public.geo_animal_hospitals_usa
where state = 'IN';

We can see the query results visually in QGIS, showing the existing animal hospital locations in Indiana (red dots).

Notice the geom column with its datatype geometry(Point) in the geo_animal_hospitals_usa table. This column is what POSTGIS brings to the table. This is where the geographically formatted data is stored.

Gather Population Data

Next, we need to get population data for our Indiana counties. The Census site has this data for us as well.

County population data

County population data can be found here, in comma separated value (CSV) format. Look for the datasets area, and download the file: co-est2019-alldata.csv. This file contains county population data for the entire U.S.

After deleting most of the columns in the dataset that are irrelevant (there are a lot!), here are the columns we’ll work with. The pop_2019 is the estimated person count for that county.

Here is the SQL to create the population_county table, and fill it with the data from above using the COPY command.

CREATE TABLE population_county (
id SERIAL,
state VARCHAR(255),
state_name VARCHAR(255),
name VARCHAR(255),
pop_2019 integer,
PRIMARY KEY (id)
);
COPY population_county(state, state_name, name, pop_2019)
FROM '/tmp/pop_county.csv'
DELIMITER ','
CSV HEADER;

Making the map

Now that we have all of the necessary data, we can make SQL queries to build up a map with logic to find the desired location for the Animal Hospital. First, start with a clean map, I have only added the Open Street Map base layer so that we can get our bearings easily as we add more layers.

Next, we’ll make the query to retrieve state geographic data inside of QGIS. To do this, navigate to: Database > DBManager, paste the SQL in the query box and then load the result to the map.

Loading geo data with SQL in QGIS
--get all states
SELECT
geom,
statefp,
geoid,
stusps,
name,
aland,
awater
FROM public.geo_state_raw;

Here we retrieve all the states…

However, we are interested only in Indiana, which is easy enough to limit by adding a where clause for the statefp = ‘18’.

--add where to limit to Indiana (statefp = 18
SELECT
geom,
statefp,
geoid,
stusps,
name,
aland,
awater
FROM public.geo_state_raw
where statefp = '18';

Now we just have Indiana…

Next, we will add the existing animal hospitals in Indiana with this query…

-- animal hospitals in Indiana
SELECT
id,
geom,
longitude,
latitude,
name,
city,
state
FROM public.geo_animal_hospitals_usa
where state = 'IN';

The result shows all the animal hospital in Indiana.

Next, we can write the query to highlight counties where the person per hospital count is high. The query to produce this is as follows.

select 
county.geom,
county.name,
max(pop.pop_2019) as pop_2019,
count(hospitals.geom) as animal_hospital_count,
CASE
WHEN count(hospitals.geom) = 0
THEN max(pop.pop_2019)
ELSE max(pop.pop_2019) / count(hospitals.geom)
END
AS persons_per_hospital
from geo_county_raw county
left join population_county pop on pop.name = CONCAT(county.name, ' County')
left join
geo_animal_hospitals_usa as hospitals
on ST_WITHIN(ST_Transform(hospitals.geom, 4326),
ST_Transform(county.geom, 4326))

where pop.state_name = 'Indiana'
and county.statefp = '18'
group by county.geom, county.name
order by persons_per_hospital desc;

The magic is in the left join to the geo_animal_hospitals_usa point table. This is a geospatial join.

The left join to the geo_animal_hospitals_usa using ST_WITHIN is so that we can count the number of animal hospital locations within each county in the animal_hospital_count column.

Now that we have the number of people in a county and the number of existing hospitals in a county, we then add a second computed column, persons_per_hospital, which is the number of people divided by the number of existing hospitals in that county. The higher this number, the more people that the hospital serves in that county.

Counties with high persons_per_hospital counts are identified by shaded darker green in the map below using graduated symbology based on the persons_per_hospital column.

Symbolizing counties according to persons_per_hospital

I also added the existing hospital point layer on top.

Indiana animal hospitals

These four Indiana counties (Stark, Tippecanoe, Fayette, and Vigo) look interesting and are where we will focus more investigation.

If we look at each of these counties from north to south, the each have similar numbers of persons per county served.

Stark County

This area looks possible, with only one other animal hospital in the county. The persons per animal hospital there is almost 23,000 people.

Tippecanoe County

This county is a bit quite a bit more people, over 8 times as many as Stark county. There are 8 existing hospitals here and the persons per animal hospital count in this county is almost 24,500 people. The existing hospitals look to be clustered in West Lafayette, IN.

Fayette County

Fayette county is similar to Stark county. It has a smallish population with a single existing hospital. The largest town in that county is Connersville, IN.

Vigo County

Lastly, we have Vigo County, home of Terre Haute, Indiana. The population is a not as big as Tippecanoe county, but larger than Fayette and Stark. The people per existing hospital is almost 27k people. This is why I think we should consider Vigo county first, with Tippecanoe County a close second as a location for our new animal hospital.

The Power of SQL

The cool thing about the SQL query to produce the data for the map is that it is easily modified to look at a different area of interest. Here we run the query for Illinois instead of Indiana simply by adjusting the where clause.

select 
county.geom,
county.name,
max(pop.pop_2019) as pop_2019,
count(hospitals.geom) as animal_hospital_count,
CASE
WHEN count(hospitals.geom) = 0
THEN max(pop.pop_2019)
ELSE max(pop.pop_2019) / count(hospitals.geom)
END
AS persons_per_hospital
from geo_county_raw county
left join population_county pop on pop.name = CONCAT(county.name, ' County')
left join
geo_animal_hospitals_usa as hospitals
on ST_WITHIN(ST_Transform(hospitals.geom, 4326),
ST_Transform(county.geom, 4326))
where pop.state_name = 'Illinois'
and county.statefp = '17'

group by county.geom, county.name
order by persons_per_hospital desc;

Darker areas are “under-served”.

Illinois: persons per animal hospital by county

How about Colorado?

Colorado: persons per animal hospital by county

--

--