# Our Blog

Ongoing observations by End Point Dev people

## Building Rasters in PostGIS In a past blog post I described a method I’d used to digest raw statistics from the Mexican government statistics body, INEGI, quantifying the relative educational level of residents in Mexico City. In the post, I divided these data into squares geographically, and created a KML visualization consisting of polygons, where each polygon’s color and height reflected the educational level of residents in the corresponding area. Where the original data proved slow to render and difficult to interpret, the result after processing was visually appealing, intuitively meaningful, and considerably more performant. Plus I got to revisit a favorite TV show, examine SQL’s Common Table Expressions, and demonstrate their use in building complex queries.

But the post left a few loose ends untied. For instance, the blog post built the visualization using just one large query. Although its CTE-based design rendered it fairly readable, the query remained far too complex, and far too slow, at least for general use. Doing everything in one query makes for a sometimes enjoyable mental exercise, but it also means the query has to start from zero, every time it runs, so iterative development bogs down quickly as the query must recalculate at every new iteration all the expensive stuff it already got worked out in previous runs. Further, the query in that post calculated its grid coordinates all on its own; there’s an easier method I promised to introduce, which I’d like to demonstrate here.

### Rasters

GIS databases contain information of (at least) two different types: vector data, and raster data. Vector data describes points, connected with lines and curves, and various objects derived from them, such as polygons and linestrings. Rasters, on the other hand, represent a two dimensional array of pixels, where each pixel corresponds to a certain geographic area. Each pixel in the raster will have two sets of associated coordinates: first, latitude and longitude values describing the geographic square this pixel represents, and second, the pixel coordinates within the raster, which in PostGIS begin with `(1,1)` at the origin. Rasters organize their data in “bands”, collections of boolean or numeric values, with at most one value per pixel. One band might represent the elevation above sea level for each pixel, or the surface temperature. Aerial imagery is often delivered as rasters: a full-color image might have one band for the red channel, another for green, and a third for blue. Bands may also contain no value for certain pixels, and we’ll make use of this ability later to make some parts of our final image transparent.

### The Data

The INEGI data I introduced in my previous blog post contains points representing every school in Mexico, and continuing on the theme of education, I thought it would be an interesting exercise to make an image of the country, where the color varies depending on the concentration of schools in the area. This resembles my previous work in many ways: I’m dividing the area under consideration into small parts, generating a value for each of those parts, and representing the result visually. Here, though, rather than drawing and extruding polygons, I’ll simply color a raster pixel. This simpler rendering, where the display system must simply project a static image, rather than an array of polygons, means I can support a much higher resolution grid than in my previous blog post, without overtaxing display hardware.

### Creating the Image

Having learned my lesson from the work with Mexico City data, I elected to create this image with multiple queries. In the end, I wrapped these queries in a Perl script, building sufficient intelligence therein to recreate any bits of the database I might have simply dropped, as I developed the script, and to skip reprocessing any steps for which it finds current results lying around.

PostGIS supports a data type called `raster`, and anticipating that I might want multiple rasters in my database each using a variety of settings and script modifications, I created a table to store these rasters and some extra data about them.

``````CREATE TABLE rasters (id SERIAL PRIMARY KEY, pixels INTEGER, r RASTER, comment TEXT);
``````

The `pixels` field holds the number of pixels along one edge of the raster, which the script presumes is a square. This is redundant, as I can extract that number from the raster itself, but having it stored here made things easier for my purposes.

To calculate a raster, I need to find a value for each of its pixels, or to confirm that the pixel should have no value, and for that I wanted another table. This table contains both geographic and Euclidean coordinates for each pixel in each raster, the `id` of the associated entry in the `rasters` table, a field for the number of schools I’ve counted in this square, fields for the RGB color assigned to the pixel, and a boolean flag telling me whether or not I’ve finished calculating everything for this pixel or not.

``````CREATE TABLE polys (
x INTEGER,
y INTEGER,
geom GEOMETRY,
count INTEGER,
r_val INTEGER,
b_val INTEGER,
g_val INTEGER,
processed BOOLEAN DEFAULT 'f',
raster_id INTEGER NOT NULL
);
``````

The script’s first step is to create an empty raster, which it does with PostGIS’s ST_MakeEmptyRaster() function, and to add four bands to it, one each for the pixels’ red, green, blue, and alpha values. The data type for each band is given as `8BUI`, PostGIS parlance for an 8-bit unsigned integer. The various color bands default to NULL values, and the alpha band to zeroes. The code looks something like this:

``````INSERT INTO rasters (pixels, r, comment)
SELECT
ST_MakeEmptyRaster(
numpixels,  -- The number of pixels along one edge of the raster
numpixels,
st_xmin,    -- latitude and longitude coordinates of the corner of the raster
st_ymax,
(st_xmax - st_xmin)/numpixels::float,   -- width and height of the raster
-(st_ymax - st_ymin)/numpixels::float,
0,
0,
4326        -- We'll use the WGS 84 coordinate system
),
1, '8BUI'::TEXT, NULL::DOUBLE PRECISION, NULL::DOUBLE PRECISION),
2, '8BUI'::TEXT, NULL::DOUBLE PRECISION, NULL::DOUBLE PRECISION),
3, '8BUI'::TEXT, NULL::DOUBLE PRECISION, NULL::DOUBLE PRECISION),
4, '8BUI'::TEXT, 0, 0
) AS r
FROM limits, params;
``````

The query refers to the `limits` and `params` tables (actually, they’re table expressions from elsewhere in the query from which this snippet is extracted). These tables provide values for `numpixels`, the number of pixels along one edge of the raster, provided by the user on the script’s command line, and for `st_xmin`, `st_xmax`, `st_ymin`, and `st_ymax`, which are latitude and longitude values describing a bounding box around the Mexican territory.

Before we can do too much experimentation and query development, we need a way to inspect our results, by rendering the raster from the database to an actual image file. This is one place where having a Perl wrapper script comes in handy, because doing this in pure SQL is actually something of a pain..

``````print "Creating image\n";
my \$res = \$dbh->selectall_arrayref(qq{
SELECT ST_AsPng(r, ARRAY[1,2,3,4]::INT[])
FROM rasters WHERE id = \$id
});

my \$png = \$res->;

open (my \$fh, '>', \$filename) || die "Couldn't open handle to \$filename: \$!";
binmode \$fh;

print \$fh \$png;

close \$fh;
``````

This uses ST_AsPng to create an image file under whatever path is stored in `\$filename`, and it’s helpful for reviewing incremental results as the script develops, and of course for rendering the final product. Here we’ve told it to use band 1 for the red values, band 2 for green, 3 for blue, and 4 for alpha. The alpha channel is important: eventually we’ll want to feed the result to Google Earth, and the transparency will mean we can see the ocean surrounding Mexico, without any unimportant raster data getting int he way.

Without any values in the bands, there’s nothing to look at yet—​so let’s make some data.

### Finding Polygons

We need to find the polygon corresponding to each pixel in the raster, and PostGIS gives us two ways to do it. The first is ST_PixelAsPolygons(), which returns a set of values, including the geographic rectangle the pixel represents, called `geom`, and its `x` and `y` coordinates. The other function, ST_PixelAsCentroids(), also accepts a single raster as its sole argument and returns a set of similar results, except that instead of a geographic rectangle, it returns only the point at the geographic center of the pixel. I used both as I experimented to make this visualization. The simplicity of these functions makes them far superior to the manual calculation I did in the previous blog post I’ve referred to.

This query fills the `polys` table with polygons for a particular raster:

``````INSERT INTO polys (x, y, geom, raster_id)
SELECT (ST_PixelAsPolygons(r)).*, \$id FROM rasters
-- Having code for both functions here made it easy to switch back and forth
-- SELECT (ST_PixelAsCentroids(r)).*, \$id FROM rasters
WHERE id = \$id
``````

One step that proved critical for decent performance was to filter the set of polygons so that I calculate values only for those pixels that actually intersect some part of Mexico. PostGIS rasters are regular rectangles, so a substantial portion of this raster lies over open ocean. Since we’re counting schools of people, and not of fish, we can skip all those ocean pixels, and though filtering them out takes considerable processing, we earn back that cost and more in savings later when calculating schools within each pixel. This modified version of the query above does that filtering.

``````WITH polygons AS (
SELECT (ST_PixelAsPolygons(r)).* FROM rasters
-- Having code for both functions here made it easy to switch back and forth
-- SELECT (ST_PixelAsCentroids(r)).* FROM rasters
WHERE id = \$id
),
filtered_polygons AS (
SELECT x, y, p.geom
FROM polygons p
JOIN mexico m
ON ST_Intersects(m.geom, p.geom)
GROUP BY 1, 2, 3
)
INSERT INTO polys (x, y, geom, raster_id)
SELECT x, y, geom, \$id FROM filtered_polygons
``````

This query refers to a table called `mexico`, which contains geographic representations of each of Mexico’s 31 states and the region of Mexico City, which I didn’t previously know wasn’t a state. It’s this table that gave us `st_xmin`, `st_xmax`, etc., when we made the original raster, and now we use it to filter out all pixels from our raster that don’t intersect some portion of Mexico.

At this point it would be nice to see a visual representation of our raster, to make sure the filtering worked the way we intended. We still haven’t filled the raster with any data, but we could fill it with dummy values and render the image, just to get a look at our progress. This raster modification we achieve with the ST_SetValues() function, which accepts a raster as input and returns the raster, with updated values in one of its bands. The function comes in several forms; the one we’ll use accepts an array of geomval objects, each of which contains a PostGIS geometry, and a single double precision value. Internally, the function finds the pixel or pixels corresponding to the geometry objects provided, and sets those pixels’ values in the given band accordingly. We make sure to provide `POINT` values here—​ the center point for each pixel’s corresponding polygon—​gc as `ST_SetValues` is considerably faster that way than when given `POLYGON` values themselves.

``````-- Update the original raster
UPDATE rasters
SET r = ST_SetValues(ST_SetValues(ST_SetValues(ST_SetValues(r, 1, r_geomvalset), 2, g_geomvalset), 3, b_geomvalset), 4, a_geomvalset)
FROM (
SELECT
ARRAY_AGG((geom, 255)::GEOMVAL) AS r_geomvalset,
ARRAY_AGG((geom, 255)::GEOMVAL) AS g_geomvalset,
ARRAY_AGG((geom, 255)::GEOMVAL) AS b_geomvalset,
ARRAY_AGG((ST_Centroid(geom), 255)::GEOMVAL) AS a_geomvalset
FROM polys
WHERE raster_id = \$id
) foo
WHERE id = \$id
``````

For a raster with many pixels, this takes a bit of time to accomplish. For my initial testing, I used rasters with only 50 pixels per side, which my wimpy laptop can churn out fairly rapidly. Here’s the first visual confirmation that we’re headed down the right path. Because of the low resolution, the image is small, but it’s obvious that our result follows the basic shape of Mexico, which means our polygon generation and filtering has worked. I mentioned transparency above. Note that in the default format of this blog, it would be impossible to tell that the transparency in this image was correct, so I’ve post-processed all the rendered raster images in this article to replace the transparency with a gray checkerboard pattern.

### Counting schools

Now all that remains is to count the schools in each pixel, and assign corresponding color values.

``````WITH polygons AS (
SELECT x, y, geom FROM polys
WHERE
processed = 'f'
AND raster_id = \$id
LIMIT 100
),
schoolcounts AS (
SELECT x AS sx, y AS sy, COUNT(*) AS schoolcount
FROM polygons p
JOIN sip ON (ST_Within(sip.geom, p.geom) AND sip.geografico = 'Escuela')
GROUP BY x, y, p.geom
)
UPDATE polys SET count = COALESCE(s.schoolcount, 0), processed = 't'
FROM schoolcounts s
RIGHT JOIN polygons p
ON (sx = p.x AND sy = p.y)
WHERE
polys.x = p.x AND polys.y = p.y AND raster_id = \$id
``````

This query finally refers to the actual INEGI data set I started with, a table called `sip`, in which each record includes a `POINT` object and a label. We’re interested in labels marked “Escuela”, Spanish for “School”. I tried a few different techniques here, as shown by the two different `JOIN` variants in the `schoolcounts` portion of the query. The version shown here simply counts schools within each square pixel. Later I’ll try counting schools within a certain radius of the center of the pixel. The two give different results, of course; which one is more useful depends on the end goal for these data.

Note the use of the `processed` field, and the 100 item limit in the `polygons` section of the query. Together with some logic in the surrounding Perl, this allows me to process pixels in small batches, interrupt processing when necessary, and later, resume more or less where I left off.

### Color assignment

After counting schools, I need to assign colors to the pixels. Initially I imagined this would be fairly easy, but in fact it took some detective work to get it right. My plan was to get a gradient of colors, divide the school counts into histogram-like buckets according to school density, and assign a color to each bucket. Building your own decent color gradient is surprisingly difficult, and I got mine from an online service. As with the last blog post, I figured a relatively small number of colors would be easiest. These I encoded as a PostgreSQL array, and I used PostgreSQL’s ntile window function to return the right bucket for each polygon. The results surprised me, shown here in a raster with 500 pixels on each side: It’s easy to see in that image that some pixels are colored to represent schools, but where did all the vertical bands come from? Much of Mexico is very sparsely populated, so vast swaths of our pixels have no schools at all. This means that the pixels with a school count of zero fill most of the available buckets. Pixels with schools in them are so much in the minority, for any raster of sufficient resolution, as to place all non-zero pixels in a single bucket. So I modified my approach slightly: I made my histogram only from those pixels with non-zero school counts. That query is below.

``````WITH colors AS (
SELECT ARRAY[
ARRAY[x'd4'::INT, x'ea'::INT, x'd7'::INT],
ARRAY[x'a9'::INT, x'd6'::INT, x'af'::INT],
ARRAY[x'7e'::INT, x'c1'::INT, x'88'::INT],
ARRAY[x'29'::INT, x'99'::INT, x'39'::INT]] AS colors
),
p AS (
SELECT x, y, geom, raster_id, count
FROM polys WHERE raster_id = \$id AND count IS NOT NULL AND count != 0
),
colorval AS (
SELECT
x, y, geom, raster_id,
NTILE(ARRAY_LENGTH(colors, 1)) OVER (ORDER BY count ASC) AS ntile,
colors[NTILE(ARRAY_LENGTH(colors, 1)) OVER (ORDER BY count ASC)] AS r,
colors[NTILE(ARRAY_LENGTH(colors, 1)) OVER (ORDER BY count ASC)] AS g,
colors[NTILE(ARRAY_LENGTH(colors, 1)) OVER (ORDER BY count ASC)] AS b,
255 AS a
FROM colors, p
WHERE raster_id = \$id
AND count != 0
AND count IS NOT NULL
)
UPDATE polys
SET r_val = r, g_val = g, b_val = b
FROM colorval c
WHERE c.x = polys.x AND c.y = polys.y AND c.raster_id = polys.raster_id
``````

A second, very simple query not included here found any pixels with zero schools and colored them white.

### Packaging the result

The final bit of required scripting to generate a usable image is to use `ST_SetValues` again, in a somewhat more complex form than last time, creating an array of `geomval` objects and using them to update the original raster.

``````UPDATE rasters
SET r = ST_SetValues(ST_SetValues(ST_SetValues(ST_SetValues(r, 1, r_geomvalset), 2, g_geomvalset), 3, b_geomvalset), 4, a_geomvalset)
FROM (
SELECT
ARRAY_AGG((ST_Centroid(geom), r_val)::GEOMVAL) FILTER (WHERE r_val IS NOT NULL) AS r_geomvalset,
ARRAY_AGG((ST_Centroid(geom), g_val)::GEOMVAL) FILTER (WHERE g_val IS NOT NULL) AS g_geomvalset,
ARRAY_AGG((ST_Centroid(geom), b_val)::GEOMVAL) FILTER (WHERE b_val IS NOT NULL) AS b_geomvalset,
ARRAY_AGG((ST_Centroid(geom), 255)::GEOMVAL) AS a_geomvalset
FROM polys
WHERE raster_id = \$id
) foo
WHERE id = \$id
``````

This gives me something like the image below, with 500 pixels on each edge of the raster: This image does the job, but contains an awful lot of empty space. I mentioned above that instead of calculating the exact number of schools within a pixel’s polygon, I thought I might try calculating the number of schools within a specific distance of the pixel’s center; perhaps that technique would lead to a more visually appealing result. This is simple enough, using the `ST_PixelAsCentroids()` function to get the centroids of the pixels’ polygons, and then modifying the `schoolcounts` CTE as shown here. The key modification—​in fact, the only modification from the earlier version shown above—​is in the `JOIN` clause, which uses `ST_Buffer` to create a circle surrounding a point, in this case, with an arbitrarily chosen radius, and counts the number of schools found within that circle.

``````schoolcounts AS (
SELECT x AS sx, y AS sy, COUNT(*) AS schoolcount
FROM polygons p
JOIN sip ON (ST_Within(sip.geom, ST_Buffer(p.geom, 0.4)) AND sip.geografico = 'Escuela')
GROUP BY x, y, p.geom
)
``````

The resulting image, again with 500x500 pixel resolution, shows evidence of its circle-based counting heritage. Obviously at this point, a person could mess with any number of variables to get all sorts of results. But for me it was enough to embed this as a GroundOverlay object within a simple KML file, to see the result in Google Earth: 