Logo preload
close Logo

Some Tips on Using Arrays in PostgreSQL

October 9, 2018

PostgreSQL is an amazing open-source database that has a lot of features that many users may not use or even know exist. One of these features is the ability to store data as arrays in a column.

blogpost header image

Once you have collected data in Fulcrum, you may want to export it into a local database within your organization. Fulcrum provides a few tools to accomplish this, including Fulcrum Desktop, the Fulcrum Export Wizard, and the Fulcrum API.

PostgreSQL is a popular relational database with many Fulcrum users. If you’ve exported your data to PostgreSQL, you’ve probably noticed that many Fulcrum field types, including multiple choice fields, classification fields, and media fields (photo, video, audio) produce array columns in your database. In fact, media fields may create multiple arrays if you use captions. This post will provide a few techniques for efficiently working with array types in PostgreSQL to help you get the most out of the data you’ve collected in Fulcrum.

‍Arrays and Array Functions

PostgreSQL allows a table column to contain multi-dimensional arrays that can be of any built-in or user-defined data type. The official documentation for arrays can be found here. Arrays have an advantage over large plain text fields in that data remains in a discreet and addressable form. PostgreSQL also has a lot of functions dealing with arrays that can be found here.

‍Create an Array Column in PostgreSQL

For purposes of this post, we will assume that we are going to create a table of people that contains a column called aliases that is a text array of various other names for a person. Creating an array column in PostgreSQL is as easy as appending the [ ] operator after your data type in a create statement such as this:

CREATE TABLE people
(
id serial,
full_name text,
aliases text[] );

Here, the aliases column is specified as our array column by using the text[ ] specifier. Note you can also do multi-dimensional arrays by appending additional [ ] specifiers together.

‍Inserting and Updating Array Data

Inserting data into a table with an array column can be as easy as:

insert into people (full_name, aliases) values (‘Abraham Lincoln’, ‘{“President Lincoln”, “Honest Abe”}’);

Doing an insert into an array column requires using the { } operator to specify the array and using double quotes (“) and a comma to delimit the separate values going into the array. We can see how the data went into our table by running:

blogtest=# select * from people;
id |    full_name    |              aliases
—-+—————–+————————————
1 | Abraham Lincoln | {“President Lincoln”,”Honest Abe”}
(1 row)

blogtest=#

Our data are indeed separate inside of PostgreSQL looking at the aliases column. You can access specific elements in the array by also using the [ ] notation in selects.

blogtest=# select full_name, aliases[1] from people;
full_name    |      aliases
—————–+——————-
Abraham Lincoln | President Lincoln
(1 row)

blogtest=#

If you are a software developer, please note that array notation in PostgreSQL starts at element one and not zero as it does in most programming languages. You can use some of the more basic operators in PostgreSQL against array elements.

blogtest=# select * from people where aliases[1] <> aliases[2];
id |    full_name    |              aliases
—-+—————–+————————————
1 | Abraham Lincoln | {“President Lincoln”,”Honest Abe”}
(1 row)

PostgreSQL lets us update and replace existing elements in an array using fairly standard SQL syntax as well.

blogtest=# update people set aliases[1] = ‘President Abraham Lincoln’ where id = 1;
UPDATE 1
blogtest=# select * from people;
id |    full_name    |                  aliases
—-+—————–+——————————————–
1 | Abraham Lincoln | {“President Abraham Lincoln”,”Honest Abe”}
(1 row)

blogtest=#

If you already know how many elements are in the array, you can add to it using standard syntax.

blogtest=# update people set aliases[3] = ‘President Lincoln’ where id = 1;
UPDATE 1
blogtest=# select * from people;
id |    full_name    |                            aliases
—-+—————–+—————————————————————-
1 | Abraham Lincoln | {“President Abraham Lincoln”,”Honest Abe”,”President Lincoln”}
(1 row)

blogtest=#

However, this is not always convenient as you may not be operating on a single entity or you may have a table with varying numbers of array sizes. In this case, we have to use the array functions that PostgreSQL provides. For example, if we want to append a value to an array we would use a query that makes use of the array_append function.

blogtest=# update people set aliases = array_append(aliases, ‘Not Abe Vigoda’) where id = 1;
blogtest=# UPDATE 1
blogtest=# select * from people;
id |    full_name    |                                     aliases
—-+—————–+———————————————————————————
1 | Abraham Lincoln | {“President Abraham Lincoln”,”Honest Abe”,”President Lincoln”,”Not Abe Vigoda”}
(1 row)

Similarly, we can remove array elements using the array_remove function so we do not have to specify the exact element position.

blogtest=# update people set aliases = array_remove(aliases, ‘Not Abe Vigoda’) where id = 1;
UPDATE 1
blogtest=# select * from people;
id |          full_name          |                            aliases
—-+—————————–+—————————————————————-
1 | Abraham Lincoln             | {“President Abraham Lincoln”,”Honest Abe”,”President Lincoln”}
(1 rows)

‍Indexing and Searching

As arrays are a different type from ordinary columns, we have to do some things a bit differently to optimize their use. Fortunately, PostgreSQL adds some special operators for searching arrays that make use of certain types of indices to speed things up.

To continue with the examples above, we need to add some more entries to our people database.

blogtest=# insert into people (full_name, aliases) values (‘Henry Charles Albert David’, ‘{“Prince Harry”, “Duke of Sussex”}’);
INSERT 0 1
blogtest=# insert into people (full_name, aliases) values (‘William Arthur Philip Louis’, ‘{“Prince William”, “Duke of Cambridge”}’);
INSERT 0 1
blogtest=# insert into people (full_name, aliases) values (‘George Alexander Louis’, ‘{“Prince George”, “Prince George of Cambridge”}’);
INSERT 0 1
blogtest=#

Normal SQL search operations will not properly work on array columns. If you try to find all the rows that have an alias of Prince Harry, you will get the following:

blogtest=# select * from people where aliases = ‘Prince Harry’;
ERROR:  malformed array literal: “Prince Harry”
LINE 1: select * from people where aliases = ‘Prince Harry’;
^
DETAIL:  Array value must start with “{” or dimension information.
blogtest=#

PostgreSQL provides the contains operator (@>) for this type of search. If we rewrite the above query, we get this:

blogtest=# select * from people where aliases @> ‘{“Prince Harry”}’;
id |         full_name          |              aliases
—-+—————————-+———————————–
3 | Henry Charles Albert David | {“Prince Harry”,”Duke of Sussex”}
(1 row)

In the real world we would use an index to speed up searches on a column. Array columns are no different in this respect. However, PostgreSQL recommends that we use the GIN index type for static column data and the GIST index type for dynamic data. The contains operator will make use of a GIN- or GIST-type index if they are present.

We can test this with our small example database. First we run the explain command for the above query that searches for anyone with an alias of Prince Harry.

blogtest=# explain select * from people where aliases @> ‘{“Prince Harry”}’;
QUERY PLAN
——————————————————–
Seq Scan on people  (cost=0.00..20.62 rows=4 width=68)
Filter: (aliases @> ‘{“Prince Harry”}’::text[])
(2 rows)

We see that this query has to walk every row in the table to find what we are seeking. Now we create a GIN index on the aliases column and see what happens.

blogtest=# create index idx_gin_aliases on people using GIN(aliases);
CREATE INDEX
blogtest=# explain select * from people where aliases @> ‘{“Prince Harry”}’;
QUERY PLAN
——————————————————-
Seq Scan on people  (cost=0.00..1.05 rows=1 width=68)
Filter: (aliases @> ‘{“Prince Harry”}’::text[])
(2 rows)

blogtest=#

The cost does indeed go down for our simple test database. This can make a big difference if you have many rows in your database as the GIN index allows the contains operator to do a simple lookup to find the row results.

‍Applying Functions Against an Array Column

We will wrap up this blog post by showing how you can run functions in PostgreSQL against an array column. Most functions normally expect each column to only contain a single value. Trying to run these functions against an array will result in an error. Consider the example of trying to use the similarity function to check which strings are alike.

newsarticles=> select id from people where similarity(aliases, ‘Prince Harry’) > 0.5;
ERROR:  function similarity(text[], unknown) does not exist
LINE 1: select id from people where similarity(aliases, ‘Prince Harr…
^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

This does not work as the similarity function has no idea how to handle an array data type. For this to work, we have to make use of the unnest function in PostgreSQL. This function takes an array and converts it into a standard set of individual roles, allowing us to run functions against it. We need to modify our query to look something like this:

newsarticles=> SELECT * FROM people, UNNEST(people.aliases) p WHERE SIMILARITY(p, ‘Prince Harry’) > 0.5;
id    |         full_name          |              aliases              |      p
———+—————————-+———————————–+————–
1002056 | Henry Charles Albert David | {“Duke of Sussex”,”Prince Harry”} | Prince Harry
(1 row)

Running explain against the query shows that PostgreSQL does indeed use unnest to convert the array into a series of separate rows.

newsarticles=> explain SELECT * FROM people, UNNEST(people.aliases) p WHERE SIMILARITY(p, ‘Prince Harry’) > 0.5;
QUERY PLAN
———————————————————————————
Nested Loop  (cost=0.00..2449266.80 rows=43604385 width=104)
->  Seq Scan on people  (cost=0.00..31205.45 rows=1321345 width=72)
->  Function Scan on unnest p  (cost=0.00..1.50 rows=33 width=32)
Filter: (similarity(p, ‘Prince Harry’::text) > ‘0.5’::double precision)
(4 rows)

We hope this post gives you a solid introduction to using arrays in PostgreSQL and speeding up queries against them. The techniques discussed here can help you work more efficiently with data you export from Fulcrum and minimize data transformations you may implement to capitalize on your data once it’s on your systems.

Fulcrum is a data collection platform that enables businesses to reduce costs, access critical data in real time, and improve decision making at every level. With Fulcrum, you can create custom apps using our simple drag-and-drop builder to turn your paper documents into digital forms that your field teams can quickly complete on mobile devices.