GIN as a substitute for bitmap indexes

On some other DBMS’s, bitmap indexes are often used for columns that have only a few distinct values. Boolean columns, status codes and such. Compared to normal B-tree indexes, bitmap indexes are very efficient at packing duplicates.

PostgreSQL doesn’t have bitmap indexes, but GIN is also very good at handling duplicates. The internal structure of a GIN index is actually quite similar to a bitmap index. It doesn’t pack items in a bitmap, but in the upcoming 9.4 version it uses so-called varbyte encoding, which in the best case uses only a single byte per indexed row. 2-3 bytes is typical. In earlier versions, each item was packed using 6 bytes, which is still a lot less than the per-tuple overhead of a B-tree index.

For example, let’s create a table with a column, and fill it with some test data:

postgres=# create table numbers (n int4, mod2 int4);
postgres=# insert into numbers (n, mod2) select n, n % 2 from generate_series(1, 10000000) n;
INSERT 0 10000000

To create a GIN index as a substitute for a regular B-tree index, use the btree_gin extension that comes with PostgreSQL:

postgres=# create extension btree_gin;
postgres=# create index numbers_mod2 on numbers using gin (mod2);

For comparison, let’s also create a regular B-tree index on it:

postgres=# create index numbers_mod2_btree on numbers using btree (mod2);

And check the sizes of the indexes:

postgres=# \\\\di+
                               List of relations
 Schema |        Name        | Type  | Owner  |  Table  |  Size  | Description 
 public | numbers_mod2_btree | index | heikki | numbers | 214 MB | 
 public | numbers_mod2_gin   | index | heikki | numbers | 10 MB  | 
(2 rows)

GIN is 20x smaller than a B-tree index!

The above was from a 9.4devel checkout. On 9.3, the GIN index takes 58 MB instead of 10 MB. That’s still a lot smaller than a B-tree, but 9.4 will really up the ante.