On some other DBMS’s, bitmap indexes are often used for columns that have only a few distinct values. Boolean columns, status codes and such. Compared to normal B-tree indexes, bitmap indexes are very efficient at packing duplicates.
PostgreSQL doesn’t have bitmap indexes, but GIN is also very good at handling duplicates. The internal structure of a GIN index is actually quite similar to a bitmap index. It doesn’t pack items in a bitmap, but in the upcoming 9.4 version it uses so-called varbyte encoding, which in the best case uses only a single byte per indexed row. 2-3 bytes is typical. In earlier versions, each item was packed using 6 bytes, which is still a lot less than the per-tuple overhead of a B-tree index.
For example, let’s create a table with a column, and fill it with some test data:
postgres=# create table numbers (n int4, mod2 int4); CREATE TABLE postgres=# insert into numbers (n, mod2) select n, n % 2 from generate_series(1, 10000000) n; INSERT 0 10000000
To create a GIN index as a substitute for a regular B-tree index, use the btree_gin extension that comes with PostgreSQL:
postgres=# create extension btree_gin; CREATE EXTENSION postgres=# create index numbers_mod2 on numbers using gin (mod2); CREATE INDEX
For comparison, let’s also create a regular B-tree index on it:
postgres=# create index numbers_mod2_btree on numbers using btree (mod2); CREATE INDEX
And check the sizes of the indexes:
postgres=# \\\\di+ List of relations Schema | Name | Type | Owner | Table | Size | Description --------+--------------------+-------+--------+---------+--------+------------- public | numbers_mod2_btree | index | heikki | numbers | 214 MB | public | numbers_mod2_gin | index | heikki | numbers | 10 MB | (2 rows)
GIN is 20x smaller than a B-tree index!
The above was from a 9.4devel checkout. On 9.3, the GIN index takes 58 MB instead of 10 MB. That’s still a lot smaller than a B-tree, but 9.4 will really up the ante.