Columnar Data Storage

Storing data on disk in a columnar form is widely regarded as delivering better query performance for analytic query processing workloads. PADB’s columnar storage implementation is transparent to applications and users – data is retrieved via SQL as with any relational database.

The catalyst for column-oriented storage is that functions like scans, joins, aggregations, and sorts are cumbersome for standard row-wise databases to perform because they must read an entire record to use any of its fields. Row-wise storage is highly appropriate for transaction processing, which involves quickly writing or retrieving a whole record by its ID (e.g., a customer account record). During analytic processing, however, when a business analyst is determining the number of customers with a certain characteristic in common, this would result in what is known as a “full table scan” (all columns and all rows must be read).

Columnar Data Storage

Row-wise databases offer workarounds to improve performance for read-mostly queries (e.g., indexes, materialized views, summary tables, subsets), but with increased costs. The costs are attributable to increased system and administration resource requirements as well as system size and data redundancy. Specifically, adding indexes and other workarounds reduces load performance and requires attention to design, implement, and maintain the structures. These elements add latency to data availability. According to Forrester’s Boris Evelson, these types of structures and redundancy can increase system storage requirements by as much as 8-to-1 or 16-to-1.

In sharp contrast, column-wise storage is very much aligned with analytic queries because they are typically concerned with only a fraction of the columns defined in a table. By reducing the processing to only the columns of interest, PADB greatly reduces I/O processing, which directly improves performance. Furthermore, PADB avoids indexes and other data structures because columnar storage, by design, lends itself to the same kind of performance that these structures are intended to provide.

How Does Columnar Orientation Impact Design Considerations?

PADB’s columnar implementation is transparent to applications so designers can focus on application functionality and not physical design. Tables are created with SQL, the same as with any relational database, but the data blocks are written by column instead of row. PADB automatically converts the records into columns as it writes the data to disk. To an end user, it behaves no differently than a row-wise database, except it’s faster.