Skip to main content

Compression

Values of a field in either MetaChunks or DataChunks and output cohorts are compressed to reduce the storage footprint. COOL uses different strategies tailed for the types of data to compress.

Compressor

A Compressor contains one specific compression strategy for a list of values. COOL currently provides the following compressors.

BitVectorCompressor

A list of non-negative integers is compressed into a bit vector of n bits (n being the max value among them). A bit is turned on if the corresponding integer is present in the list.

DeltaCompressor

A list of integers is compressed with delta encoding. The min and max values are recorded and the deltas are fed into a ZIntCompressor. It is applied to Metric

LZ4JavaCompressor

LZ4 is used to compress string values in COOL.

RLECompressor

Run length encoding (RLE) is applied to an integer sequence in which the same data value occurs consecutively. It is applied to UserKey.

SimpleBitSetCompressor

It is used to compress bitsets with RLE. It is used to store a pre-calculated bit set of a field.

ZIntCompressor

The ZintCompressor aims to use the least bytes (among 1, 2, and 4 bytes) to pack integers in a buffer. It is used to compress the values of a field in DataChunk.

ZIntBitCompressor

It is used to compress integer field values when other Integer compressions are not suitable. Also, it is currently used to compress the cohort result in cohort selection.

Histogram

The Histogram describes the characteristic of a sequence of values, including sort, min, max, count, etc.

CompressorAdvisor and CompressorFactory

CompressorAdvisor picks the suitable compressor based on Histogram and CompressorFactory creates the compressor.

OutputCompressor

OutputCompressor writes out data in compressed format.

Extension

Developers are welcome to introduce new compression methods by implementing the Compressor interface, modify the CompressorFactory with a matching Codec, and changes the codec assignment for an internal data structure in CompressorAdvisor.