Abstract
We present Pcodec (Pco), a format and algorithm for losslessly compressing numerical (float or integer) sequences. Pco's core and most novel component is a binning algorithm that quickly converges to the true entropy of smoothly, independently, and identically distributed (SIID) integers. We mathematically prove this convergence with a practical bound. To accommodate data this is not SIID, Pco has two opinionated preprocessing steps. The first step, Pco's mode, decomposes the numbers into more smoothly distributed integer latent variables. The second step, delta encoding, makes the latents more independently and identically distributed. We demonstrate that Pco achieves 29-94% higher compression ratio than other numerical codecs on six real-world columnar datasets while using less compression time.