Paper
Bayesian Attention Networks for Data Compression
Published Mar 29, 2021 · Michael Tetelman
ArXiv
0
Citations
0
Influential Citations
Abstract
The lossless data compression algorithm based on Bayesian Attention Networks is derived from first principles. Bayesian Attention Networks are defined by introducing an attention factor per a training sample loss as a function of two sample inputs, from training sample and prediction sample. By using a sharpened Jensen's inequality we show that the attention factor is completely defined by a correlation function of the two samples w.r.t. the model weights. Due to the attention factor the solution for a prediction sample is mostly defined by a few training samples that are correlated with the prediction sample. Finding a specific solution per prediction sample couples together the training and the prediction. To make the approach practical we introduce a latent space to map each prediction sample to a latent space and learn all possible solutions as a function of the latent space along with learning attention as a function of the latent space and a training sample. The latent space plays a role of the context representation with a prediction sample defining a context and a learned context dependent solution used for the prediction.
Sign up to use Study Snapshot
Consensus is limited without an account. Create an account or sign in to get more searches and use the Study Snapshot.
Full text analysis coming soon...