Learning Through Differences: Reducing Data Movement in Modern AI Training

Ville Karlsson
Feb 25
2 min read

Training modern AI systems is becoming increasingly expensive. The largest models today require enormous amounts of electricity, and if the trends continue, the models of tomorrow will require even more. While it is natural to think that the main expense comes from doing computation, a significant part of the cost actually comes from moving data around inside computers. During training, models constantly send large amounts of information back and forth between memory and processing units, which quickly racks up the energy costs.

There are two possible ways to reduce the energy costs, either by optimizing the hardware, or by optimizing the software.

One promising direction comes from looking at how the brain works. Biological neurons do not constantly transmit detailed numerical values to each other and instead communicate using short electrical events called action potentials or spikes. These spikes occur only when something important changes and most of the time, the neurons remain silent. Inspired by this, researchers have developed Spiking Neural Networks (SNNs), which aim to process information in a similarly event-driven way the human brain does.

However, training these networks efficiently is no easy task. Most modern AI systems rely on a method called backpropagation, which does yield great results, but is also very communication-heavy. Backpropagation requires sending detailed “error” information backward through the entire network so that each connection can be adjusted. This process typically involves dense, high-precision numerical values and repeated full updates. That style of training does not naturally fit with event-driven, sparse communication.

Another alternative framework is predictive coding, a computational framework inspired by theories of brain learning. In predictive coding, each layer of a network tries to predict the activity of the next layer. Instead of sending full information forward and backward, the network focuses on sending only the difference between what was predicted and what actually happened. Learning happens by gradually reducing these differences.

But even standard predictive coding often relies on dense numerical messages during training, which limits its potential efficiency gains. This is where our work comes in.

In our recent work, we introduce Difference Predictive Coding (DiffPC), a version of predictive coding designed to work more naturally with spiking, event-driven networks. The core idea is simple: instead of repeatedly transmitting full numerical states, the network communicates only small incremental changes, and it does so using sparse spike-like signals. If nothing significant changes, nothing is sent. In this way, communication becomes tied directly to meaningful updates.

We demonstrate that predictive coding can be reformulated to operate using discrete, sparse signals while maintaining competitive accuracy on standard image classification tasks. In our experiments, this approach reduces the amount of information transmitted during training by orders of magnitude when compared to conventional predictive coding implementations.

More broadly, the goal is to better align learning algorithms with the hardware they run on. Energy-efficient, brain-inspired computing systems are designed to exploit sparse, event-driven communication. Training methods that depend on dense, high-precision data exchange limit that potential. By structuring learning around the communication of differences, our work shows a concrete path toward reducing data movement, one of the key contributors to the cost of training modern neural networks.