About

elwood-spatial is an information-theoretic outlier detection library for temporal-spatial networks. It has been originally developed and purposed for detecting outliers in air-sensor networks (measuring PM2.5 during smoke events), however it can be broadly applied to fit other use cases in enivronmental monitoring and beyond. This package implements the methods described in the paper Detecting outliers in PM2.5 air sensor networks during smoke events using information theory and machine learning by Stuart J. Illson and Karoline K. Barkjohn.

Citing This Package

If you use elwood-spatial in your research, please cite it as follows:

Illson, S. (2026). elwood-spatial: Information-Theoretic Outlier Detection
for Spatial Networks (v0.1.1). Zenodo. https://doi.org/10.5281/zenodo.18856270

DOI: 10.5281/zenodo.18856270

Why Information Theory?

Information theory, first introduced by Claude (Elwood) Shannon in 1948, provides a mathematical framework for quantifying uncertainty and surprise in data. Originally developed to optimize the transmission of messages over communication channels, it has since been extended to fields as diverse as genetics, neuroscience, climate science, and machine learning.

The two principal concepts, entropy and information content, offer potential utility in interpreting sensor behavior within a network. Entropy captures the degree of disorder, or unpredictability, within a system. In a sensor network, high entropy reflects disorder or instability, such as conditions where smoke intrusion is causing nearby devices to report drastically different measurements. Conversely, low entropy indicates broad agreement and observed stability across the network.

The information content of a device within its network reflects the unexpectedness, or surprise, of a given value being seen relative to its neighbors. This can characterize the state of the network as a whole, as well as identify individual devices that deviate from their local network.

These information-theoretic concepts work in concert: they accommodate networks of varying size and device type, while providing measures of disorder that remain meaningful during rapidly evolving events, such as wildfire smoke intrusion.

How It Works in Practice

Information-theoretic methods inherently scale with network density, adapting to networks where any node may have many or few neigboring observations, allowing adoption without the need for reparameterization. They can detect non-linear relationships in the data, making them robust to complex and shifting patterns

One of the key advantages is that the framework will defer classification during periods of high uncertainty: when the network becomes disordered, premature characterizations are avoided. Because the approach is contingent on network agreement, the networks sizes and parameter sensitivity can be tuned to tolerate varying levels of disorder across spatial proximities or event types, from highly localized anomalies to very large events. This prevents the removal of potentially valuable data under unstable conditions.

One caveat is that measurements need to be discretized into categorical bins. This can be done according to your domain specific criteria. The detection rule flags a device as an outlier only when three conditions are simultaneously met:

Outlier detection equation
  1. The device's measurement carries high information content; it is surprising relative to its network Information content equation
  2. The network has low entropy; neighboring devices are in broad agreement (ordered network) Shannon entropy equation
  3. The device's bin classification is sufficiently distant from the network average in measurement space Bin deviation equation
Variable definitions for the outlier equation
Temporal Context and Machine Learning

While the rule-based equation operates on a single time step, the research also explored extending the framework by incorporating temporal behavior through a gradient-boosted decision tree (XGBoost) model. By adding features that capture recent measurement history, the model can identify faulty devices whose behavior is diverging from their neighbors over time, even when the instantaneous deviation is subtle.

Amending with Coefficient of Variation

Using the coefficient of variation (CV) can be useful in amending the outlier list after the information-theoretic rule has run. CV measures how dynamic a device's readings have been over a recent window (typically 24 hours), expressed as a percentage of the mean.

This amendment works in two directions:

  • Adding stuck sensors: A device reporting a high value with very low CV (e.g. ≤ 7.5%) is likely flatlined or stuck, it hasn't changed over time. These sensors may not be caught by the information-theoretic rule if their network is in broad agreement with the stuck reading. The CV check catches them independently.
  • Removing dynamic sensors: Conversely, a flagged sensor with high CV is actively changing. This suggests it may be responding to a real event, even if it is anamolous. Removing it from the outlier list gives the event time to resolve naturally.
Choosing your approach depends on your use case. For example, when considering public health messaging, removing high-CV sensors from the outlier list allows genuine anomalous events to resolve without prematurely discarding valid data. For sensor maintenance and QA/QC, adding low-CV sensors helps catch stuck devices early before they degrade data quality over time. The CV threshold and value criteria should be tuned to your domain, they are not universal constants. See the Inspecting Parameters guide for code examples.

Author

Stuart Illson.

License

elwood-spatial is released under the BSD 3-Clause License.

Huxley the dog and Elwood the crow
Huxley (dog) and Elwood (crow). Bonded over a shared love of dog treats and hanging out in the front yard in the sun.