elwood_spatial.features
Feature engineering utilities for ML-based outlier detection. Adds temporal, network, and neighbor-deviation features to a DataFrame for use with models like XGBoost.
coefficient_of_variation(values)
Compute the coefficient of variation as a percentage: CV = (std / mean) × 100.
| Parameter | Type | Description |
|---|---|---|
values | pd.Series | np.ndarray | Numeric values |
Returns float. Returns NaN if mean is zero.
add_rolling_features(df, ...)
Add temporal rolling-window features per device.
| Parameter | Type | Default | Description |
|---|---|---|---|
df | pd.DataFrame | Input data | |
id_column | str | "id" | Device ID column |
time_column | str | "timestamp" | Timestamp column |
value_column | str | "value" | Measurement column |
windows | list[str] | ["3h", "6h"] | Rolling window sizes |
Adds columns per window: cv_{W}, range_{W}, std_{W}. Also adds delta, directionality, outage.
df = es.add_rolling_features(df, windows=["3h", "6h", "12h"]) add_network_features(df, network, bins, ...)
Add information-theoretic network features at each timestep.
| Parameter | Type | Default | Description |
|---|---|---|---|
df | pd.DataFrame | Input data | |
network | Network | Spatial network | |
bins | BinSpec | Bin specification | |
id_column | str | "id" | Device ID column |
time_column | str | "timestamp" | Timestamp column |
value_column | str | "value" | Measurement column |
Adds columns: network_entropy, information, bin_deviation, neighbor_count.
add_neighbor_deviation_features(df, network, feature_columns, ...)
For each feature column, add a column measuring deviation from the neighborhood mean.
| Parameter | Type | Default | Description |
|---|---|---|---|
df | pd.DataFrame | Input data (must have network features already) | |
network | Network | Spatial network | |
feature_columns | list[str] | Columns to compute deviations for | |
id_column | str | "id" | Device ID column |
time_column | str | "timestamp" | Timestamp column |
For each column col, adds {col}_neighbor_dev.