Adding Isolation Forest Anomaly Scoring to FlowEnricher: practical, fast NetFlow Machine Learning

Teaching FlowEnricher to Spot Weirdos: Isolation Forest Joins the Party

tl;dr: We added unsupervised anomaly detection to FlowEnricher using an Isolation Forest microservice. It scores per-IP behavior in real time and helps catch stealthy port scans and low-and-slow DoS bursts that signatures miss. Yeap, Machine Learning in netflows.

Why Isolation Forest?

Rule engines are great at “known patterns.” But attackers get creative. Isolation Forest learns what’s normal for your network and flags outliers—no labels required.

How it works

  • FlowEnricher aggregates flows per source and builds compact feature vectors (packets/sec, bytes/sec, unique destinations, SYN ratio, entropies…).

  • Vectors are POSTed to a tiny Python service (FastAPI + scikit-learn). It maintains an Isolation Forest model.

  • The service replies with an anomaly score (0..1). FlowEnricher can log it, visualize it in ClickHouse, or use it directly in rules.

High-level architecture
NetFlow/IPFIX –> flowenricher (Go)  –> Detector
– enrichment (ASN/GeoIP/PTR/SNMP) -> ClickHouse
– feature extraction per window -> POST to ML scorer /score
– receives ml_score -> feed into detection engine
– detection rules can use Isolation Forest alongside rule conditions

Ops, not research

  • No GPU, no massive frameworks. A ~30MB container scores vectors in sub-millisecond time.

  • Retraining is cheap: point it at a rolling baseline every 5–15 minutes.

  • It’s optional—feature-flagged and hot-reloadable. You can A/B it alongside the classic rules.

What it catches well

  • Horizontal scans: one source, many destinations → high uniqueness + entropy.

  • Vertical scans: many ports on one host → high unique ports, SYN ratio.

  • Weird mixes: atypical packet sizes / protocol shares.

What it won’t solve

  • Encrypted exfiltration that mimics business traffic perfectly (no silver bullets).

  • Poor baselines (train on clean intervals!).