Introducing the New CFM Web Detector

A Production-Grade, ML-Ready Traffic Analytics & Abuse-Detection Engine for Nginx, Apache & LiteSpeed

CFM (Configurable Firewall Manager) started as a modern nftables-first firewall manager designed for high-security hosting and infrastructure operators.
Over time, it evolved into a complete security platform: dynamic firewalling, live log-driven detection, autoblocking, system hardening, notifications, DNS/GeoIP enrichment, and API integration.

Today, CFM takes another major step forward with the introduction of the Unified Web Detector — a near-real-time HTTP analytics and suspicious-behavior engine that works across:

  • Nginx

  • Apache HTTPD

  • LiteSpeed

It ingests access logs (file or journald), computes metrics in one-minute sliding windows, enriches the data (ASN, GeoIP, PTR), and exposes this information via CLI, API, and a future web UI.

This post explains what it is, how it works, how to use it, and how it is ML-ready for the next phase of anomaly detection (HBOS, eHBOS, Isolation Forest).

☑️ In Short — What Is CFM?

CFM is an all-in-one firewall + intrusion detection + autoblock manager for Linux servers.
It combines:

  • nftables policy control

  • dynamic ALLOW/BLOCK mechanisms

  • log-driven detection (SSH, Exim, MySQL, FTP, ModSecurity, cPanel)

  • portscan & flood detection

  • autoblocking with TTL

  • enrichment (ASN, Country, PTR)

  • notifications (email, Slack, SMTP)

  • MaxMind auto-updates

  • API integration with external systems

It is designed for shared hosting providers, WordPress/WHM/cPanel infrastructure, self-hosted web stacks, and bare-metal servers that need strong security automation.


🔥 Introducing the Unified Web Detector

Previously, CFM had separate detectors for Nginx and Apache.
They are now replaced with one unified engine.

It implements a high-performance sliding-window analytics engine that:

  • Parses web server access logs in real time

  • Maintains per-vhost metrics

  • Computes suspiciousness scores

  • Tracks IPs, user agents, referrers, and paths

  • Provides instant drill-down reports

  • Is fully ML-ready (HBOS/eHBOS/iForest)

Supported input sources

CFM currently supports:

  • file tailing (e.g., /var/log/nginx/access.log)

  • journald streams (journalctl -u nginx, docker-logs equivalent)

The detector supports Nginx, Apache, and LiteSpeed via sample configs shipped in package (/usr/share/cfm)

📊 What the Web Detector Tracks

For every vhost per minute, CFM computes:

✔ Request metrics

  • total — total requests

  • 2xx, 3xx, 4xx, 5xx

  • 401 and 499 (Apache-specific: 499 appears only if logged)

✔ Rates & ratios

  • RPS (requests-per-second)

  • err% (error ratio)

  • auth401 ratio

  • avg response time (avg_rt)

✔ Unique metrics

  • number of unique source IPs

  • direct traffic % (no referrer)

  • bot traffic % (Googlebot, Facebook, etc.)

✔ Top lists

For each vhost:

  • Top Source IPs

  • Top User Agents

  • Top Referrers

  • Top Paths

All of these are exposed both via CLI and JSON API.


🔎 Suspicious Vhost Detection

CFM evaluates each vhost once per sliding window and assigns a Suspicious Score.
This is NOT a traditional anomaly detection yet — but the design is ML-ready and can easily upgrade to HBOS/eHBOS later.

The engine detects vhosts that might be attacked, scanned, brute-forced, or otherwise behaving strangely.

Suspicious score inputs include:

  • High 4xx/5xx ratio

  • High 3xx ratio from a single IP

  • Unusual spike in UniqueIPs

  • Very high average response time

  • Bad-Agent concentration

  • High direct traffic %

  • Repeated hits from same AS / Country

Each vhost gets:

  • score (0.0–1.0)

  • reasons (human-readable list)

  • Additional metrics (unique IPs, rps, 3xx/4xx/5xx counts, 401 ratio)

The CLI (cfm httpd-top, cfm nginx-top) displays:

Suspicious vhosts

—- Suspicious vhosts —-
HOST SCORE REASONS RPS 3xx 4xx 5xx uniqIP err%
example.com 0.82 high_3xx,spike_ips 6.2 3.8 1.1 0.2 41 17.4

 

This immediately shows which vhosts might be under:

  • bot probing

  • credential stuffing

  • brute-force scans

  • misconfiguration loops

  • plugin/theme issues


🧪 ML-Ready Architecture

The suspicious scoring infrastructure is intentionally built so that ML can be slotted in without rewriting code.

Future algorithm plug-ins:

  • HBOS — Histogram-Based Outlier Score

  • eHBOS — Ensemble HBOS

  • Isolation Forest — tree-based anomaly scoring

  • Z-Score / statistical baselines

  • PCA pre-processing

  • Auto-thresholding per vhost

CFM can easily emit feature vectors:

  • RPS

  • 3xx/4xx/5xx ratios

  • unique IPs

  • bot percentage

  • RT average

  • entropy of UAs/referrers

🔧 HTTP Debug API (for dashboards & Grafana)

CFM exposes metrics on 127.0.0.1:6060:

/httpd/top

/nginx/top

JSON output for dashboards.

/httpd/host?name=example.com

Full drill-down JSON.

/httpd/suspicious?min=0.6

Machine-readable suspicious vhosts.

/debug/pprof/*

Full pprof profiler for tuning performance.

This allows:

  • building a Grafana dashboard

  • building a Web UI

  • piping metrics to ClickHouse

  • testing ML algorithms live


🧩 Architecture Flow (High Level)

+——————+
| Access Logs        |
| nginx / apache / litespeed |
+——–+———+
              |
              v
(Tailing or Journald)
              |
              v
+———————+
| Web Detector          |
| – per-vhost metrics |
| – sliding windows   |
+———+———–+
                |
                v
+—————————-+
| Suspicious Scoring Engine |
| (Rules now, ML later)         |
+————+—————+
                    |
                    v
+——————+ +—————-+
| CLI (cfm top) | | Debug JSON API |
+——————+ +—————-+

🎯 Why This Matters

Modern websites—especially WordPress, WooCommerce, cPanel/DirectAdmin deployments—are hit constantly by:

  • automated crawlers

  • low-skill scanners

  • credential stuffing

  • plugin exploit sweeps

  • global botnets

  • brute-force HTTP auth

  • SEO spam attacks

Traditional firewalls don’t see these because:

  • they act per-packet, not per-request

  • they don’t understand vhosts

  • they don’t understand user agents

  • they don’t track 3xx/4xx/5xx patterns

  • they don’t track referrers

  • they don’t track paths

  • they don’t enrich IPs

CFM fills that gap.

It gives you hosting-grade, NOC-grade visibility into your web traffic — instantly — with no external dependencies.


🏁 Final Notes

With this new web detector, CFM now offers:

✔ Real-time vhost analytics

✔ Suspicious activity detection

✔ Per-IP HTTP classification

✔ Response-time performance insights

✔ Enriched visibility (ASN, PTR, Country)

✔ ML-ready feature vectors

✔ Unified behavior for Nginx, Apache, LiteSpeed

✔ Dramatically better debugging & monitoring

You now have the foundation for:

  • HTTP anomaly detection

  • Auto-block policies based on vhost behavior

  • Web application threat detection

  • Performance & tuning analytics

  • Abuse detection with minimal CPU overhead