Multi-recipient routing. Per-account or org-wide. Severity threshold per recipient.
Product · AWS Advanced Tier Services Partner
Catch unusual AWS spend before it compounds
Statistical baselines per service plus an AI commentary layer that explains why the line moved. Free, multi-account, multi-region.
By HabileLabs
HIGHNAT Gateway · us-east-1+218% vs baseline
“Started 09:14 with the image-resize deploy — a retry loop is hammering NAT. VPC endpoints would cut ~$764/mo.”
Flagged the morning it started — 19 days before the invoice.
How It Works
Statistical Baseline + AI Commentary
Fewer false positives than naive threshold alerts. Plain-English explanations on every alert so on-call does not have to dig.
- 1
Establish Baseline
Rolling 30-day window per service per account, calibrated to ignore weekday-vs-weekend and end-of-month invoice patterns.
- 2
Detect Deviations
Robust statistical thresholds (median absolute deviation, not naive mean) so noisy workloads do not bury real signals.
- 3
Apply AI Commentary
A natural-language layer turns the raw anomaly into a one-sentence explanation: "spike in running instances, possible missed shutdown scheduling."
- 4
Route To Recipients
Severity-aware routing to the email addresses you configure per account. Slack, Teams, and PagerDuty land when the integrations ship.
Examples from the product
What an Anomaly looks like in your Inbox
Each card is the exact shape of the notification you receive — service + region + delta + AI commentary.
Amazon EC2
us-east-1Medium+7% week-over-week
Spike in running instances starts Friday 19:00 and continues through weekend. Possible missed shutdown scheduling on the staging cluster.
AWS Lambda
us-east-1High+152% in 24 hours
Function `nightly-report-runner` invoked at 2.5× normal volume. Likely a runaway loop — concurrency limit not set. Worth checking the invocation source.
Amazon S3
eu-central-1Medium+23% on storage
Standard-tier growth on logs-prod bucket. Lifecycle policy may have regressed — last successful Glacier transition was 14 days ago.
Data Transfer
cross-regionHigh+34% cross-region
New traffic flow us-east-1 → ap-south-1 starting Tuesday. Check VPC peering, CloudFront origin, or recent deployment touching cross-region replication.
Amazon RDS
us-east-1Info-100% (likely deletion)
Instance `analytics-prod-replica` disappeared from the bill. If intentional, dismiss. If not, this is a costly accident — check CloudTrail before backups expire.
Notification channels
Email Today - The Rest, on the Roadmap.
We list channels honestly — what works now, and what is shipping next. No false claims to lure a click.
Slack
Channel routing by severity. Sandbox + production splits supported.
Microsoft Teams
Adaptive card alerts with one-click drill into the dashboard.
PagerDuty
On-call routing for high-severity anomalies. Auto-resolve when the next baseline window closes.
Tune sensitivity per service
Some workloads are inherently noisy — batch jobs, ML training, dev environments. Refine lets you tune thresholds per service so the noise stays quiet and the real signal cuts through.
- Per-service sensitivity (Low / Medium / High / Off)
- Per-account overrides
- Quiet hours for non-production accounts
- Mute by tag — e.g. Environment=dev
Frequently Asked Questions
- Refine builds a 30-day rolling baseline once you connect. New accounts see initial low-confidence alerts within 7 days; full-confidence anomalies after 30 days of history. You can also feed historical CUR to bootstrap immediately.
Stop finding cost spikes on the invoice
60-second setup. Free forever. Read-only access.
Refine is built and supported by HabileLabs, an AWS Advanced Tier Services Partner.