Automated Anomaly Detection for GA4 with BigQuery
Traffic dropped 40% and no one noticed for three days. A tracking script broke and conversions vanished. Bot traffic inflated your metrics by 200%. These problems are preventable. This guide shows how to build automated anomaly detection using BigQuery ML and Cloud Functions to catch issues before they become disasters.
Why Anomaly Detection Matters
Every analytics team has a horror story.
Traffic dropped 60% because a developer removed the GTM snippet during a deploy. No one noticed for a week because “traffic fluctuates.” By the time someone checked, a month of campaign data was compromised.
Or the opposite: traffic spiked 300% from bot traffic. The team celebrated their “viral moment” while reporting garbage data to stakeholders.
These scenarios share a common failure: no one was watching the data in real-time.
GA4’s built-in alerts are basic. They trigger on simple thresholds that either fire constantly (too sensitive) or miss real problems (too loose). What you need is statistical anomaly detection that understands your traffic patterns and alerts only when something is genuinely wrong.
BigQuery ML makes this possible with time series forecasting.
How Anomaly Detection Works
The approach:
- Build a forecast model that learns your normal traffic patterns (daily cycles, weekly seasonality, trends)
- Compare actual data to what the model predicted
- Flag anomalies when actual values fall outside the prediction interval
- Alert automatically when anomalies occur
If your model predicts 10,000 sessions on Tuesday (±1,500), and you get 4,000, that’s an anomaly worth investigating. If you get 9,200, that’s normal variation.
Prerequisites
- GA4 BigQuery Export enabled (setup guide)
- At least 60 days of historical data (more is better for seasonality)
- Cloud Functions access for automated alerting (optional)
Step 1: Prepare Daily Metrics
First, aggregate GA4 events into daily metrics we want to monitor:
CREATE OR REPLACE TABLE `project.dataset.daily_metrics` AS
SELECT
PARSE_DATE('%Y%m%d', event_date) as date,
-- Traffic metrics
COUNT(DISTINCT user_pseudo_id) as users,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)) as sessions,
COUNT(*) as total_events,
-- Engagement metrics
COUNTIF(event_name = 'page_view') as page_views,
COUNTIF(event_name = 'scroll') as scroll_events,
-- Conversion metrics
COUNTIF(event_name = 'purchase') as purchases,
SUM(IF(event_name = 'purchase',
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value'),
0)) as revenue,
COUNTIF(event_name = 'generate_lead') as leads,
-- Quality indicators
COUNTIF(event_name = 'page_view') / NULLIF(COUNT(DISTINCT user_pseudo_id), 0) as pages_per_user,
COUNTIF(event_name = 'purchase') / NULLIF(COUNT(DISTINCT user_pseudo_id), 0) as conversion_rate
FROM `project.analytics_PROPERTY_ID.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
GROUP BY event_date
ORDER BY date;This gives you a clean time series table with one row per day.
Step 2: Build ARIMA Forecast Models
ARIMA (AutoRegressive Integrated Moving Average) models learn patterns in time series data:
- Daily patterns: Traffic is higher on weekdays
- Weekly seasonality: Mondays differ from Saturdays
- Trends: Gradual growth or decline over time
- Holiday effects: Spikes during sales periods
BigQuery ML handles the complexity automatically.
Model for Session Count
CREATE OR REPLACE MODEL `project.dataset.sessions_forecast_model`
OPTIONS(
model_type='ARIMA_PLUS',
time_series_timestamp_col='date',
time_series_data_col='sessions',
auto_arima=TRUE,
data_frequency='DAILY',
holiday_region='IT' -- Adjust to your region: US, GB, DE, FR, etc.
) AS
SELECT date, sessions
FROM `project.dataset.daily_metrics`
WHERE date < CURRENT_DATE(); -- Train on historical dataModel for Conversions
CREATE OR REPLACE MODEL `project.dataset.conversions_forecast_model`
OPTIONS(
model_type='ARIMA_PLUS',
time_series_timestamp_col='date',
time_series_data_col='purchases',
auto_arima=TRUE,
data_frequency='DAILY',
holiday_region='IT'
) AS
SELECT date, purchases
FROM `project.dataset.daily_metrics`
WHERE date < CURRENT_DATE();Model for Revenue
CREATE OR REPLACE MODEL `project.dataset.revenue_forecast_model`
OPTIONS(
model_type='ARIMA_PLUS',
time_series_timestamp_col='date',
time_series_data_col='revenue',
auto_arima=TRUE,
data_frequency='DAILY',
holiday_region='IT'
) AS
SELECT date, revenue
FROM `project.dataset.daily_metrics`
WHERE date < CURRENT_DATE();Training takes 5-15 minutes per model. Create models for each metric you want to monitor.
Step 3: Generate Forecasts with Confidence Intervals
The model predicts expected values plus upper/lower bounds:
CREATE OR REPLACE TABLE `project.dataset.sessions_forecast` AS
SELECT
forecast_timestamp as date,
forecast_value as predicted_sessions,
prediction_interval_lower_bound as lower_bound,
prediction_interval_upper_bound as upper_bound,
standard_error
FROM ML.FORECAST(
MODEL `project.dataset.sessions_forecast_model`,
STRUCT(
30 AS horizon, -- Forecast 30 days ahead
0.95 AS confidence_level -- 95% confidence interval
)
);The confidence interval is key: values outside this range are statistically unusual given your historical patterns.
Step 4: Detect Anomalies
Compare actual values to forecasts:
CREATE OR REPLACE TABLE `project.dataset.anomalies` AS
WITH actuals AS (
SELECT date, sessions as actual_sessions
FROM `project.dataset.daily_metrics`
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
),
forecasts AS (
SELECT
DATE(date) as date,
predicted_sessions,
lower_bound,
upper_bound
FROM `project.dataset.sessions_forecast`
)
SELECT
a.date,
a.actual_sessions,
f.predicted_sessions,
f.lower_bound,
f.upper_bound,
-- Calculate deviation
ROUND((a.actual_sessions - f.predicted_sessions) / f.predicted_sessions * 100, 1) as percent_deviation,
-- Flag anomalies
CASE
WHEN a.actual_sessions < f.lower_bound THEN 'LOW'
WHEN a.actual_sessions > f.upper_bound THEN 'HIGH'
ELSE 'NORMAL'
END as anomaly_status,
-- Severity (how far outside bounds)
CASE
WHEN a.actual_sessions < f.lower_bound THEN
ROUND((f.lower_bound - a.actual_sessions) / f.lower_bound * 100, 1)
WHEN a.actual_sessions > f.upper_bound THEN
ROUND((a.actual_sessions - f.upper_bound) / f.upper_bound * 100, 1)
ELSE 0
END as severity_percent
FROM actuals a
JOIN forecasts f USING (date)
ORDER BY date DESC;Query results:
| date | actual | predicted | lower | upper | deviation | status | severity |
|---|---|---|---|---|---|---|---|
| 2026-01-31 | 4,200 | 9,800 | 7,500 | 12,100 | -57.1% | LOW | 44.0% |
| 2026-01-30 | 10,100 | 9,600 | 7,300 | 11,900 | +5.2% | NORMAL | 0 |
| 2026-01-29 | 9,400 | 9,500 | 7,200 | 11,800 | -1.1% | NORMAL | 0 |
January 31st shows a clear anomaly: actual sessions are 44% below the lower bound.
Step 5: Automated Alerting
Option A: Scheduled Query + Email
Create a scheduled query that runs daily and sends results:
-- Save this as a scheduled query with email notification
SELECT
date,
actual_sessions,
predicted_sessions,
percent_deviation,
anomaly_status,
severity_percent
FROM `project.dataset.anomalies`
WHERE anomaly_status != 'NORMAL'
AND date = CURRENT_DATE() - 1 -- Yesterday's data
ORDER BY severity_percent DESC;In BigQuery:
- Click Schedule on the query
- Set to run daily at 9 AM
- Enable Email notifications for query results
Option B: Cloud Function for Slack/PagerDuty
For real-time alerts, trigger a Cloud Function:
1. Create the Cloud Function (Python):
import functions_framework
from google.cloud import bigquery
import requests
import os
SLACK_WEBHOOK = os.environ.get('SLACK_WEBHOOK')
@functions_framework.http
def check_anomalies(request):
client = bigquery.Client()
query = """
SELECT date, actual_sessions, predicted_sessions,
percent_deviation, anomaly_status, severity_percent
FROM `project.dataset.anomalies`
WHERE anomaly_status != 'NORMAL'
AND date >= CURRENT_DATE() - 1
ORDER BY severity_percent DESC
LIMIT 5
"""
results = client.query(query).result()
anomalies = list(results)
if anomalies:
message = "🚨 *GA4 Anomaly Detected*\n\n"
for row in anomalies:
emoji = "📉" if row.anomaly_status == "LOW" else "📈"
message += f"{emoji} *{row.date}*: {row.actual_sessions:,} sessions "
message += f"({row.percent_deviation:+.1f}% from expected)\n"
requests.post(SLACK_WEBHOOK, json={"text": message})
return f"Checked {len(anomalies)} anomalies"2. Schedule with Cloud Scheduler:
gcloud scheduler jobs create http anomaly-check \
--schedule="0 9 * * *" \
--uri="https://REGION-PROJECT.cloudfunctions.net/check_anomalies" \
--http-method=GETNow you get Slack alerts whenever traffic deviates significantly from expected.
Detecting Specific Problems
Bot Traffic Detection
Bots often show unusual patterns: high page views per session, zero scroll events, specific user agents.
-- Identify suspicious traffic patterns
SELECT
date,
users,
sessions,
page_views,
scroll_events,
-- Bot indicators
ROUND(page_views / NULLIF(sessions, 0), 1) as pages_per_session,
ROUND(scroll_events / NULLIF(page_views, 0) * 100, 1) as scroll_rate,
-- Flag potential bot days
CASE
WHEN page_views / NULLIF(sessions, 0) > 20 THEN 'High pages/session'
WHEN scroll_events / NULLIF(page_views, 0) < 0.1 THEN 'Low scroll rate'
ELSE 'Normal'
END as bot_indicator
FROM `project.dataset.daily_metrics`
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
ORDER BY date DESC;Real users scroll. Real users don’t view 50 pages per session. Sudden changes in these ratios indicate bot activity.
Tracking Implementation Issues
When tracking breaks, specific events disappear while others remain:
-- Compare event ratios over time
WITH daily_events AS (
SELECT
PARSE_DATE('%Y%m%d', event_date) as date,
COUNTIF(event_name = 'page_view') as page_views,
COUNTIF(event_name = 'scroll') as scrolls,
COUNTIF(event_name = 'purchase') as purchases,
COUNTIF(event_name = 'add_to_cart') as cart_adds
FROM `project.analytics_PROPERTY_ID.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY))
GROUP BY event_date
)
SELECT
date,
page_views,
scrolls,
purchases,
cart_adds,
-- Ratios that should be stable
ROUND(scrolls / NULLIF(page_views, 0) * 100, 1) as scroll_to_pv_ratio,
ROUND(cart_adds / NULLIF(page_views, 0) * 100, 2) as cart_to_pv_ratio,
ROUND(purchases / NULLIF(cart_adds, 0) * 100, 1) as purchase_to_cart_ratio
FROM daily_events
ORDER BY date DESC;If scroll_to_pv_ratio suddenly drops to zero, your scroll tracking broke. If purchase_to_cart_ratio drops to zero but cart adds continue, the purchase event is failing.
Conversion Rate Anomalies
Monitor conversion rate independently of volume:
CREATE OR REPLACE MODEL `project.dataset.conversion_rate_forecast`
OPTIONS(
model_type='ARIMA_PLUS',
time_series_timestamp_col='date',
time_series_data_col='conversion_rate',
auto_arima=TRUE,
data_frequency='DAILY'
) AS
SELECT
date,
SAFE_DIVIDE(purchases, users) as conversion_rate
FROM `project.dataset.daily_metrics`
WHERE users > 100 -- Exclude low-traffic days
AND date < CURRENT_DATE();A sudden drop in conversion rate—even if traffic is normal—indicates checkout issues, payment failures, or site problems.
Building a Monitoring Dashboard
Create a Looker Studio dashboard that shows:
1. Actual vs Predicted (time series chart)
SELECT
a.date,
a.actual_sessions,
f.predicted_sessions,
f.lower_bound,
f.upper_bound
FROM `project.dataset.daily_metrics` a
LEFT JOIN `project.dataset.sessions_forecast` f ON a.date = DATE(f.date)
WHERE a.date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
ORDER BY date;2. Anomaly Summary (scorecard)
SELECT
COUNTIF(anomaly_status = 'LOW') as low_anomalies,
COUNTIF(anomaly_status = 'HIGH') as high_anomalies,
MAX(CASE WHEN anomaly_status != 'NORMAL' THEN severity_percent END) as max_severity
FROM `project.dataset.anomalies`
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY);3. Event Ratio Health (table)
SELECT
date,
scroll_to_pv_ratio,
cart_to_pv_ratio,
purchase_to_cart_ratio
FROM `project.dataset.event_ratios`
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY date DESC;Maintenance
Retrain Models Monthly
Traffic patterns change. Retrain models to capture new baselines:
-- Schedule this query to run on the 1st of each month
CREATE OR REPLACE MODEL `project.dataset.sessions_forecast_model`
OPTIONS(
model_type='ARIMA_PLUS',
time_series_timestamp_col='date',
time_series_data_col='sessions',
auto_arima=TRUE,
data_frequency='DAILY',
holiday_region='IT'
) AS
SELECT date, sessions
FROM `project.dataset.daily_metrics`
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 180 DAY)
AND date < CURRENT_DATE();Tune Sensitivity
If you get too many alerts, widen the confidence interval:
-- 99% confidence = fewer alerts, only extreme anomalies
SELECT * FROM ML.FORECAST(
MODEL `project.dataset.sessions_forecast_model`,
STRUCT(30 AS horizon, 0.99 AS confidence_level)
);If you’re missing problems, tighten it:
-- 90% confidence = more sensitive
SELECT * FROM ML.FORECAST(
MODEL `project.dataset.sessions_forecast_model`,
STRUCT(30 AS horizon, 0.90 AS confidence_level)
);Cost Estimate
| Component | Monthly Cost |
|---|---|
| ARIMA model training (3 models) | ~$3-5 |
| Daily forecast queries | ~$1-2 |
| Daily metrics table | ~$0.50 |
| Cloud Function (if used) | ~$0 (free tier) |
| Total | ~$5-10/month |
Far cheaper than the cost of not noticing a tracking failure for a week.
Summary
Automated anomaly detection transforms GA4 from passive reporting to active monitoring:
- Aggregate daily metrics from GA4 export
- Train ARIMA models that learn your traffic patterns
- Generate forecasts with confidence intervals
- Flag anomalies when actuals fall outside bounds
- Alert automatically via email, Slack, or PagerDuty
The system catches problems humans miss: gradual declines, subtle tracking failures, bot traffic that inflates metrics.
Start with session monitoring. Once that’s running, add conversion rate and revenue models. The goal isn’t perfect prediction—it’s catching problems before they become disasters.
Related Resources
- Predicting Customer Value with BigQuery ML
- AI-Powered Audience Segmentation: K-Means Clustering - Use clustering to filter anomalies by audience segment
- GA4 Debug Mode: Complete Troubleshooting Guide
- Tracking Strategy: Build a Measurement System That Works