Predicting Customer Value with BigQuery ML and GA4 Data

Why Predict Customer Behavior?

Every user on your site leaves behavioral signals. Page views, scroll depth, time on site, products viewed, cart additions. These signals correlate with outcomes: purchases, sign-ups, churn.

Traditional analytics tells you what happened. Predictive analytics tells you what will happen.

With predictions, you can:

Target high-propensity users with Google Ads before they convert elsewhere
Identify churn risk and trigger retention campaigns
Allocate budget toward users likely to have high lifetime value
Personalize experiences based on predicted interests

The challenge has always been building these models. Until BigQuery ML.

What is BigQuery ML?

BigQuery ML lets you create and execute machine learning models using standard SQL. No Python, no TensorFlow, no ML infrastructure to manage.

CREATE MODEL `project.dataset.my_model`
OPTIONS(model_type='LOGISTIC_REG') AS
SELECT * FROM training_data;

That’s it. BigQuery handles feature engineering, training, hyperparameter tuning, and model deployment.

Supported model types include:

Linear regression - Predict continuous values (revenue, LTV)
Logistic regression - Predict binary outcomes (will purchase / won’t purchase)
K-means clustering - Segment users into groups
XGBoost - High-accuracy predictions with gradient boosting
Deep Neural Networks - Complex pattern recognition
ARIMA - Time series forecasting

For most marketing use cases, logistic regression and XGBoost cover 90% of needs.

Prerequisites

1. GA4 BigQuery Export

Your GA4 property must export data to BigQuery. This is free and takes 5 minutes to set up:

In GA4, go to Admin > BigQuery Links
Click Link and select your Google Cloud project
Choose Daily export (streaming adds cost)
Enable Include advertising identifiers if you want to build Google Ads audiences

Data starts flowing within 24 hours. You’ll get tables like:

analytics_PROPERTY_ID.events_YYYYMMDD
analytics_PROPERTY_ID.events_intraday_YYYYMMDD

2. BigQuery Access

You need a Google Cloud project with BigQuery enabled. If you linked GA4, you already have this.

Billing must be enabled, but model training on GA4 data typically costs $1-10 depending on data volume.

Building a Purchase Propensity Model

Let’s build a model that predicts which users will purchase in the next 7 days.

Step 1: Understand Your Data

First, explore what’s available:

-- See all event types in your data
SELECT
  event_name,
  COUNT(*) as event_count
FROM `project.analytics_PROPERTY_ID.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260130'
GROUP BY event_name
ORDER BY event_count DESC
LIMIT 20;

Common events useful for prediction:

page_view - Engagement level
scroll - Content consumption
view_item - Product interest
add_to_cart - Purchase intent
begin_checkout - Strong purchase intent
purchase - Our target variable

Step 2: Create Training Features

We need to transform raw events into features (input variables) that predict purchase behavior.

CREATE OR REPLACE TABLE `project.dataset.training_data` AS

WITH user_features AS (
  SELECT
    user_pseudo_id,

    -- Engagement metrics
    COUNT(*) as total_events,
    COUNT(DISTINCT event_date) as active_days,
    COUNTIF(event_name = 'page_view') as page_views,
    COUNTIF(event_name = 'scroll') as scroll_events,

    -- Purchase intent signals
    COUNTIF(event_name = 'view_item') as items_viewed,
    COUNTIF(event_name = 'add_to_cart') as cart_additions,
    COUNTIF(event_name = 'begin_checkout') as checkout_starts,

    -- Session metrics
    COUNT(DISTINCT (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) as sessions,

    -- Device and traffic
    MAX(device.category) as device_category,
    MAX(traffic_source.medium) as traffic_medium,

    -- Time-based
    MIN(PARSE_DATE('%Y%m%d', event_date)) as first_seen,
    MAX(PARSE_DATE('%Y%m%d', event_date)) as last_seen

  FROM `project.analytics_PROPERTY_ID.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260123'  -- Training period
  GROUP BY user_pseudo_id
),

-- Label: Did user purchase in the following 7 days?
purchase_labels AS (
  SELECT DISTINCT
    user_pseudo_id,
    1 as purchased
  FROM `project.analytics_PROPERTY_ID.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20260124' AND '20260130'  -- Prediction window
    AND event_name = 'purchase'
)

SELECT
  f.*,
  DATE_DIFF(f.last_seen, f.first_seen, DAY) as days_active,
  COALESCE(l.purchased, 0) as will_purchase  -- Target variable
FROM user_features f
LEFT JOIN purchase_labels l USING (user_pseudo_id);

This creates a table where each row is a user with:

Features: Their behavior during the training period
Label: Whether they purchased in the following 7 days (1 or 0)

Step 3: Train the Model

Now create the model:

CREATE OR REPLACE MODEL `project.dataset.purchase_propensity_model`
OPTIONS(
  model_type='LOGISTIC_REG',
  input_label_cols=['will_purchase'],
  auto_class_weights=TRUE,  -- Handle imbalanced data (few purchasers vs many non-purchasers)
  enable_global_explain=TRUE  -- See feature importance
) AS
SELECT
  -- Exclude user_pseudo_id and date fields from features
  total_events,
  active_days,
  page_views,
  scroll_events,
  items_viewed,
  cart_additions,
  checkout_starts,
  sessions,
  device_category,
  traffic_medium,
  days_active,
  will_purchase
FROM `project.dataset.training_data`;

Training takes 2-10 minutes depending on data size. BigQuery handles everything: feature scaling, regularization, cross-validation.

Step 4: Evaluate the Model

Check how well the model performs:

SELECT *
FROM ML.EVALUATE(MODEL `project.dataset.purchase_propensity_model`);

Key metrics to look at:

Metric	What It Means	Good Value
precision	Of users predicted to purchase, how many actually did?	> 0.3
recall	Of users who purchased, how many did we predict?	> 0.5
accuracy	Overall correct predictions	> 0.7
f1_score	Balance of precision and recall	> 0.4
roc_auc	Model’s ability to distinguish purchasers	> 0.7

For marketing use cases, roc_auc > 0.7 indicates a useful model.

Step 5: Understand Feature Importance

See which behaviors most predict purchases:

SELECT *
FROM ML.GLOBAL_EXPLAIN(MODEL `project.dataset.purchase_propensity_model`)
ORDER BY attribution DESC;

Typical findings:

cart_additions and checkout_starts are strongest predictors (obviously)
items_viewed matters more than page_views
active_days shows engaged users convert more
traffic_medium reveals which channels bring high-intent users

Making Predictions

Score Current Users

Apply the model to users from the last 7 days:

CREATE OR REPLACE TABLE `project.dataset.user_propensity_scores` AS

WITH recent_users AS (
  SELECT
    user_pseudo_id,
    COUNT(*) as total_events,
    COUNT(DISTINCT event_date) as active_days,
    COUNTIF(event_name = 'page_view') as page_views,
    COUNTIF(event_name = 'scroll') as scroll_events,
    COUNTIF(event_name = 'view_item') as items_viewed,
    COUNTIF(event_name = 'add_to_cart') as cart_additions,
    COUNTIF(event_name = 'begin_checkout') as checkout_starts,
    COUNT(DISTINCT (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) as sessions,
    MAX(device.category) as device_category,
    MAX(traffic_source.medium) as traffic_medium,
    DATE_DIFF(MAX(PARSE_DATE('%Y%m%d', event_date)), MIN(PARSE_DATE('%Y%m%d', event_date)), DAY) as days_active
  FROM `project.analytics_PROPERTY_ID.events_*`
  WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
    AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
  GROUP BY user_pseudo_id
)

SELECT
  user_pseudo_id,
  predicted_will_purchase_probs[OFFSET(1)].prob as purchase_probability
FROM ML.PREDICT(
  MODEL `project.dataset.purchase_propensity_model`,
  (SELECT * FROM recent_users)
);

Each user now has a purchase_probability score from 0 to 1.

Create Audience Segments

Segment users by propensity:

SELECT
  CASE
    WHEN purchase_probability >= 0.7 THEN 'High Intent'
    WHEN purchase_probability >= 0.4 THEN 'Medium Intent'
    WHEN purchase_probability >= 0.1 THEN 'Low Intent'
    ELSE 'Very Low Intent'
  END as segment,
  COUNT(*) as users,
  AVG(purchase_probability) as avg_probability
FROM `project.dataset.user_propensity_scores`
GROUP BY 1
ORDER BY avg_probability DESC;

Exporting to Google Ads

The real power comes from acting on predictions. Export high-propensity users to Google Ads for targeted campaigns.

Option 1: GA4 Audiences via BigQuery

If you have GA4 360 or use the Measurement Protocol, you can send propensity scores back to GA4 as user properties, then build audiences in GA4’s UI.

Option 2: Google Ads Data Manager

In Google Ads, go to Tools > Data Manager
Connect your BigQuery project
Create an audience from your propensity table

-- Create a table formatted for Google Ads
SELECT
  user_pseudo_id as user_id,
  'High Intent Purchasers' as audience_name
FROM `project.dataset.user_propensity_scores`
WHERE purchase_probability >= 0.6;

Option 3: Customer Match

Export emails/phone numbers (if you have them linked to user_pseudo_id) for Customer Match upload.

Going Further: LTV Prediction

Purchase propensity predicts if someone will buy. Lifetime Value (LTV) predicts how much they’ll spend over time.

LTV Model

CREATE OR REPLACE MODEL `project.dataset.ltv_model`
OPTIONS(
  model_type='LINEAR_REG',  -- Regression for continuous values
  input_label_cols=['future_revenue']
) AS

WITH user_history AS (
  -- Features from first 30 days
  SELECT
    user_pseudo_id,
    COUNTIF(event_name = 'purchase') as purchases_30d,
    SUM(IFNULL((SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value'), 0)) as revenue_30d,
    COUNT(DISTINCT event_date) as active_days_30d
  FROM `project.analytics_PROPERTY_ID.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260130'
  GROUP BY user_pseudo_id
),

future_value AS (
  -- Revenue in next 90 days
  SELECT
    user_pseudo_id,
    SUM(IFNULL((SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value'), 0)) as future_revenue
  FROM `project.analytics_PROPERTY_ID.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20260131' AND '20260430'
    AND event_name = 'purchase'
  GROUP BY user_pseudo_id
)

SELECT
  h.*,
  COALESCE(f.future_revenue, 0) as future_revenue
FROM user_history h
LEFT JOIN future_value f USING (user_pseudo_id);

Now you can identify not just who will buy, but who will become your most valuable customers.

Best Practices

Data Quality

Minimum 30 days of training data
At least 1000 conversion events for reliable models
Consistent tracking - model breaks if events change

Feature Engineering

Ratios often outperform counts: cart_additions / items_viewed (cart rate) can be stronger than raw counts
Recency matters: Weight recent behavior higher than old behavior
Exclude obvious signals: Don’t include begin_checkout if you want to catch users earlier in the funnel

Model Maintenance

Retrain monthly - user behavior shifts over time
Monitor performance - track precision/recall on a holdout set
A/B test campaigns - verify that targeting high-propensity users actually improves ROI

Cost Considerations

BigQuery ML pricing:

Operation	Cost
Model training	$250 per TB processed
Predictions	$5 per TB processed
Storage	$0.02 per GB/month

For a typical GA4 export (1M users, 30 days):

Training: ~$2-5
Daily predictions: ~$0.50
Storage: ~$1/month

Far cheaper than external ML platforms, and no data leaves Google’s infrastructure.

Summary

BigQuery ML transforms GA4 data from historical reporting into predictive intelligence:

Export GA4 to BigQuery (free, 5 minutes)
Create training features from user behavior
Train a model with one SQL statement
Score users to get propensity/LTV predictions
Export to Google Ads for targeted campaigns

The models aren’t perfect, but they don’t need to be. A model that identifies high-intent users with 70% accuracy still dramatically improves campaign ROI compared to broad targeting.

Start with purchase propensity. Once that’s working, expand to churn prediction, LTV modeling, and product recommendations.

Automated Anomaly Detection for GA4 with BigQuery - Monitor your models and data quality with statistical anomaly detection
AI-Powered Audience Segmentation: K-Means Clustering - Discover natural user groups to complement propensity scoring
Tracking Strategy: How to Build a Measurement System That Works
GA4 Custom Dimensions & Metrics
Google Cloud BigQuery ML Documentation