BigQuery Scheduled Queries for GA4 Automated Reporting

Raw GA4 event data sitting in BigQuery is only useful once it becomes clean, automated reporting tables. This guide covers how to set up BigQuery scheduled queries for GA4 — from timing and pricing to MERGE patterns and when to move to Dataform.

Why Automate GA4 Reporting in BigQuery

The GA4 BigQuery export gives you raw event-level data. Every pageview, click, and purchase lands in events_YYYYMMDD tables as nested, repeated rows. That is genuinely useful — but it is useless for stakeholders until it is transformed into clean, queryable reporting tables.

Most teams start the same way: run a query manually, copy the results into Sheets, email them around. That pattern breaks the moment someone asks for last week’s data broken out by campaign, or when the person who maintained the Sheet leaves the team. It does not scale, it is not reproducible, and it creates invisible single points of failure.

Scheduled queries solve this. You write standard SQL, configure a recurring schedule, and BigQuery writes results directly into destination tables. Those tables feed Looker Studio dashboards, Connected Sheets reports, and alerting pipelines — without anyone manually running anything.

This article covers how to set up BigQuery scheduled queries specifically for GA4 data. It includes timing considerations around GA4’s export schedule, incremental loading patterns, working SQL for common reporting tables, and an honest assessment of when scheduled queries are not enough and you should move to Dataform.


GA4Propertyraw eventsdailyexportBigQueryevents_YYYYMMDDpartitioned by event_dateraw event tablesMERGE3-daylookbackScheduled Query · daily 10:00Reportingga4_daily_sessionsga4_conversionsga4_product_perffeedsDestinationsLooker StudioConnected SheetsEmail via Cloud Fn

GA4 exports raw events daily into BigQuery partitioned tables. A scheduled query runs a MERGE with a 3-day lookback into clean reporting tables, which feed Looker Studio dashboards, Connected Sheets, and email alerts.


How BigQuery Scheduled Queries Work

BigQuery scheduled queries run on top of the Data Transfer Service (DTS). You write a standard SQL query, assign a schedule, and configure how results are written.

The minimum schedule interval is 15 minutes. The default is every 24 hours. Schedules are defined as cron-like strings — for example, every 24 hours, every monday 09:00, or a full cron expression like 0 10 * * *.

Write dispositions for SELECT queries:

  • WRITE_TRUNCATE — overwrites the destination table on each run
  • WRITE_APPEND — appends new rows to the destination table

DDL/DML queries (CREATE TABLE, INSERT, MERGE, DELETE) do not use a write disposition because they modify tables directly. You leave the destination table blank and let the SQL handle everything.

Parameterized date references are the most useful feature for GA4 pipelines. BigQuery injects two parameters into scheduled query runs:

  • @run_date — the logical date the query runs for (a DATE type)
  • @run_time — a TIMESTAMP representing the scheduled run time

Using @run_date in your WHERE clause makes queries idempotent and avoids hardcoding dates. For example:

WHERE event_date = FORMAT_DATE('%Y%m%d', @run_date)

Other configuration options worth knowing: an optional Pub/Sub topic for run completion notifications (important for monitoring and chaining), an optional service account for authentication (strongly recommended for production), and the destination dataset (must be in the same project as the query).


Pricing — What Scheduled Queries Cost

Scheduled queries have no separate pricing tier. They are billed exactly like interactive queries under whichever pricing model your project uses.

On-demand pricing charges $6.25 per TiB of data processed, with the first 1 TiB per month free. A GA4 property generating one million events per day produces roughly 1–2 GB of raw export data per day. A daily summary query that scans a single day’s partition costs fractions of a cent on on-demand pricing.

Capacity (Editions) pricing uses reserved slots at a flat rate regardless of bytes scanned. If you already have slot commitments for other workloads, scheduled queries run within those reservations at no additional cost.

The real cost risk is bad query patterns running on a tight schedule. A query that scans 90 days of unpartitioned GA4 data every hour will exhaust your free tier quickly and generate unexpected on-demand charges.

Partition pruning is mandatory. Always filter on event_date when querying GA4 tables. BigQuery partitions the export tables by event_date, so a filter like WHERE event_date = FORMAT_DATE('%Y%m%d', @run_date) eliminates every other partition from the scan. Without it, every run scans your entire historical dataset.


GA4 Export Timing — Intraday vs Daily Tables

This is where scheduled queries break in production if you get it wrong.

GA4 exports data to BigQuery in two table types:

  • events_intraday_YYYYMMDD — updated continuously throughout the day, typically with a lag of minutes to about an hour
  • events_YYYYMMDD — the finalized daily table, written after the day ends in the property’s configured timezone

For standard GA4 properties, the daily table is typically ready by mid-morning the following day. For GA4 360 properties, Google sends a Pub/Sub message when the daily export is confirmed complete — typically around 5am in the property timezone.

The timing trap: Once the daily table is finalized, the intraday table for that date is deleted. If you schedule a query at 2am expecting yesterday’s data in events_20260405, it may not exist yet. Your query either returns no rows or fails with a table-not-found error.

A safe default: Schedule daily reporting queries at 10am–12pm in the property’s timezone, targeting the previous day (DATE_SUB(@run_date, INTERVAL 1 DAY)). This gives the daily export sufficient time to finalize in the overwhelming majority of cases.

The late-event problem: GA4 can backfill daily tables for up to 72 hours after the date. Mobile SDK event batches and Measurement Protocol hits can arrive late. If your pipeline only processes event_date = yesterday, you will miss these events. The production-grade solution is to reprocess the last 3 days on every run and use a MERGE to deduplicate — covered in the incremental patterns section below.


SQL Pattern — Daily Session Summary Table

The session summary table is the foundation of most GA4 reporting setups. It aggregates event-level data into one row per session with traffic source and key engagement metrics.

GA4 does not store sessions as first-class entities — a session is a derived concept based on user_pseudo_id and the ga_session_id event parameter. Here is a production-ready scheduled query:

MERGE `your_project.your_dataset.ga4_daily_sessions` AS target
USING (
  WITH raw_sessions AS (
    SELECT
      event_date,
      user_pseudo_id,
      (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
      collected_traffic_source.manual_source AS source,
      collected_traffic_source.manual_medium AS medium,
      collected_traffic_source.manual_campaign_name AS campaign,
      MAX(IF(event_name = 'session_start', 1, 0)) AS has_session_start,
      MAX(IF(event_name IN ('purchase', 'generate_lead'), 1, 0)) AS has_conversion,
      SUM(IF(event_name = 'purchase', ecommerce.purchase_revenue, 0)) AS revenue,
      COUNTIF(
        (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = 1
      ) AS engaged_event_count,
      SUM(
        COALESCE(
          (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec'),
          0
        )
      ) / 1000 AS engagement_time_sec,
      COUNT(*) AS event_count
    FROM `your_project.analytics_XXXXXXXXX.events_*`
    WHERE
      _TABLE_SUFFIX BETWEEN
        FORMAT_DATE('%Y%m%d', DATE_SUB(@run_date, INTERVAL 3 DAY))
        AND FORMAT_DATE('%Y%m%d', DATE_SUB(@run_date, INTERVAL 1 DAY))
    GROUP BY event_date, user_pseudo_id, session_id, source, medium, campaign
  )
  SELECT
    PARSE_DATE('%Y%m%d', event_date) AS event_date,
    COALESCE(source, '(direct)') AS source,
    COALESCE(medium, '(none)') AS medium,
    COALESCE(campaign, '(not set)') AS campaign,
    COUNT(*) AS sessions_count,
    COUNTIF(engaged_event_count > 0) AS engaged_sessions,
    SUM(engagement_time_sec) AS total_engagement_time_sec,
    SUM(has_conversion) AS conversions,
    SUM(revenue) AS revenue
  FROM raw_sessions
  WHERE has_session_start = 1
  GROUP BY event_date, source, medium, campaign
) AS source_data
ON target.event_date = source_data.event_date
   AND target.source = source_data.source
   AND target.medium = source_data.medium
   AND target.campaign = source_data.campaign
WHEN MATCHED THEN UPDATE SET
  sessions_count = source_data.sessions_count,
  engaged_sessions = source_data.engaged_sessions,
  total_engagement_time_sec = source_data.total_engagement_time_sec,
  conversions = source_data.conversions,
  revenue = source_data.revenue
WHEN NOT MATCHED THEN INSERT (
  event_date, source, medium, campaign,
  sessions_count, engaged_sessions,
  total_engagement_time_sec, conversions, revenue
)
VALUES (
  source_data.event_date, source_data.source, source_data.medium, source_data.campaign,
  source_data.sessions_count, source_data.engaged_sessions,
  source_data.total_engagement_time_sec, source_data.conversions, source_data.revenue
);

The query uses a 3-day lookback window to handle late-arriving events. The MERGE on event_date + source + medium + campaign makes reruns idempotent. This table becomes the backbone for Looker Studio dashboards and weekly email digests.


SQL Pattern — Conversion Attribution Table

GA4’s standard attribution model in the UI gives you limited control. Building attribution logic in BigQuery gives you full control.

The following query implements last-non-direct-click attribution. For each converting session, it walks back through the user’s session history and credits the most recent non-direct touchpoint:

MERGE `your_project.your_dataset.ga4_conversions_attributed` AS target
USING (
  WITH sessions AS (
    SELECT
      event_date,
      user_pseudo_id,
      (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
      MIN(event_timestamp) AS session_start_ts,
      collected_traffic_source.manual_source AS source,
      collected_traffic_source.manual_medium AS medium,
      MAX(IF(event_name IN ('purchase', 'generate_lead'), 1, 0)) AS is_conversion,
      SUM(IF(event_name = 'purchase', ecommerce.purchase_revenue, 0)) AS revenue,
      MIN(
        IF(event_name IN ('purchase', 'generate_lead'), event_timestamp, NULL)
      ) AS conversion_ts
    FROM `your_project.analytics_XXXXXXXXX.events_*`
    WHERE
      _TABLE_SUFFIX BETWEEN
        FORMAT_DATE('%Y%m%d', DATE_SUB(@run_date, INTERVAL 3 DAY))
        AND FORMAT_DATE('%Y%m%d', DATE_SUB(@run_date, INTERVAL 1 DAY))
    GROUP BY event_date, user_pseudo_id, session_id, source, medium
  ),
  converting_sessions AS (
    SELECT * FROM sessions WHERE is_conversion = 1
  ),
  attributed AS (
    SELECT
      c.event_date,
      c.user_pseudo_id,
      c.session_id AS conversion_session_id,
      c.conversion_ts,
      c.revenue,
      COALESCE(
        FIRST_VALUE(
          IF(
            NOT (
              COALESCE(s.source, '(direct)') = '(direct)'
              AND COALESCE(s.medium, '(none)') = '(none)'
            ),
            s.source, NULL
          ) IGNORE NULLS
        ) OVER (
          PARTITION BY c.user_pseudo_id, c.session_id
          ORDER BY s.session_start_ts DESC
          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ),
        '(direct)'
      ) AS attributed_source,
      COALESCE(
        FIRST_VALUE(
          IF(
            NOT (
              COALESCE(s.source, '(direct)') = '(direct)'
              AND COALESCE(s.medium, '(none)') = '(none)'
            ),
            s.medium, NULL
          ) IGNORE NULLS
        ) OVER (
          PARTITION BY c.user_pseudo_id, c.session_id
          ORDER BY s.session_start_ts DESC
          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ),
        '(none)'
      ) AS attributed_medium
    FROM converting_sessions c
    JOIN sessions s
      ON c.user_pseudo_id = s.user_pseudo_id
     AND s.session_start_ts <= c.conversion_ts
  )
  SELECT DISTINCT
    event_date,
    user_pseudo_id,
    conversion_session_id,
    conversion_ts,
    revenue,
    attributed_source,
    attributed_medium
  FROM attributed
) AS source_data
ON target.user_pseudo_id = source_data.user_pseudo_id
   AND target.conversion_session_id = source_data.conversion_session_id
WHEN MATCHED THEN UPDATE SET
  attributed_source = source_data.attributed_source,
  attributed_medium = source_data.attributed_medium,
  revenue = source_data.revenue
WHEN NOT MATCHED THEN INSERT ROW;

This model can be extended: swap the FIRST_VALUE window for a SUM across all sessions for linear attribution, or add position weighting for U-shaped models. The key advantage over doing this in Looker Studio is that the logic is centralized, versioned, and runs on fresh data automatically.


SQL Pattern — E-commerce Product Performance Table

For e-commerce properties, the raw GA4 export stores item data in a repeated items array field. You need to unnest it before aggregating at the product level:

MERGE `your_project.your_dataset.ga4_product_performance` AS target
USING (
  SELECT
    PARSE_DATE('%Y%m%d', event_date) AS event_date,
    item.item_id,
    MAX(item.item_name) AS item_name,
    MAX(item.item_category) AS item_category,
    COUNTIF(event_name = 'view_item') AS item_views,
    COUNTIF(event_name = 'add_to_cart') AS add_to_carts,
    COUNTIF(event_name = 'purchase') AS purchases,
    SUM(IF(event_name = 'purchase', item.item_revenue, 0)) AS revenue,
    SAFE_DIVIDE(
      COUNTIF(event_name = 'purchase'),
      NULLIF(COUNTIF(event_name = 'view_item'), 0)
    ) AS view_to_purchase_rate
  FROM `your_project.analytics_XXXXXXXXX.events_*`,
    UNNEST(items) AS item
  WHERE
    event_name IN ('view_item', 'add_to_cart', 'purchase')
    AND _TABLE_SUFFIX BETWEEN
      FORMAT_DATE('%Y%m%d', DATE_SUB(@run_date, INTERVAL 3 DAY))
      AND FORMAT_DATE('%Y%m%d', DATE_SUB(@run_date, INTERVAL 1 DAY))
  GROUP BY event_date, item.item_id
) AS source_data
ON target.event_date = source_data.event_date
   AND target.item_id = source_data.item_id
WHEN MATCHED THEN UPDATE SET
  item_views = source_data.item_views,
  add_to_carts = source_data.add_to_carts,
  purchases = source_data.purchases,
  revenue = source_data.revenue,
  view_to_purchase_rate = source_data.view_to_purchase_rate
WHEN NOT MATCHED THEN INSERT ROW;

The resulting table is compact and cheap to query from Looker Studio — a product performance dashboard scanning this pre-aggregated table reads kilobytes instead of gigabytes.


Incremental Patterns — Append vs Replace vs MERGE

There are three ways to write data from a scheduled query:

WRITE_TRUNCATE (replace) is the simplest. Each run overwrites the destination table completely. Fine for small tables or when you always reprocess everything, but wasteful for large historical tables.

WRITE_APPEND adds new rows each run without touching existing data. It is efficient but creates duplicates if the same date is processed twice — which happens whenever a query overlaps with a previously processed range, or when you rerun a failed query. For GA4 specifically, the 72-hour late-event window means WRITE_APPEND almost always produces incorrect aggregates unless you manually manage partition deletes.

MERGE is the right pattern for production GA4 pipelines. It handles late-arriving events cleanly and is idempotent — running the same query ten times produces the same result as running it once. Use a DML query (no destination table setting needed in the console) and a 3-day lookback window:

WHERE event_date BETWEEN
  FORMAT_DATE('%Y%m%d', DATE_SUB(@run_date, INTERVAL 3 DAY))
  AND FORMAT_DATE('%Y%m%d', DATE_SUB(@run_date, INTERVAL 1 DAY))

Combined with a MERGE keyed on the natural grain of the output table, this handles the late-event problem automatically.


Sending Results to Google Sheets and Email

Scheduled queries write to BigQuery tables. They do not write directly to Sheets or send email. Getting data to end users requires one more step.

Connected Sheets is the most native option. Link a Google Sheet to the BigQuery table your scheduled query populates, configure a scheduled refresh (hourly, daily, or weekly), and the Sheet always reflects the latest data. No code required beyond the initial setup. The main limitation is the 10 million cell limit, which is not a constraint for most reporting tables.

Apps Script with the BigQuery connector is more flexible for smaller teams. A script runs on a time-based trigger, reads from your pre-built reporting table, and writes results to a Sheet. More customizable than Connected Sheets but requires some scripting.

Email delivery requires more infrastructure. The Google Cloud reference architecture uses Cloud Scheduler to trigger a Cloud Function that queries BigQuery and sends results via the Gmail API or SendGrid. If you have already configured a Pub/Sub topic on your scheduled query, trigger the Cloud Function on query completion instead — this way the email only goes out when data is confirmed ready.


Error Handling and Monitoring

A scheduled query that fails silently is worse than no automation at all.

Email notifications are enabled by default and send on failure. Useful for catching problems, but too slow for production alerting if data freshness matters hour-to-hour.

Pub/Sub notifications are the production-grade option. Configure a Pub/Sub topic when creating the scheduled query, and BigQuery publishes a message on every run completion — success or failure. The message includes the run status, start and end times, error messages, and bytes processed.

To monitor across all scheduled queries, use the bigquerydatatransfer.googleapis.com/transfer_config/completed_runs metric in Cloud Monitoring. Set an alert policy that fires when completion_state != SUCCEEDED for any critical query. The BigQuery documentation on scheduled query alerts covers the exact metric filters needed.

Common failure modes for GA4 scheduled queries:

  • Query runs before the daily export table exists (fix: adjust schedule time or add a table existence guard)
  • Destination dataset permissions changed or were revoked
  • Service account token expired (reason to use a dedicated service account from day one)
  • Query exceeds on-demand quota or slot capacity
  • The GA4 property’s BigQuery dataset was moved or renamed

When to Graduate to Dataform

Scheduled queries work well for up to about five independent queries with no dependencies between them. Beyond that, you start adding buffer time between queries to ensure each one finishes before the next starts. A 30-minute buffer becomes 2 hours after the second incident. Your pipeline takes 6 hours to complete 20 minutes of actual work, and you still cannot guarantee correctness.

Dataform solves the dependency problem. You declare that ga4_conversion_attribution depends on ga4_daily_sessions, and Dataform runs them in the correct order, only starting the dependent table after the upstream table completes. It also adds:

  • Version-controlled SQL — transformations live in git, not the BigQuery console
  • Incremental table definitions — built-in MERGE patterns without writing raw DML
  • Assertions — data quality tests that fail the pipeline if row counts drop or NULL rates spike
  • Documentation — column descriptions, lineage graphs

Dataform is free within Google Cloud. You pay only for the BigQuery compute it triggers, at standard on-demand rates. For most GA4 reporting setups with fewer than 10 queries, start with scheduled queries and migrate to Dataform when you hit dependency or maintainability pain — not before.

Other orchestration options: dbt (richer ecosystem, excellent testing framework, but not GCP-native), Cloud Composer/Airflow (appropriate for multi-system pipelines, significant overhead for pure-BQ work), Cloud Workflows (lightweight option for chaining a handful of steps).


Step-by-Step Setup Walkthrough

BigQuery Console

  1. Write and test your SQL query. Confirm it returns expected results against a known date range.
  2. Click Schedule > Create new scheduled query in the query editor toolbar.
  3. Name it descriptively: ga4_daily_sessions_by_source, ga4_product_performance_merge.
  4. Set the schedule. For GA4 daily reporting: every day 10:00 in the property’s timezone.
  5. For SELECT queries: set destination table and write preference. For DML queries: leave destination blank.
  6. Under Advanced options: configure a Pub/Sub topic for notifications.
  7. Configure a service account with BigQuery Data Editor on the destination dataset and BigQuery Job User on the project.
  8. Save, then click Run now to trigger a manual run. Verify results before relying on the schedule.

bq CLI

bq mk \
  --transfer_config \
  --project_id=your_project \
  --data_source=scheduled_query \
  --display_name='GA4 Daily Sessions by Source' \
  --target_dataset=your_dataset \
  --params='{
    "query": "SELECT ...",
    "destination_table_name_template": "ga4_daily_sessions",
    "write_disposition": "WRITE_TRUNCATE",
    "partitioning_field": "event_date"
  }' \
  --schedule='every day 10:00'

Terraform

resource "google_bigquery_data_transfer_config" "ga4_daily_sessions" {
  display_name           = "GA4 Daily Sessions by Source"
  location               = "US"
  data_source_id         = "scheduled_query"
  schedule               = "every day 10:00"
  destination_dataset_id = google_bigquery_dataset.reporting.dataset_id
  service_account_name   = google_service_account.bq_scheduled.email

  params = {
    query                           = file("${path.module}/sql/ga4_daily_sessions.sql")
    destination_table_name_template = "ga4_daily_sessions"
    write_disposition               = "WRITE_TRUNCATE"
    partitioning_field              = "event_date"
  }
}

Using Terraform gives you the same infrastructure-as-code benefits as the rest of your GCP stack: code review, change history, and the ability to recreate your pipeline from scratch in a new project.


Conclusion — Building a Lightweight GA4 Data Pipeline

Scheduled queries are the lowest-friction way to turn raw GA4 event data into automated, reliable reporting tables. They require no infrastructure beyond BigQuery, cost nothing beyond standard query pricing, and you can have a working pipeline in under 30 minutes.

Three decisions determine whether your pipeline is solid or fragile:

Timing. Schedule after the GA4 daily export completes — 10am–12pm in the property’s timezone targeting the previous day. Do not assume the daily table exists at 2am.

Write strategy. Use MERGE with a 3-day lookback window for production tables. It handles late-arriving GA4 events, is idempotent, and costs very little extra compared to debugging duplicate or missing rows.

Monitoring. Configure a Pub/Sub topic on every scheduled query you care about and set up a Cloud Monitoring alert for failed runs. Silent failures are the only kind that are truly dangerous.

For most teams, scheduled queries feeding Connected Sheets or Looker Studio cover 80% of what a full data pipeline would deliver at a fraction of the complexity. When dependency chains emerge, when you need data quality tests, or when you want SQL under version control, Dataform is the natural next step.

Start simple. Automate the queries your team runs manually every week. Build from there.


Sources