GA4 Data Retention Workarounds with BigQuery
GA4 deletes raw event data after 14 months maximum — but if you have BigQuery export enabled, that limit simply does not apply to you. Here is everything you need to know to keep your data, query it efficiently, and recover what you can if you started late.
The GA4 Data Retention Problem
GA4 retains user-level and event-level data for a maximum of 14 months. The default setting is two months. You can change this under Admin > Data Settings > Data Retention, but 14 months is the ceiling — there is no way to extend it further within GA4 itself. Once that window closes, granular data is permanently deleted from Google’s servers.
What this means in practice: after your retention window expires, Exploration reports stop working for that historical period. Funnel analyses, path reports, segment overlap, cohort analysis, User Lifetime reports — all of these rely on raw event-level data. When the data is gone, those reports simply return no results for the affected date range. You cannot retroactively apply a new segment to 18 months of user behavior. You cannot build a cohort of users who completed a specific event 16 months ago. That analysis is gone.
There is a nuance here that most guides get wrong. Standard aggregated reports — the ones under the Reports section, including Traffic Acquisition, Engagement, and Monetization — are not affected by data retention settings at all. These reports pull from pre-aggregated tables that Google computes and stores separately. They will continue to show historical trend data as long as your GA4 property exists. So if you only need to know that organic sessions were up 12% year-over-year, you will still see that in a standard report. What you lose is the ability to do anything ad hoc with that history: apply segments retroactively, drill into user-level journeys, or answer questions you did not think to ask in advance.
One additional limitation worth noting: certain user-level dimensions — age, gender, interests — are always capped at two months of retention regardless of your retention setting. The 14-month maximum only applies to event and user data in the general sense.
The Google Analytics help documentation on data retention confirms these limits, but downplays the operational consequences for anyone doing serious analysis.
What You Actually Lose (and What You Keep)
To be concrete about what the retention limit does and does not affect:
Unaffected by retention:
- All standard reports in the Reports section (Traffic Acquisition, Engagement, Monetization, Retention overview, etc.)
- Any metric that relies on pre-aggregated data
- Historical trend lines in standard dimensions
- Audience counts and remarketing lists (governed separately)
Affected by retention:
- Exploration reports of all types: Free Form, Funnel Exploration, Path Exploration, Segment Overlap, Cohort Exploration, User Lifetime
- Any analysis requiring event-level granularity
- Custom segments applied to historical data
- User-scoped dimensions and user properties over time
- Any query or export that needs raw event rows
For surface-level reporting — “how did traffic perform last quarter?” — you will be fine. For anything involving user journeys, behavioral cohorts, multi-step funnels, or retroactive segmentation, the 14-month ceiling is a hard wall. Year-over-year cohort comparisons, long-term attribution analysis, historical user journey mapping — none of this is possible within GA4 alone once data ages out.
The practical consequence is that most analytics teams hit this wall at a specific moment: when a stakeholder asks a question that requires historical event-level data that no longer exists. At that point, if you have no BigQuery export running, the answer is simply: that data is gone.
BigQuery Export as the Primary Workaround
The BigQuery export is the correct solution to this problem. When you link a GA4 property to a BigQuery project, Google writes a raw, event-level copy of every event to tables you own in your own Google Cloud project. This data is entirely decoupled from GA4’s retention settings. You could set GA4 retention to two months tomorrow, and your BigQuery data would remain intact indefinitely. GA4’s deletion logic has no reach into BigQuery.
Google’s BigQuery export documentation describes two export modes:
Daily export creates events_YYYYMMDD tables that finalize approximately 10 hours after the day ends. There is a limit of one million events per day on this export mode. For most sites, this is sufficient.
Streaming export creates events_intraday_YYYYMMDD tables with near-real-time latency (seconds, not hours) and no daily event cap. The cost is an additional $0.05 per GB of streaming inserts, which works out to roughly $0.05 per 600,000 events. When both export modes are active, the intraday table is replaced by the finalized daily table at end of day — you do not end up with duplicate data.
For most properties, enabling both makes sense. The streaming export covers you if you exceed one million daily events, and it provides intraday visibility for operational monitoring. For smaller sites, daily export alone is usually enough.
The most important property of BigQuery export is this: once your data is in BigQuery, it belongs to you. You set the retention policies. You decide what to keep and for how long. You are not subject to GA4’s server-side deletion. As Graphed’s analysis of BigQuery export and data retention confirms, BigQuery export data is completely independent of GA4 retention settings.
Setting Up the BigQuery Export
The setup path is: GA4 Admin > Product Links > BigQuery Links > Link.
You will need a Google Cloud project with billing enabled. The BigQuery API must be active on that project. Walk through the prompts:
- Select your Google Cloud project.
- Choose a data location. This is the region where your BigQuery dataset will be created. Pick the region closest to where you will run queries or where your other data infrastructure lives. This choice cannot be changed after linking — if you pick the wrong region, you will need to unlink and re-link, and you will lose any data accumulated in the interim.
- Select export frequency: Daily, Streaming, or both.
- Select which data streams to include (relevant if you have multiple streams on the property).
One critical limitation: the export is not retroactive. It starts from the moment of linking. Whatever data existed in GA4 before you set up the link stays in GA4 (subject to its retention settings) and does not get exported to BigQuery automatically.
This creates a strong incentive to act immediately. Every day you run a GA4 property without BigQuery export enabled is a day of raw event data that is accumulating only in GA4, subject to deletion. Storage costs in BigQuery for a site generating a few hundred thousand events per day are effectively negligible — the cost of having the data far exceeds the cost of storing it. Enable the export on day one, even if you have no immediate use for it. You will eventually want it.
Cost Breakdown: It Is Cheaper Than You Think
BigQuery pricing for GA4 export data has three components. The current pricing page has the full details, but here is the summary relevant to GA4:
Storage:
- Active storage (data modified in the last 90 days): $0.02 per GB per month
- Long-term storage (data not modified for 90+ days): $0.01 per GB per month
- First 10 GB per month: free
Query processing (on-demand pricing):
- $6.25 per TB scanned
- First 1 TB per month: free
Streaming inserts (if using streaming export):
- $0.05 per GB inserted
Real-world estimates for GA4 data: a medium-traffic site generating around 500,000 events per day produces roughly 25 GB per month of raw event data in BigQuery. At active storage rates, that is about $0.50 per month, dropping to $0.25 per month after 90 days when it moves to long-term pricing. A high-traffic site generating five million events per day produces around 250 GB per month — roughly $5 per month for storage. For most properties, the BigQuery free tier (10 GB storage, 1 TB of queries) covers all costs for the first several months of operation.
The cost argument for enabling BigQuery export is straightforward: for small to medium sites, the cost is zero or near-zero for an extended period. The question is never really “can we afford BigQuery export?” — it is “can we afford not to have it?”
Querying Historical Data: Essential Patterns
GA4 data in BigQuery is fully denormalized and event-level. One row per event. Parameters, user properties, and e-commerce items are stored as nested, repeated fields (arrays of structs). You extract values from these using UNNEST(). The Google Developers documentation on basic queries for GA4 export data covers the fundamentals, but here are the patterns you will actually use for historical analysis.
Basic event query across a date range
Use wildcard table names with _TABLE_SUFFIX to query across multiple daily tables without scanning everything:
SELECT
event_date,
event_name,
user_pseudo_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location
FROM
`your_project.your_dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240101' AND '20260101'
AND event_name = 'page_view'The _TABLE_SUFFIX BETWEEN filter is important. Without it, BigQuery scans every table in the dataset matching the wildcard. With it, only the relevant date-range tables are scanned, which directly controls your query cost.
Year-over-year comparison
WITH current_year AS (
SELECT
FORMAT_DATE('%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS month_day,
COUNT(*) AS sessions_this_year
FROM `your_project.your_dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20251231'
AND event_name = 'session_start'
GROUP BY 1
),
prior_year AS (
SELECT
FORMAT_DATE('%m-%d', PARSE_DATE('%Y%m%d', event_date)) AS month_day,
COUNT(*) AS sessions_prior_year
FROM `your_project.your_dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240101' AND '20241231'
AND event_name = 'session_start'
GROUP BY 1
)
SELECT
c.month_day,
c.sessions_this_year,
p.sessions_prior_year,
ROUND((c.sessions_this_year - p.sessions_prior_year) / p.sessions_prior_year * 100, 1) AS pct_change
FROM current_year c
LEFT JOIN prior_year p USING (month_day)
ORDER BY 1Long-term cohort retention
This query identifies users by their first-event month and measures how many return in subsequent months — something completely impossible in GA4 UI beyond the retention window:
WITH first_touch AS (
SELECT
user_pseudo_id,
MIN(PARSE_DATE('%Y%m%d', event_date)) AS first_date
FROM `your_project.your_dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20230101' AND '20260101'
GROUP BY user_pseudo_id
),
activity AS (
SELECT DISTINCT
user_pseudo_id,
PARSE_DATE('%Y%m%d', event_date) AS activity_date
FROM `your_project.your_dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20230101' AND '20260101'
)
SELECT
FORMAT_DATE('%Y-%m', f.first_date) AS cohort_month,
DATE_DIFF(a.activity_date, f.first_date, MONTH) AS months_since_first,
COUNT(DISTINCT a.user_pseudo_id) AS active_users
FROM first_touch f
JOIN activity a USING (user_pseudo_id)
GROUP BY 1, 2
ORDER BY 1, 2Historical conversion funnel
WITH user_events AS (
SELECT
user_pseudo_id,
event_name,
TIMESTAMP_MICROS(event_timestamp) AS event_time
FROM `your_project.your_dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240101' AND '20260101'
AND event_name IN ('session_start', 'view_item', 'add_to_cart', 'purchase')
),
funnel AS (
SELECT
user_pseudo_id,
MAX(CASE WHEN event_name = 'session_start' THEN 1 ELSE 0 END) AS reached_session,
MAX(CASE WHEN event_name = 'view_item' THEN 1 ELSE 0 END) AS reached_view_item,
MAX(CASE WHEN event_name = 'add_to_cart' THEN 1 ELSE 0 END) AS reached_add_to_cart,
MAX(CASE WHEN event_name = 'purchase' THEN 1 ELSE 0 END) AS reached_purchase
FROM user_events
GROUP BY user_pseudo_id
)
SELECT
SUM(reached_session) AS users_with_session,
SUM(reached_view_item) AS users_viewed_item,
SUM(reached_add_to_cart) AS users_added_to_cart,
SUM(reached_purchase) AS users_purchased
FROM funnelA note on query cost management: an unoptimized query scanning 500 GB of raw GA4 data costs roughly $3.10 at standard on-demand rates. The same query with proper _TABLE_SUFFIX filtering that reduces the scan to 5 GB costs $0.03. Always filter by date range first. For dashboards that run the same queries repeatedly, materialize results into summary tables rather than scanning raw event tables each time.
Backfilling: What If You Did Not Export From Day One
If BigQuery export was not enabled from the beginning, your options narrow considerably, but they are not zero.
BigQuery Data Transfer Service backfill
The BigQuery Data Transfer Service supports manual backfill runs for GA4 data. The catch: it can only backfill data that GA4 still has. If your GA4 property has 14-month retention and you are backfilling within that window, this works. If data has already been deleted by GA4’s retention policy, the transfer service has nothing to pull from. Google Cloud’s documentation on GA4 data transfers covers the specifics of triggering backfill runs.
Google Analytics Data API (Reporting API)
For data beyond what native backfill can reach, the GA4 Reporting API lets you pull aggregated historical data into BigQuery. This is not raw event-level data — you get summarized metrics and dimensions — but it can preserve trend data that would otherwise be inaccessible.
Limitations: 100,000 rows per API request, daily quota limits, and you must specify exactly which dimensions and metrics you want upfront. If you did not think to request a dimension at the time of the API call, you cannot add it retroactively. The Backfill-GA4-to-BigQuery project on GitHub provides an open-source tool to automate this process, and workflow automation tools like N8N can orchestrate the same pattern.
Third-party ETL tools
Fivetran, Stitch, and OWOX all offer connectors that can pull GA4 data into BigQuery. These tools have their own retention and freshness constraints, and they generally wrap the same Reporting API under the hood for historical data. They add operational overhead and cost, but they can be worth it if you need a managed solution and do not want to build the pipeline yourself.
The honest assessment: backfilled API data will never match the granularity of native BigQuery export. You are recovering summaries, not events. The lesson here is always the same — enable the export proactively. Every month you delay is a month that cannot be recovered at event-level granularity.
Long-Term Maintenance and Optimization
Once you have accumulated months or years of data in BigQuery, a few housekeeping practices make the difference between a manageable dataset and one that generates surprise query bills.
Storage tiering happens automatically
BigQuery automatically transitions tables to long-term storage pricing after 90 days of no modifications. Since GA4 export tables are written once and not updated, they will move to long-term pricing ($0.01/GB/month) after 90 days without any action on your part. This is a meaningful cost reduction over time and requires no configuration.
Partitioning is already handled for you
GA4 daily export creates one table per day by design (events_YYYYMMDD). This means your data is already effectively partitioned by date. When you filter on _TABLE_SUFFIX, BigQuery reads only the relevant daily tables and ignores the rest. This is the single most impactful cost-control mechanism for GA4 BigQuery queries, and it is built into the export schema.
Materialize frequent queries
If dashboards or recurring reports scan raw event tables repeatedly, create materialized views or scheduled summary tables. A query that scans 200 GB of raw events to compute daily session counts by channel costs money every time it runs. A scheduled query that pre-aggregates that same data into a small summary table — and that runs once nightly — reduces all subsequent dashboard queries to scanning kilobytes.
Example pattern for a nightly scheduled summary:
-- Run nightly via BigQuery scheduled queries
-- Writes to: your_project.your_dataset.sessions_daily_summary
SELECT
event_date,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
COUNT(DISTINCT CONCAT(
user_pseudo_id, '-',
CAST(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
AS STRING
)
)) AS sessions,
COUNT(DISTINCT user_pseudo_id) AS users
FROM `your_project.your_dataset.events_*`
WHERE
_TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND event_name = 'session_start'
GROUP BY 1, 2, 3Cost monitoring
Set up Cloud Billing budget alerts so you get notified if query spend spikes unexpectedly. Use BigQuery’s INFORMATION_SCHEMA.JOBS view to audit which queries are scanning the most data:
SELECT
user_email,
query,
total_bytes_processed / POW(1024, 3) AS gb_scanned,
total_slot_ms
FROM `region-us`.INFORMATION_SCHEMA.JOBS
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND job_type = 'QUERY'
ORDER BY total_bytes_processed DESC
LIMIT 20This query (substituting the correct region for your dataset) shows you the most expensive queries run in the past week. It is the fastest way to identify queries that are missing date filters or otherwise scanning more data than necessary.
The Bottom Line
GA4’s data retention limit is a real constraint that affects anyone doing serious analytics work. The 14-month ceiling makes long-term cohort analysis, historical user journey mapping, and retroactive segmentation impossible within GA4 alone. Standard aggregated reports survive, but raw event-level analysis does not.
BigQuery export is the definitive solution. It decouples your event data from GA4’s retention policies entirely. The data lives in your Google Cloud project, under your control, for as long as you want to keep it. For most sites, the cost is negligible — often zero for the first several months within BigQuery’s free tier. The setup takes 15 minutes.
If you have a GA4 property and BigQuery export is not enabled, that is the most important thing to fix today. Every day without it is event-level data that will eventually be deleted. The cost of enabling it now is near-zero. The cost of not having it when you eventually need it is high.
If you are already past the point where data has been lost, the Reporting API backfill approach recovers aggregated trends but not raw events. Use it to patch the gap, and treat it as a reminder to never be in that position again.
Related Resources
- Dataform for GA4: Build Your First BigQuery Pipeline — Automate GA4 data transformations with Dataform
- dbt for GA4 BigQuery Exports: Modeling Guide — Structure your GA4 BigQuery data with dbt
- GA4 BigQuery Export Cost Optimization — Keep BigQuery costs under control as data grows
Sources
- GA4 Data Retention — Google Analytics Help
- BigQuery Export — Google Analytics Help
- BigQuery Pricing — Google Cloud
- Basic Queries for GA4 Export Data — Google Developers
- GA4 Data Transfer — Google Cloud Docs
- Introduction to GA4 Export Data in BigQuery — ga4bigquery.com
- Using BigQuery to Overcome GA4 Data Retention Limits — Cypress North
- How to Backfill GA4 Data in BigQuery — Optimize Smart
- Backfill-GA4-to-BigQuery — GitHub
- Does GA4 Data Retention Affect BigQuery Export? — Graphed