Tracking AI referral traffic
ChatGPT, Gemini, Perplexity, and Claude are sending real traffic to real websites. GA4 buckets most of it as Direct or Referral and tells you almost nothing about which assistant sent it, what the user asked, or whether they converted. This guide is the actual mechanics — what to match on, what to ignore, and what the traffic predicts.
Why GA4 doesn't show ChatGPT traffic by default
Three separate things break the default channel grouping at once. None of them are GA4 bugs — they are how the AI assistants are built.
1. Stripped referrers. When an assistant renders a link as a clickable citation, most of them open it in a way that strips the HTTP Referer header. The browser arrives at your site with no referrer at all, so GA4 falls back to (direct) / (none). That session looks identical to someone who typed your URL into the address bar.
2. In-app browsers. The ChatGPT and Perplexity mobile apps open links inside an embedded WebView. That WebView may or may not send a referrer, may identify with a non-standard user-agent string, and frequently does not run third-party cookies. GA4 sees the session, but the dimensions that would normally tell you where it came from are empty.
3. No UTM tags by default. Search engines append nothing — they rely on the referrer. AI assistants behave the same way. Unless someone explicitly tags a link with ?utm_source=chatgpt, you get no UTM signal. A small number of newer Perplexity and Copilot surfaces do tag outbound links, but you cannot count on it.
(direct) / (none) and is indistinguishable from real direct traffic without active instrumentation. If your direct channel has grown over the last 12 months and you can't explain why, this is almost always part of it.The four detection signals
There is no single field that tells you “this came from ChatGPT.” You combine four signals, in priority order, and take the first hit.
Signal 1 — Referrer domain
Match against sessionSource (or the raw document.referrer if you're instrumenting client-side). This is the cleanest signal when the referrer is present.
Signal 2 — Hostname suffix
Subdomains drift. Match by suffix instead of exact equality so that copilot.microsoft.com and m365.cloud.microsoft (Copilot for Microsoft 365) both resolve to Copilot. Same for Google's Gemini / Bard rebrand.
Signal 3 — UTM parameter
If the AI surface tagged the outbound link, you'll get a clean UTM. Treat this as authoritative — it overrides referrer guessing.
Signal 4 — User-agent string
Last-resort signal, mostly useful for catching bots vs humans. The ChatGPT crawler (GPTBot), Perplexity crawler (PerplexityBot), and Claude crawler (ClaudeBot) all identify themselves. These are training/indexing crawlers — not real users — but you may want to exclude them from session counts entirely. Real assistant traffic (a person clicking through) comes from a normal browser UA.
The actual domain list
This is the working list SignalGuide matches against. It changes — we keep it updated as new surfaces appear (Apple Intelligence, Meta AI, You.com, Mistral Le Chat). Treat this as a starting point, not a fixed spec.
Categorizing the traffic: three buckets, not one
Lumping all of this together as “AI traffic” loses the most interesting signal. There are three categorically different things going on.
Bucket 1 — Assistant traffic
A user is in a conversation with ChatGPT, Gemini, Claude, or Copilot. The assistant generated an answer, cited your page, the user clicked the citation. This is the highest-intent AI traffic: the user already saw a summary, and clicked because they wanted depth. Conversion rates here typically beat organic on the same pages.
Bucket 2 — Search-with-AI traffic
Perplexity, You.com, Phind, and SearchGPT are AI-flavored search engines — the user typed a query, got an AI summary plus a ranked list of sources, and clicked. Intent looks much more like organic search: query terms are public, you can usually see the query in the referrer (Perplexity passes the query string), and you should treat this traffic the way you treat Google organic.
Bucket 3 — AI overview traffic
Google's AI Overviews and Bing's AI snapshots sit on top of normal SERPs. When a user clicks a citation inside an overview, the referrer is still google.com — there is no separate domain to match. This traffic is invisible to referrer matching. The only way to isolate it today is to look at GSC: pages whose impressions jumped but CTR collapsed are the strongest indicator that they're being summarized inside an overview, not clicked through.
What we see in the wild
Some patterns hold up across the sites SignalGuide watches. None of this is gospel — your mileage will vary by vertical, audience, and content depth.
- Sites with strong technical / how-to content typically see AI referrals land in the 5–30% of total referral traffic range. Documentation sites, developer blogs, and depth-of-explanation content punch above their weight.
- Pure local-services sites (plumbers, dentists, regional retail) see near-zero AI referral traffic. Assistants tend to summarize and recommend rather than link out for that category.
- ChatGPT and Perplexity together usually account for 70–90% of identifiable AI traffic. Gemini is growing but still trails.
- AI-referral sessions skew shorter on average but with higher page-per-session depth on content sites — the visitor is verifying a claim, not browsing.
- Conversion pages reached via AI referral often outperform the same pages reached via generic organic — pre-qualified intent.
What AI referral traffic predicts
Two things to look at, both of which are actionable.
Query intent. When Perplexity passes the query in the referrer, you get free product research. The queries are usually more specific than what shows up in GSC — “best lightweight task runner for a small DevOps team” vs “task runner.” Mine these for content gaps and product positioning.
Conversion rate vs organic on the same pages. Compute sessions-to-conversion-page-view for AI-referral sessions and compare to organic for the same landing page. If AI converts higher (it usually does, when present in non-trivial volume), the case for investing in AI-citable content writes itself.
Instrumenting it yourself in GA4
If you want to wire this up by hand, here is the minimum viable setup.
Create a custom dimension on session source
In GA4 Admin, open Custom definitions → Create custom dimensions. Name it ai_source, scope it to Session, and bind it to a user-defined event parameter you'll send from the page (e.g. ai_source).
Detect and stamp the parameter client-side
Add a small snippet to your site (or a Google Tag Manager custom HTML tag) that runs before gtag('event', 'page_view', ...). It inspects document.referrer and the URL, matches against the domain list, and pushes the value.
Verify it shows up
In GA4 Realtime, open a private window, hit your site with ?utm_source=chatgpt appended, and watch for the ai_referral event with ai_source = chatgpt. Custom dimensions take up to 24 hours to surface in standard reports — Realtime is your fast check.
Build a custom report
In Explore, create a free-form report with ai_source as a dimension and Sessions, Engaged sessions, and Conversions as metrics. Filter to rows where ai_source is not empty. That's your dashboard.
How SignalGuide does it automatically
We pull GA4 by sessionSource on every analysis run and match against the maintained domain list — no custom dimension, no GTM tag, no waiting 24 hours. The list is kept current as new surfaces appear, and per-AI-source numbers show up in every briefing alongside total sessions, top landing pages, and conversion rate against your conversion pages. The full pull is documented in Connecting your data and surfaces in your scheduled briefings.
We also flag the pattern that referrer matching misses: pages whose GSC impressions jumped sharply while clicks did not. That's the strongest proxy we have for the AI-overview category until Google exposes it as a first-class GSC dimension.
What's next
AI overviews in SERPs become measurable. Google is already labeling AI Overview impressions in GSC for some properties. Once that's broadly available, the “invisible bucket” problem goes away and impression-to-click gaps become explicit.
Agent-driven traffic. Operator-style agents (Computer Use, ChatGPT's Operator, browser-using agents) will start hitting sites with non-human session patterns: zero scroll, sub-second time on page, immediate form submission. Treat agent sessions as a fourth bucket, not a bot exclusion — they're completing real tasks on behalf of real users.
Citation rates as a metric. The next frontier is understanding not just “did AI send traffic” but “how often did AI cite my page when asked the queries I care about.” We're building toward this. The honest answer today: nobody has a clean way to measure it at scale.
Related: Features · Connecting your data · Scheduled reports · Traffic intelligence tools compared.