Field guide

Tracking AI referral traffic

ChatGPT, Gemini, Perplexity, and Claude are sending real traffic to real websites. GA4 buckets most of it as Direct or Referral and tells you almost nothing about which assistant sent it, what the user asked, or whether they converted. This guide is the actual mechanics — what to match on, what to ignore, and what the traffic predicts.

Why GA4 doesn't show ChatGPT traffic by default

Three separate things break the default channel grouping at once. None of them are GA4 bugs — they are how the AI assistants are built.

1. Stripped referrers. When an assistant renders a link as a clickable citation, most of them open it in a way that strips the HTTP Referer header. The browser arrives at your site with no referrer at all, so GA4 falls back to (direct) / (none). That session looks identical to someone who typed your URL into the address bar.

2. In-app browsers. The ChatGPT and Perplexity mobile apps open links inside an embedded WebView. That WebView may or may not send a referrer, may identify with a non-standard user-agent string, and frequently does not run third-party cookies. GA4 sees the session, but the dimensions that would normally tell you where it came from are empty.

3. No UTM tags by default. Search engines append nothing — they rely on the referrer. AI assistants behave the same way. Unless someone explicitly tags a link with ?utm_source=chatgpt, you get no UTM signal. A small number of newer Perplexity and Copilot surfaces do tag outbound links, but you cannot count on it.

The net effect

On a typical site, somewhere between 40% and 80% of AI-assistant traffic arrives as (direct) / (none) and is indistinguishable from real direct traffic without active instrumentation. If your direct channel has grown over the last 12 months and you can't explain why, this is almost always part of it.

The four detection signals

There is no single field that tells you “this came from ChatGPT.” You combine four signals, in priority order, and take the first hit.

Signal 1 — Referrer domain

Match against sessionSource (or the raw document.referrer if you're instrumenting client-side). This is the cleanest signal when the referrer is present.

TEXT

sessionSource = "chat.openai.com"   → ChatGPT (web)
sessionSource = "chatgpt.com"       → ChatGPT (web)
sessionSource = "perplexity.ai"     → Perplexity
sessionSource = "www.perplexity.ai" → Perplexity
sessionSource = "gemini.google.com" → Gemini

Signal 2 — Hostname suffix

Subdomains drift. Match by suffix instead of exact equality so that copilot.microsoft.com and m365.cloud.microsoft (Copilot for Microsoft 365) both resolve to Copilot. Same for Google's Gemini / Bard rebrand.

TEXT

endsWith(".openai.com")     → ChatGPT family
endsWith(".perplexity.ai")  → Perplexity
endsWith(".microsoft.com")  → Copilot (filter further by path)
endsWith(".claude.ai")      → Claude
endsWith(".anthropic.com")  → Claude (admin/docs)

Signal 3 — UTM parameter

If the AI surface tagged the outbound link, you'll get a clean UTM. Treat this as authoritative — it overrides referrer guessing.

TEXT

utm_source = chatgpt        → ChatGPT
utm_source = perplexity     → Perplexity
utm_source = copilot        → Copilot
utm_source = gemini         → Gemini
utm_source = claude         → Claude

Signal 4 — User-agent string

Last-resort signal, mostly useful for catching bots vs humans. The ChatGPT crawler (GPTBot), Perplexity crawler (PerplexityBot), and Claude crawler (ClaudeBot) all identify themselves. These are training/indexing crawlers — not real users — but you may want to exclude them from session counts entirely. Real assistant traffic (a person clicking through) comes from a normal browser UA.

Don't use UA alone

A normal Chrome UA from the ChatGPT in-app browser is indistinguishable from any other Chrome session. UA is a backstop, not a primary signal.

The actual domain list

This is the working list SignalGuide matches against. It changes — we keep it updated as new surfaces appear (Apple Intelligence, Meta AI, You.com, Mistral Le Chat). Treat this as a starting point, not a fixed spec.

TEXT

# ChatGPT (OpenAI)
chat.openai.com
chatgpt.com
auth0.openai.com         # sometimes appears in the chain

# Gemini / Bard (Google)
gemini.google.com
bard.google.com          # legacy, still appears
aistudio.google.com      # AI Studio sessions

# Perplexity
perplexity.ai
www.perplexity.ai

# Claude (Anthropic)
claude.ai
anthropic.com

# Copilot (Microsoft)
copilot.microsoft.com
bing.com/chat            # path-scoped — Bing Chat surface
m365.cloud.microsoft     # Copilot for Microsoft 365
edgeservices.bing.com    # Edge sidebar Copilot

# You.com
you.com

# Meta AI
meta.ai

# Mistral
chat.mistral.ai

# Other surfaces worth catching
poe.com                  # Quora Poe (multi-model)
huggingface.co/chat      # HuggingChat (path-scoped)
phind.com                # AI-powered dev search

Categorizing the traffic: three buckets, not one

Lumping all of this together as “AI traffic” loses the most interesting signal. There are three categorically different things going on.

Bucket 1 — Assistant traffic

A user is in a conversation with ChatGPT, Gemini, Claude, or Copilot. The assistant generated an answer, cited your page, the user clicked the citation. This is the highest-intent AI traffic: the user already saw a summary, and clicked because they wanted depth. Conversion rates here typically beat organic on the same pages.

Bucket 2 — Search-with-AI traffic

Perplexity, You.com, Phind, and SearchGPT are AI-flavored search engines — the user typed a query, got an AI summary plus a ranked list of sources, and clicked. Intent looks much more like organic search: query terms are public, you can usually see the query in the referrer (Perplexity passes the query string), and you should treat this traffic the way you treat Google organic.

Bucket 3 — AI overview traffic

Google's AI Overviews and Bing's AI snapshots sit on top of normal SERPs. When a user clicks a citation inside an overview, the referrer is still google.com — there is no separate domain to match. This traffic is invisible to referrer matching. The only way to isolate it today is to look at GSC: pages whose impressions jumped but CTR collapsed are the strongest indicator that they're being summarized inside an overview, not clicked through.

Why the buckets matter

Assistant traffic is small but high-intent. Search-with-AI is mid-volume, mid-intent. AI overviews are mostly invisible — but they explain the impressions-up / clicks-down pattern that has been showing up across sites since mid-2024. Different buckets, different actions.

What we see in the wild

Some patterns hold up across the sites SignalGuide watches. None of this is gospel — your mileage will vary by vertical, audience, and content depth.

Sites with strong technical / how-to content typically see AI referrals land in the 5–30% of total referral traffic range. Documentation sites, developer blogs, and depth-of-explanation content punch above their weight.
Pure local-services sites (plumbers, dentists, regional retail) see near-zero AI referral traffic. Assistants tend to summarize and recommend rather than link out for that category.
ChatGPT and Perplexity together usually account for 70–90% of identifiable AI traffic. Gemini is growing but still trails.
AI-referral sessions skew shorter on average but with higher page-per-session depth on content sites — the visitor is verifying a claim, not browsing.
Conversion pages reached via AI referral often outperform the same pages reached via generic organic — pre-qualified intent.

What AI referral traffic predicts

Two things to look at, both of which are actionable.

Query intent. When Perplexity passes the query in the referrer, you get free product research. The queries are usually more specific than what shows up in GSC — “best lightweight task runner for a small DevOps team” vs “task runner.” Mine these for content gaps and product positioning.

Conversion rate vs organic on the same pages. Compute sessions-to-conversion-page-view for AI-referral sessions and compare to organic for the same landing page. If AI converts higher (it usually does, when present in non-trivial volume), the case for investing in AI-citable content writes itself.

Instrumenting it yourself in GA4

If you want to wire this up by hand, here is the minimum viable setup.

Create a custom dimension on session source

In GA4 Admin, open Custom definitions → Create custom dimensions. Name it ai_source, scope it to Session, and bind it to a user-defined event parameter you'll send from the page (e.g. ai_source).

Detect and stamp the parameter client-side

Add a small snippet to your site (or a Google Tag Manager custom HTML tag) that runs before gtag('event', 'page_view', ...). It inspects document.referrer and the URL, matches against the domain list, and pushes the value.

JAVASCRIPT

(function () {
  var AI_DOMAINS = [
    { match: /\.openai\.com$|^chatgpt\.com$/, label: "chatgpt" },
    { match: /\.perplexity\.ai$|^perplexity\.ai$/, label: "perplexity" },
    { match: /gemini\.google\.com$|^bard\.google\.com$/, label: "gemini" },
    { match: /\.claude\.ai$|^claude\.ai$/, label: "claude" },
    { match: /copilot\.microsoft\.com$/, label: "copilot" },
    { match: /^you\.com$/, label: "you" },
    { match: /^meta\.ai$/, label: "meta_ai" },
    { match: /^poe\.com$/, label: "poe" },
    { match: /^phind\.com$/, label: "phind" }
  ];

  function detect() {
    // 1) UTM wins
    var utm = new URLSearchParams(window.location.search).get("utm_source");
    if (utm && /chatgpt|perplexity|gemini|claude|copilot|you|meta_ai/.test(utm)) {
      return utm.toLowerCase();
    }
    // 2) Referrer
    var ref = document.referrer;
    if (!ref) return null;
    try {
      var host = new URL(ref).hostname;
      for (var i = 0; i < AI_DOMAINS.length; i++) {
        if (AI_DOMAINS[i].match.test(host)) return AI_DOMAINS[i].label;
      }
    } catch (e) {}
    return null;
  }

  var src = detect();
  if (src && window.gtag) {
    window.gtag("event", "ai_referral", { ai_source: src });
  }
})();

Verify it shows up

In GA4 Realtime, open a private window, hit your site with ?utm_source=chatgpt appended, and watch for the ai_referral event with ai_source = chatgpt. Custom dimensions take up to 24 hours to surface in standard reports — Realtime is your fast check.

Build a custom report

In Explore, create a free-form report with ai_source as a dimension and Sessions, Engaged sessions, and Conversions as metrics. Filter to rows where ai_source is not empty. That's your dashboard.

Cookie consent and CMPs

If you run a consent banner that blocks GA4 until the user accepts, you'll miss every AI-referral session where the user bounces before consenting. AI traffic skews to one-pager visits, so this is a larger blind spot than for organic. Consider a server-side variant if you need full coverage.

How SignalGuide does it automatically

We pull GA4 by sessionSource on every analysis run and match against the maintained domain list — no custom dimension, no GTM tag, no waiting 24 hours. The list is kept current as new surfaces appear, and per-AI-source numbers show up in every briefing alongside total sessions, top landing pages, and conversion rate against your conversion pages. The full pull is documented in Connecting your data and surfaces in your scheduled briefings.

We also flag the pattern that referrer matching misses: pages whose GSC impressions jumped sharply while clicks did not. That's the strongest proxy we have for the AI-overview category until Google exposes it as a first-class GSC dimension.

What's next

AI overviews in SERPs become measurable. Google is already labeling AI Overview impressions in GSC for some properties. Once that's broadly available, the “invisible bucket” problem goes away and impression-to-click gaps become explicit.

Agent-driven traffic. Operator-style agents (Computer Use, ChatGPT's Operator, browser-using agents) will start hitting sites with non-human session patterns: zero scroll, sub-second time on page, immediate form submission. Treat agent sessions as a fourth bucket, not a bot exclusion — they're completing real tasks on behalf of real users.

Citation rates as a metric. The next frontier is understanding not just “did AI send traffic” but “how often did AI cite my page when asked the queries I care about.” We're building toward this. The honest answer today: nobody has a clean way to measure it at scale.

See your AI referral traffic in 60 seconds

Connect Google once and SignalGuide will show you ChatGPT, Gemini, Perplexity, Claude, and Copilot sessions on your first analysis run — with top landing pages and conversion attribution. Start free →