AI, Can I Have Your Attention Please?

Greg & Lex attention of AI

What it takes for your content to be seen, stored, and surfaced in the age of AI

What does it take to get your content discovered by AI — and how can you even measure it? This article is for those who want to build structured, meaningful content that becomes reusable by the next generation of AI models. (Yes, the next — more on that later.)

Think of AI as a new kind of super-user. One that crawls, consumes, summarizes, and sometimes stores your content to answer future queries.

But here’s the kicker: you don’t always get to see when or how that happens. So the real question becomes…

Can the Real AI Please Stand Up? #Eminem

They’re all real — let’s get that out of the way. But why does getting the attention of AI matter more than ever?

Let’s break down how your website can be discovered by AI today.

  1. A user has query on a AI platform like ChatGPT or Perplexity
  2. The AI chooses how to answer it using;
    1. A Snapshot of the model
    2. Internal curated index (Perplexity)
    3. Web Search index (Bing, Baidu, Google)
  3. This shows up in the conversion of AI
  4. Website visit – (when clicked)
AI website visits graf

The 3 Ways AI visits Your Website

  1. Snapshot Training
  2. Internal Indexing
  3. Web Search Indexes

Let’s unpack each.

1. Snapshot Training (The AI’s “Memory”)

A snapshot is a frozen-in-time dataset used to train AI. It contains books, websites, code, etc. Once the model is trained, that data is baked into its neural network.

It’s not searchable. Not editable. Just… remembered.

Used when:

  • There’s no web access
  • The question is timeless
  • The model is confident in its prior knowledge
ModelRelease DateKnowledge CutoffUpdate FrequencyNext Expected Update
GPT-4Mar 2023Sep 2021~12–18 monthsRetired (Apr 2025)
GPT-4 TurboNov 2023Dec 2023~6–12 monthsPossibly late 2025
GPT-4oMay 2024Oct 2023~12 monthsPossibly late 2025
GPT-4.1Apr 2025Jun 2024~12 monthsPossibly mid-2026
Claude 3 (Opus, Sonnet, Haiku)Mar 2024Aug 2023~6 monthsPossibly late 2024
Claude 3.5 SonnetJun 2024Apr 2024~6 monthsPossibly late 2024
Claude 3.5 HaikuJul 2024Jul 2024~4 monthsPossibly late 2024
Claude 3.7 SonnetNov 2024Nov 2024~6 monthsPossibly mid-2025
Gemini 1.5 ProMay 2024May 2024~6 monthsPossibly late 2024
Gemini 2.5 ProJan 2025Jan 2025~6 monthsPossibly mid-2025
Perplexity AIAug 2022No fixed cutoffContinuous real-time updatesNot applicable

2. Internal Indexes (AI’s Own Reference System)

These are curated collections of trusted sources, stored in a format that AIs can search quickly.

Used when:

  • Fast, accurate citations are needed
  • Specific queries benefit from authoritative sources

It’s like keeping bookmarked PDFs next to your memory.

Examples:

  • ChatGPT (SearchBot) indexes select sources
  • Perplexity (PerplexityBot) builds a semantic index
  • You.com structures results specifically for AI digestion

These aren’t visible to users, but AI queries them to supplement its answers.

3. Web Search Indexes (External)

These are real-time lookups — your typical Google/Bing/Baidu indexes — used when freshness matters.

Used when:

  • The question is recent or news-based
  • Snapshot or internal data isn’t sufficient

This is like Googling during a conversation.

Examples:

  • ChatGPT with browsing uses Bing
  • Claude fetches via API
  • Gemini taps Google Search
  • Perplexity default mode uses Bing live
AI ToolUses Own IndexUses External Search EngineAcknowledges llms.txt
Perplexity✅ (Bing)✅ (Publishes llms-full.txt)
Grok✅ (Bing)
ClaudePossibly✅ (via API)✅ (Publishes llms.txt)
LLaMADepends on integration
DeepSeek✅ (Baidu, others)
ChatGPT✅ (Bing)
Gemini✅ (Google Index)

For those who are paying attention to the last table you realise that the Bing index is mostly used.

Under the radar its makes Bing as a search engine more important that Google’s.

I personally think this is one of the reasons why Google is pushing the rollout of Gemini and having it accessing its own search index instead of building a completely new one to reclaim lost monopoly.

2. Internal Indexes (AI’s Own Reference System)

These are curated collections of trusted sources, stored in a format that AIs can search quickly.

Used when:

  • Fast, accurate citations are needed
  • Specific queries benefit from authoritative sources

It’s like keeping bookmarked PDFs next to your memory.

Examples:

  • ChatGPT (SearchBot) indexes select sources
  • Perplexity (PerplexityBot) builds a semantic index
  • You.com structures results specifically for AI digestion

These aren’t visible to users, but AI queries them to supplement its answers.

3. Web Search Indexes (External)

These are real-time lookups — your typical Google/Bing/Baidu indexes — used when freshness matters.

Used when:

  • The question is recent or news-based
  • Snapshot or internal data isn’t sufficient

This is like Googling during a conversation.

Examples:

  • ChatGPT with browsing uses Bing
  • Claude fetches via API
  • Gemini taps Google Search
  • Perplexity default mode uses Bing live

How Do You Know AI Is Visiting You?

Through Server Logs!

Each AI has its own crawler and user agent. Here are the key ones to watch for:

A Breakdown of AI Bots, Their Purpose & User Agents (2025)

AI isn’t just reading the web — it’s crawling it, copying it, and in some cases, training on it. These bots may show up in your server logs or silently use your content. Here’s a practical list of who they are, what they do, and how to recognize them.

Who’s Really Crawling Your Website?

A Breakdown of AI Bots, Their Purpose & User Agents

The age of AI means your content isn’t just visited by Google or Bing anymore. Today, a silent parade of AI-powered bots is constantly crawling, indexing, and repackaging your content — often without you even knowing it.

If you’re wondering who’s behind the curtain, here’s a detailed overview of the most active AI user agents and what they’re really doing on your site.


OpenAI / ChatGPT

BotPurposeUser Agent
GPTBotGathers training data for ChatGPTGPTBot/1.1link
ChatGPT-UserHandles user interaction sessionsChatGPT-User/1.0link
OAI-SearchBotIndexes content for on-demand research toolsOAI-SearchBot/1.0link

Anthropic / Claude

BotPurposeUser Agent
Anthropic AI BotCrawls for training Claude’s foundation modelanthropic-ai/1.0link
ClaudeBotUsed in real-time Claude queriesClaudeBot/1.0claudebot@anthropic.com
Claude WebWeb data ingestion for Claude trainingclaude-web/1.0link

Google / Gemini

BotPurposeUser Agent
Google-ExtendedCollects content for Gemini training & answersGoogle-Extended/1.0link

Apple

BotPurposeUser Agent
ApplebotPowers Siri & Spotlight answersApplebot/1.0link
Applebot-ExtendedExtended capabilitiesApplebot-Extended/1.0link

Microsoft / Copilot

BotPurposeUser Agent
BingBotMicrosoft search (used by Copilot AI)BingBot/1.0link

Meta (Facebook & Instagram)

BotPurposeUser Agent
FacebookBotCrawls URLs for previewsFacebookBot/1.0link
Meta External FetcherFetches data for previewsmeta-externalagent/1.1link

Amazon

BotPurposeUser Agent
AmazonbotCrawls for Alexa and Echo-related contentAmazonbot/0.1link

ByteDance / TikTok

BotPurposeUser Agent
BytespiderDiscovery engine for TikTokBytespider/1.0link

Perplexity AI

BotPurposeUser Agent
PerplexityBotCrawls and retrieves content for AI answersPerplexityBot/1.0link

Others

BotPurposeUser Agent
YouBotUsed by You.com AI assistantYouBotlink
DuckAssistBotPowers DuckDuckGo AI answersDuckAssistBot/1.0link
AI2BotAllen Institute research botAI2Bot/1.0link
CCBotCommon Crawl’s data archive builderCCBot/1.0link
Cohere AITrains Cohere’s LLM modelscohere-ai/1.0link
Omgili BotScrapes forums and discussionsomgili/1.0link
TimpiBotCrawls decentralized web contentTimpibot/0.8link
DiffBotExtracts structured data for AI/knowledge graphsDiffbot/0.1link

Why This Matters

Knowing which bots are crawling your content isn’t just a technical curiosity — it’s a strategic insight:

  • Compliance: Tools like GPTBot and ClaudeBot may be using your content for training unless you opt-out via robots.txt or llms.txt.
  • Measurement: You can trace real-time visits via server logs by tracking these user agents.
  • Visibility: Some AI bots are better at turning your content into citations or direct answers. Want to show up in Perplexity or Claude? You need crawlable, structured, and valuable content.

Tip: Monitor Your Logs

If you’re running a brand, media platform, or e-commerce site, it’s worth setting up alerts and logs for these bots. This lets you:

Inform decisions around AI visibility and content licensingWhy It Matters for SEO

Spot which AIs are using your content

Test the effectiveness of your structured data

To influence:

  • Snapshots → Publish before next training cycle
  • Internal Indexes → Appear on trusted hubs
  • Web Search → Be crawlable, structured, and fast

Here a quick search on a website of mine. I checked the server log to see what agents of OpenAI have visited.

Searchbot

CHATGPT-User bot

But the GPTbot (the one for the snapshots) has not visited yet. 

The search bot = someone who asked a question with search within ChatGPT

ChatGPT user = someone who uses ChatGPT to visit the page and analyze its content.

User your server logs and tools like the Log file analyzer to understand to make sense. 

Not all user agents are already in the tool so know that you have to add user agents strings manually read the activity from your server log

How do i know that i got AI referral (Google analytics)

The best thing is that AI can generate traffic for your business. As mentioned before, ai will either use your website for information in its snapshot or use search to display your information and give users the opportunity to click on the source.

When this happens web analytics tools, let’s take GA4 for now. Show it as “Referral” traffic.

What is referral traffic in GA4?

Referral traffic is traffic that was referred via a different websource such as a website. If i click on a link of a website and land on another website, it is called “referral traffic”. 

Even if the “medium” displayed “(not set)” it is still a referral.

What do you know when you chatgpt.com /(not set) or referral in your analytics?

  • Your website has been clicked on from an chatgpt

What you dont know

  • Is the url displayed due a snapshot, index or search?

There is a possibility that someone pasted your URL in AI and later you clicked on that link.

What we should aim for as a business is that you are going to be part of the AI index AND being a top search website.

What Marketers, Businesses & Webmasters Need To Do

First, realise that being authoritative enough to be used by AI only works if you actually care about the content you share. Short content that fills a useless gap in the content space just to build backlinks? That’s a dying industry. (Sorry backlinking friends — I’m team AI now.)

You must create content that AI cannot replicate  but curate. This can only be done by being a thought leader. You need to be the source that thinks ahead and shares original perspectives in your niche.

AI is the curator of your content. It will analyze, compare and summarize it and “calculate” its truthfulness. 

Ranking in search is about being one of many. Being referenced by AI is about being the authority.

With that said, let’s get practical. Here’s how to get and keep AI’s attention:

  • Google Search Console & Bing Webmaster Tools
    Why? Because Bing is the backbone of most external AI search integrations, and Gemini has access to Google’s index. Your site needs to be clearly visible to both.
  • robots.txt + llms.txt
    Why? Guide AI crawlers on what they can access. The llms.txt file is optional — not all AIs honor it (yet) — but it’s a step in the right direction.
  • Rendered HTML Structure Check
    Why? AI crawlers often don’t render JavaScript. Ensure your content is visible as clean, server-side HTML.
  • HTML Basics
    Why? Headings (<h1>, <h2>, etc.), lists, quotes, and tables help AI parse your content with precision.
  • HTML Meta Tags
    Why? Title, meta description, canonical, and Open Graph tags summarize your content and improve understanding.
  • Natural Content Flow
    Why? Headline → Problem → Explanation → Solution. A clear structure helps AI map intent and topic coverage.
  • Semantic HTML
    Why? Tags like <article>, <section>, <header>, and <footer> give meaning to your layout — clarity AI appreciates.
  • Structured Data
    Why? JSON-LD markup using schema.org vocab tells AI who, what, where, and how your content fits into a broader context. Use it for blogs, products, organizations, FAQs, and more.
  • Speed Matters
    Why? Fast-loading pages are easier to crawl and process. AI won’t wait 10 seconds for your script-heavy homepage.
  • Content Worth Visiting
    Why? If you’re not a trusted source, AI won’t learn from you — not even about your own business.

So yes — SEO still matters. But now it’s not just for human visitors. It’s for the next generation of machine readers too.

If you want AI’s attention, stop trying to “rank” and start trying to be useful.

Here’s how:

  • Be the original voice in your niche. Don’t echo — lead.
  • Avoid content-for-content’s-sake. Thin articles for backlinks are fading fast.
  • Think like Wikipedia. Be thorough, accurate, and factual.
  • Prioritize authority, trust, structure, clarity.
  • Write for understanding, not clicks.
  • Build internal link hubs and topic clusters.
  • Publish on domains and platforms AI already trusts.
  • Share original data, insights, or frameworks — things AI can’t invent.

If your content adds zero value to the AI’s memory, it won’t get remembered.

How to Make Your Content AI-Friendly

1. Semantic HTML Structure

Use proper tags: <header>, <section>, <article>, <h1><h3>, <p>, <footer>

2. Structured Data (JSON-LD)

Use Schema.org markup:

  • Article
  • Organization
  • FAQPage
  • Product
  • Review

Validate it regularly.

3. Semantic Clarity in Content

  • Logical content flow: Headline → Problem → Solution → Proof
  • Use natural subheadings, not robotic phrases
  • Include FAQs, lists, tables, and blockquotes

4. Internal Linking

  • Link with descriptive anchor text
  • Build topic clusters

5. Accurate Meta Tags

  • Title and Description = Honest summary
  • No overpromising

6. Consistent Entity Usage

  • Stick to one naming format (e.g. “Gregory Pinas”)
  • Use sameAs, mainEntityOfPage for clarity

7. Fast, Crawlable Pages

  • Pre-rendered HTML
  • Avoid JS-dependence for critical content
  • Fast load speed = more accessible to AI

8. E-E-A-T Signals

  • Author bios, credentials
  • Cited sources
  • Clear brand identity

Marketers: What To Do Next

Here’s your checklist:

  1. Register with Google Search Console + Bing Webmaster Tools
  2. Use robots.txt and (optionally) llms.txt
  3. Serve pre-rendered, semantic HTML
  4. Use proper <h1>–<h3> hierarchy and formatting
  5. Optimize meta tags, canonical URLs, and OG tags
  6. Add JSON-LD structured data across key pages
  7. Build content with depth, links, and clarity
  8. Ensure fast load time + mobile rendering

Note that these are all so-called “quick wins” mentioned in blogs for SEO but let’s call them “Foundational necessities” from now on.

The Hard Truth

AI doesn’t search for content — it selects it. Make your website worth selecting.

There’s no trick or hack to get into an AI model’s attention span.

You need:

  • High-value content
  • Deep topical authority
  • Semantic and structural clarity

To be selected, you need to:

  • Be trustworthy
  • Be understandable
  • Be worth remembering

That’s how you get AI’s attention. Not by shouting louder. But by saying something worth listening to — clearly, structurally, and consistently.

Think less about ranking — and more about deserving a place in AI’s answers.

CATEGORIES:

GEO

Tags:

No responses yet

Geef een reactie

Je e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *

Latest Comments

Geen reacties om weer te geven.