Back to blog

Should You Add llms.txt? How to Check If AI Crawlers Actually Use It

Learn what llms.txt is, what to include, what not to include, and how to monitor whether AI crawlers request it and follow its linked pages.

Brittany JiaoSEO Guides

llms.txt is becoming one of the most discussed files in AI search and technical SEO.

The idea is simple:

Put a Markdown file at:

https://example.com/llms.txt

Use it to give large language models and AI agents a curated map of the most important pages, docs, products, policies, and resources on your website.

But the practical question is not only:

Should we add llms.txt?

The better question is:

If we add llms.txt, can we tell whether AI crawlers actually request it and follow the pages it points to?

That is where most implementation guides stop too early.

This guide explains what llms.txt is, what to include, what not to include, and how to monitor whether AI crawlers use it after it goes live.

What llms.txt Is

llms.txt is a proposed Markdown-based file that helps large language models and AI agents understand a website.

The proposal, published by Jeremy Howard in September 2024, describes /llms.txt as a way to provide concise information that helps LLMs use a website at inference time.

In plain language:

  • robots.txt tells crawlers what access is acceptable.
  • sitemap.xml lists URLs for search engines.
  • llms.txt gives AI systems a curated, human-readable and machine-readable guide to important content.

It is not a replacement for SEO.

It is not a replacement for robots.txt.

It is not a guarantee that ChatGPT, Claude, Perplexity, Gemini, or any other AI system will cite your site.

It is a structured orientation file.

That can still be useful, especially for sites with docs, tools, product pages, APIs, comparison pages, or agent-readable workflows.

What llms.txt Should Contain

The original llms.txt proposal describes a Markdown file with a specific structure.

A practical version usually includes:

  • one H1 with the site or project name
  • a short blockquote summary
  • short context about the site
  • H2 sections grouping important resources
  • Markdown links with useful descriptions
  • optional sections for lower-priority resources

Example:

# CrawlConsole

> CrawlConsole helps teams monitor AI crawler activity, understand agent discovery, and improve website visibility across AI search and agent workflows.

## Core Resources

- [Web Crawlers](https://crawlconsole.com/web-crawlers): Directory of AI and search crawler user agents.
- [WebMCP](https://crawlconsole.com/webmcp): Resources for making websites more agent-readable.
- [MCP Finder](https://crawlconsole.com/mcp-finder): Tool for discovering MCP servers and agent tools.

## Crawler Guides

- [PerplexityBot](https://crawlconsole.com/web-crawlers/perplexitybot): Profile for identifying Perplexity crawler traffic.
- [GPTBot](https://crawlconsole.com/web-crawlers/gptbot): Profile for identifying OpenAI crawler traffic.

The descriptions matter.

Do not just dump URLs.

The point is to help an AI system understand which page is useful for which job.

What llms.txt Should Not Contain

Because llms.txt is public, treat it like a public webpage.

Do not include:

  • API keys
  • private endpoints
  • staging URLs
  • customer data
  • unpublished product plans
  • internal admin links
  • private docs
  • login-only resources
  • anything you would not want indexed or copied

Also avoid stuffing it with every page on the site.

That is what a sitemap is for.

llms.txt should be curated.

A useful file answers:

  • What does this site do?
  • What pages are most important?
  • Which resources should agents read first?
  • Which pages explain the main entities?
  • Which tools or actions are available?
  • Which pages are optional context?

If your llms.txt becomes a giant URL dump, it stops being helpful.

How llms.txt Is Different From robots.txt

This distinction matters.

robots.txt is about crawler access.

Example:

User-agent: GPTBot
Disallow: /private
Allow: /

llms.txt is about context.

Example:

## AI Crawler Resources

- [Web Crawlers](https://crawlconsole.com/web-crawlers): Directory of crawler profiles.
- [PerplexityBot](https://crawlconsole.com/web-crawlers/perplexitybot): How to identify Perplexity crawler visits.

One file tells automated systems what they are allowed to fetch.

The other file tells AI systems what your most useful resources are.

They should work together.

If your llms.txt points to a page that your robots.txt blocks, CDN blocks, or WAF challenges, the file may describe a resource that crawlers cannot actually access.

That is why monitoring matters.

How llms.txt Is Different From sitemap.xml

Your sitemap is broad.

It usually lists every indexable URL that search engines should know about.

Your llms.txt should be selective.

It should highlight the pages that are most useful for AI systems and agents:

  • key docs
  • product pages
  • API references
  • crawler profiles
  • tool pages
  • buying guides
  • comparison pages
  • policies
  • FAQs
  • original research
  • category explainers

Think of the sitemap as the full inventory.

Think of llms.txt as the curated reading list.

For CrawlConsole, a good llms.txt would not list every blog post.

It would prioritize resources like Web Crawlers, WebMCP, MCP Finder, WebMCP Checker, and important crawler profiles such as GPTBot, OAI-SearchBot, ClaudeBot, and PerplexityBot.

Should Every Website Add llms.txt?

Not every site needs it immediately.

It is most useful when:

  • your site has important docs
  • your product is technical
  • your pages explain complex workflows
  • your site has many resource pages
  • agents may need to understand your tools
  • you want AI systems to understand specific entities
  • you care about AI search visibility
  • you already monitor crawler behavior

It is less urgent if:

  • your site is very small
  • you have no public docs or resources
  • your key pages are not crawlable
  • your content is thin or outdated
  • you cannot monitor whether anything changes after launch

The file is not a magic visibility switch.

If your pages are weak, blocked, duplicated, or poorly linked, llms.txt will not fix that by itself.

Step 1: Choose The Pages Worth Highlighting

Start with 10-30 URLs.

Pick pages that answer important questions:

  • What is the company?
  • What does the product do?
  • Who is it for?
  • What tools are available?
  • What docs explain the workflow?
  • What pages support commercial discovery?
  • What pages explain key entities?
  • What pages should agents use as references?

For a SaaS company, this might include:

  • homepage
  • product overview
  • docs
  • API reference
  • pricing
  • comparison pages
  • use-case pages
  • integration pages
  • security page
  • support docs

For an ecommerce company, this might include:

  • category pages
  • product search
  • product guides
  • sizing or compatibility guides
  • shipping and return policies
  • high-intent product pages
  • buying guides
  • FAQs

For CrawlConsole, useful candidates include:

Step 2: Group Pages By Agent Job

Do not group only by website navigation.

Group by the job an AI agent or LLM might need to complete.

Examples:

| Agent job | Useful section | |---|---| | Identify crawler traffic | AI Crawler Resources | | Check whether agents can use a site | Agent-Readable Website Tools | | Discover MCP servers | MCP Discovery | | Understand ecommerce AI discovery | Agentic Commerce | | Test prompts about brand visibility | Prompt Workflows | | Troubleshoot crawler blocking | Crawler Access Guides |

This is more useful than generic sections like "Company" and "Resources."

The structure should help an AI system choose the right page for the task.

A link without context is weaker than a link with context.

Weak:

- [Web Crawlers](https://crawlconsole.com/web-crawlers)

Better:

- [Web Crawlers](https://crawlconsole.com/web-crawlers): Directory of AI and search crawler user agents, including OpenAI, Anthropic, Perplexity, and Google crawlers.

Descriptions should explain:

  • what the page is
  • who it is for
  • what problem it solves
  • when an agent should use it

Avoid hype.

Use clear labels.

Your llms.txt should not be the only place that connects important pages.

If a page is important enough to include in llms.txt, it should usually also be linked from:

  • relevant blog posts
  • product pages
  • docs
  • resource hubs
  • navigation or footer
  • related guide sections
  • sitemap

For example, if your llms.txt highlights PerplexityBot, then related articles like How to Check If PerplexityBot Crawled Your Website should also link to it.

If your llms.txt highlights WebMCP Checker, then WebMCP guides should link to it naturally.

llms.txt should reinforce the site graph.

It should not be a workaround for poor internal linking.

Step 5: Publish It At The Root

Publish the file at:

https://example.com/llms.txt

Then verify:

  • it returns 200
  • it is publicly accessible
  • it is not blocked by robots.txt
  • it is not behind authentication
  • it is served as text or Markdown
  • it does not redirect unexpectedly
  • it does not include private information
  • it contains absolute links where useful

Also consider whether you need an expanded version, such as llms-full.txt, if your docs or product information require more context.

For many marketing sites, start simple.

The first version should be useful, short, and maintainable.

Step 6: Monitor Whether AI Crawlers Request It

This is the CrawlConsole-specific step most llms.txt guides skip.

After publishing, watch for requests to:

/llms.txt
/llms-full.txt

Track:

  • crawler name
  • timestamp
  • status code
  • IP or ASN if available
  • redirect behavior
  • whether the same crawler later requests linked pages
  • whether important linked pages get revisited

Look for crawlers such as:

The important question is not only:

Did a crawler request llms.txt?

It is also:

Did the crawler later request the important pages linked inside it?

That gives you a stronger signal that the file may be part of a discovery path.

Step 7: Compare Before And After Publishing

Before adding llms.txt, record a baseline.

For your most important pages, note:

  • AI crawler visits
  • status codes
  • revisit frequency
  • last crawl date
  • prompt test results
  • AI referral traffic, if any
  • GSC impressions, if relevant

After publishing, compare:

  • Did /llms.txt get requested?
  • Which crawlers requested it?
  • Did crawlers revisit linked pages?
  • Did crawl frequency change on highlighted pages?
  • Did prompt test answers improve?
  • Did any pages get blocked or redirected?
  • Did any linked pages remain ignored?

Do not over-attribute.

If crawler visits increase, llms.txt may be one factor. Internal links, new content, external mentions, sitemaps, and normal crawler schedules may also play a role.

The goal is evidence, not certainty.

Step 8: Use Prompt Tests Carefully

Prompt tests are useful, but they are not crawler logs.

After publishing llms.txt, run repeatable prompts:

  • "What does [brand] do?"
  • "What tools does [brand] offer?"
  • "How can I monitor AI crawler traffic?"
  • "Which pages explain [product category]?"
  • "What is the best resource for [specific workflow]?"

Use the Prompt Library to keep these tests consistent.

Track:

  • whether the answer mentions your brand
  • whether it describes the right pages
  • whether it cites or references the right resources
  • whether it invents outdated information
  • whether competitors appear instead
  • whether answers change after content updates

Prompt tests tell you what AI systems say.

Crawler logs tell you what automated systems requested.

Use both, but do not confuse them.

Step 9: Maintain llms.txt Like A Product Surface

Do not publish llms.txt once and forget it.

Update it when:

  • new product pages launch
  • docs change
  • crawler resources are added
  • WebMCP pages are updated
  • MCP tools are added
  • high-value blog posts are published
  • ecommerce product guides change
  • important URLs are redirected
  • pages are removed or consolidated

Treat it like a lightweight product surface for agents.

If the file links to stale pages, broken pages, or outdated descriptions, it can create confusion instead of clarity.

Common Mistakes

Mistake 1: Treating llms.txt Like A Ranking Factor

There is no guarantee that adding llms.txt will improve AI search visibility.

Treat it as a discovery and context layer, not a ranking lever.

Mistake 2: Copying Your Sitemap

Do not list everything.

Curate the most useful resources.

Mistake 3: Including Private Information

The file is public.

Do not include anything confidential.

Mistake 4: Linking To Blocked Pages

If llms.txt links to pages that robots.txt, CDN rules, WAF rules, or auth walls block, the file may point crawlers toward dead ends.

Mistake 5: Never Monitoring It

If you cannot tell whether /llms.txt gets requested, you cannot learn much from the implementation.

Monitor crawler behavior after publishing.

llms.txt Checklist

Before publishing:

  • Purpose: the file explains what the site does and which resources matter.
  • Structure: it uses one H1, a short summary, and clear H2 sections.
  • Links: every important link has a useful description.
  • Curation: it lists important pages, not every page.
  • Safety: it contains no private data, keys, or internal URLs.
  • Access: it returns 200 at /llms.txt.
  • Consistency: linked pages are also supported by internal links and sitemap coverage.
  • Monitoring: crawler requests to /llms.txt and linked pages are tracked after publishing.
  • Prompt tests: answer behavior is tested separately from crawler access.
  • Maintenance: the file is updated when important pages change.

Where CrawlConsole Fits

CrawlConsole fits after the file goes live.

Use Web Crawlers to identify AI crawler user agents.

Then monitor:

  • whether /llms.txt gets requested
  • which crawler requested it
  • whether the crawler received 200
  • whether linked pages were visited after the request
  • whether important pages stayed ignored
  • whether crawler behavior changed after updates

For agent-readable site work, pair llms.txt with WebMCP and the WebMCP Checker.

For MCP discovery workflows, connect relevant pages to MCP Finder.

For content planning, use the workflow in How to Use AI Crawler Logs to Find Content Ideas.

The practical point:

Do not just add llms.txt. Watch what happens after you add it.

The Bottom Line

llms.txt is worth testing if your website has important resources that AI systems should understand.

But it should be treated as an emerging convention, not a guaranteed ranking shortcut.

Use it to provide a curated, readable map of your most important resources.

Then monitor:

  • whether AI crawlers request it
  • whether they receive a clean response
  • whether they visit linked pages
  • whether important pages get revisited
  • whether prompt answers improve over time

That is the difference between adding a file and building an AI crawler visibility workflow.