Should You Add llms.txt? How to Check If AI Crawlers Actually Use It

Learn what llms.txt is, what to include, what not to include, and how to monitor whether AI crawlers request it and follow its linked pages.

June 11, 2026Brittany JiaoSEO Guides

Should You Add llms.txt? How to Check If AI Crawlers Actually Use It

llms.txt is becoming one of the most discussed files in AI search and technical SEO.

The idea is simple:

Put a Markdown file at:

https://example.com/llms.txt

Use it to give large language models and AI agents a curated map of the most important pages, docs, products, policies, and resources on your website.

But the practical question is not only:

Should we add llms.txt?

The better question is:

If we add llms.txt, can we tell whether AI crawlers actually request it and follow the pages it points to?

That is where most implementation guides stop too early.

This guide explains what llms.txt is, what to include, what not to include, and how to monitor whether AI crawlers use it after it goes live.

What llms.txt Is

llms.txt is a proposed Markdown-based file that helps large language models and AI agents understand a website.

The proposal, published by Jeremy Howard in September 2024, describes /llms.txt as a way to provide concise information that helps LLMs use a website at inference time.

In plain language:

robots.txt tells crawlers what access is acceptable.
sitemap.xml lists URLs for search engines.
llms.txt gives AI systems a curated, human-readable and machine-readable guide to important content.

It is not a replacement for SEO.

It is not a replacement for robots.txt.

It is not a guarantee that ChatGPT, Claude, Perplexity, Gemini, or any other AI system will cite your site.

It is a structured orientation file.

That can still be useful, especially for sites with docs, tools, product pages, APIs, comparison pages, or agent-readable workflows.

What llms.txt Should Contain

The original llms.txt proposal describes a Markdown file with a specific structure.

A practical version usually includes:

one H1 with the site or project name
a short blockquote summary
short context about the site
H2 sections grouping important resources
Markdown links with useful descriptions
optional sections for lower-priority resources

Example:

# CrawlConsole

> CrawlConsole helps teams monitor AI crawler activity, understand agent discovery, and improve website visibility across AI search and agent workflows.

## Core Resources

- [Web Crawlers](https://crawlconsole.com/web-crawlers): Directory of AI and search crawler user agents.
- [WebMCP](https://crawlconsole.com/webmcp): Resources for making websites more agent-readable.
- [MCP Finder](https://crawlconsole.com/mcp-finder): Tool for discovering MCP servers and agent tools.

## Crawler Guides

- [PerplexityBot](https://crawlconsole.com/web-crawlers/perplexitybot): Profile for identifying Perplexity crawler traffic.
- [GPTBot](https://crawlconsole.com/web-crawlers/gptbot): Profile for identifying OpenAI crawler traffic.

The descriptions matter.

Do not just dump URLs.

The point is to help an AI system understand which page is useful for which job.

What llms.txt Should Not Contain

Because llms.txt is public, treat it like a public webpage.

Do not include:

API keys
private endpoints
staging URLs
customer data
unpublished product plans
internal admin links
private docs
login-only resources
anything you would not want indexed or copied

Also avoid stuffing it with every page on the site.

That is what a sitemap is for.

llms.txt should be curated.

A useful file answers:

What does this site do?
What pages are most important?
Which resources should agents read first?
Which pages explain the main entities?
Which tools or actions are available?
Which pages are optional context?

If your llms.txt becomes a giant URL dump, it stops being helpful.

How llms.txt Is Different From robots.txt

This distinction matters.

robots.txt is about crawler access.

Example:

User-agent: GPTBot
Disallow: /private
Allow: /

llms.txt is about context.

Example:

## AI Crawler Resources

- [Web Crawlers](https://crawlconsole.com/web-crawlers): Directory of crawler profiles.
- [PerplexityBot](https://crawlconsole.com/web-crawlers/perplexitybot): How to identify Perplexity crawler visits.

One file tells automated systems what they are allowed to fetch.

The other file tells AI systems what your most useful resources are.

They should work together.

If your llms.txt points to a page that your robots.txt blocks, CDN blocks, or WAF challenges, the file may describe a resource that crawlers cannot actually access.

That is why monitoring matters.

How llms.txt Is Different From sitemap.xml

Your sitemap is broad.

It usually lists every indexable URL that search engines should know about.

Your llms.txt should be selective.

It should highlight the pages that are most useful for AI systems and agents:

key docs
product pages
API references
crawler profiles
tool pages
buying guides
comparison pages
policies
FAQs
original research
category explainers

Think of the sitemap as the full inventory.

Think of llms.txt as the curated reading list.

For CrawlConsole, a good llms.txt would not list every blog post.

It would prioritize resources like Web Crawlers, WebMCP, MCP Finder, WebMCP Checker, and important crawler profiles such as GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, and AppleBot-Extended.

Should Every Website Add llms.txt?

Not every site needs it immediately.

It is most useful when:

your site has important docs
your product is technical
your pages explain complex workflows
your site has many resource pages
agents may need to understand your tools
you want AI systems to understand specific entities
you care about AI search visibility
you already monitor crawler behavior

It is less urgent if:

your site is very small
you have no public docs or resources
your key pages are not crawlable
your content is thin or outdated
you cannot monitor whether anything changes after launch

The file is not a magic visibility switch.

If your pages are weak, blocked, duplicated, or poorly linked, llms.txt will not fix that by itself.

Step 1: Choose The Pages Worth Highlighting

Start with 10-30 URLs.

Pick pages that answer important questions:

What is the company?
What does the product do?
Who is it for?
What tools are available?
What docs explain the workflow?
What pages support commercial discovery?
What pages explain key entities?
What pages should agents use as references?

For a SaaS company, this might include:

homepage
product overview
docs
API reference
pricing
comparison pages
use-case pages
integration pages
security page
support docs

For an ecommerce company, this might include:

category pages
product search
product guides
sizing or compatibility guides
shipping and return policies
high-intent product pages
buying guides
FAQs

For CrawlConsole, useful candidates include:

Step 2: Group Pages By Agent Job

Do not group only by website navigation.

Group by the job an AI agent or LLM might need to complete.

Examples:

| Agent job | Useful section | |---|---| | Identify crawler traffic | AI Crawler Resources | | Check whether agents can use a site | Agent-Readable Website Tools | | Discover MCP servers | MCP Discovery | | Understand ecommerce AI discovery | Agentic Commerce | | Test prompts about brand visibility | Prompt Workflows | | Troubleshoot crawler blocking | Crawler Access Guides |

This is more useful than generic sections like "Company" and "Resources."

The structure should help an AI system choose the right page for the task.

Step 3: Add Short Descriptions To Every Link

A link without context is weaker than a link with context.

Weak:

- [Web Crawlers](https://crawlconsole.com/web-crawlers)

Better:

- [Web Crawlers](https://crawlconsole.com/web-crawlers): Directory of AI and search crawler user agents, including OpenAI, Anthropic, Perplexity, and Google crawlers.

Descriptions should explain:

what the page is
who it is for
what problem it solves
when an agent should use it

Avoid hype.

Use clear labels.

Step 4: Keep It Consistent With Your Internal Links

Your llms.txt should not be the only place that connects important pages.

If a page is important enough to include in llms.txt, it should usually also be linked from:

relevant blog posts
product pages
docs
resource hubs
navigation or footer
related guide sections
sitemap

For example, if your llms.txt highlights PerplexityBot, then related articles like How to Check If PerplexityBot Crawled Your Website should also link to it.

If your llms.txt highlights WebMCP Checker, then WebMCP guides should link to it naturally.

llms.txt should reinforce the site graph.

It should not be a workaround for poor internal linking.

Step 5: Publish It At The Root

Publish the file at:

https://example.com/llms.txt

Then verify:

it returns 200
it is publicly accessible
it is not blocked by robots.txt
it is not behind authentication
it is served as text or Markdown
it does not redirect unexpectedly
it does not include private information
it contains absolute links where useful

Also consider whether you need an expanded version, such as llms-full.txt, if your docs or product information require more context.

For many marketing sites, start simple.

The first version should be useful, short, and maintainable.

Step 6: Monitor Whether AI Crawlers Request It

This is the CrawlConsole-specific step most llms.txt guides skip.

After publishing, watch for requests to:

/llms.txt
/llms-full.txt

Track:

crawler name
timestamp
status code
IP or ASN if available
redirect behavior
whether the same crawler later requests linked pages
whether important linked pages get revisited

Look for crawlers such as:

The important question is not only:

Did a crawler request llms.txt?

It is also:

Did the crawler later request the important pages linked inside it?

That gives you a stronger signal that the file may be part of a discovery path.

Step 7: Compare Before And After Publishing

Before adding llms.txt, record a baseline.

For your most important pages, note:

AI crawler visits
status codes
revisit frequency
last crawl date
prompt test results
AI referral traffic, if any
GSC impressions, if relevant

After publishing, compare:

Did /llms.txt get requested?
Which crawlers requested it?
Did crawlers revisit linked pages?
Did crawl frequency change on highlighted pages?
Did prompt test answers improve?
Did any pages get blocked or redirected?
Did any linked pages remain ignored?

Do not over-attribute.

If crawler visits increase, llms.txt may be one factor. Internal links, new content, external mentions, sitemaps, and normal crawler schedules may also play a role.

The goal is evidence, not certainty.

Step 8: Use Prompt Tests Carefully

Prompt tests are useful, but they are not crawler logs.

After publishing llms.txt, run repeatable prompts:

"What does [brand] do?"
"What tools does [brand] offer?"
"How can I monitor AI crawler traffic?"
"Which pages explain [product category]?"
"What is the best resource for [specific workflow]?"

Use the Prompt Library to keep these tests consistent.

Track:

whether the answer mentions your brand
whether it describes the right pages
whether it cites or references the right resources
whether it invents outdated information
whether competitors appear instead
whether answers change after content updates

Prompt tests tell you what AI systems say.

Crawler logs tell you what automated systems requested.

Use both, but do not confuse them.

Step 9: Maintain llms.txt Like A Product Surface

Do not publish llms.txt once and forget it.

Update it when:

new product pages launch
docs change
crawler resources are added
WebMCP pages are updated
MCP tools are added
high-value blog posts are published
ecommerce product guides change
important URLs are redirected
pages are removed or consolidated

Treat it like a lightweight product surface for agents.

If the file links to stale pages, broken pages, or outdated descriptions, it can create confusion instead of clarity.

Common Mistakes

Mistake 1: Treating llms.txt Like A Ranking Factor

There is no guarantee that adding llms.txt will improve AI search visibility.

Treat it as a discovery and context layer, not a ranking lever.

Mistake 2: Copying Your Sitemap

Do not list everything.

Curate the most useful resources.

Mistake 3: Including Private Information

The file is public.

Do not include anything confidential.

Mistake 4: Linking To Blocked Pages

If llms.txt links to pages that robots.txt, CDN rules, WAF rules, or auth walls block, the file may point crawlers toward dead ends.

Mistake 5: Never Monitoring It

If you cannot tell whether /llms.txt gets requested, you cannot learn much from the implementation.

Monitor crawler behavior after publishing.

llms.txt Checklist

Before publishing:

Purpose: the file explains what the site does and which resources matter.
Structure: it uses one H1, a short summary, and clear H2 sections.
Links: every important link has a useful description.
Curation: it lists important pages, not every page.
Safety: it contains no private data, keys, or internal URLs.
Access: it returns 200 at /llms.txt.
Consistency: linked pages are also supported by internal links and sitemap coverage.
Monitoring: crawler requests to /llms.txt and linked pages are tracked after publishing.
Prompt tests: answer behavior is tested separately from crawler access.
Maintenance: the file is updated when important pages change.

Where CrawlConsole Fits

CrawlConsole fits after the file goes live.

Use Web Crawlers to identify AI crawler user agents.

Then monitor:

whether /llms.txt gets requested
which crawler requested it
whether the crawler received 200
whether linked pages were visited after the request
whether important pages stayed ignored
whether crawler behavior changed after updates

For agent-readable site work, pair llms.txt with WebMCP and the WebMCP Checker.

For MCP discovery workflows, connect relevant pages to MCP Finder.

For content planning, use the workflow in How to Use AI Crawler Logs to Find Content Ideas.

The practical point:

Do not just add llms.txt. Watch what happens after you add it.

The Bottom Line

llms.txt is worth testing if your website has important resources that AI systems should understand.

But it should be treated as an emerging convention, not a guaranteed ranking shortcut.

Use it to provide a curated, readable map of your most important resources.

Then monitor:

whether AI crawlers request it
whether they receive a clean response
whether they visit linked pages
whether important pages get revisited
whether prompt answers improve over time

That is the difference between adding a file and building an AI crawler visibility workflow.

Distribution Copy

X Post

Adding llms.txt is not the finish line.

After publishing it, check:

did AI crawlers request /llms.txt?
did they receive 200?
did they later visit the linked pages?
did prompt answers improve?
are stale or blocked URLs listed?

Drafted a practical monitoring checklist for llms.txt.

LinkedIn Post

llms.txt is getting more attention in AI search and technical SEO.

But the useful question is not only whether you should add it.

The better question is:

After you add it, can you tell whether AI crawlers actually request it and follow the resources it links to?

I drafted a practical guide covering:

what llms.txt should include
how it differs from robots.txt and sitemap.xml
what not to include
how to publish it safely
how to monitor crawler requests after it goes live
how to combine crawler logs with prompt tests

The main point: treat llms.txt as an agent-readable discovery layer, not a magic AI ranking tag.