Crawl Control

Robots.txt Tester

Validate whether one crawler can fetch one URL path from pasted rules or a bounded live fetch of the published robots file.

Use notes

Robots.txt controls crawling, not indexing. A blocked URL can still appear in search if engines discover it elsewhere, while `noindex` only works when the page stays crawlable long enough to be seen.

Robots.txt controls crawling, not indexing. A blocked URL can still appear in search if engines discover it elsewhere, while `noindex` only works when the page stays crawlable long enough to be seen.

Crawler test

Use paste mode for drafts and pull requests. Use live mode to fetch the currently published robots file for the target origin.

Paste the draft or deployed file contents exactly as the crawler would see them.

Robots.txt evaluation

Idle

No result yet

Run the tool to see the result here.

Trust

How this tool handles the task

How it runs

Paste mode evaluates the rules you provide. Live mode fetches only the published `robots.txt` file for the entered origin.

Current limits

Each run checks one crawler against one path. Unsupported directives are surfaced as warnings instead of silently changing the result.

Privacy

Only the URL, crawler name, and optional pasted rules are used for the current run.

Examples

How to use this tool

  1. Use paste mode for draft rules and pull requests.
  2. Use live mode when you need the currently published `origin/robots.txt` file for the tested URL.
  3. Review the matched rule, warnings, and fetched source before deciding whether the target path is safely crawlable.

Common mistakes and limits

  • The tool fetches only one live file: `origin/robots.txt`, not linked sitemaps or downstream pages.
  • `Disallow` affects crawling, not whether a discovered URL can still appear in search.
  • If the target page stays blocked, later canonical or `noindex` checks will not reflect what search crawlers can see.

Next steps

Generate or revise the rule

Go back here when the live rule is wrong and needs editing.

Check the indexation layer next

Once the page is crawlable, verify the live search signals that depend on that access.

FAQ

Does `Disallow` keep a URL out of Google?

No. `Disallow` affects crawling, not indexing. A blocked URL can still be listed in search if Google discovers it from links, sitemaps, or previous crawls.

Why does live mode fetch only one file?

The tester intentionally fetches only `origin/robots.txt` for the supplied URL. That keeps the validation bounded and matches how robots files are actually published.

Which rule patterns does this validator understand?

It evaluates `Allow`, `Disallow`, wildcard `*`, end-anchor `$`, longest-match precedence, and allow-over-disallow tie breaks. Unsupported directives are ignored and surfaced as warnings.