Crawl Control
Robots.txt Tester
Validate whether one crawler can fetch one URL path from pasted rules or a bounded live fetch of the published robots file.
Use notes
Robots.txt controls crawling, not indexing. A blocked URL can still appear in search if engines discover it elsewhere, while `noindex` only works when the page stays crawlable long enough to be seen.
Crawler test
Use paste mode for drafts and pull requests. Use live mode to fetch the currently published robots file for the target origin.
Paste the draft or deployed file contents exactly as the crawler would see them.
Robots.txt evaluation
IdleNo result yet
Run the tool to see the result here.
Trust
How this tool handles the task
How it runs
Paste mode evaluates the rules you provide. Live mode fetches only the published `robots.txt` file for the entered origin.
Current limits
Each run checks one crawler against one path. Unsupported directives are surfaced as warnings instead of silently changing the result.
Privacy
Only the URL, crawler name, and optional pasted rules are used for the current run.
Examples
How to use this tool
- Use paste mode for draft rules and pull requests.
- Use live mode when you need the currently published `origin/robots.txt` file for the tested URL.
- Review the matched rule, warnings, and fetched source before deciding whether the target path is safely crawlable.
Common mistakes and limits
- The tool fetches only one live file: `origin/robots.txt`, not linked sitemaps or downstream pages.
- `Disallow` affects crawling, not whether a discovered URL can still appear in search.
- If the target page stays blocked, later canonical or `noindex` checks will not reflect what search crawlers can see.
Next steps
Generate or revise the rule
Go back here when the live rule is wrong and needs editing.
Check the indexation layer next
Once the page is crawlable, verify the live search signals that depend on that access.
FAQ
Does `Disallow` keep a URL out of Google?
No. `Disallow` affects crawling, not indexing. A blocked URL can still be listed in search if Google discovers it from links, sitemaps, or previous crawls.
Why does live mode fetch only one file?
The tester intentionally fetches only `origin/robots.txt` for the supplied URL. That keeps the validation bounded and matches how robots files are actually published.
Which rule patterns does this validator understand?
It evaluates `Allow`, `Disallow`, wildcard `*`, end-anchor `$`, longest-match precedence, and allow-over-disallow tie breaks. Unsupported directives are ignored and surfaced as warnings.