More games at WuGames.ioSponsoredDiscover free browser games — play instantly, no download, no sign-up.Play

Robots.txt Tester

Test robots.txt rules with RFC 9309 group selection and Allow/Disallow precedence. Bulk-check URLs per bot, export CSV. Blocking is not deindexing.

clearClearpastePaste

Robots.txt Tester - Test & Validate Crawler Rules

A powerful robots.txt tester and validator tool that helps you test if specific URLs are allowed or blocked by robots.txt rules for different search engine crawlers. Test Google, Bing, Yahoo, and custom user-agents to ensure your robots.txt file works correctly. Essential for SEO optimization and website crawl management.

What is robots.txt?

Robots.txt is a text file placed in the root directory of a website (e.g., example.com/robots.txt) that tells search engine crawlers which pages or sections of your site they can or cannot access. It uses simple directives:

- User-agent: Specifies which crawler the rules apply to (* means all)
- Disallow: Tells crawlers not to access specific paths
- Allow: Explicitly permits access to paths (overrides Disallow)
- Sitemap: Points crawlers to your XML sitemap
- Crawl-delay: Specifies delay between requests (not supported by all bots)

Robots.txt is part of the Robots Exclusion Protocol and is respected by reputable search engines like Google, Bing, Yahoo, and others. However, it's not a security measure - malicious bots can ignore it.

How do I use this robots.txt tester?

Using the tester is simple:

1. Paste your robots.txt content into the text area (or click 'Load Sample' for an example)
2. Select a User-Agent (Googlebot, Bingbot, etc.) or choose 'Custom' for specific bots
3. Enter the URL path you want to test (e.g., /admin/dashboard)
4. Click 'Test' to see if the path is allowed or disallowed

The tool will:
- Parse all robots.txt rules
- Apply the correct precedence rules
- Show whether the URL is allowed or blocked
- Display which specific rule matched
- Show all parsed directives for reference

You can test multiple paths and user-agents to ensure your robots.txt works as intended.

What are the robots.txt rule precedence rules?

When multiple rules match a URL, robots.txt follows these precedence rules:

1. Most Specific Path Wins: A longer, more specific rule overrides a shorter one
- Disallow: /admin/ vs Disallow: /admin/settings/
- The longer path takes precedence

2. Allow Beats Disallow: When rules are equally specific, Allow wins
- Disallow: /admin/ + Allow: /admin/public/
- /admin/public/ is allowed despite the Disallow rule

3. User-Agent Specificity: Specific user-agent rules override wildcard (*)
- User-agent: Googlebot rules take precedence over User-agent: * for Google

4. Default Allow: If no rule matches, access is allowed by default

Our tester correctly implements these rules to give you accurate results that match how search engines interpret your robots.txt file.

How does this tester pick the matching user-agent group?

Per RFC 9309 (the standardized Robots Exclusion Protocol) and Google's implementation, a crawler obeys ONLY the single most specific matching group - not a combination of groups.

- If a group names your bot (e.g. User-agent: Googlebot), that group is authoritative and the User-agent: * group is ignored entirely for Googlebot.
- The most specific match is the one whose agent token is the longest prefix of the crawler name.
- User-agent: * is used only as a fallback when no named group matches.
- Consecutive User-agent lines before a block share that block's rules (one group, multiple agents).

This tester implements exactly this group-selection logic, so a rule under User-agent: * will not leak into a bot that has its own group. This is the most common source of wrong verdicts in naive testers.

Does robots.txt remove a page from Google?

No. Disallowing a URL in robots.txt only asks crawlers not to FETCH it - it does not remove the URL from Google's index. A blocked URL can still appear in search results (often with no description) if Google discovers it through links from other pages.

To actually keep a page out of the index, you must let crawlers reach the page and use one of:

- A meta robots noindex tag: <meta name="robots" content="noindex">
- An X-Robots-Tag: noindex HTTP response header

Important: if the page is blocked by robots.txt, Google cannot crawl it and therefore cannot see the noindex directive. So never block a URL in robots.txt that you also want to deindex - allow crawling and use noindex instead. This is a critical distinction professionals must get right.

Robots.txt Tester — Test robots.txt rules with RFC 9309 group selection and Allow/Disallow precedence. Bulk-check URLs per bot, export CSV.
Robots.txt Tester

Does Google support Crawl-delay or Noindex in robots.txt?

No to both, for Googlebot:

- Crawl-delay: Google ignores it. Crawl rate is managed automatically (and previously via Search Console). Bing and Yandex do honor Crawl-delay, so keep it if you target those engines, but it has no effect on Google.
- Noindex: in robots.txt: Google dropped support on 1 September 2019. The unofficial 'Noindex:' directive in robots.txt no longer works - use a meta robots noindex tag or an X-Robots-Tag HTTP header instead.

Also note paths are CASE-SENSITIVE (/Admin/ and /admin/ are different), while directive names like Disallow / User-agent are case-INSENSITIVE. Our tester reflects this behavior.

Can I test different search engine bots?

Yes! The tool supports testing with many popular search engine crawlers:

- Googlebot: Google's main web crawler
- Googlebot-Image: For Google Image Search
- Googlebot-News: For Google News
- Googlebot-Video: For Google Video Search
- Bingbot: Microsoft Bing's crawler
- Slurp: Yahoo's web crawler
- DuckDuckBot: DuckDuckGo's crawler
- Baiduspider: Baidu (Chinese search engine)
- YandexBot: Yandex (Russian search engine)
- Social media bots: Facebook, Twitter, LinkedIn
- Custom: Test any user-agent string

Different bots can have different rules in your robots.txt, and this tool lets you test each one individually to ensure they behave as expected.

What are wildcards in robots.txt?

Robots.txt supports two important wildcards:

1. Asterisk (*) - Matches any sequence of characters
Examples:
- Disallow: /*.pdf$ (blocks all PDF files)
- Disallow: /admin/* (blocks everything under /admin/)
- Allow: /public/*.html (allows all HTML in /public/)

2. Dollar Sign ($) - Matches end of URL
Examples:
- Disallow: /*.pdf$ (blocks URLs ending in .pdf)
- Disallow: /admin$ (blocks /admin but not /admin/page)
- Allow: /search$ (allows exactly /search, not /search/results)

Without $, a rule matches any URL starting with that pattern:
- Disallow: /admin (matches /admin, /admin/, /admin/page, /administrator)
- Disallow: /admin$ (matches only /admin)

Our tester correctly handles both wildcards to accurately test your rules.

Common robots.txt mistakes

Avoid these common robots.txt errors:

1. Blocking CSS/JS files: Don't block resources Google needs to render pages
- Bad: Disallow: /*.css$
- This can hurt SEO as Google can't render your site properly

2. Confusing case-sensitivity: PATHS are case-sensitive, directive NAMES are not
- /Admin/ and /admin/ are different paths and must match exactly
- Directive names like Disallow:, Allow:, and User-agent: are case-insensitive, but stick to the canonical spelling for clarity

3. Blocking entire site unintentionally:
- Disallow: / (blocks everything!)
- Make sure this is intentional

4. Using robots.txt for security: It's not a security tool
- Malicious bots ignore it
- Use proper authentication instead

5. Forgetting the Allow directive:
- You can unblock subdirectories of blocked directories
- Disallow: /admin/ then Allow: /admin/public/

Use this tester to catch these mistakes before deploying your robots.txt!

Is my data safe?

Yes, your data is completely safe:

- All testing happens in your browser
- No robots.txt content is sent to any server
- We don't store or log any data you test
- Works completely offline after page load
- No tracking or analytics on your test data
- Open-source client-side processing

You can verify privacy by checking your browser's network tab - no requests are made when testing robots.txt rules.

Key Features

  • Test any robots.txt content
  • Bulk URL testing with CSV export
  • RFC 9309 most-specific-group selection
  • Support for all major search engine bots
  • Custom user-agent testing
  • Accurate rule precedence implementation
  • Wildcard support (* and $)
  • Visual parsing of all rules
  • Allow/Disallow detection
  • Matched rule highlighting
  • Sample robots.txt for quick testing
  • Real-time validation
  • Dark mode support
  • 100% client-side processing
  • No data sent to servers
  • Works offline
  • Mobile-friendly design