Robots.txt Tester

Free online robots.txt tester and validator. Test if URLs are allowed or blocked by robots.txt rules for different search engine bots. Perfect for SEO and web developers.

clearClearpastePaste

Robots.txt Tester - Test & Validate Crawler Rules

A powerful robots.txt tester and validator tool that helps you test if specific URLs are allowed or blocked by robots.txt rules for different search engine crawlers. Test Google, Bing, Yahoo, and custom user-agents to ensure your robots.txt file works correctly. Essential for SEO optimization and website crawl management.

What is robots.txt?

Robots.txt is a text file placed in the root directory of a website (e.g., example.com/robots.txt) that tells search engine crawlers which pages or sections of your site they can or cannot access. It uses simple directives:

- User-agent: Specifies which crawler the rules apply to (* means all)
- Disallow: Tells crawlers not to access specific paths
- Allow: Explicitly permits access to paths (overrides Disallow)
- Sitemap: Points crawlers to your XML sitemap
- Crawl-delay: Specifies delay between requests (not supported by all bots)

Robots.txt is part of the Robots Exclusion Protocol and is respected by reputable search engines like Google, Bing, Yahoo, and others. However, it's not a security measure - malicious bots can ignore it.

How do I use this robots.txt tester?

Using the tester is simple:

1. Paste your robots.txt content into the text area (or click 'Load Sample' for an example)
2. Select a User-Agent (Googlebot, Bingbot, etc.) or choose 'Custom' for specific bots
3. Enter the URL path you want to test (e.g., /admin/dashboard)
4. Click 'Test' to see if the path is allowed or disallowed

The tool will:
- Parse all robots.txt rules
- Apply the correct precedence rules
- Show whether the URL is allowed or blocked
- Display which specific rule matched
- Show all parsed directives for reference

You can test multiple paths and user-agents to ensure your robots.txt works as intended.

What are the robots.txt rule precedence rules?

When multiple rules match a URL, robots.txt follows these precedence rules:

1. Most Specific Path Wins: A longer, more specific rule overrides a shorter one
- Disallow: /admin/ vs Disallow: /admin/settings/
- The longer path takes precedence

2. Allow Beats Disallow: When rules are equally specific, Allow wins
- Disallow: /admin/ + Allow: /admin/public/
- /admin/public/ is allowed despite the Disallow rule

3. User-Agent Specificity: Specific user-agent rules override wildcard (*)
- User-agent: Googlebot rules take precedence over User-agent: * for Google

4. Default Allow: If no rule matches, access is allowed by default

Our tester correctly implements these rules to give you accurate results that match how search engines interpret your robots.txt file.

Can I test different search engine bots?

Yes! The tool supports testing with many popular search engine crawlers:

- Googlebot: Google's main web crawler
- Googlebot-Image: For Google Image Search
- Googlebot-News: For Google News
- Googlebot-Video: For Google Video Search
- Bingbot: Microsoft Bing's crawler
- Slurp: Yahoo's web crawler
- DuckDuckBot: DuckDuckGo's crawler
- Baiduspider: Baidu (Chinese search engine)
- YandexBot: Yandex (Russian search engine)
- Social media bots: Facebook, Twitter, LinkedIn
- Custom: Test any user-agent string

Different bots can have different rules in your robots.txt, and this tool lets you test each one individually to ensure they behave as expected.

What are wildcards in robots.txt?

Robots.txt supports two important wildcards:

1. Asterisk (*) - Matches any sequence of characters
Examples:
- Disallow: /*.pdf$ (blocks all PDF files)
- Disallow: /admin/* (blocks everything under /admin/)
- Allow: /public/*.html (allows all HTML in /public/)

2. Dollar Sign ($) - Matches end of URL
Examples:
- Disallow: /*.pdf$ (blocks URLs ending in .pdf)
- Disallow: /admin$ (blocks /admin but not /admin/page)
- Allow: /search$ (allows exactly /search, not /search/results)

Without $, a rule matches any URL starting with that pattern:
- Disallow: /admin (matches /admin, /admin/, /admin/page, /administrator)
- Disallow: /admin$ (matches only /admin)

Our tester correctly handles both wildcards to accurately test your rules.

Common robots.txt mistakes

Avoid these common robots.txt errors:

1. Blocking CSS/JS files: Don't block resources Google needs to render pages
- Bad: Disallow: /*.css$
- This can hurt SEO as Google can't render your site properly

2. Typos and syntax errors: Robots.txt is case-sensitive
- Use 'Disallow:', not 'disallow:' or 'DisAllow:'
- Use 'User-agent:', not 'User-Agent:' (though most bots accept both)

3. Blocking entire site unintentionally:
- Disallow: / (blocks everything!)
- Make sure this is intentional

4. Using robots.txt for security: It's not a security tool
- Malicious bots ignore it
- Use proper authentication instead

5. Forgetting the Allow directive:
- You can unblock subdirectories of blocked directories
- Disallow: /admin/ then Allow: /admin/public/

Use this tester to catch these mistakes before deploying your robots.txt!

Is my data safe?

Yes, your data is completely safe:

- All testing happens in your browser
- No robots.txt content is sent to any server
- We don't store or log any data you test
- Works completely offline after page load
- No tracking or analytics on your test data
- Open-source client-side processing

You can verify privacy by checking your browser's network tab - no requests are made when testing robots.txt rules.

Key Features

  • Test any robots.txt content
  • Support for all major search engine bots
  • Custom user-agent testing
  • Accurate rule precedence implementation
  • Wildcard support (* and $)
  • Visual parsing of all rules
  • Allow/Disallow detection
  • Matched rule highlighting
  • Sample robots.txt for quick testing
  • Real-time validation
  • Dark mode support
  • 100% client-side processing
  • No data sent to servers
  • Works offline
  • Mobile-friendly design