How to whitelist our audit crawler
When your site is behind a bot-protection layer (Cloudflare, Vercel, DataDome or similar), our crawler may not be able to fetch your pages and the audit will fail. This page explains how to let it through.
What our crawler is
We run a self-hosted Crawl4AI instance to fetch the pages listed in your audit order. The crawl is a one-shot operation -- it runs once per audit, respects your robots.txt directives (including noai and noimageai), and the fetched content is used solely to generate your report. It is not shared with any third-party training pipeline.
Whitelist by IP
The most reliable way to allow our crawler is to whitelist it by IP address. Our egress IPs are published via DNS -- the hostname crawler.geo.gg always resolves to the current IPs of our crawler. You can verify the current IPs at any time with:
dig +short crawler.geo.gg
nslookup crawler.geo.gg
Current IPs as of this page load:
195.201.165.51
Why there is no single stable User-Agent
Our crawler uses browser engines (Chromium) to handle JavaScript-rendered sites. Browser engines cycle through real browser User-Agent strings to pass basic bot-detection heuristics -- pinning one User-Agent would make it trivially blockable. The IP address is the only stable, verifiable identifier for our crawler. Whitelisting by IP is therefore the recommended approach.
Where to add the whitelist
- Cloudflare WAF: Security > WAF > Tools > IP Access Rules -- add each IP with action Allow and apply it to your zone.
- Vercel Firewall: Project Settings > Security > Firewall -- add a rule with condition IP Address equals each crawler IP and action Allow.
- DataDome: DataDome dashboard > Allowlist -- add each crawler IP to the IP allowlist.
After whitelisting, use the Retry button on your audit page to re-run the crawl. If you have trouble or your WAF provider is not listed above, reply to your audit confirmation email and we will help.