A research crawler that checks websites for a small set of standard, publicly-served “well-known” files. This page explains what it does and how to opt out.
The crawler measures the adoption of AI-readiness and security-related “well-known” files across the public web, as part of non-commercial security research. For each website it makes a small number of HTTP(S) requests for these standard files, which are intended to be publicly fetched:
/llms.txt/llms-full.txt/AGENTS.md/.well-known/security.txt/robots.txtNot every run requests all of these; many runs request only /llms.txt.
A site is visited only occasionally, and requests to a given site are spread out
to keep the load negligible.
The crawler accesses only publicly available resources — it does not attempt to reach anything behind a login, paywall, or other access control, and it performs no write actions.
If you would prefer that we not request these files from your domain, email opt-out@llmstxtscan.org with the domain name(s) you want excluded, and we will remove them from future scans.
llms.txt and related files are an emerging convention for making websites
more usable by AI systems. This project studies how widely they have been adopted,
their quality, and how they co-occur with security files such as
security.txt. The research is non-commercial.
General enquiries: contact@llmstxtscan.org
Opt-out requests: opt-out@llmstxtscan.org