llms.txt crawler bot

A research crawler that checks websites for a small set of standard, publicly-served “well-known” files. This page explains what it does and how to opt out.

Identify this crawler Requests come with the User-Agent:
llms.txt crawler bot (+mailto:opt-out@llmstxtscan.org)

What it does

The crawler measures the adoption of AI-readiness and security-related “well-known” files across the public web, as part of non-commercial security research. For each website it makes a small number of HTTP(S) requests for these standard files, which are intended to be publicly fetched:

/llms.txt
/llms-full.txt
/AGENTS.md
/.well-known/security.txt
/robots.txt

Not every run requests all of these; many runs request only /llms.txt. A site is visited only occasionally, and requests to a given site are spread out to keep the load negligible.

The crawler accesses only publicly available resources — it does not attempt to reach anything behind a login, paywall, or other access control, and it performs no write actions.

How to opt out

If you would prefer that we not request these files from your domain, email opt-out@llmstxtscan.org with the domain name(s) you want excluded, and we will remove them from future scans.

Why this research

llms.txt and related files are an emerging convention for making websites more usable by AI systems. This project studies how widely they have been adopted, their quality, and how they co-occur with security files such as security.txt. The research is non-commercial.

Contact

General enquiries: contact@llmstxtscan.org
Opt-out requests: opt-out@llmstxtscan.org