Robots.txt file
Understanding Web Robots
Web robots, also referred to as crawlers, web wanderers, or spiders, are software programs designed to automatically navigate the internet. They serve various purposes, with search engines utilizing them to index web content.
The robots.txt file implements the Robots Exclusion Protocol (REP), which allows website administrators to specify which parts of the site should be off-limits to specific robot user agents. It enables web administrators to control access, such as permitting access to web content while preventing indexing of directories like cgi, private areas, or temporary folders.
Placing the robots.txt File
A standard robots.txt file is included in the root directory of your Origen installation. It must be located in the root of the domain or subdomain and named robots.txt
.
Origen in a Subdirectory
Placing the robots.txt file in a subdirectory is not valid. Web robots only check for this file in the root directory of the domain. If your Origen site is installed within a folder like example.com/origen/, the robots.txt file must be moved to the site's root directory at example.com/robots.txt.
Note: The name of the Origen folder must be included as a prefix in the disallowed path. For instance, the Disallow rule for the /backend/
folder should be modified to read Disallow: /origen/backend/
.
Origen robots.txt Contents
This is the contents of a standard Origen robots.txt:
User-agent: *
Disallow: /backend/
Disallow: /api/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /apps/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /widgets/
Disallow: /extenders/
Disallow: /tmp/
Robot Exclusion
To exclude directories or prevent robots from accessing specific areas of your website, you can incorporate a Disallow directive within the robots.txt file. For instance, if you intend to restrict all robots from accessing the /tmp directory, you can add the following rule:
Disallow: /tmp/
See also:
- Block access to your content at Google's Help Center.
Syntax Checking
For syntax checking you can use a validator for robots.txt files. Try one of these:
- Test your robots.txt with the robots.txt Tester at Google.
- robots.txt Checker by Search Engine Promotion Help.
General Information
- The Web Robots Pages The main Website for robots.txt.
- A Standard for Robot Exclusion The original standard.
- Robots meta tag, data-nosnippet, and X-Robots-Tag specifications
- Robots.txt and Search Indexing