Skip to main content

Robots.txt file

Understanding Web Robots

Web robots, also referred to as crawlers, web wanderers, or spiders, are software programs designed to automatically navigate the internet. They serve various purposes, with search engines utilizing them to index web content.

The robots.txt file implements the Robots Exclusion Protocol (REP), which allows website administrators to specify which parts of the site should be off-limits to specific robot user agents. It enables web administrators to control access, such as permitting access to web content while preventing indexing of directories like cgi, private areas, or temporary folders.

Placing the robots.txt File

A standard robots.txt file is included in the root directory of your Origen installation. It must be located in the root of the domain or subdomain and named robots.txt.

Origen in a Subdirectory

Placing the robots.txt file in a subdirectory is not valid. Web robots only check for this file in the root directory of the domain. If your Origen site is installed within a folder like example.com/origen/, the robots.txt file must be moved to the site's root directory at example.com/robots.txt.

Note: The name of the Origen folder must be included as a prefix in the disallowed path. For instance, the Disallow rule for the /backend/ folder should be modified to read Disallow: /origen/backend/.

Origen robots.txt Contents

This is the contents of a standard Origen robots.txt:

    User-agent: *
Disallow: /backend/
Disallow: /api/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /apps/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /widgets/
Disallow: /extenders/
Disallow: /tmp/

Robot Exclusion

To exclude directories or prevent robots from accessing specific areas of your website, you can incorporate a Disallow directive within the robots.txt file. For instance, if you intend to restrict all robots from accessing the /tmp directory, you can add the following rule:

    Disallow: /tmp/

See also:

Syntax Checking

For syntax checking you can use a validator for robots.txt files. Try one of these:

General Information