Search Google Appliance


FAQs

How can I prevent a site/page from showing up in PSU search results?

There are three ways to ensure a web site/page is not included in the Portland State search index, and thus, not returned in any search results.

  1. Preferred: robots.txt file
  2. OK: <meta name="robots"> tags
  3. Last Resort: notify Web Communications

Note that if a web site/page is already included in the PSU search index, it will not be excluded until you implement one of these solutions and the search index is reindexed.

How to use a robots.txt file

You can place a robots.txt file at the root directory of a website to give very specific instructions to search engine indexing robots (aka, "spiders") about what directories and files you want robots to include or exclude.

This solution for excluding web resources from the PSU search index is preferred because you have direct control over it and your robots.txt file will manage other robots as well. For example, if you want to exclude a web page/directory/site from the PSU search index because it is not appropriate for public consumption, you probably also want to exclude it from general public web indexes (eg, Google).

Implementation Notes

  • The robots.txt file must be named exactly that and must be a plain-text file just like HTML pages.
  • You must place the robots.txt file at the root of a website for robots to find it, but it can include instructions about pages/directories anywhere in the site.

Check out these resources to learn how to use robots.txt files:

How to use <meta name="robots"> tags

You can place <meta name="robots"> tags on individual web pages to give very specific instructions to search engine indexing robots (aka, "spiders") about what you want robots to include or exclude.

This solution for excluding web resources from the PSU search index is a good choice if you want to have greater control over what robots do with your web pages. Compared to using a robots.txt file, it is less easy to use <meta name="robots"> tags for global instructions as the tags must be included on individual pages. You could use both a robots.txt file and <meta name="robots"> tags on a website to give both broad and very detailed instructions to indexing robots.

Check out these resources to learn how to use <meta name="robots"> tags:

Notify Web Communications to exclude from index

You can notify Web Communications to exclude specific web resources from the PSU search index.

This solution for excluding web resources from the PSU search index is deprecated because it requires Web Communications to make a manual entry in a list of exclusions and adds an extra step to the process. If your exclusion requirements change, you will have to remember to notify Web Communications and they will have to update the PSU search index exclusion list again manually. Use this solution only if the preferred solutions above are impractical.