Information

Title	Customer Portal Sites and Search Engines

URL Name	self-sites-and-search-engines

Article Details

Solution

Introduction

govService Customer Portal sites are indexed by search engines like Google. This means that when people search for your organization name and relevant search terms, eg. 'Abandoned Vehicle', they could find search results that take them to your 'Report an Abandoned Vehicle' form.

You can control how your Customer Portal site is indexed with what is known as a 'robots.txt' file.

What is Robots.TXT? - from robotstxt.org:

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

 User-agent: * Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don't try to use /robots.txt to hide information.

Editing the Robots.TXT in Self Admin

You can change your robots.txt file in Self Admin > Site Settings.

Editing Search Engine Result Content

Sometimes, search results don't retrieve the most important or useful summary text for the page. For example, when searching for your organization's abandoned vehicle process, you could see the following as the search result's summary:

Please view our Terms and privacy policy. Return to website. This site is powered by Self. JavaScript is not enabled! Some page elements may not display correctly or not display at all! Please enable JavaScript in the browser settings. Here is more information about how to enable it in your browser:.

This is because the robots.txt file allows you to tell crawlers what to index and what not to index. It does not allow you to tell crawlers how it should display indexed results.

Crawlers will retrieve all content, not just the visible content. In Self sites, forms are placed in an iframe which is not registered as content by a crawler. This means that the only content the crawler sees is content outside of the iframe. In our example, the JS not enabled message is due to the fact that the there is no visible content on the page (outside of the forms iframe) before the <noscript> tag, which indicates an element which should be displayed if javascript is disabled in the browser.

The only way to manipulate the content Google returns is to ensure your Self site has content in plaintext on the page before any elements that you don't want to be shown in the search engine results. You could add some alt-text on a logo, or some more page elements around the header, for example.

top of page

Customer Portal Sites and Search Engines

Introduction

What is Robots.TXT? - from robotstxt.org:

Editing the Robots.TXT in Self Admin

Editing Search Engine Result Content