WDT Applications: Path Crawler

WDT: Tools Path Crawler

Description

Image of a webpage using the path crawler The Path Crawler is an interface element that allows users to easily navigate a site. It provides a quick feedback on the user location and overall structure of the site and possible link options within a site.

The Path Crawler is required for Departments, Faculties, and Administrative pages.

The Path Crawler is a hierarchical listing of a web page path. It starts by listing the root category of the page which most times will be the actual office or department, in other cases uprm.edu will be the root category. It continues by listing each sub category until it reaches the page. It will list a descriptive name. Each item in the list is a link to that category.

Implementation

A web page author can develop his own path crawler by following the guidelines above. The WDT has developed an online tool to automatically create a path crawler into any web page regardless of where it is hosted. This tools works by adding a Javascript into your pages that calls the crawler tool. The crawler tool is a PHP CGI that is available at http://www.uprm.edu/app/crawler.php.

Getting Ready:
Before you use the crawler tool you must create an infrastructure in your website for the tool. This is a one time requirement and it is used for all of your pages. The crawler tools requires you to create a text file named "crawler.txt" inside each directory (folder) in your website. This files provides information about the folders that represent a category. It provides a descriptive name and link for this folder.

Each director in the path must have a file named "crawler.txt". This file is used to retrieve the information printed for that directory. You can create this file in any text editor. The format for the crawler.txt is very simple. You must have one field per line. A field has the following format field name=field value.

The first field is author. The field "author" is the email of the person in charge of the contents of that directory.

The field "titlesp", which is the name in Spanish users will see in the path crawler. The field "titleen" is the same as "titlesp" but in English.

The same way we have "linken" and "linksp" where you set the URL (the actual link) that the user will be taken when he or she clicks on that item.

Fields can be in any order. If your pages only use one language you can use only those fields for the language you use.

Example of a "crawler.txt" file:


	author=jose@uprm.edu titleen=Department of Biology titlesp=Departmento de Biologia linken=www.uprm.edu/biology/index.html linksp=www.uprm.edu/biology/indexsp.html

You only put a crawler.txt file in folders which contain pages that use the path crawler tool. You must set the privilege for the crawler.txt so that the crawler tool can read the file. In a Unix or MacOS X system you can do this by using the chmod command in a Telnet or SSH session. Some web publishing tools and FTP applications provide a dialog to change a file privileges.

Using the crawler tool in your pages:
Once your site has all the crawler.txt files required you can use the crawler tool in any page inside any of these directories.

The crawler is placed in either the right or left side of your page right after the banner. To include a path crawler in a web page you must add one Javascript to your page.

This java script is the one that calls the pathcrawler tool and generates the menu. The actual code for the java script is not included instead you call an external script. When you call the external script you must provide some arguments to the crawler tool so it can create your path.

	This is an example of the javascript need in the index page of our news section.
	<script type="text/javascript" src="http://www.uprm.edu/app/crawler.php?root=news/index.html/lang=sp"> </script> The arguments to the crawler tool are the ones after the ? question mark just like in any other cgi. Next we will discuss the diferent arguments and their uses.

The crawler tool requires two arguments to be able to create the Path Crawler:

crawler.php?lang=sp&root=/news/index.html


	lang
	The language of the path crawler. Possible values are "sp" for Spanish and "en" for English. If lang is not specified the default language is English.

	root
	The location (path) of the document requesting the path crawler. The relative path from your domain to the document. This is the part after the domain in the url. For example if my url is HTTP://WWW.UPRM>EDU/NEWS/INDEX.HTML then my root will be NEWS/INDEX.HTML. Notice that you do not need a leading backslash for the root or in any other argument that asks for a path.

The crawler tool also provides optional arguments that allow to fine tune the path crawler:


*	domain
	The domain argument is very important if your page is served under a domain other than www.uprm.edu like grad.uprm.edu or enterprise.uprm.edu for example. The domain argument tells the tool that your pages are found in a different domain. This important because if you are using the crawler on pages hosted in your own server the crawler will not work. The domain is nothing else than the actual domain name of your server. For example if I use the crawler tool in the pages hosted in ADEM my domain will be DOMAIN=enterprise.uprm.edu . Lets see an example of the whole address: www.uprm.edu/app/crawler.php?root=index.html&lang=en&domain=enterprise.uprm.edu&bgcolor=0000ff NOTE The order in which the arguments appear is inrelevant for the tool.

	bgcolor
	The color of the path crawler. This value must be in hex format just like any color in the web but without the palm sign (#). For example if I want the path to be in blue I'll add the bgcolor argument like this: www.uprm.edu/app/crawler.php?root=index.html&lang=en&bgcolor=0000ff Notice that arguments are appended with the & character.

	startat
	The startat argument is very handy because allows the crawler to make it seem like a page is under another unit even when is not physically at the same location. For example the website for Nursing is in http://www.uprm.edu/nursing yet it belongs to Arts & Sciences which is at ac.uprm.edu. Using the startat argument we can make it so that the path of nursing is under AS. To do so we set STARTAT to the full url without the "http://" of the parent unit. In the example given before the startat argument will be something like this STARTAT=ac.uprm.edu/departments . The startat must point to a directory not a page. This directory must have a crawler.txt.

Last Revision: Nov. 02, WDT, wmaster@uprm.edu