Excite, Inc. Excite for Web Servers Help
Using The Forms-Based Administration Tools
What Does This Software Do?
Excite for Web Servers makes it easy for you to add searching --
Excite, Inc.'s advanced concept-based searching -- to your Web site.
Excite for Web Servers provides a simple Web-browser interface for
doing all the things necessary to enable concept-based searching of
collections of documents -- administering, indexing, and searching
over the collections. In particular, one can:
- define a document collection -- that is, specify a set of
documents to be considered a single collection over which one can
search,
- design customized pages for displaying to users who wish to search
over that collection,
- index that collection, monitoring the progress, and
- search the collection.
With Excite for Web Servers, it's easy to set up
concept-based-searchable Web sites in minutes.
Installation Information
During installation of the Excite for Web Servers software, an HTML
file containing information about the location of certain Excite for
Web Servers-related files is generated for you. If you would like to
find certain Excite for Web Servers files, you can track them down by
looking at your installation information file.
Main Administration Page
The main administration page is accessible via the AT-admin script.
From this main form you can create collections, configure existing
collections, change passwords, and
configure URL mappings for your Web Server.
Changing the Password
If you would like to change the password that allows access to the
administration pages, you can do so by pressing the Password
button on the main administration page.
Configuring URL Mappings
Pressing the Configure URL Mappings button on the main administration
page allows you to tell EWS where the files that are served by
your Web sever are stored, and how the URLs correspond to them. This
feature is used to ensure that URLs in your search-results-lists
point to the right place.
For example, suppose you had files on your site accessed
with the URL http://foo.bar.com/root1/ that were
stored in the directory /usr/local/www/html on your
machine, while file accessed with the URL http://foo.bar.com/root2/
were stored in /usr/docs/html. You could add the following entries
to the mappings:
/root1/ /usr/local/www/html/
/root2/ /usr/docs/html/
You can also use the URL mappings to deal with aliases to your
server. Suppose all the files in /usr/surveys/html were
served from the same server, but you wanted the URLs to
appear with a different alias, http://survey.bar.com/.
You could add this entry to the mappings:
http://survey.bar.com/ /usr/surveys/html
NOTE: If you are indexing user's public_html directories,
it is not necessary to set up mappings by hand for those
directories, because it is handled automatically by EWS.
Document Collections
A document collection is the specification of a set of HTML or
plain-text files over which one would like to do concept-based
searches. Or more simply, it can be thought of as the searchable
documents themselves.
Besides a name, each document collection has associated with it a set
of configurable attributes -- information about the documents in the
collection and the index to be built on those documents. These
attributes, described in more detail below, include:
- document information -- The CollectionContents. A
specification of which files are to be included in the collection,
by means of either:
- an explicit list of documents, or
- a set of rules describing the documents.
- index information
- The CollectionIndex. The directory in which the index for the
collection should be stored.
- IndexingContact. Optionally, the email address (or hostname in NT
versions) of the person to be notified when indexing of this collection
is complete.
Each document collection has a collection-name.conf file where its
configurable attributes are stored. The
Collection File Format is described more fully in
the documentation on the command-line applications.
New Collection
Creating a new document collection is a two-step process: naming and
configuring.
First, you must give the new collection a name. Simply enter a name
in the field provided on the main Excite for Web Servers
Administration page.
Once named, the new collection must be "configured" -- that is, have
its other attributes defined. Click the Configure New Collection
button to bring up a page on which to provide values for those
attributes. (Defaults are provided, which you can change as
appropriate.) See the next section for more information on these
attributes.
When you're done configuring the new collection, you may then index
it, generate search/result pages for it, and then search over it.
Configurable Attributes
There are a number of attributes associated with each collection that
you can configure using the Configure New Collection form. These
attributes provide information about the documents in the collection,
the location of the index to be built on those documents, and whom to
contact when indexing is complete. These attributes are explained in
detail directly below.
CollectionIndex
When you index a collection for searching, Excite for Web Servers
generates several files which make up that index. The indexing and
searching applications need to know where these index files are
located. The CollectionIndex is simply the name of a directory
where the index files will be stored. If you don't particularly care
where the index goes, you may simply leave the default value provided.
CollectionContents
There are three methods available to describe the collection of files
you wish to have included in the index: Enter the Files Directly, Index
Using File List, and Index ~user Directories. None of these options
are exclusive: you can use any combination of them to pick the files
you want to index.
Enter the Files Directly
The first option is to Enter the Files Directly, and while it's a little
more complicated than using a file list, it's more useful. It
allows you first to specify where the indexer should look for files to
be indexed, and then it allows you to give the indexer rules about
which of the files it finds there to include in or exclude from
indexing, based on file name and content.
To specify where the indexer should look for files to index, you
simply provide a list of file/directory names. Each file listed
will be a candidate for indexing, and each directory will be
"expanded" so that all the files "below" it -- that is, all the files
contained in it and in its traversed subdirectories -- will be
candidates for indexing as well. If a path name contains special
non-alphanumeric characters such as whitespace, be sure to enclose
the full path name with single quotes. Whitespace delineates
individual file/directory names.
Once a set of candidate files as described by your list is created,
then inclusion/exclusion rules are applied to each candidate to see
whether it will be indexed. The set of inclusion/exclusion rules is
the IndexFilter, described below.
Index Using File List
The other option for describing the collection of files you wish to
have indexed is to Index Using (a) File List. Simply provide
the indexer with a filename containing a list of files you wish to
index. Remember that you must use absolute pathnames.
Index ~user Directories
When you choose the option on the configuration page for a collection,
the indexer will find all of the user's directories that contain
an appropriately named directory in the home directory (usually
called public_html), and index the files in each of those
directories.
IndexFilter
Your means for specifying which of the candidate files to index (or
not) is the IndexFilter. The IndexFilter provides you with some
generic options for inclusion and also allows you create a more
complicated Custom Filter File.
There are three generic inclusion rules; of the files which are
candidates for indexing, you may include:
- some non-binary candidate files:
- all those files whose names match the expression *.htm* (and no
others) AND/OR
- all those files whose names match either of the expressions
all non-binary candidate files.
In addition to generic inclusion rules, you may create a Custom Filter
File to specify other inclusion rules or exclusion rules.
Custom Filter File
If the generic IndexFilter inclusion rules aren't good enough for
specifying which files to index how (and which not to index), you may
optionally provide further specifications or restrictions on which
files to index by use of a Custom Filter File. The rules provided in
this file override the generic rules.
Inclusion/exclusion rules have three columns:
- the type of expression being used to match against the candidate
filenames (to be explained below),
- the expression itself, and
- the format in which to index the files which match the expression --
either HTML, TEXT, or nothing (don't index it).
There are two different categories of expressions one can use to match
against filenames: regular expressions and Unix-style globbing
expressions.
Regular expressions are very powerful, giving you access to very
terse, expressive rules for matching filenames. (If you'd like to
learn about regular expressions, there is a Unix man page on
"regexp".) If you'd like to use a regular expression, simply put the
token regexp in the first column of that line.
While regular expressions are very powerful, they're often just enough
rope to hang yourself with. As a nice alternative, Unix-style
globbing expressions may be less powerful, but they're relatively safe
and comfortable. We allow three different types of Unix-style
expressions for matching against filenames:
- dir -- will match only root-level directory names. This is not
likely to be all that useful, as you're requiring that the
expression you give match the pathname starting from its absolute
beginning, but we thought we'd let you have that choice.
- subdir -- will match against any directory in the pathname.
- file -- will match only simple filenames.
These three different types of Unix-style expressions should give you
as much flexibility in specifying files as you'll need.
An example Custom Filter File:
# don't index any ".pl" files in directories called "bin"
regexp \/bin\/.*\.pl$
# index all stuff in /usr/local/www/html/text-files as plain text.
# the next four lines are all equivalent.
dir usr/local/www/html/text-files TEXT
dir usr/local/www/html/text-files/ TEXT
dir /usr/local/www/html/text-files TEXT
dir /usr/local/www/html/text-files/ TEXT
# don't index anything below and directory with "old" in its name.
# again, the next four lines are all equivalent.
subdir *old*
subdir *old*/
subdir /*old*
subdir /*old*/
# override the general '*.htm*' rule. there happen to be some
# files ending with '.html.C' -- index them as text instead of
# HTML.
# these two lines are equivalent.
file *.html.C TEXT
file /*.html.C TEXT
After you've created a Custom Filter File, then simply indicate in
the entry field provided the name of this file you've created.
Summary Mode
Summaries are generated at indexing time, and therefore impact indexing speed.
Fast summaries do not add any significant time to the indexing speed since
the first few lines from the file are used as the document summary. Quality
summaries are calculated using Excite's summarization technology. These are
generally better descriptions than the first two lines of the file, but do
slow down indexing a bit. If you would like to use your own description as
the summary of a document, you can do so by adding the following META tag
to the document:
<META NAME="DESCRIPTION" CONTENT="This is my own summary.">
IndexingContact
This field is optional, and allows the adminstrator to specify an
email address of a user or hostname for a machine which should receive
a notification upon the completion of an indexing process for this
document collection. If you wish to index a collection previously
configured by someone other than yourself, you will probably wish to
change the configuration so that you are notified instead of the
previous administrator. Important: Note that this field has
different uses, depending on your operating system. UNIX users can
provide an email address at which they can be reached, while NT users
should provide the name of a machine that is running the NT Messenger
Service, where they will receive notice when the indexing process
finishes.
Query and Query-Results Page Generation
Once you've configured a document collection, you can then create
query and query-results pages to be used specifically with that
document collection.
The query page is the page a user sees when searching, and,
predictably, the query-results page is used to return the results of a
query to a user. In order to allow for these pages to look different
for different document collections, a new query page and a new
query-results page is "generated" for each document collection.
To generate a collection-specific query page or query-results page,
you have two options:
- You may use a stock template page provided for you, changing certain
modifiable appearance attributes as you wish:
- the Banner Image to display at the top of both the query and
query-results pages, and
- the Brief Description of the contents of the document
collection to appear at the top of the query (but not
query-results) page. OR
- You may supply your own template page for either or both the query
page and query-results page.
Specific information about how to generate query and query-results
pages directly follows.
Banner Image
If you are using either -- or both -- the stock query or query-results
page provided for you, you may still provide your own image to be
displayed at the top of that page. This image is called the Banner
Image. (By default, it is the Excite, Inc. logo as it appears at the top
of the admin pages.)
To display an image of your own design, simply put the filename of
your image in the entry field provided on the generation form.
(Hint: You may also edit the generated .html and .cgi scripts if
wish to change the appearance of those pages. However, take care not
to change the search form itself. If you "break" things, just
re-generate the pages.)
Brief Description
Think of this description as a sub-title to the collection name. If
you use the stock query page provided for you, you may provide a brief
description (size, age, contents) of the document collection to be
displayed there. This description will give the searcher further
guidance as to the contents of your document collection.
It's good user-interface policy to tell users what they are searching
over. If your collection contains information on economic conditions
in Malaysia (and that's not obvious from the collection name), tell
them this, so they don't spend their time doing queries on "Barbie
Dolls" or "hydroponic gardening", which would no doubt provide
dissappointing results.
Backlink
If you generate a page that allows searches to be forwarded to
Excite, you can provide a URL in the field provided so that the
search results page will contain a link back to your site.
Linktext
The linktext field accompanies the backlink field and lets you provide
text (up to 20 characters) for the URL. If you do not provide linktext,
the URL itself will be used as the text for the link.
Query Template
Instead of using the stock query and query-results pages provided for
you, you may create your own, completely customized pages. In the
case of the query page, you create your own query page by supplying a
Query Template file.
A Query Template file is a regular HTML file which contains the
line
###EXCITE###
in the place where you want the query form to be inserted. (Put
nothing else on this line, or it may confuse the script which
automatically generates the new page.)
After creating this file, indicate its location in the field provided,
and that will be enough to override the use of the stock query page
template provided for you.
Query-Results Template
Just as you can create your own, completely customized query page,
you may create your own completely customized query-results page --
the page a user sees when getting back the results of a query on a
particular document collection.
To do so, you must provide a template file (known as a Query-Results
Template file) and indicate in the field provided where this file is
located.
Like the Query Template, the Query-Results Template is a normal
HTML file which contains on a line by itself:
###EXCITE###
, indicating, in this case, where you want the query results to
appear.