Excite, Inc. Excite for Web Servers Help
Preferences and Customization
Introduction
There are a number of different customizations one can make to the
search/retrieval behavior of Excite for Web Servers. These
customizations affect such things as how the query results look,
whether certain features are enabled, and (if enabled) how those
features behave. All customizations affect the query-results and
summarization pages.
Perhaps the most important thing to understand about these
customizations is that they are of two classes:
- those which affect the generation of new query-results pages (and
which will, in the future, be options provided during the
page-generation process), and
- those which apply to all query-results pages, regardless of when
generated.
For a better idea of the possible customizations available (and which
above-mentioned category they fall into), here's a complete breakdown:
- The generation of all new query-results pages. (Previously
generated query-results pages will be unaffected by these changes
unless re-generated.)
- legend -- $show_legend
- whether a legend appears at all
- if one appears, whether it's at the top or bottom of the page
- whether a special graphic which enables the regrouping of query
results by subject will be added to the page --
$subject_group_mode
- The display of query-results pages at query time. (These
changes will affect all query-results pages, regardless of when
generated.)
- the maximum number of documents to return for each (concept-based
or query-by-example) query -- $max_docs_to_return
- automatic summarization
- whether the '(summary)' link or inline summaries will appear
in the results list -- $summary_mode and $inline_summaries
- if enabled, then on the summary-results page...
- the maximum number of sentences in the summary --
$number_of_summary_sentences
- the maximum total length (in characters) of all the sentences
in the summary -- $maximum_summary_length
- for automatic subject grouping (if enabled),
- the maximum number of groups generated --
$number_of_subject_groups
- whether to show more than just the top results documents when
grouping -- $show_additional_docs_in_grouping
- whether the relevance-indicating character or graphic in the
results list will also serve as a link enabling query-by-example
-- $query_by_example_mode
- whether the relevance indicator will be a graphic (1) or simply a
character (0) -- '+' or '-' -- $graphic_relevance_mode
- what else appears in each line of the results list -- the
subroutines customize_result_list_line() and
customize_grouping_line()
The bold-face notes in the outline above indicate perl variables or
subroutines in a particular script which must be changed in order for
you to effect the customization -- that's right, until we have time to
get a forms-based interface on these preferences, you'll have to do a
little hacking to make them work to your liking.
A description of how to do all this customization follows directly.
afeatures.pl
This file is located in the perllib directory (a subdirectory of
the one in which you installed this software), and it is the file
you'll have to modify in order to make your customizations. By
changing the values of certain variables and modifying the return
values of certain subroutines in this file, you can customize
Excite for Web Servers's behavior to suit your needs. Keep reading...
Generation-Time Options
The options listed in this section affect the generation of
query-results pages.
Remember that any query-results pages generated before modifications
to these preferences will not be affected by the modifications. If
you wish for these preferences to affect old query-results pages as
well, you must regenerate those pages.
$show_legend
There are three options for showing the legend:
- at the bottom of the page (default):
$show_legend = 'bottom';
- at the top of the page:
$show_legend = 'top';
- or not at all:
#$show_legend (commented out)
$subject_group_mode
The value of $subject_group_mode
determines whether a special
graphic will appear at the top of the query-results page allowing
one to group the results by subject as well as by confidence.
Options:
$subject_group_mode = 1;
-- add "Grouped by Confidence/Subject"
graphic.
$subject_group_mode = 0;
-- do not add button.
Query-Time Options
The options listed in this section affect the display of all
query-results pages (and summary results), regardless of generation
time.
$graphic_relevance_mode
In addition to numeric scores, Excite for Web Servers uses either a
color-coded graphic or a '+'/'-' character which indicates the
relevance of a particular document to a query (and also serves as the
query-by-example link, if $query_by_example_mode
== 1). The value
of the variable $graphic_relevance_mode
determines whether the
graphic or the character is used:
- $graphic_relevance_mode = 1; -- display red (confident) or black
(not confident) graphic [default]
- $graphic_relevance_mode = 0; -- display '+' (confident) or '-' (not
confident)
$query_by_example_mode
When Query By Example is enabled, the relevance indicator -- either
a black/red graphic or a '+'/'-' character -- is also a
query-by-example link. By clicking this link, one can submit an
entire document as a query -- "give me other documents like this one."
The value of variable $query_by_example_mode
determines whether or
not Query By Example is enabled:
$query_by_example_mode = 1;
-- enabled (default)
$query_by_example_mode = 0;
-- disabled
$inline_summaries
The default value of $inline_summaries
is normally set to 1, 'On',
in which case document summaries will appear directly below the document's
title in the results list instead of a separate '(summary)' link,
regardless of the $summary_mode
variable setting, see below.
Turning this variable on will also add a Summary Mode option to the collection
configuration forms interface to allow the user to specify fast (first two
lines of a document) or a more
slower computed quality summary for the documents in a collection. Refer to
the Summary Mode section in the Using The Forms-Based Administration Tools
documentation for further information.
$summary_mode
The value of the variable $summary_mode
determines
whether or not Automatic Summarization is enabled:
$summary_mode = 1;
-- summary enabled (default)
$summary_mode = 0;
-- summary disabled
If Automatic Summarization is enabled and the variable
$inline_summaries
is set to 0, 'Off', then the text '(summary)'
is displayed to the right of each document title in a results list. By
clicking this link, one can request a short summarization of the
document. If both of the variables $inline_summaries
and
$summary_mode
are set
to 0, 'Off', then neither the '(summary)' link nor inline document
summaries will appear on the results list.
$summary_link_mode
This variable is normally 'on'. It determines whether or not
a link to the original document is available on the summary page.
$number_of_summary_sentences
If Automatic Summarization is enabled, then one can specify the
maximum number of sentences which will be used to create summaries by
setting the value of:
$number_of_summary_sentences
The default is
5 sentences. If Automatic Summarization is disabled, this variable
has no effect.
$maximum_summary_length
If Automatic Summarization is enabled, then one can specify the
maximum number of characters to be used in the creation of summaries.
This limit takes precendence over:
$number_of_summary_sentences
reducing the number of sentences displayed if necessary.
To set a maximum summary length, set the value of:
$maximum_summary_length
If this variable is unset -- that is,
commented out (by preceding it with a '#') --, then no maximum limit
is applied. (This is the default.)
$number_of_subject_groups
If Automatic Subject Grouping is enabled, then one can specify the
maximum number of groups into which a set of query results will be
divided by setting the value of:
$number_of_subject_groups
Default: $number_of_subject_groups = 6;
(Note: Logically, this number should be much less than the number of
documents returned from a query. Setting it higher than the number of
returned documents will produce the same behavior as setting it equal
to the number of returned documents.)
$show_additional_docs_in_grouping
Automatic subject grouping works best when it has a large number of
documents to be putting in groups, so by default, additional documents
from the ones originally displayed in "Grouped by Confidence" mode are
brought in for the groupings. However, some people find this
confusing. With $show_additional_docs_in_grouping, you can control
whether this happens:
$show_additional_docs_in_grouping = 1;
show additional docs
(default), or
$show_additional_docs_in_grouping = 0;
use only original docs
$max_docs_to_return
The value of $max_docs_to_return
determines the upper limit on the
amount of documents that are returned by a query. By default this
variable is set to 20.
$log_searches
Normally, this variable is 'off'. However, if you set it to
a non-zero value, every search done on EWS will be logged to
a file in the install directory called query.log. If you
wish to change the name of the log file, you can do this on
a case by case basis by editing the generated CGI script.
Changing the last argument in the call to:
&ArchitextQuery'directQuery()
will change the file that queries are logged to.
$maximum_query_time
You may wish to limit the amount of time, in seconds, that a search can last.
This is defaulted to 60 seconds normally (which should always
be more than sufficient), but if you want to adjust it, just
change the value of this varaible.
$stem_by_default
This variable will affect how an index is generated. The default for this
variable is set to 1, which causes only the roots of terms to be included
in an index. Thus the keyword "smiles" would be indexed as "smile" and so
on. Performing the query "smile" on a stemmed index would return the
documents containing the term "smiles" as well as those containing "smile".
$index_html_comments
This variable will also affect how an index is generated. The default for
this variable is commented out, effectively turned off. If it is
un-commented, this will cause the text occuring between html comment tags
to be included in an index. Note that this variable does not in any way
affect the summarization of a document. The summarization algorithm may
still use html comments in a document's summary.
Result List Customization
By default, Excite for Web Servers displays each document's score and
title in the results lists for regular queries, providing a link to
the document itself. We think it's a pretty good thing to do -- we
made it the default behavior, right? --, but maybe you'd like
something else. Perhaps you'd prefer to display its first three
lines. Or maybe you'd prefer to have the link be one which invokes a
CGI script to format the document in a special way. Both those
things, and more, are possible. With a little perl programming, you
may customize results lists -- both regular and group-by-subject -- to
your liking.
That perl programming involves changing subroutines which determine
the display of the results. Since there are two different types of
results -- those for regular queries, and those for group-by-subject
queries -- there is a subroutine for each. If "activated", the
appropriate subroutine is called for each document in the list, and it
specifies what should appear on the line for that document.
$customize_result_list
In order to activate the subroutines which produce the results-list
lines, set the variable $customize_result_list
to 1. The
subroutines are described below.
customize_result_list_line
The subroutine customize_result_list_line
is for specifying the
format of lines on regular query-results pages. By default, this
subroutine appears as follows:
sub customize_result_list_line {
local($collection_name, $file, $doc_root, $relevance_qbe, $score,
$title, $summary, $original_line) = @_;
return "$original_line";
}
The variables $collection_name
, $file
, $doc_root
,
$relevance_qbe
, $score
, $title
, $summary
, and
$original_line
are the actual arguments to this subroutine. Here
is a brief description of each:
$collection_name
is the name of the collection being searched.
This is useful if you wish to customize the results of a query
differently for different collections.
$file
is the full pathname of the document.
$doc_root
is the pathname of the Web server's document root.
This is useful for converting the value of the $file
variable to
a URL by removing the $doc_root
prefix.
$relevance_qbe
is the filename of the appropriate
confidence-of-relevance indicator for that document, which also
serves as the query-by-example button if the
$query_by_example_mode
is enabled.
$score
is the document's floating-point relevancy score (maximum,
10.0). [Note that this argument is not passed in the the
customize_grouping_line function.]
$title
is a slight misnomer. There are two cases:
- If the document is HTML and it has a title, then
$title
is
that title.
- If the document isn't HTML, or it's HTML without a title, then
$title
is the document's filename.
- If you are not using inline summaries (ie : $inline summaries = 0) then
$summary
is the text for the summary link according to the
following rule (Default: if
summarization is enabled, '(summary)'
; if disabled,
'(empty)'
.) otherwise if you are using inline summaries
(ie : $inline_summaries = 1) then $summary
is the actual
summary text for that particular returned document.
$original_line
is the text for the default result line.
The subroutine -- really a function -- simply returns the string to be
used as the result line: in this case just the $original_line
(that
is, the default output).
You can use the information provided to you to create whatever string
you like, then return that value for the result.
customize_grouping_line
The subroutine customize_grouping_line
is for specifying the format
of lines on group-by-subject results pages. The default format of
the lines displayed on group-by-subject results pages is different
from that of the lines on regular results pages. (In particular,
relevance scores are not displayed.) For this reason, and because
it's nice to have the added flexibility which doing so provides, we
have a different subroutine for cutomizing subject-grouped lines than
for regular query-results lines.
By default, this subroutine is defined as follows:
sub customize_grouping_line {
local($collection_name, $file, $doc_root, $relevance_qbe,
$title, $summary, $original_line) = @_;
return "$original_line";
}
All of these arguments have the same values and meanings as those
described above in the customize_result_list_line function. The
only difference between the two routines is the missing $score
argument, not needed (since scores are not displayed in the subject
grouping output).
Command Line Applications
Documentation for accessing the functionality offered by the forms
from the command line is also available.