Most of the functionality offered from the browser is also accessible using command-line versions of the index and query tools.
When you create a collection file, it is given the name
databasename.conf. (When you pass in a database name as an
argument to either the index or query scripts, leave off the .conf
suffix.) The .conf file contains a list of attribute-value pairs.
Each attribute must have a value, with the exception of the
ExclusionRules
attribute, which can be left blank, but which must
appear as the last attribute line in the file.
The scripts that read these files do so in a case-sensitive manner, so be sure your .conf matches the case usage of the default file Architext.conf if you are creating it by hand.
Below is a sample .conf file:
<Collection foobar> IndexExecutable /usr/bin/Architext/architextIndex SearchExecutable /usr/bin/Architext/architextSearch StemTable /usr/bin/Architext/stem.tbl StopTable /usr/bin/Architext/stop.tbl CollectionContents /usr/local/www/html CollectionIndex /usr/bin/Architext/foobar CollectionInfo /usr/bin/Architext/foobar.cf ExclusionRules /home/u/johndoe/exclude </Collection>Note that the
Collection
and /Collection
tags at the start and
end of the file are also required.
From the browser, only a few of the attributes of a document collection are visible. If you access the collection configuration files directly, you can see the full range of configurable attributes for a particular document collection. Below are definitions of all of the attributes stored in collection configuration files.
Below is the list of attributes accessible from the forms interface. They are defined in the admin tools in AT-helpdoc.html documentation.
aindex.pl
databasename
. The information about the database you wish to index
should be stored in a file called databasename.conf. The format of
the .conf file is described above in the Collection File Format
section.
architextIndex [flags] [index rootname] [file/dir names to index...]
It is also possible to direct a list of files and directories to index through stdin, with files separated by newlines. Example usages of architextIndex are given below.
-stem "stemmer name"
-stop "stop name"
-C "config filename"
ARCHITEXT_CONFIG
for the name of a
configuration file. If it is undefined, the program will look in the
local directory for the file .architextinit. Finally, if no config
file is found, then the program will still function as expected,
assuming that all the necessary arguments appear on the command line.
The intended use of the config file is to define arguments that will
remain static across all your indexing and querying operations. The
-stop
and -stem
arguments definitely fall into this category,
and the -R
argument could arguably be placed in a config file as
well.
Any operators repeated on the command line will override those that appear in a config file.
-R "rootname"
This flag is optional. If none is given, the indexing or searching executable assumes that the first non-flag argument that appears on the command line is meant to be the rootname. Thus, the following commands are semantically identical, and all cause each file under the directory /data/literature to be indexed, with the index files given the rootname of foo:
echo "/data/literature" | architextIndex foo
echo "/data/literature" | architextIndex -R foo
architextIndex -R foo /data/literature
architextIndex foo /data/literature
Additionally, suppose you wanted to index three files and a directory: eenie.html, meenie.html, miney.html, and htmlDir/. The following commands are all equivalent (assuming that the file flist contains the four files listed above, one file per line):
architextIndex foo eenie.html meenie.html miney.html htmlDir
architextIndex -R foo eenie.html meenie.html miney.html htmlDir
cat flist | architextIndex foo
cat | architextIndex -R
aquery.pl zymurgy "top-fermented ales"
architextSearch -R "rootnames for index files" -stem "stemmername" -stop "stoppername" -q "query string" -C "config filename"
-R "rootname"
-stem "stemmer name"
-stop "stop name"
-C "config filename"
ARCHITEXT_CONFIG
for the name of a
configuration file. If it is undefined, the program will look in the
local directory for the file .architextinit. Finally, if no config
file is found, then the program will still function as expected,
assuming that all the necessary arguments appear on the command line.
The intended use of the config file is to define arguments that will
remain static across all your indexing and querying operations. The
-stop
and -stem
arguments definitely fall into this category,
and the -R
argument could arguably be placed in a config file as
well.
Any operators repeated on the command line will override those that appear in a config file.
-q "query string"
DocNum Relevance Score RelNum Filename <title>...</title> ------ --------------- ------- ------------------------- ------------------ 6 0.00541587 1 /www/burnt/html/index.html Architext Software 14 0.00180529 1 /www/html/demo.html Online Demo 12 0.000848499 2 /www/html/company.html About ArchitextThere is a second possible output format, which will only occur when the query uses the gather operator (explained below), which is used to do Automatic Subject Grouping on the results of a query. Here is sample output:
==================== Summary words: Architext demo perl query magazine 8 /home/www/burnt/html/old-index.html Architext SGI Page 25 /home/www/html/ems/index.html IDG Demo 31 /home/www/html/infoworld/index.html IDG Demo 32 /home/www/html/infoworld-new/index.html IDG Demo 71 /home/www/html/docs.html Architext Documentation All: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 23 24 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 47 48 49 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 ==================== Summary words: manager ben cardiff compton's concurrency 28 /home/www/html/ryan.html Architext Tasks 20 /home/www/html/tasks.html Architext Tasks All: 20 28 ==================== Summary words: yet everything faq seem ok 22 /home/www/html/faq/index.html Architext Public Demo All: 22In the output above, there are three subject groups, each separated by
===================
. First the summary words are presented. These
are the words that are most central to that particular cluster of
documents. Next, the top articles from that group are presented in a
three column format. There are two leading spaces (' '), then the
Document Number
, another space (' '), the Filename
, a tab
character ('\t'), and then the Title
. Finally, on the line labeled
All:
, the numbers of all the relevant document in that group are
output. Oftentimes, the group contains more articles than the
parameters of the gather are defaulted to display.
There are two types of documents that you would want to create scripts for. One is for the results of a query and one is for the results of an ASG (Automatic Subject Grouping). The motivation for this extended html syntax is to allow you to tailor the query result and subject grouping result pages to your needs. The four tags available in the extended syntax are the following:
<ARCHITEXT-RESULT DB="database name" UNPREFIX="path prefix to remove" AHOME="location of Architext html directory (relative to html root)"> <ARCHITEXT-MAKE-GATHER DB="database name" UNPREFIX="path prefix to remove" AHOME="location of Architext html directory"> <ARCHITEXT-GATHER DB="database name" UNPREFIX="path prefix to remove"> AHOME="location of Architext html directory"> <ARCHITEXT-LEGEND DB="db name" AHOME="location of Architext html directory">Placing these tags within your html documents and then using architextify-html to parse them will create .cgi scripts that will perform queries and subject grouping and will parse the results and present them in html format for use with your Web browser.
The ARCHITEXT-RESULT
and ARCHITEXT-MAKE-GATHER
flags will appear
in the document that you wish to have become your query script, while
the ARCHITEXT-GATHER
tag will reside in the file that you wish to
become the ASG script. The ARCHITEXT-LEGEND
tag can appear in
either the query script or the ASG script and will display a legend at
the bottom of a results page explaing the terms and icons used. The
legend appears by default when the scripts are generated using the
forms interface.
search
field in the form. This page is just standard html and
there is no need to use architextify-html to parse this form. The
query.cgi script, however, is generated by the user with the help
of architextify-html. A Sample Query Script appears later in
this document.
<html><head><title>Architext Querying</title></head> <body><h1> <img src="/Architext/Block-logo.gif"> </h1> <p><b>Database description:</b> stuff and poop <p> Enter a natural-language query in the form below. <HR> <FORM ACTION="/cgi-bin/query.cgi" METHOD="POST"> <INPUT TYPE="submit" VALUE="Search For Concept"> <TEXTAREA NAME="search" COLS=80 ROWS=4></TEXTAREA><P> </FORM>
ARCHITEXT-RESULT
and ARCHITEXT-MAKE-GATHER
tags.
<html> <head> <title> This is a query-result document. </title> </head><body> <hr> <h1> A Query </h1> Here are the results of your query. <UL><ARCHITEXT-RESULT DB="mydatabase" UNPREFIX="/usr/local/www/html/"></UL> <FORM ACTION="gather.cgi" METHOD="POST"> <INPUT TYPE="submit" NAME="submit" VALUE="submit"> <ARCHITEXT-MAKE-GATHER DB="mydatabase" UNPREFIX="/usr/local/www/html/"> </FORM> </body> </html>Incidentally, the
ARCHITEXT-MAKE-GATHER
tag and the form it appears
in are optional in this page, and are only required if the user wishes
to perform ASG (Automatic Subject Grouping) on the results of queries.
After this html page has been architextify-html-ed, it will become
a perl script that obeys the conventions of .cgi scripts. Note
that the gather.cgi script that is invoked by the form above is
another script that must be generated by the user. The format for a
Sample Gather Script is described below. Once installed in the
cgi-bin directory, queries can be invoked using a standard html
form. A Sample Query Page is described above.
<html> <head> <title> This is an Automatic Subject Grouping result document. </title> </head><body> <hr> <h1> A Gather </h1> Here are the results of gathering the results of your query. <UL><ARCHITEXT-GATHER DB="mydatabase" UNPREFIX="/usr/local/www/html"></UL> </body> </html>
$root
variable in all the perl scripts
shipped with the distribution.
$root
variable in any perl scripts you specify with a new one. The
syntax for this is as follows:
scripts-fix-root <new root dir> <files to update...>There is also a script scripts-fix-perl available that will update the
#!
notation at the head of each perl script specified on the
command line should the location of your perl interpreter change.
Both of these scripts are used at install time, and you won't likely
need to use them again unless your install home or your perl
interpreter change location.
install-admin <Architext config directory> <cgi-bin directory>