Google

Isearch Quick Start Guide

or, Installing and Using Isearch for the Hopelessly Impatient
with Slightly More Thorough Instructions for Those who Care.

Erik Scott, Scott Technologies, Inc.

Isearch is a text search system developed by CNIDR. CNIDR is the Center for Networked Information Discovery and Retrieval. They are part of a non-profit corporation known as MCNC. MCNC used to stand for "Microelectronics Center of North Carolina", but they dropped the long form and now they're legally just MCNC. Anyway, the lawyers thought you'd like to know that.

Isearch pretty much does one thing: it lets the user search through a bunch of text by hunting for words. It doesn't "understand" your text collection, it doesn't try to parse it, it just uses slightly-less-than-brute-force statistical methods to hunt for words in documents. It does this by building indexes. An Isearch index is a just like the index in a book: if you look for a word in the index of a book, it tells you a page number to look at. Likewise, if you search for a word with Isearch, it looks in the index for a filename associated with that word and then shows that file to you. It can do more complicated things, but that's the basic game plan.

Installing Isearch requires five steps:

  1. Get Isearch. If you're reading this, there's a good chance you've already finished this step and the next one.
  2. Uncompress and un-tar the distribution. Again, unless someone printed this and put it on your desk, you're through with this step, too.
  3. Edit the Makefile. This is where you tell Isearch what kind of machine you have. If you have a fairly mainstream Unix box, there won't be any problem. If you have a strange machine, then you've probably had a lot of practice porting software by now anyway.
  4. Compile. If you did Step 3 correctly, then this is a no-brainer. On the other hand, this is rarely a no-brainer in the real world.
  5. Index and search some sample text so you know you did things right.

Really, installation usually isn't tricky at all. Start to finish should take around thirty minutes the first time, and about ten when you download new versions.

STEP 0: GET GCC and GZIP

Okay, we lied: There's an extra step. You really should have gcc and g++ on your machine, and you must have gzip (and its companion gunzip) so you can unpack the archive that Isearch comes in. You can get gzip and gcc via anonymous ftp from prep.ai.mit.edu in the directory /pub/gnu. Gzip installation is fairly simple. Unpack the tar file and follow the instructions in the file "README". Installing gcc and g++ is a lot more complicated. In principle, you could use a vendor-supplied C++ compiler. The Solaris SparcWorks C++ compiler is known to work, and the CenterLine C++ compiler has also compiled Isearch successfully. The bad news is that there are several subtly different versions of the C++ specification and each compiler views it a little differently. Getting Isearch to work with a different compiler is about like porting Isearch to another machine. Isearch is also fairly dependent on compiler versions, especially with gcc. Gcc users should make sure they are using at least 2.7.X. Note that the old, pre-compiled gcc for Solaris is not meant to be a general-purpose compiler: it's just enough of a compiler to compile a newer version of gcc.

Incidentally: You're also going to have to install libg++ if you haven't already. SAme ftp site, same directory. Install it after you install g++.

STEP 1: GET ISEARCH

You should always work from the latest version of Isearch available. Bugs are being corrected and new features are being added daily (literally). You can always get the newest version from ftp.cnidr.org. Here's a sample session where we will download today's version of Isearch from ftp.cnidr.org by logging in as "anonymous", sending our email address as our password, changing the current directory to "/pub/NIDR.tools/Isearch", making sure we're in binary mode for ftp, and getting the file:


sti-gw% ftp ftp.cnidr.org

Connected to kudzu.cnidr.org.

220 kudzu FTP server (Version wu-2.4(1) Sun Jan 1 17:43:49 EST 1995) ready.

Name (ftp.cnidr.org:escott): anonymous

331 Guest login ok, send your complete e-mail address as password.

Password:

230- Welcome to kudzu.cnidr.org

230- 

230- ftp logged in from sti-gw.sti-ext.com at Mon Mar 25 11:14:59 1996

230- 

230- There are currently 7 users of the maximum 25 logged on.

230- 

230- If you have trouble using this experimental FTP client, try including

230- a dash in front of your username (as your login password).  This will

230- disable informational messages such as these.

230- 

230- Comments???  Send mail to `bmv@k12.cnidr.org`

230-

230-

230 Guest login ok, access restrictions apply.

ftp> cd /pub/NIDR.tools/Isearch   

250-Please read the file README

250-  it was last modified on Fri Mar 22 15:41:06 1996 - 3 days ago

250 CWD command successful.

ftp> dir

200 PORT command successful.

150 Opening ASCII mode data connection for /bin/ls.

total 4995

-rw-rw-r--   1 138      30     1026816 Mar 22 20:40 Isearch-1.13-aix.tar.gz

-rw-rw-r--   1 138      30      480568 Mar 22 20:40 Isearch-1.13-linux.tar.gz

-rw-rw-r--   1 138      30     1176479 Mar 22 20:40 Isearch-1.13-osf1.tar.gz

-rw-rw-r--   1 138      30      667405 Mar 22 20:40 Isearch-1.13-solaris.tar.gz

-rw-rw-r--   1 138      30      741496 Mar 22 20:40 Isearch-1.13-sunos.tar.gz

-rw-rw-r--   1 138      30      808411 Mar 22 20:41 Isearch-1.13-ultrix.tar.gz

-rw-rw-r--   1 138      30      125536 Mar 22 20:41 Isearch-1.13.tar.gz

-rw-rw-r--   1 138      30         970 Mar 22 20:41 README

drwxrwxr-x  11 138      30         512 Mar 22 20:16 archive

drwxrwxr-x   2 138      30        1024 Mar 22 20:43 untested

226 Transfer complete.

772 bytes received in 0.031 seconds (24 Kbytes/s)

ftp> binary

200 Type set to I.

ftp> get Isearch-1.13.tar.gz

200 PORT command successful.

150 Opening BINARY mode data connection for Isearch-1.13.tar.gz (125536 bytes).

226 Transfer complete.

local: Isearch-1.13.tar.gz remote: Isearch-1.13.tar.gz

125536 bytes received in 0.12 seconds (9.9e+02 Kbytes/s)

ftp> quit

sti-gw% 

Notice a few things:

  1. There are precompiled binary kits for IBM RS6000, Linux on an Intel box, DEC OSF/1 (now called Digital Unix, soon to be called "The Operating System Formerly Known as 'Prince'") for Alpha (now know as Alpha AXP), Solaris 2.X for Sparc, SunOS 4.1.X for Sparc, and Ultrix for DEC PMAX architectures (DECstation 3100s, 5000s, and their kin). If you have one of these machines and you really can't get Isearch to compile in step 4, consider using one of these binary kits. If you don't have one of the above architectures then you're going to have to build from source code anyway.
  2. There is a directory named "untested". The files in that directory are, well, untested. If you need an absolutely newest and greatest version, then look in here.

STEP 2: UNCOMPRESS AND UN-TAR THE DISTRIBUTION

You should now have a copy of Isearch as a gzipped tar tar. The ".gz" suffix indicates a gzipped file, and the ".tar" suffix indicates it's tarred. The first thing to do is to uncompress the file:

sti-gw% gunzip Isearch-1.13.tar.gz

You now have a (much larger) file called "Isearch-1.13.tar".

The next step is to un-tar the file we just uncompressed. In our example, we're going to want to install Isearch so it has the path name "/local/project/Isearch-1.13". To do this, copy or move the file to "/local/project":

sti-gw% mv Isearch-1.13 /local/project

Finally, we'll un-tar the distribution:

sti-gw$ cd /local/project

sti-gw% tar xf Isearch-1.13.tar

There should now be a directory named /local/project/Isearch-1.13, and it should contain the Isearch distribution:


sti-gw% ls /local/project/Isearch-1.13

CHANGES    Makefile   TUTORIAL   doctype

COPYRIGHT  README     bin        src

If you see all of that, then you probably got it right.

STEP 3: EDIT THE MAKEFILE

This is the part that makes people nervous, but it really isn't that bad at all. You need to edit the Makefile with your favorite editor, and essentially just follow the instructions. Edit /local/project/Isearch-1.13/Makefile in this example (you won't have to edit the makefiles in the subdirectories, they automatically inherit what they need).

The first choice is for compiler. Probably 99% of the world should leave the default "g++". Isearch is developed by people who use g++, so that's your best bet. It's also hard to beat the price.

The second choice is for "CFLAGS", which are the options to pass to the compiler. There are canned selections for you to choose from. If you don't know what to select, then "CFLAGS=-g -DUNIX -Wall" is a good starting guess.

The third choice is for the location to install the finished programs. The default "/usr/local/bin" is a good guess. Note that if you don't elect to actually "make install" in step 4 then you'll never have to set this.

The rest of the choices probably should be left alone.

While on the subject of editing files, it's worth noting that the file /local/project/Isearch-1.13/doctype/dtconf.inf describe how many "doctypes" your copy of Isearch will be aware of. A doctype is essentially a file type handler; there are doctypes for simple ASCII files, files of separate paragraphs, and so forth. You can almost always just leave this file alone and take the default, which is "Give me all of 'em!".

STEP 4: COMPILE

At this point, you should be able to


sti-gw% cd /local/project/Isearch-1.13

sti-gw% make

And all the right things will happen. Note that many compilers will print a few warning lines along the way, but they're warning about pretty harmless stuff. If there are any errors, then the compilation will grind to a halt. Otherwise, when you're done there will be new files in the "bin" subdirectory:


sti-gw% ls bin

Iindex        Isearch       Iutil         libIsearch.a

If you got that far, then you can (optionally) :


sti-gw% make install

and the compiled executables will be copied to /usr/local/bin.

STEP 5: INDEX SOME TEXT AND SEE HOW IT WORKS

Now let's index a little text and see if Isearch really works. First, let's pick a couple of files to index. In this example, we'll index the files "CHANGES" and "COPYRIGHT" since we know everyone has them.


sti-gw% cd /local/project/Isearch-1.13

sti-gw% mkdir testIndex

sti-gw% cd testIndex

sti-gw% /local/project/Isearch-1.13/bin/Iindex -d tester ../CHANGES ../COPYRIGHT

Iindex 1.13

Building document list ...

Building database tester:

   Parsing files ...

   Parsing /local/project/Isearch-1.13/CHANGES ...

   Parsing /local/project/Isearch-1.13/COPYRIGHT ...

   Indexing 1870 words ...

   Merging index ...

Database files saved to disk.

sti-gw% ls

tester.dbi  tester.inx  tester.mdg  tester.mdk  tester.mdt

That created five index files to describe the two files we indexed. Don't worry, we could have indexed every file on your system and still only had five index files. Now let's do a little searching and see how well we did:


sti-gw% /local/project/Isearch-1.13/bin/Isearch -d tester RSET table

../bin/Isearch -d tester RSET table

Isearch 1.13

Searching database tester:

1 document(s) matched your query, 1 document(s) displayed.

      Score   File

   1.   100   /local/project/Isearch-1.13/CHANGES

Select file #: 

At this point, if you enter "1" and press return, it will display the contents of "CHANGES". If you press return without a number, Isearch will exit. This is sort of a trivial example, but if you index thousands of files then the above search will list the ones that contain either "RSET" or "table". If you run Isearch without any arguments then it will give a list of all of its options, what they mean, and some examples of their use. For more information, see the Isearch Tutorial included with the distribution.