Writing External Programs

General Purpose

External programs are the main method that MetaLib’s gateway uses to interact with targets/resources within the HTTP world.

The goal of the external programs is to simulate a Web browser, which may be done using open-source APIs/modules.

The most common way is known as “screen scraping,” in which an external program retrieves an HTML page and parses the relevant data with regular expressions.
It is recommended that you use Perl as the programming language to implement external programs.

Interface Type

Adapter

Implementation Steps

MetaLib’s external programs are divided into three main steps:

1. FIND

2. PRESENT

3. PRESENT-SINGLE (FULL).

Each step is presented by a separate program, which is named according to the following convention:
<target_name>_<step_extension>
(for example: google_scholar_find, google_scholar_present and google_scholar_present_single)
The external programs reside in the vir_ext directory:

cd $metalib_dev/dat01/vir_ext

After writing the programs, you must define them as the appropriate modules for a KnowledgeBase configuration.

Important: All three steps/programs must include a mechanism that will ensure the termination of the program. This means that if there are errors or a predefined timeout is reached, the program will end.

FIND

Find Program – Standard Input
The input for this step may include the following fields:
1. Configuration-name:  BASE=GOOGLE_SCHOLAR
2. Find request:            FIND-REQUEST=WTI = (perl LWP)
3. User-Id (internal):     METALIB-USERNAME=PATRON1
4. Authentication (user-id/passwd):  SEARCH-AUTHENTICATION=u_name/password
5. Search address:       SEARCH-ADDRESS=http://scholar.google.com
6. Database code: SEARCH-DATABASE=unicorn

#reading the params

   ($error_code, $find_request, $host_name, $cgi_name,

   $proxy_type, $proxy_address, $proxy_credentials)=&readParam;

Find Program – Standard Output
The following fields should be written to the standard output:

1. Error status: ERROR-CODE=0200 or ERROR=text
2. Number Of hits found: SET-RESULT=68
3. HTTP request: FIND-REQUEST=http:// …; May be used by the present program to jump to the native interface.
4. Set number in the native’s back-end:          SET-NUMBER=1
5. Set entry in the native’s back-end:              SET-ENTRY=1
Fields 3-5 will be sent as the input to the present program.

#################################################

sub displayResults {

    my ($l_result,$l_url_query, $proxy_type, $proxy_address, $proxy_credentials) = @_;

    print "SET-RESULT=$l_result\n";

    print "FIND-REQUEST=$l_url_query\n";

    print "SET-ENTRY=1\n";

}

Find Program Structure

The FIND program can be divided into three main sub-parts: authentication, query normalization, and pattern matching.

Authentication: Used to get a ticket via a cookie or any other container.

The authentication phase is usually performed in the FIND program, when a communication is first initiated. The following are a few known authentication methods: basic, secure, SOAP, or IP.
The output of such a process will usually be a session-id or a cookie. The same session-id should be used between the different programs, which means that in addition to other fields, the session-id field should be passed.
The following is another view of a cookie, this time as part of the HTML header:

Query normalization: Transforms MetaLib’s end-user query into the target search syntax terminology. For example:

WRD = (black bird)

becomes:  

http://www.scirus.com/srsapp/search?dt=all&ff=all&ds=sd&sa=all&g=a&q=black+bird&cn=all&t=phrase

Pattern matching: This part should track down the text describing the number of hits found in the HTML page and transfer this text to MetaLib’s user interface.


#################################################

sub parseResult {

    my ($in_response) = @_;

    my $counter = 0;

    my $hit = 'of (\d+) total';

    if ($in_response =~ /$hit/){

        $counter = $1;

    }

    if ($counter == 0){

        return 0;

    }

    return ($counter);

}

PRESENT

Present Program – Standard I/O
The PRESENT program uses parameters 3-5 that were sent by the find program and an additional parameter, MAX-RECORD, that is defined by the configuration.
The program’s output is the records themselves, in MetaLib’s standard format. Each payload will be started with the tag RECORD-FORMAT=”PLAIN” and will be closed with the tag END-OF-DATA.
Each record will be started with the tag RECORD and closed with the tag END-RECORD.

Present Program – Content
Two major actions take place in the PRESENT program:
1. Retrieving the next X records from the native target by sending a formatted http-request.
2. Converting the http-response from HTML to MetaLib’s format.

###############################################################

sub parseResults {

    my ($response) = @_;

    my ($date, $year, $full_year, $title);

    my (@present);

    my $prefix = 'http://prola.asp.org';

    $response =~ s/\n//g;

    $response =~ s/\r//g;

    $response =~ s/\t//g;

    my $counter = 0;

    while ($response =~ /<li.*?><span.*?>\s*?\d+\.(.*?)<\/li>/){

        $response = $';

        my $result_line = $1;

        my $docx=[];

        #if ($result_line =~ /href="(.*?)"/){

        #    push (@$docx, "856 \$\$u $prefix$1");

        #    $result_line = $';

        #}

        if ($result_line =~ /"aps-boldfont">(.*?)<\/span>/){

            $title= $1;

            $result_line = $';

            $title =~ s/<.*?>//g;

            push (@$docx, "245 \$\$a $title");

        }

        if ($result_line =~ /abstract<\/span>:(.*?)<\/span><br \/>/i){

            $title= $1;

            $result_line = $';

            $title =~ s/<.*?>//g;

            push (@$docx, "520 \$\$a $title");

        }

        if ($result_line =~ /"aps-mediumfont">(.*?)</){

            push (@$docx, "100 \$\$a $1");

            $result_line = $';

        }

        if ($result_line =~ /"aps-mediumfont">(.*?)</){

            push (@$docx, "773 \$\$t $1");

            $result_line = $';

        }

        if ($result_line =~ /"aps-boldfont">(.*?)</){

            push (@$docx, "773 \$\$u $1");

            $result_line = $';

        }

        if ($result_line =~ /span>, (\d+?)-?(\d+?) \((\d+)\)/){

            if (!$1){

                push (@$docx, "773 \$\$v $2");

            }

            else{

                push (@$docx, "773 \$\$v $1");

                push (@$docx, "773 \$\$w $2");

            }

            push (@$docx, "YR \$\$a $3");

        }

           

        if (scalar @$docx) {

            push(@present,$docx);

        }

    }

    return (@present);

}

To reduce processing time, it is necessary to parse only the fields presented in the Brief Display
(title, author, issn, year and citation).

PRESENT SINGLE

The PRESENT-SINGLE program activates the link attached to each record under the field MORE. In other words, the PRESENT program simulates a click of the Full View button.

Important: Most sites hold enough data in the Brief view, so the PRESENT-SINGLE program is usually not used. Note that using the PRESENT-SINGLE program will affect performance.

The program should parse all fields that are relevant to MetaLib’s standard format. The incoming values will override the values that were fetched in the previous stage.

Maintenance

The external programs are rapidly changing due to changes in the target APIs and presentation layers. A few programming principles may reduce the overall maintenance time, and will help to accomplish a high level of data integrity.
For example:
1. Keep a uniform structure and style between the various programs.
2. Construct a robust debugging mechanism, that will monitor essential messages and diagnostics.
3. When scraping HTML tags or any other form of metadata, try to minimize the use of style tags or display elements within your regular expression. Emphasize on anchors such as >, <, href=, or input type=checkbox and not style=’text-align: center’ or <span CLASS=.
4. Do not write long regular expression. Try to phrase as short and unique expressions as possible.

Error Messages

Each external program may send an error code or message back to MetaLib. The variety of codes are derived from the table www_heading, under the TAB directory in the VIR00 library. The range of numbers that is defined for external use is between 2500 to 3000.
The syntax should be ERROR-CODE=0211.

Running the Programs

It is possible to run the programs from a UNIX command line. This can help stabilize the code before it is connected to a real configuration. Before running the program from a UNIX command line, you need to prepare a command file to hold the input parameters of the findprogram.

>>cat imdb_find_command

SEARCH-ADDRESS=http://www.imdb.com/find?

FIND-REQUEST=WRD=(batman)

You can find the imdb_find program in the examples section.

To run the find program from the command line, pipe the command file’s content into the program.

>>cat imdb_find_command | ./imdb_find

You can also pipe the output of the find program into the present program.

Adapter Examples

The examples should be used as reference to writing your own programs, If you want to run a program use the imdb_find program as the other programs require authentication.

prola_find

#!/exlibris/metalib/m4_a/product/bin/perl
#####################################################################
#################-----PROLA FIND-----################################
#####################################################################
sub BEGIN {
    unshift (@INC, $ENV{'aleph_ext'});
    unshift (@INC, "$ENV{'aleph_product'}"."/perl/lib/site_perl/5.005");
    unshift (@INC, "$ENV{'aleph_product'}"."/perl/lib/site_perl/5.005/i686-linux");
}
          
use strict;
require "call_httpd_LWP.ML";
use URI::Escape;
my $debug;
my $error_code;
##################the program begins here#######################
{
    #init
    my ($host_name, $method, $find_request, $set_entry, $command);
    my ($response, $result, $cookie_jar, $view, $url_query);
    my ($search_key, $search_time, $proxy_type, $proxy_address,
    $proxy_credentials);
    my ($cgi_name, $cgi_param);
  
    $debug = substr(uc($ENV{'VIR_EXT_DEBUG'}),0,1);
    if ($debug ne "Y"){
        $debug ="";
    }
    #reading the params
    ($error_code, $find_request, $host_name, $cgi_name,
    $proxy_type, $proxy_address, $proxy_credentials)=&readParam;
  
    if ($debug) {
        print STDERR "\n\nfind_request  = $find_request\n";
        print STDERR "error_code        = $error_code\n";
        print STDERR "host_name         = $host_name\n";
        print STDERR "proxy_type        = $proxy_type\n";
        print STDERR "proxy_address     = $proxy_address\n";
        print STDERR "proxy_credentials = $proxy_credentials\n";
    }             
  
    if ($error_code) {
        print STDOUT "$error_code\n";
        exit(0);
    }
  
    #authenticating the data
    ($search_key, $search_time) =
    &authen($host_name, $cgi_name, $proxy_type, $proxy_address,
    $proxy_credentials);
  
    #constructing the query
    ($error_code, $command) =
    &constructQuery($find_request, $search_key, $search_time);
    #error check
    if ($error_code) {
        print STDOUT "$error_code";
        exit(0);
    }
  
    #constructing the url
    $url_query = $host_name.$cgi_name.$command;
    #error check
    if ($debug) {
        print STDERR "\n\ncookie_jar = $cookie_jar \n"
        ."\nurl_query = $url_query\n";
    }
  
    #getting the http page      
    $method = "POST";
  
    ($response) =
    &call_httpd($method, $url_query, $cookie_jar,
    $proxy_type, $proxy_address, $proxy_credentials);
  
    #debug option
    if (0){
        print "$response\n";
    }
  
    #finding the number of results
    $result= &parseResult($response);
    #displaying the results
    &displayResults($result, $url_query, $proxy_type, $proxy_address, $proxy_credentials);
}

#########The Subroutines#########################
#################################################
sub readParam {
    my ($find, $host, $line, $base, $host_name, $proxy_type,
    $proxy_address, $proxy_credentials);
    my ($cgi_name, $cgi_param);
    while ($line = <STDIN>) {
        if ($line =~ /^FIND-REQUEST=/){
            ($find) = $line =~ /FIND-REQUEST=\s?(.*)/;
        }
        elsif ($line =~ /^SEARCH-ADDRESS=/){
            ($host) = $line =~ /SEARCH-ADDRESS=\s?(.*)/;
        }
        elsif ($line =~ /^BASE=/){
            ($base) = $line =~ /BASE=\s?(.*)/;  
        }
        elsif ($line =~ /^PROXY-TYPE=/){
            ($proxy_type) = $line =~ /PROXY-TYPE=\s?(.*)/;
        }
        elsif ($line =~ /^PROXY-ADDRESS=/){
            ($proxy_address) = $line =~ /PROXY-ADDRESS=\s?(.*)/;
        }
        elsif ($line =~ /^PROXY-CREDENTIALS=/){
            ($proxy_credentials) = $line =~ /PROXY-CREDENTIALS=\s?(.*)/;
        }
    }
      
    if (length $host == 0){
        print STDOUT "Error : " .
                    "SEARCH-ADDRESS parameter is empty\n";
        return ("ERROR-CODE=0211","","");
    }
    if ($host =~ m{(http://.*?/)(.*?\?)}i){
        $host_name = $1;
        $cgi_name = $2;
    }
  
    return ("", $find, $host_name, $cgi_name, $proxy_type,
    $proxy_address, $proxy_credentials);
}
#################################################
sub authen{
    my ($host_name, $cgi_name, $proxy_type,
        $proxy_address, $proxy_credentials) = @_;
    my ($response, $cookie, $cookie_jar, $search_key,$search_time);
    my $in_url = $host_name.$cgi_name;
    my $in_method = "POST";
    #getting the response page
    ($response)=&call_httpd($in_method,$in_url,
    $cookie_jar, $proxy_type, $proxy_address, $proxy_credentials);
    $response =~ s/\n//g;
    if ($response =~ /"searchKey"\s*?value="(.*?)"/){
        $search_key = $1;
        $search_key = uri_escape($search_key);
    }
    if ($response =~ /"searchTime" value="(.*?)"/){
        $search_time = $1;
    }
    return ($search_key, $search_time);
}
#################################################
sub constructQuery {
    my ($find_request, $search_key, $search_time) = @_;
    my ($cgi_parameters, $code1, $code2, $op, $value1, $value2,
        $year, $year2);
    my $h = {
        'WRD' => 'all',
        'WTI' => 'title',
        'WSU' => 'abstitle',
        'WAU' => 'author',
        'WYR' => 'year',
     };
  $cgi_parameters  = "journal=PRA&journal=PRB&journal=PRC&journal=PRD&journal=PRE&journal=PRL&journal=PR&journal=PRI&journal=RMP&journal=PRSTAB&journal=PRSTPER&";
  $cgi_parameters .= "year=YEAR&field_1=CODE1&unary_1=EXACT&query_1=TERM1&boolean_1=BOOL1&field_2=CODE2&unary_2=EXACT&query_2=TERM2&searchKey=$search_key&searchTime=$search_time";
  
    if ($find_request =~ /\s*(\w+)\s*=\s*\(\s*(.*?)\s*\)\s*(AND|OR|NOT)\s*(\w+)\s*=\s*\(\s*(.*?)\s*\)/) {
        $code1 = $h->{$1};
        if (!$code1){
            $code1 = 'all';
        }
        ($value1, $year) = normalizeValue($1,$2);
        if ($year){
            $code1 = "";
        }  
        $op     = $3;
        if ($op eq "NOT"){
            $op = "AND+!";
        }
        $code2  = $h->{$4};
        if (!$code2){
            $code2 = 'all';
        }
        ($value2,$year2) = normalizeValue($4,$5);
        if ($year2){
            if ($year){
                print "Error: This resource doesn't support
                searches where both search fields are WYR\n";
                exit (0);
            }
            $code2 = "";
            $year = $year2;
        }
     
    }
    elsif ($find_request =~ /\s*(\w{3})\s*=\s*\((.*?)\)\s*/ ) {
        $code1  = $h->{$1};
        ($value1,$year) = normalizeValue($1,$2);
        if ($year){
            print "Error: no query specified\n";
            exit (0);
        }
        $op     = "AND";
        $code2  = "all";
        $value2 = "";
    }
    else {
        print STDERR "Error (prola_find) : ";
        print STDERR "Could not parse find request: $find_request\n";
        return ("ERROR-CODE=0209","");
    }
    $cgi_parameters =~ s/CODE1/$code1/;
    $cgi_parameters =~ s/CODE2/$code2/;
    $cgi_parameters =~ s/TERM1/$value1/;
    $cgi_parameters =~ s/TERM2/$value2/;
    $cgi_parameters =~ s/BOOL1/$op/;
    $cgi_parameters =~ s/YEAR/$year/;
    return ("",$cgi_parameters);
}

#################################################
sub normalizeValue(){
    my ($code,$in_value) = @_;
    my $in_year = "0";
    $in_value =~ s/\?/\*/;
    if ($code eq "WYR") {
        $in_year = $in_value;
        $in_value = "";
    } else {
        $in_value =~ s/\s/+/g;
    }
    return ($in_value,$in_year);
}
#################################################
sub parseResult {
    my ($in_response) = @_;
    my $counter = 0;
    my $hit = 'of (\d+) total';
    if ($in_response =~ /$hit/){
        $counter = $1;
    }
    if ($counter == 0){
        return 0;
    }
    return ($counter);
}
  
#################################################
sub displayResults {
    my ($l_result,$l_url_query, $proxy_type, $proxy_address, $proxy_credentials) = @_;
    print "SET-RESULT=$l_result\n";
    print "FIND-REQUEST=$l_url_query\n";
    print "SET-ENTRY=1\n";
    if ($proxy_type){
        print "PROXY-TYPE=$proxy_type\n";
        print "PROXY-ADDRESS=$proxy_address\n";
        print "PROXY-CREDENTIALS=$proxy_credentials\n";
    }
}
###################End of Program###############

prola_present

#!/exlibris/metalib/m4_a/product/bin/perl
#####################################################################
#####################-----Prola Present-------#######################
#####################################################################

sub BEGIN {
    unshift (@INC, $ENV{'aleph_ext'});
    unshift (@INC, "$ENV{'aleph_product'}"."/perl/lib/site_perl/5.005");
    unshift (@INC, "$ENV{'aleph_product'}"."/perl/lib/site_perl/5.005/i686-linux");
}
use strict;
require "call_httpd_LWP.ML";
use URI::Escape;
my $debug;
my ($proxy_type, $proxy_address, $proxy_credentials);

{
    #init
    my ($error_code, $response, $set_number, $set_entry,
    $max_record, $find_request, $cookie_jar, $url_query);
    my @docx_arr;
    
    $debug = substr(uc($ENV{'VIR_EXT_DEBUG'}),0,1);
    if ($debug ne "Y") {
        $debug = "";
    }
    
    #getting the params
    ($set_number,$set_entry,$max_record,$find_request)=&readParam();
    
    #getting the qeury and cookies
    ($url_query) = &constructQuery($find_request,$set_entry);
    
    #getting the http page
    my $method = "POST";
    #($response) = &call_httpd($method,$url_query, $cookie_jar);
    ($response) = &call_httpd($method,$url_query, $cookie_jar,
    $proxy_type, $proxy_address, $proxy_credentials);
    
    #debug option
    if (0){
        print "$response\n";
    }
    
    #parsing the results
    @docx_arr= &parseResults($response);
    
    #displaying the results
    &displayResults($set_entry, @docx_arr);
    
}
     
######################The Subroutines##########################
sub readParam{

    my ($l_set_number, $l_set_entry,$l_max_record,
    $l_find_request, $line);
    while ($line = <STDIN>) {
        if ($line =~ /^SET-NUMBER/) {
            ($l_set_number) = $line =~ /SET-NUMBER=(.*)/;
        }
        if ($line =~ /^SET-ENTRY/) {
          ($l_set_entry) = $line =~ /SET-ENTRY=(.*)/;
        }
        if ($line =~ /^MAX-RECORD/) {
            ($l_max_record) = $line =~ /MAX-RECORD=(.*)/;
        }
        if ($line =~ /^FIND-REQUEST/) {
        ($l_find_request) = $line =~ /FIND-REQUEST=(.*)/;
        }
        if ($line =~ /^PROXY-TYPE=/){
            ($proxy_type) = $line =~ /PROXY-TYPE=\s?(.*)/;
        }
        if ($line =~ /^PROXY-ADDRESS=/){
            ($proxy_address) = $line =~ /PROXY-ADDRESS=\s?(.*)/;
        }
        if ($line =~ /^PROXY-CREDENTIALS=/){
            ($proxy_credentials) = $line =~ /PROXY-CREDENTIALS=\s?(.*)/;
        }
    }
    
    if ($debug) {
        print STDERR "\n\n".
        "           ---- prola_present Input params ----\n".
        "        set_number   = $l_set_number\n".
        "        set_entry    = $l_set_entry\n".
        "        max_record   = $l_max_record\n".
        "        find_request = $l_find_request\n";
    }
    return ($l_set_number, $l_set_entry,
    $l_max_record, $l_find_request);
}
                                                        
################################################################
sub constructQuery {
    my ($find_request,$set_entry) = @_;
    $set_entry--;
    $find_request .= '&skip='.$set_entry;
    return ($find_request);
}
             
###############################################################
sub parseResults {
    my ($response) = @_;
    my ($date, $year, $full_year, $title);
    my (@present);
    my $prefix = 'http://prola.asp.org';
    $response =~ s/\n//g;
    $response =~ s/\r//g;
    $response =~ s/\t//g;
    my $counter = 0;
    while ($response =~ /<li.*?><span.*?>\s*?\d+\.(.*?)<\/li>/){
        $response = $';
        my $result_line = $1;
        my $docx=[];
        #if ($result_line =~ /href="(.*?)"/){
        #    push (@$docx, "856 \$\$u $prefix$1");
        #    $result_line = $';
        #}
        if ($result_line =~ /"aps-boldfont">(.*?)<\/span>/){
            $title= $1;
            $result_line = $';
            $title =~ s/<.*?>//g;
            push (@$docx, "245 \$\$a $title");
        }
        if ($result_line =~ /abstract<\/span>:(.*?)<\/span><br \/>/i){
            $title= $1;
            $result_line = $';
            $title =~ s/<.*?>//g;
            push (@$docx, "520 \$\$a $title");
        }
        if ($result_line =~ /"aps-mediumfont">(.*?)</){
            push (@$docx, "100 \$\$a $1");
            $result_line = $';
        }
        if ($result_line =~ /"aps-mediumfont">(.*?)</){
            push (@$docx, "773 \$\$t $1");
            $result_line = $';
        }
        if ($result_line =~ /"aps-boldfont">(.*?)</){
            push (@$docx, "773 \$\$u $1");
            $result_line = $';
        }
        if ($result_line =~ /span>, (\d+?)-?(\d+?) \((\d+)\)/){
            if (!$1){
                push (@$docx, "773 \$\$v $2");
            }
            else{
                push (@$docx, "773 \$\$v $1");
                push (@$docx, "773 \$\$w $2");
            }
            push (@$docx, "YR \$\$a $3");
        }
            
        if (scalar @$docx) {
            push(@present,$docx);
        }
    }
    return (@present);
}
    
##############################################################
sub displayResults {
    my ($set_entry, @l_docx_arr) = @_;
    my ($n, $i, $line);
    my $length = scalar(@l_docx_arr);
    print "RECORD-FORMAT=\"PLAIN\"\n";
    if ((scalar @l_docx_arr)==0) {
        print STDOUT "RECORD\n";
        print STDOUT "END-RECORD\n";
    }
    else {
        for ($i=0 ;$i < $length; $i++) {
            my $docx = $l_docx_arr[$i];
            print "RECORD\n";
            foreach $line (@$docx) {
                print STDOUT "$line\n";
            }
            print "END-RECORD\n";
        }
    }
    print "END-OF-DATA\n";
}

#####################END of Program###########################

agricola_cat_present_single

#!/exlibris/metalib/m4_a/product/bin/perl

sub BEGIN {
  unshift (@INC, $ENV{'aleph_ext'});
}

require "call_httpd_timeout";
require "clean_name";
$host_name  = 'www.nal.usda.gov';
$port       = 80;
$protocol   = " HTTP/1.0\n\n";
$command_find  = "GET /cgi-bin/agricola-cat?";
$go_on = 1;
$prefix = "\$\$a";
$rec_begin="RECORD\n";
$rec_end="END-RECORD\n";

while ($line = <STDIN>) {
  if ($line =~ /^FIND-REQUEST/) {
    ($find_request) = $line =~ /FIND-REQUEST=\s?(.*)/;
  }
}

$command = $command_find.$find_request.'&screen=MA'.$protocol;
$command = $command_find.$find_request.$protocol;
#print "command:$command\n";

@response = &call_httpd_timeout($command,20);

&parse_result(@response);
print "END-OF-DATA\n";

## ---------------------------------------------------------
## Subroutine definitions start here

sub parse_result {
# Print results received from the server on the screen

  my (@res) = @_;
  my ($go_on) = 1;
  my ($line, $line_buf,$tmp);
  my %dictionary;
  $dictionary{"Author"} = 100;
  $dictionary{"Title"} = 245;
  $dictionary{"Transl Title"} = 246;
  $dictionary{"Other Title"} = 246;
  $dictionary{"Description"} = 300;
  $dictionary{"Series"} = 490;
  $dictionary{"Note"} = 500;
  $dictionary{"SPECIAL NOTE"} = 500;
  $dictionary{"Abstract"} = 520;
  $dictionary{"LC Subject"} = 650;
  $dictionary{"Other Author"} = 700;
  $dictionary{"Publisher"} = 773;
#  $dictionary{"NAL CALL NO"} = ;
#  $dictionary{"Other Loc"} = ;

  print "$rec_begin\n";
  while ($go_on && ($line = shift(@res))) {
    chomp($line);

    if ($line =~ /.*\<TR\> \<TD\> \<B\>.*$/i) {
       $line_buf = &clean_name($line);
       $line_buf =~ s/\s\s//g;
       ($tmp) = $line_buf =~ /^\s*(.*)$/ ;
       $line = shift(@res);
       chomp($line);
       $line = &clean_name($line);
       $line =~ s/\s\s//g;
       if ($dictionary{$tmp}){
         $result_buf = $dictionary{$tmp}." ".$prefix.$line;
         print "$result_buf\n";
       }
    }
    if ($line =~ /.*\<\/TABLE\>.*$/i) {
       $go_on = 0;
    }
  }
print "$rec_end";
}

imdb_find

#!/exlibris/metalib/m4_e/product/bin/perl
sub BEGIN {
    unshift (@INC, $ENV{'aleph_ext'});
    unshift (@INC, "$ENV{'aleph_product'}"."/perl/lib/site_perl/5.005");
    unshift (@INC, "$ENV{'aleph_product'}"."/perl/lib/site_perl/5.005/i686-linux");
}
          
use strict;
require "call_httpd_LWP.ML";
use URI::Escape;
my $debug;
my $error_code;
##################the program begins here#######################
{
    #init
    my ($host_name, $method, $find_request, $set_entry, $command);
    my ($response, $result, $cookie_jar, $view, $url_query);
    my ($session_id);
    my ($cgi_name, $cgi_param);
  
    $debug = substr(uc($ENV{'VIR_EXT_DEBUG'}),0,1);
    if ($debug ne "Y"){
        $debug ="";
    }
    #reading the params
    ($error_code, $find_request,
    $host_name, $cgi_name)=&readParam;
  
    if ($debug) {
        print STDERR "\n\nfind_request = $find_request\n";
        print STDERR "error_code       = $error_code\n";
        print STDERR "host_name        = $host_name\n";
    }             
  
    if ($error_code) {
        print STDOUT "$error_code\n";
        exit(0);
    }
  
    #authenticating the data
    ($cookie_jar, $session_id) =
    &authen($host_name, $cgi_name);
  
    #constructing the query
    ($error_code, $command) =
    &constructQuery($find_request);
    #error check
    if ($error_code) {
        print STDOUT "$error_code";
        exit(0);
    }
  
    #constructing the url
    $url_query = $host_name.$cgi_name.$command;
  
    #error check
    if ($debug) {
        print STDERR "\n\ncookie_jar = $cookie_jar \n"
        ."\nurl_query = $url_query\n";
    }
  
    #getting the http page      
    $method = "GET";
    ($response, $cookie_jar) =
    &call_httpd1($method, $url_query, $cookie_jar);
  
    #finding the number of results
    $result = &parseResult($response);
  
    #displaying the results
    &displayResults($result, $url_query, $cookie_jar);
}

#########The Subroutines#########################
#################################################
sub readParam {
    my ($find, $host, $line, $base, $host_name);
    my ($cgi_name, $cgi_param);
    while ($line = <STDIN>) {
        if ($line =~ /^FIND-REQUEST=/){
            ($find) = $line =~ /FIND-REQUEST=\s?(.*)/;
        }
        elsif ($line =~ /^SEARCH-ADDRESS=/){
            ($host) = $line =~ /SEARCH-ADDRESS=\s?(.*)/;
        }
  
        elsif ($line =~ /^BASE=/){
            ($base) = $line =~ /BASE=\s?(.*)/;  
        }
    }
      
    if (length $host == 0){
        print STDOUT "Error (totaltelecom find) : " .
                    "SEARCH-ADDRESS parameter is empty\n";
        return ("ERROR-CODE=0211","","");
    }
    if ($host =~ m{(http://.*?/)(.*?\?)}i){
        $host_name = $1;
        $cgi_name = $2;
    }
  
    return ("", $find, $host_name, $cgi_name);
}
#################################################
sub authen{
    my ($response, $cookie, $cookie_jar) = "";
    my ($host_name, $cgi_name, $cgi_param) = @_;
    my $in_url = $host_name.$cgi_name.$cgi_param;
    my $in_method = "GET";
    ($response, $cookie)=
    &call_httpd1($in_method,$in_url,"");
    $cookie_jar .= $cookie;  
    return ($cookie_jar);
}
#################################################
sub constructQuery {
    my ($find_request) = @_;
    my ($search_type, $search_value, $params) ="";
    my ($session_id, $cgi_params, $new_session_id)= "";
    #the rest of the params
    $params = 's=all&q=ML_SEARCH';
    if ($find_request =~ /\s*(\w{3})\s*=\s*\((.*?)\)\s*/ ){
        $search_value = $2;
    }
    else {
        print STDOUT "Error (find) : ";
        print STDOUT "Could not parse find"
        ."request: find_request\n";
        return ("ERROR-CODE=0209","");
    }
                           
    $params =~ s/ML_SEARCH/$search_value/;
    return ("",$params);
}

#################################################
sub normalizeValue(){
    my ($value1,$bool,$value2) = @_;
    if ($bool eq 'NOT'){
        return ($value1);
    }
    return "$value1+$bool+$value2";
}

#################################################
sub parseResult {
    my ($in_response) = @_;
    my $counter = 0;
    my $hit = 'Displaying (\d+) Result';
    if ($in_response =~ /$hit/){
        $counter = $1;
    }
    if ($counter == 0){
        return 1;
    }
    return ($counter);
}
  
#################################################
sub displayResults {
    my ($result, $url_query, $cookies) = @_;
    print "SET-RESULT=$result\n";
    print "FIND-REQUEST=$url_query\n";
    print "SET-ENTRY=1\n";
}
###################End of Program###############

Notes

I don’t see it documented that also FIND program may return search results in the same format PRESENT does. This is useful in cases where the target service returns records right away when the search is initiated and can save round trips to the service.