Index of Section 1 Manual Pages

Interix / SUAperlfaq9.1Interix / SUA

PERLFAQ9(1)      Perl Programmers Reference Guide     PERLFAQ9(1)



NAME
       perlfaq9 - Networking ($Revision: 1.28 $, $Date:
       2005/12/31 00:54:37 $)

DESCRIPTION
       This section deals with questions related to networking,
       the internet, and a few on the web.

       What is the correct form of response from a CGI script?

       (Alan Flavell  answers...)

       The Common Gateway Interface (CGI) specifies a software
       interface between a program ("CGI script") and a web
       server (HTTPD). It is not specific to Perl, and has its
       own FAQs and tutorials, and usenet group, comp.infosys-
       tems.www.authoring.cgi

       The CGI specification is outlined in an informational RFC:
       http://www.ietf.org/rfc/rfc3875

       Other relevant documentation listed in:
       http://www.perl.org/CGI_MetaFAQ.html

       These Perl FAQs very selectively cover some CGI issues.
       However, Perl programmers are strongly advised to use the
       CGI.pm module, to take care of the details for them.

       The similarity between CGI response headers (defined in
       the CGI specification) and HTTP response headers (defined
       in the HTTP specification, RFC2616) is intentional, but
       can sometimes be confusing.

       The CGI specification defines two kinds of script: the
       "Parsed Header" script, and the "Non Parsed Header" (NPH)
       script. Check your server documentation to see what it
       supports. "Parsed Header" scripts are simpler in various
       respects. The CGI specification allows any of the usual
       newline representations in the CGI response (it's the
       server's job to create an accurate HTTP response based on
       it). So "\n" written in text mode is technically correct,
       and recommended. NPH scripts are more tricky: they must
       put out a complete and accurate set of HTTP transaction
       response headers; the HTTP specification calls for records
       to be terminated with carriage-return and line-feed, i.e
       ASCII \015\012 written in binary mode.

       Using CGI.pm gives excellent platform independence,
       including EBCDIC systems. CGI.pm selects an appropriate
       newline representation ($CGI::CRLF) and sets binmode as
       appropriate.

       My CGI script runs from the command line but not the
       browser.  (500 Server Error)

       Several things could be wrong.  You can go through the
       "Troubleshooting Perl CGI scripts" guide at

               http://www.perl.org/troubleshooting_CGI.html

       If, after that, you can demonstrate that you've read the
       FAQs and that your problem isn't something simple that can
       be easily answered, you'll probably receive a courteous
       and useful reply to your question if you post it on
       comp.infosystems.www.authoring.cgi (if it's something to
       do with HTTP or the CGI protocols).  Questions that appear
       to be Perl questions but are really CGI ones that are
       posted to comp.lang.perl.misc are not so well received.

       The useful FAQs, related documents, and troubleshooting
       guides are listed in the CGI Meta FAQ:

               http://www.perl.org/CGI_MetaFAQ.html

       How can I get better error messages from a CGI program?

       Use the CGI::Carp module.  It replaces "warn" and "die",
       plus the normal Carp modules "carp", "croak", and "con-
       fess" functions with more verbose and safer versions.  It
       still sends them to the normal server error log.

           use CGI::Carp;
           warn "This is a complaint";
           die "But this one is serious";

       The following use of CGI::Carp also redirects errors to a
       file of your choice, placed in a BEGIN block to catch com-
       pile-time warnings as well:

           BEGIN {
               use CGI::Carp qw(carpout);
               open(LOG, ">>/var/local/cgi-logs/mycgi-log")
                   or die "Unable to append to mycgi-log: $!\n";
               carpout(*LOG);
           }

       You can even arrange for fatal errors to go back to the
       client browser, which is nice for your own debugging, but
       might confuse the end user.

           use CGI::Carp qw(fatalsToBrowser);
           die "Bad error here";

       Even if the error happens before you get the HTTP header
       out, the module will try to take care of this to avoid the
       dreaded server 500 errors.  Normal warnings still go out
       to the server error log (or wherever you've sent them with
       "carpout") with the application name and date stamp
       prepended.

       How do I remove HTML from a string?

       The most correct way (albeit not the fastest) is to use
       HTML::Parser from CPAN.  Another mostly correct way is to
       use HTML::FormatText which not only removes HTML but also
       attempts to do a little simple formatting of the resulting
       plain text.

       Many folks attempt a simple-minded regular expression
       approach, like "s/<.*?>//g", but that fails in many cases
       because the tags may continue over line breaks, they may
       contain quoted angle-brackets, or HTML comment may be pre-
       sent.  Plus, folks forget to convert entities--like "<"
       for example.

       Here's one "simple-minded" approach, that works for most
       files:

           #!/usr/bin/perl -p0777
           s/<(?:[^>'"]*|(['"]).*?\1)*>//gs

       If you want a more complete solution, see the 3-stage
       striphtml program in
       http://www.cpan.org/authors/Tom_Chris-
       tiansen/scripts/striphtml.gz .

       Here are some tricky cases that you should think about
       when picking a solution:

           A > B

           A > B

           

           

           <# Just data #>

           >>>>>>>>>>> ]]>

       If HTML comments include other tags, those solutions would
       also break on text like this:

           

       How do I extract URLs?

       You can easily extract all sorts of URLs from HTML with
       "HTML::SimpleLinkExtor" which handles anchors, images,
       objects, frames, and many other tags that can contain a
       URL.  If you need anything more complex, you can create
       your own subclass of "HTML::LinkExtor" or "HTML::Parser".
       You might even use "HTML::SimpleLinkExtor" as an example
       for something specifically suited to your needs.

       You can use URI::Find to extract URLs from an arbitrary
       text document.

       Less complete solutions involving regular expressions can
       save you a lot of processing time if you know that the
       input is simple.  One solution from Tom Christiansen runs
       100 times faster than most module based approaches but
       only extracts URLs from anchors where the first attribute
       is HREF and there are no other attributes.

               #!/usr/bin/perl -n00
               # qxurl - tchrist@perl.com
               print "$2\n" while m{
                   < \s*
                     A \s+ HREF \s* = \s* (["']) (.*?) \1
                   \s* >
               }gsix;

       How do I download a file from the user's machine?  How do
       I open a file on another machine?

       In this case, download means to use the file upload fea-
       ture of HTML forms.  You allow the web surfer to specify a
       file to send to your web server.  To you it looks like a
       download, and to the user it looks like an upload.  No
       matter what you call it, you do it with what's known as
       multipart/form-data encoding.  The CGI.pm module (which
       comes with Perl as part of the Standard Library) supports
       this in the start_multipart_form() method, which isn't the
       same as the startform() method.

       See the section in the CGI.pm documentation on file
       uploads for code examples and details.

       How do I make a pop-up menu in HTML?

       Use the