NAME
    WebFetch - Perl module to download and save information from the
    Web

SYNOPSIS
      use WebFetch;

DESCRIPTION
    The WebFetch module is a general framework for downloading and
    saving information from the web, and for display on the web. It
    requires another module to inherit it and fill in the specifics
    of what and how to download. WebFetch provides a generalized
    interface for saving to a file while keeping the previous
    version as a backup. This is expected to be used for
    periodically-updated information which is run as a cron job.

INSTALLATION
    After unpacking and the module sources from the tar file, run

    `perl Makefile.PL'

    `make'

    `make install'

    Or from a CPAN shell you can simply type "`install WebFetch'"
    and it will download, build and install it for you.

    If you need help setting up a separate area to install the
    modules (i.e. if you don't have write permission where perl
    keeps its modules) then see the Perl FAQ.

    To begin using the WebFetch modules, you will need to test your
    fetch operations manually, put them into a crontab, and then use
    server-side include (SSI) or a similar server configuration to
    include the files in a live web page.

  MANUALLY TESTING A FETCH OPERATION

    Select a directory which will be the storage area for files
    created by WebFetch. This is an important administrative
    decision - keep the volatile automatically-generated files in
    their own directory so they'll be separated from manually-
    maintained files.

    Choose the specific WebFetch-derived modules that do the work
    you want. See their particular manual/web pages for details on
    command-line arguments. Test run them first before committing to
    a crontab.

  SETTING UP CRONTAB ENTRIES

    First of all, if you don't have crontab access or don't know
    what they are, contact your site's system administrator(s). Only
    local help will do any good on local-configuration issues. No
    one on the Internet can help. (If you are the administrator for
    your system, see the crontab(1) and crontab(5) manpages and
    nearly any book on Unix system administration.)

    Since the WebFetch command lines are usually very long, you may
    prefer to make one or more scripts as front-ends so your crontab
    entries aren't so huge.

    Do not run the crontab entries too often - be a good net.citizen
    and do your updates no more often than necessary. Popular sites
    need their users to refrain from making automated requests too
    often because they add up on an enormous scale on the Internet.
    Some sites such as Freshmeat prefer no shorter than hourly
    intervals. Slashdot prefers no shorter than half-hourly
    intervals. When in doubt, ask the site maintainers what they
    prefer.

    (Then again, there are a very few sites like Yahoo and CNN who
    don't mind getting the extra hits if you're going to create
    links to them. Even so, more often than every 20 minutes would
    still be excessive to the biggest web sites.)

  SETTING UP SERVER-SIDE INCLUDES

    See the manual for your web server to make sure you have server-
    side include (SSI) enabled for the files that need it. (It's
    wasteful to enable it for all your files so be careful.)

    When using Apache HTTPD, a line like this will include a
    WebFetch-generated file:

    <!--#include file="fetch/slashdot.html"-->

WebFetch FUNCTIONS
    The following function definitions assume `$obj' is a blessed
    reference to a module that is derived from (inherits from)
    WebFetch.

    Do not use the new() function directly from WebFetch.
        *Use the `new' function from a derived class*, not directly
        from WebFetch. The WebFetch module itself is just
        infrastructure for the other modules, and contains none of
        the details needed to complete any specific fetches.

    $obj->init( ... )
        This is called from the `new' function of all WebFetch
        modules. It takes "name" => "value" pairs which are all
        placed verbatim as attributes in `$obj'.

    $obj->run
        This function is exported by standard WebFetch-derived
        modules as `fetch_main'. This handles command-line
        processing for some standard options, calling the module-
        specific fetch function and WebFetch's $obj->save function
        to save the contents to one or more files.

        The command-line processing for some standard options are as
        follows:

    --dir *directory*
            (required) the directory in which to write output files

    --group *group*
            (optional) the group ID to set the output file(s) to

    --mode *mode*
            (optional) the file mode (permissions) to set the output
            file(s) to

    --export *export-file*
            (optional) save a portable WebFetch-export copy of the
            fetched info in the file named by this parameter. The
            contents of this file can be read by the
            WebFetch::General module. You may use this to export
            your own news to other WebFetch users. (Exports may be
            explicitly disabled by some WebFetch-derived modules
            simply by omiting the export step from their fetch()
            functions. Though it works with all the modules that
            come included with the WebFetch package itself.)

    --ns_export *ns-export-file*
            (optional) save a MyNetscape export copy of the fetched
            info into the file named by this parameter. If this
            optional parameter is used, three additional parameters
            become required: --ns_site_title, --ns_site_link, and --
            ns_site_desc. If you want to include an icon in the
            channel display, you should also use --ns_image_title
            and --ns_image_url. A URL Prefix must also be set for
            this to work correctly, which can be supplied via the
            the --url_prefix parameter or in the *url-prefix* line
            of the WebFetch::SiteNews news input file.

For more info see http://my.netscape.com/publish/

            *Note that MyNetscape uses Reuters Distribution Format
            (RDF) for its imports. So this format may be readable by
            any other sites who can read data from Reuters. You
            should use the ".rdf" suffix on file names that use this
            format.*

    --ns_site_title *site-title*
            (required if --ns_export is used) For exporting to
            MyNetscape, this sets the name of your site. It cannot
            be more than 40 characters

    --ns_site_link *site-link*
            (required if --ns_export is used) For exporting to
            MyNetscape, this is the full URL MyNetscape will use to
            link to your site. It cannot be more than 500
            characters.

    --ns_site_desc *site-description*
            (required if --ns_export is used) For exporting to
            MyNetscape, this is a short description of your site. It
            cannot be more than 500 characters.

    --ns_image_title *image-title*
            (optional) For exporting to MyNetscape, this is the
            title (alt) text for the icon image.

    --ns_image_url *image-url*
            (optional) For exporting to MyNetscape, this is the URL
            MyNetscpae will use for your icon image. If this is
            present, the link on the image will be the same as your
            --ns_site_link parameter.

    --url_prefix *url-prefix*
            (optional) include a URL prefix to use on the saved URLs
            on --ns_export output files. (It could also be used in
            the future by other output formats that need URL
            prefixes.) This is considered optional by WebFetch
            though you will probably need it for MyNetscape to
            properly link to your site. This information can also be
            supplied via the *url-prefix* line of the
            WebFetch::SiteNews news input file. If it is set in the
            WebFetch::SiteNews, it will override the --url_prefix
            command line parameter.

    --font_size *number*
            (optional) choose a font size for generated HTML text.
            This will be used in a font tag so it may be relative,
            like "-1" or "+1".

    --font_face *string*
            (optional) choose a font face for generated HTML text.
            This will be used in a font tag so it may be any
            standard font name or a list. For example, for a sans-
            serif font, use "`Helvetica,Arial,sans-serif'".

    --style *style-name-list*
            (optional) select from one or more of various HTML
            output styles for the generated HTML text. If more than
            one style name is listed, they must be separated by
            commas (no spaces.)

        para    use paragraph breaks between lines/links instead of
                unordered lists

        notable usually WebFetch modules generate HTML table-formatted
                output text but this option will disable the e of
                tables

        bullet  use explicit bullet characters (HTML entity #149) and
                line breaks (br) to identify and separate each link

        ul      (default) use an HTML unnumbered list (ul) block for the
                list of links

            The *para*, *bullet* and *ul* styles are mutually
            exclusive. Others may be specified at the same time.

    --quiet (optional) suppress printed warnings for HTTP errors
            *(applies only to modules which use the WebFetch::get()
            function)* in case they are not desired for cron outputs

    --debug (optional) print verbose debugging outputs, only useful for
            developers adding new WebFetch-based modules or
            finding/reporting a bug in an existing module

        Modules derived from WebFetch may add their own command-line
        options that WebFetch::run() will use by defining a variable
        called `@Options' in the calling module, using the
        name/value pairs defined in Perl's Getopts::Long module.
        Derived modules can also add to the command-line usage error
        message by defining a variable called `$Usage' with a string
        of the additional parameters, as they should appear in the
        usage message.

    $obj->fetch
        This function must be provided by each derived module to
        perform the fetch operaton specific to that module. It will
        be called from `new()' so you should not call it directly.
        Your fetch function should extract some data from somewhere
        and place of it in HTML or other meaningful form in the
        "savable" array.

        Upon entry to this function, $obj must contain the following
        attributes:

    dir     The name of the directory to save in. (If called from the
            command-line, this will already have been provided by
            the required `--dir' parameter.)

    savable a reference to an array where the "savable" items will be
            placed by the $obj->fetch function. (You only need to
            provide an array reference - other WebFetch functions
            can write to it.) Each entry of the savable array is a
            hash reference with the following attributes:

        file    file name to save in

        content scalar w/ entire text or raw content to write to the
                file

        group   (optional) group setting to apply to file

        mode    (optional) file permissions to apply to file

            Contents of savable items may be generated directly by
            derived modules or with WebFetch's `html_gen',
            `html_savable' or `raw_savable' functions. These
            functions will set the group and mode parameters from
            the object's own settings, which in turn could have
            originated from the WebFetch command-line if this was
            called that way.

        Upon exit from this function, the $obj->savable array must
        contain one entry for each file to be saved. More than one
        array entry means more than one file to save. The WebFetch
        infrastructure will save them, retaining backup copies and
        setting file modes as needed.

    $obj->get
        This WebFetch utility function will get a URL and return a
        reference to a scalar with the retrieved contents. Upon
        entry to this function, `$obj' must contain the following
        attributes:

    url     the URL to get

    quiet   a flag which, when set to a non-zero (true) value,
            suppresses printing of HTTP request errors on STDERR

    $obj->wf_export ( $filename, $fields, $links, [ $comment, [ $param ]] )
        This WebFetch utility function generates contents for a
        WebFetch export file, which can be placed on a web server to
        be read by other WebFetch sites. The WebFetch::General
        module reads this format. $obj->wf_export has the following
        parameters:

    $filename
            the file to save the WebFetch export contents to; this
            will be placed in the savable record with the contents
            so the save function knows were to write them

    $fields a reference to an array containing a list of the names of
            the data fields (in each entry of the @$lines array)

    $lines  a reference to an array of arrays; the outer array contains
            each line of the exported data; the inner array is a
            list of the fields within that line corresponding in
            index number to the field names in the @$fields array

    $comment
            (optional) a Human-readable string comment (probably
            describing the purpose of the format and the definitions
            of the fields used) to be placed at the top of the
            exported file

    $param  (optional) a reference to a hash of global parameters for
            the exported data. This is currently unused but reserved
            for future versions of WebFetch.

    $obj->ns_export ( $filename, $lines )
        This WebFetch utility function generates contents for a
        MyNetscape export file, which can be placed on a web server
        to be read by the MyNetscape site (my.netscape.com) if you
        create a "channel" for your site at MyNetscape.

        Of the modules included with WebFetch, only
        WebFetch::SiteNews and WebFetch::Genercal call $obj-
        >ns_export(). The others will ignore it (because they're
        just obtaining data from other sites themselves.) You may
        use $obj->ns_export() in your own modules which inherit from
        WebFetch.

For more info see http://my.netscape.com/publish/

        $obj->ns_export has the following parameters:

    $filename
            the file to save the WebFetch export contents to; this
            will be placed in the savable record with the contents
            so the save function knows were to write them

    $lines  a reference to an array of arrays; the outer array contains
            each line of the exported data; the inner array is a
            list of two fields within that line consisting of a text
            title string in one entry and a URL in the second entry.

    $site_title
            For exporting to MyNetscape, this sets the name of your
            site. It cannot be more than 40 characters

    $site_link
            For exporting to MyNetscape, this is the full URL
            MyNetscape will use to link to your site. It cannot be
            more than 500 characters.

    $site_desc
            For exporting to MyNetscape, this is a short description
            of your site. It cannot be more than 500 characters.

    $image_title
            (optional) For exporting to MyNetscape, this is the
            title (alt) text for the icon image.

    $image_url
            (optional) For exporting to MyNetscape, this is the URL
            MyNetscpae will use for your icon image. If this is
            present, the link on the image will be the same as your
            $site_link parameter.

    $obj->html_gen( $filename, $format_func, $links, [ $style ] )
        This WebFetch utility function generates some common formats
        of HTML output used by WebFetch-derived modules. The HTML
        output is stored in the $obj->{savable} array, for which all
        the files in that array can later be saved by the $obj->save
        function. It has the following parameters:

    $filename
            the file name to save the generated contents to; this
            will be placed in the savable record with the contents
            so the save function knows were to write them

    $format_func
            a refernce to code that formats each entry in @$links
            into a line of HTML

    $links  a reference to an array of arrays of parameters for
            `&$format_func'; each entry in the outer array is
            contents for a separate HTML line and a separate call to
            `&$format_func'

    $style  (optional) a hash reference with style parameter
            names/values that can modify the behavior of the
            funciton to use different HTML styles. The recognized
            values are enumerated with WebFetch's *--style* command
            line option. (When they reach this point, they are no
            longer a comma-delimited string - WebFetch or another
            module has parsed them into a hash with the style name
            as the key and the integer 1 for the value.)

        Upon entry to this function, `$obj' must contain the
        following attributes:

    num_links
            number of lines/links to display

    savable reference to an array of hashes which this function will use
            as storage for filenames and contents to save (you only
            need to provide an array reference - the function will
            write to it)

            See $obj->fetch for details on the contents of the
            `savable' parameter

    table_sections
            (optional) if present, this specifies the number of
            table columns to use; the number of links from
            `num_links' will be divided evenly between the columns

    $obj->html_savable( $filename, $content )
        This WebFetch utility function stores pre-generated HTML in
        a new entry in the $obj->{savable} array, for later writing
        to a file. It's basically a simple wrapper that puts HTML
        comments warning that it's machine-generated around the
        provided HTML text. This is generally a good idea so that
        neophyte webmasters (and you know there are a lot of them in
        the world :-) will see the warning before trying to manually
        modify your automatically-generated text.

        See $obj->fetch for details on the contents of the `savable'
        parameter

    $obj->raw_savable( $filename, $content )
        This WebFetch utility function stores any raw content and a
        filename in the $obj->{savable} array, in preparation for
        writing to that file. (The actual save operation may also
        automatically include keeping backup files and setting the
        group and mode of the file.)

        See $obj->fetch for details on the contents of the `savable'
        parameter

    $obj->save
        This WebFetch utility function goes through all the entries
        in the $obj->{savable} array and saves their contents,
        providing several services such as keeping backup copies,
        and setting the group and mode of the file, if requested to
        do so.

        If you call a WebFetch-derived module from the command-line
        run() or fetch_main() functions, this will already be done
        for you. Otherwise you will need to call it after populating
        the `savable' array with one entry per file to save.

        Upon entry to this function, `$obj' must contain the
        following attributes:

    dir     directory to save files in

    savable names and contents for files to save

        See $obj->fetch for details on the contents of the `savable'
        parameter

  WRITING NEW WebFetch-DERIVED MODULES

    The easiest way to make a new WebFetch-derived module is to
    start from the module closest to your fetch operation and modify
    it. Make sure to change all of the following:

    fetch function
        The fetch function is the meat of the operation. Get the
        desired info from a local file or remote site and place the
        contents that need to be saved in the `savable' parameter.

    module name
        Be sure to catch and change them all.

    file names
        The code and documentation may refer to output files by
        name.

    module parameters
        Change the URL, number of links, etc as necessary.

    command-line parameters
        If you need to add command-line parameters, modify both the
        `@Options' and `$Usage' variables. Don't forget to add
        documentation for your command-line options and remove old
        documentation for any you removed.

    authors
        Add yourself as an author if you added any significant
        functionality. But if you used anyone else's code, retain
        the existing author credits in any module you modify to make
        a new one.

    export function
        If it's appropriate for users of your module to be able to
        export its data to other sites, add an export() function.
        Use the one in WebFetch::SiteNews as an example if you need
        to.

    Please consider contributing any useful changes back to the
    WebFetch project at `webfetch-maint@svlug.org'.

AUTHOR
    WebFetch was written by Ian Kluft for the Silicon Valley Linux
    User Group (SVLUG). Send patches, bug reports, suggestions and
    questions to `webfetch-maint@svlug.org'.

    WebFetch is Open Source software distributed via the
    Comprehensive Perl Archive Network (CPAN), a worldwide network
    of Perl web mirror sites. WebFetch may be copied under the same
    terms and licensing as Perl itelf.

A current copy of the source code and documentation may be found at
http://www.svlug.org/sw/webfetch/

SEE ALSO
perl(1), WebFetch::CNETnews, WebFetch::CNNsearch, WebFetch::COLA,
WebFetch::Freshmeat, WebFetch::LinuxToday, WebFetch::ListSubs,
WebFetch::PerlStruct,
WebFetch::SiteNews, WebFetch::Slashdot, WebFetch::YahooBiz.