org.apache.lucene.ant
Class HtmlDocument
public class HtmlDocument
The
HtmlDocument
class creates a Lucene
Document
from an HTML document.
It does this by using JTidy package. It can take input input
from
File
or
InputStream
.
static Document | Document(File file) - Creates a Lucene
Document from a File .
|
String | getBody() - Gets the bodyText attribute of the
HtmlDocument object.
|
static Document | getDocument(InputStream is) - Creates a Lucene
Document from an InputStream .
|
String | getTitle() - Gets the title attribute of the
HtmlDocument
object.
|
static void | main(args[] ) - Runs
HtmlDocument on the files specified on
the command line.
|
HtmlDocument
public HtmlDocument(File file)
throws IOException
Constructs an
HtmlDocument
from a
File
.
file
- the File
containing the
HTML to parse
HtmlDocument
public HtmlDocument(InputStream is)
is
- the InputStream
containing the HTML
Document
public static Document Document(File file)
throws IOException
Creates a Lucene
Document
from a
File
.
getBody
public String getBody()
Gets the bodyText attribute of the
HtmlDocument
object.
getDocument
public static Document getDocument(InputStream is)
getTitle
public String getTitle()
Gets the title attribute of the HtmlDocument
object.
main
public static void main(args[] )
throws Exception
Runs HtmlDocument
on the files specified on
the command line.
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.