org.apache.lucene.benchmark.byTask.feeds

Interface HTMLParser

public interface HTMLParser

HTML Parsing Interfacew for test purposes
Method Summary
DocDataparse(String name, Date date, Reader reader, DateFormat dateFormat)
Parse the input Reader and return DocData.
DocDataparse(String name, Date date, StringBuffer inputText, DateFormat dateFormat)
Parse the inputText and return DocData.

Method Detail

parse

public DocData parse(String name, Date date, Reader reader, DateFormat dateFormat)
Parse the input Reader and return DocData. A provided name or date is used for the result, otherwise an attempt is made to set them from the parsed data.

Parameters: dateFormat date formatter to use for extracting the date. name name of the result doc data. If null, attempt to set by parsed data. date date of the result doc data. If null, attempt to set by parsed data. reader of html text to parse.

Returns: Parsed doc data.

Throws: IOException InterruptedException

parse

public DocData parse(String name, Date date, StringBuffer inputText, DateFormat dateFormat)
Parse the inputText and return DocData.

Parameters: inputText the html text to parse.

See Also: HTMLParser

Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.