net.datacrow.util
Class HtmlUtils

java.lang.Object
  extended by net.datacrow.util.HtmlUtils

public class HtmlUtils
extends java.lang.Object


Constructor Summary
HtmlUtils()
           
 
Method Summary
static org.w3c.dom.Document getDocument(java.lang.String html)
           
static org.w3c.dom.Document getDocument(java.net.URL url, int cleanupLevel)
           
static org.w3c.dom.Document getDocument(java.net.URL url, java.lang.String charset)
           
static org.w3c.dom.Document getDocument(java.net.URL url, java.lang.String charset, int cleanupLevel)
           
static java.lang.String getHtmlCleaned(java.net.URL url, java.lang.String charset, int cleanupLevel)
           
static java.lang.String toPlainText(java.lang.String html)
           
static java.lang.String toPlainText(java.lang.String html, java.lang.String charset)
          Clean the string of any unwanted characters
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlUtils

public HtmlUtils()
Method Detail

getDocument

public static org.w3c.dom.Document getDocument(java.net.URL url,
                                               int cleanupLevel)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

getDocument

public static org.w3c.dom.Document getDocument(java.net.URL url,
                                               java.lang.String charset)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

getDocument

public static org.w3c.dom.Document getDocument(java.net.URL url,
                                               java.lang.String charset,
                                               int cleanupLevel)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

getDocument

public static org.w3c.dom.Document getDocument(java.lang.String html)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

getHtmlCleaned

public static java.lang.String getHtmlCleaned(java.net.URL url,
                                              java.lang.String charset,
                                              int cleanupLevel)
                                       throws java.lang.Exception
Throws:
java.lang.Exception

toPlainText

public static java.lang.String toPlainText(java.lang.String html)

toPlainText

public static java.lang.String toPlainText(java.lang.String html,
                                           java.lang.String charset)
Clean the string of any unwanted characters

Parameters:
s - string to clean