net.datacrow.util
Class HtmlUtils

java.lang.Object
  extended by net.datacrow.util.HtmlUtils

public class HtmlUtils
extends java.lang.Object


Constructor Summary
HtmlUtils()
           
 
Method Summary
static org.w3c.dom.Document getDocument(java.lang.String html)
           
static org.w3c.dom.Document getDocument(java.net.URL url, boolean cleanup)
           
static org.w3c.dom.Document getDocument(java.net.URL url, java.lang.String charset)
           
static org.w3c.dom.Document getDocument(java.net.URL url, java.lang.String charset, boolean cleanup)
           
static java.lang.String getHtmlCleaned(java.net.URL url, java.lang.String charset, boolean cleanup)
           
static java.lang.String toPlainText(java.lang.String html)
           
static java.lang.String toPlainText(java.lang.String html, java.lang.String charset)
          Clean the string of any unwanted characters
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlUtils

public HtmlUtils()
Method Detail

getDocument

public static org.w3c.dom.Document getDocument(java.net.URL url,
                                               boolean cleanup)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

getDocument

public static org.w3c.dom.Document getDocument(java.net.URL url,
                                               java.lang.String charset)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

getDocument

public static org.w3c.dom.Document getDocument(java.net.URL url,
                                               java.lang.String charset,
                                               boolean cleanup)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

getDocument

public static org.w3c.dom.Document getDocument(java.lang.String html)
                                        throws java.lang.Exception
Throws:
java.lang.Exception

getHtmlCleaned

public static java.lang.String getHtmlCleaned(java.net.URL url,
                                              java.lang.String charset,
                                              boolean cleanup)
                                       throws java.lang.Exception
Throws:
java.lang.Exception

toPlainText

public static java.lang.String toPlainText(java.lang.String html)

toPlainText

public static java.lang.String toPlainText(java.lang.String html,
                                           java.lang.String charset)
Clean the string of any unwanted characters

Parameters:
s - string to clean