Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html-snippet doesn't work with Jsoup parser #90

Open
tomoharu-fujita opened this issue Jan 15, 2014 · 5 comments
Open

html-snippet doesn't work with Jsoup parser #90

tomoharu-fujita opened this issue Jan 15, 2014 · 5 comments
Assignees

Comments

@tomoharu-fujita
Copy link

net.cgrand.enlive-html/html-snippet pass java.io.StringReader instance to html-resource, but Jsoup/parse doesn't come with a corresponding interface.

@dhruvbhatia
Copy link

Same issue, here's a simple example that breaks:

(enlive/set-ns-parser! net.cgrand.jsoup/parser)
(enlive/html-resource (java.io.StringReader. "<h1>Hi, cgrand!</h1>") (enlive/ns-options))

The above returns this:

CompilerException java.lang.IllegalArgumentException: No matching method found: parse

@jcromartie
Copy link

This is seriously ruining my day today.

@dhruvbhatia
Copy link

Here's a monkey patch workaround I use. Basically had to redefine a bunch of core functions and then modify the parser fn to suit my needs:

(ns my.namespace
  (:import [org.jsoup Jsoup]
           [org.jsoup.nodes Attribute Attributes Comment DataNode Document
                            DocumentType Element Node TextNode XmlDeclaration]
           [org.jsoup.parser Parser Tag]))

(def ^:private ->key (comp keyword #(.. % toString toLowerCase)))

(defprotocol IEnlive
  (->nodes [d] "Convert object into Enlive node(s)."))

(extend-protocol IEnlive
  Attribute
  (->nodes [a] [(->key (.getKey a)) (.getValue a)])

  Attributes
  (->nodes [as] (not-empty (into {} (map ->nodes as))))

  Comment
  (->nodes [c] {:type :comment :data (.getData c)})

  DataNode
  (->nodes [dn] (str dn))

  Document
  (->nodes [d] (not-empty (map ->nodes (.childNodes d))))

  DocumentType
  (->nodes [dtd] {:type :dtd :data ((juxt :name :publicid :systemid) (->nodes (.attributes dtd)))})

  Element
  (->nodes [e] {:tag     (->key (.tagName e))
                :attrs   (->nodes (.attributes e))
                :content (not-empty (map ->nodes (.childNodes e)))})

  TextNode
  (->nodes [tn] (.getWholeText tn))

  nil
  (->nodes [_] nil))

; redefined parser fn to support jsoup
(defn parser
  "Parse a HTML document stream into Enlive nodes using JSoup."
  [stream]
  (with-open [^java.io.Closeable stream stream]
    (->nodes (Jsoup/parse stream "ISO-8859-1" ""))))

; then this will work
(net.cgrand.enlive-html/html-resource (-> "<h1>Hi, cgrand!</h1>" (.getBytes "ISO-8859-1")
                                            java.io.ByteArrayInputStream.) {:parser parser})

@fdserr
Copy link
Collaborator

fdserr commented Jul 10, 2015

Added to wiki, many thanks @dhruvbhatia !

@fdserr fdserr closed this as completed Jul 10, 2015
@cgrand
Copy link
Owner

cgrand commented Sep 2, 2015

@JustinIAC, net.cgrand.jsoup should be fixed for handling readers before making JSoup the default

@cgrand cgrand reopened this Sep 2, 2015
@fdserr fdserr self-assigned this Sep 2, 2015
@fdserr fdserr added this to the 1.1.7 milestone Sep 2, 2015
@fdserr fdserr removed this from the 1.1.7 milestone Sep 2, 2015
fdserr pushed a commit to fdserr/enlive that referenced this issue Oct 1, 2015
- upgrade jsoup to 1.8.3
- test html-resource with jsoup (jsoup always adds a <head>, not
tagsoup)
- add jsoup/tagsoup fixtures to existing tests
- pass tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants