Cleaning HTML with Nokogiri (instead of Tidy)

The tidy gem is no longer maintained and has multiple memory leak issues.

Some people suggested using Nokogiri.

I'm currently cleaning the HTML using:

Nokogiri::HTML::DocumentFragment.parse(html).to_html

I've got two issues though:

  • Nokogiri removes the DOCTYPE

  • Is there an easy way to force the cleaned HTML to have a html and body tag?

8
задан Phrogz 7 April 2011 в 17:23
поделиться