The tidy
gem is no longer maintained and has multiple memory leak issues.
Some people suggested using Nokogiri.
I'm currently cleaning the HTML using:
Nokogiri::HTML::DocumentFragment.parse(html).to_html
I've got two issues though:
Nokogiri removes the DOCTYPE
Is there an easy way to force the cleaned HTML to have a html
and body
tag?