Consider the following:
from lxml import etree
from StringIO import StringIO
x = """<?xml version="1.0" encoding="utf-8"?>\n<aa> â</aa>"""
p = etree.XMLParser(remove_blank_text=True, resolve_entities=False)
r = etree.parse(StringIO(x), p)
This would fail with:
lxml.etree.XMLSyntaxError: Entity 'nbsp' not defined, line 2, column 11
This is because resolve_entities=False
doesn't ignore them, it just doesn't resolve them.
If I use etree.HTMLParser
instead, it creates html
and body
tags, plus a lot of other special handling it tries to do for HTML
.
What's the best way to get a â
text child under the aa
tag with lxml?