What's the best way to handle  -like entities in XML documents with lxml?

Consider the following:

from lxml import etree
from StringIO import StringIO

x = """<?xml version="1.0" encoding="utf-8"?>\n<aa>&nbsp;&acirc;</aa>"""
p = etree.XMLParser(remove_blank_text=True, resolve_entities=False)
r = etree.parse(StringIO(x), p)

This would fail with:
lxml.etree.XMLSyntaxError: Entity 'nbsp' not defined, line 2, column 11

This is because resolve_entities=False doesn't ignore them, it just doesn't resolve them.

If I use etree.HTMLParser instead, it creates html and body tags, plus a lot of other special handling it tries to do for HTML.

What's the best way to get a  â text child under the aa tag with lxml?

12
задан Prody 2 March 2011 в 16:14
поделиться