Handle wrongly encoded character in Python unicode string

Question

Handle wrongly encoded character in Python unicode string

I am dealing with unicode strings returned by the python-lastfm library.

I assume somewhere on the way, the library gets the encoding wrong and returns a unicode string that may contain invalid characters.

For example, the original string i am expecting in the variable a is "Glück"

>>> a
u'Gl\xfcck'
>>> print a
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)

\xfc is the escaped value 252, which corresponds to the latin1 encoding of "ü". Somehow this gets embedded in the unicode string in a way python can't handle on its own.

How do i convert this back a normal or unicode string that contains the original "Glück"? I tried playing around with the decode/encode methods, but either got a UnicodeEncodeError, or a string containing the sequence \xfc.

17

python string unicode character-encoding

задан strfry 22 April 2011 в 23:18

0 ответов

Другие вопросы по тегам:

python string unicode character-encoding

Handle wrongly encoded character in Python unicode string

0 ответов

Похожие вопросы: