Существует несколько вопросов для рассмотрения.
А \u2018 символ может появиться только как фрагмент представления строки unicode в Python, например, если Вы пишете:
>>> text = u'‘'
>>> print repr(text)
u'\u2018'
Теперь, если Вы просто хотите распечатать строку unicode красиво, просто используйте unicode's encode
метод:
>>> text = u'I don\u2018t like this'
>>> print text.encode('utf-8')
I don‘t like this
, Чтобы удостовериться, что каждая строка из любого файла была бы считана как unicode, необходимо использовать эти codecs.open
функция вместо всего open
, который позволяет Вам определять кодирование файла:
>>> import codecs
>>> f1 = codecs.open(file1, "r", "utf-8")
>>> text = f1.read()
>>> print type(text)
<type 'unicode'>
>>> print text.encode('utf-8')
I don‘t like this
Yes that's expected.
If you think about it: what else can the database do? If you increment the column and then use that as a foreign key in other inserts within the same transaction and while you're doing that someone else commits then they can't use your value. You'll get a gap.
Sequences in databases like Oracle work much the same way. Once a particular value is requested, whether or not it's then committed doesn't matter. It'll never be reused. And sequences are loosely not absolutely ordered too.
It's pretty much expected behaviour. With out it the db would have to wait for each transaction that has inserted a record to complete before assigning a new id to the next insert.
Yes, this is expected behaviour. This documentation explains it very well.
Beginning with 5.1.22, there are actually three different lock modes that control how concurrent transactions get auto-increment values. But all three will cause gaps for rolled-back transactions (auto-increment values used by the rolled-back transaction will be thrown away).