Переход к Unicode для приложения, которое обрабатывает текстовые файлы

Как упоминает Соне, но я бы добавил:

$("img").hide();

$("img").bind("load", function () { $(this).fadeIn(); });

или:

$("img").bind("load", function () {
 $(this).css({opacity: 0, visibility: "visible"}).animate({opacity: 1}, "slow"); 
});
9
задан Argalatyr 17 June 2009 в 03:24
поделиться

4 ответа

There is no such thing as AnsiString output - every text file has a character encoding. The moment that your files contain characters outside of the ASCII range you have to think about encoding, as even loading those files in different countries will produce different results - unless you happen to be using a Unicode encoding.

If you load a text file you need to know which encoding it has. For formats like xml or html that information is part of the text, for Unicode there is the BOM, even though it isn't strictly necessary for UTF-8 encoded files.

Converting an application to Delphi 2009 is a chance to think about encoding of text files and correct past mistakes. Data files of an application do often have a longer life than the applications itself, so it pays to think about how to make them future-proof and universal. I would suggest to go UTF-8 as the text file encoding for all new applications, that way porting an application to different platforms is easy. UTF-8 is the best encoding for data exchange, and for characters in the ASCII or ISO8859-1 range it does also create much smaller files than UTF-16 or UTF-32 even.

If your data files contain only ASCII chars you are all set then, as they are valid UTF-8 encoded files then as well. If your data files are in ISO8859-1 encoding (or any other fixed encoding), then use the matching conversion while loading them into string lists and saving them back. If you don't know in advance what encoding they will have, ask the user upon loading, or provide an application setting for the default encoding.

Use Unicode strings internally. Depending on the amount of data you need to handle you might use UTF-8 encoded strings.

4
ответ дан 4 December 2019 в 19:35
поделиться

Я предлагаю использовать полный Unicode, если это стоит усилий и требований. И сохранение файлового ввода-вывода ANSI отдельно от остальных. Но это сильно зависит от вашего приложения.

4
ответ дан 4 December 2019 в 19:35
поделиться

You say:

"The app does some pretty heavy character-by-character analysis of string in objects descended from TList."

Since Windows runs Unicode natively, you may find your character analysis runs faster if you load the text file internally as Unicode.

On the other hand, if it is a large file, you will also find it takes twice as much memory.

For more about this, see Jan Goyvaert's article: "Speed Benefits of Using the Native Win32 String Type"

So it is a tradeoff you have to decide on.

3
ответ дан 4 December 2019 в 19:35
поделиться

If you are going to take Unicode input from the GUI, what's the strategy going to be for converting it to ASCII output? (This is an assumption as you mention writing Ansi text back out, assumedly for these non-Unicode based applications that you are not going to rewrite and assumedly don't have the sourcecode for.) I'd suggest staying with AnsiString throughout the app until these other apps are Unicode enabled. If your main job of your application is analyzing non-Unicode ASCII type files, then why switch to Unicode internally? If the main job of your application involves having a better Unicode enabled GUI then go Unicode. I don't believe that there's enough info presented to decide a proper choice.

If there is no chance for non-easily translatable characters to be written back out for these non-Unicode applications, then the suggestion for UTF-8 is the likely way to go. However, if there is a chance, then how are the non-Unicode applications going to handle multi-byte characters? How are you going to convert to (assumedly) the basic ASCII character set?

1
ответ дан 4 December 2019 в 19:35
поделиться
Другие вопросы по тегам:

Похожие вопросы: