Allegro peut manipuler et afficher du texte en utilisant des valeurs de caractères allant de 0 à 2^32-1 (bien que l'implémentation actuelle du grabber peut seulement créér des fontes utilisant au plus 2^16-1 caractères). Vous pouvez choisir entre plusieurs formats d'encodage des textes, qui contrôlent la façon dont les chaînes sont stockées et comment Allegro interprète les chaînes que vous lui passez. Ce paramétrage affecte tous les aspects du système : quand une fonction retourne un résultat de type char *, ou prend un char * en argument, ce texte sera dans le format que vous avez dit à Allegro d'utiliser.
By default, Allegro uses UTF-8 encoded text (U_UTF8). This is a variable-width format, where characters can occupy anywhere from one to six bytes. The nice thing about it is that characters ranging from 0-127 are encoded directly as themselves, so UTF-8 is upwardly compatible with 7 bit ASCII ("Hello, World!" means the same thing regardless of whether you interpret it as ASCII or UTF-8 data). Any character values above 128, such as accented vowels, the UK currency symbol, and Arabic or Chinese characters, will be encoded as a sequence of two or more bytes, each in the range 128-255. This means you will never get what looks like a 7 bit ASCII character as part of the encoding of a different character value, which makes it very easy to manipulate UTF-8 strings.
There are a few editing programs that understand UTF-8 format text files. Alternatively, you can write your strings in plain ASCII or 16 bit Unicode formats, and then use the Allegro textconv program to convert them into UTF-8.
If you prefer to use some other text format, you can set Allegro to work with normal 8 bit ASCII (U_ASCII), or 16 bit Unicode (U_UNICODE) instead, or you can provide some handler functions to make it support whatever other text encoding you like (for example it would be easy to add support for 32 bit UCS-4 characters, or the Chinese GB-code format).
There is some limited support for alternative 8 bit codepages, via the U_ASCII_CP mode. This is very slow, so you shouldn't use it for serious work, but it can be handy as an easy way to convert text between different codepages. By default the U_ASCII_CP mode is set up to reduce text to a clean 7 bit ASCII format, trying to replace any accented vowels with their simpler equivalents (this is used by the allegro_message() function when it needs to print an error report onto a text mode DOS screen). If you want to work with other codepages, you can do this by passing a character mapping table to the set_ucodepage() function.
void set_uformat(int type);
Sets the current text encoding format. This will affect all parts of
Allegro, wherever you see a function that returns a char *, or takes a
char * as a parameter. The type should be one of the values:
U_ASCII - fixed size, 8 bit ASCII characters U_ASCII_CP - alternative 8 bit codepage (see set_ucodepage()) U_UNICODE - fixed size, 16 bit Unicode characters U_UTF8 - variable size, UTF-8 format Unicode charactersAlthough you can change the text format on the fly, this is not a good idea. Many strings, for example the names of your hardware drivers and any language translations, are loaded when you call allegro_init(), so if you change the encoding format after this, they will be in the wrong format, and things will not work properly. Generally you should only call set_uformat() once, before allegro_init(), and then leave it on the same setting for the duration of your program.
int get_uformat(void);
Returns the currently selected text encoding format.
void register_uformat(int type,
int (*u_getc)(char *s),
int (*u_getx)(char **s),
int (*u_setc)(char *s, int c),
int (*u_width)(char *s),
int (*u_cwidth)(int c),
int (*u_isok)(int c));
Installs a set of custom handler functions for a new text encoding
format. The type is the ID code for your new format, which should be a
4-character string as produced by the AL_ID() macro, and which can later
be passed to functions like set_uformat() and uconvert(). The function
parameters are handlers that implement the character access for your new
type: see below for details of these.
void set_ucodepage(unsigned short *table, unsigned short *extras);
When you select the U_ASCII_CP encoding mode, a set of tables are used to
convert between 8 bit characters and their Unicode equivalents. You can
use this function to specify a custom set of mapping tables, which allows
you to support different 8 bit codepages. The table parameter points to
an array of 256 shorts, which contain the Unicode value for each
character in your codepage. The extras parameter, if not NULL, points to
a list of mapping pairs, which will be used when reducing Unicode data to
your codepage. Each pair consists of a Unicode value, followed by the way
it should be represented in your codepage. The table is terminated by a
zero Unicode value. This allows you to create a many->one mapping, where
many different Unicode characters can be represented by a single codepage
value (eg. for reducing accented vowels to 7 bit ASCII).
int need_uconvert(char *s, int type, int newtype);
Given a pointer to a string, a description of the type of the string, and
the type that you would like this string to be converted into, this
function tells you whether any conversion is required. No conversion will
be needed if type and newtype are the same, or if one type is ASCII, the
other is UTF-8, and the string contains only character values less than
128. As a convenience shortcut, you can pass the value U_CURRENT as
either of the type parameters, to represent whatever text format is
currently selected.
int uconvert_size(char *s, int type, int newtype);
Returns the number of bytes that will be required to store the specified
string after a conversion from type to newtype, including the zero
terminator. The type parameters can use the value U_CURRENT as a shortcut
to represent the currently selected encoding format.
void do_uconvert(char *s, int type, char *buf, int newtype, int size);
Converts the specified string from type to newtype, storing at most size
bytes into the output buf. The type parameters can use the value
U_CURRENT as a shortcut to represent the currently selected encoding
format.
char *uconvert(char *s, int type, char *buf, int newtype, int size);
Higher level function running on top of do_uconvert(). This function
converts the specified string from type to newtype, storing at most size
bytes into the output buf, but it checks before doing the conversion, and
doesn't bother if the string formats are already the same (either both
types are equal, or one is ASCII, the other is UTF-8, and the string
contains only 7 bit ASCII characters). If a conversion was performed it
returns a pointer to buf, otherwise it returns a copy of s, so you must
use the return value rather than assuming that the string will always be
moved to buf. As a convenience, if buf is NULL it will convert the string
into an internal static buffer. You should be wary of using this feature,
though, because that buffer will be overwritten the next time this
routine is called, so don't expect the data to persist across any other
library calls.
char *uconvert_ascii(char *s, char buf[]);
Helper macro for converting strings from ASCII into the current encoding
format. Expands to uconvert(s, U_ASCII, buf, U_CURRENT, sizeof(buf)).
char *uconvert_toascii(char *s, char buf[]);
Helper macro for converting strings from the current encoding format into
ASCII. Expands to uconvert(s, U_CURRENT, buf, U_ASCII, sizeof(buf)).
extern char empty_string[];
You can't just rely on "" to be a valid empty string in any encoding
format. This global buffer contains a number of consecutive zeros, so it
will be a valid empty string no matter whether the program is running in
ASCII, Unicode, or UTF-8 mode.
int ugetc(char *s);
Low level helper function for reading Unicode text data. Given a pointer
to a string in the current encoding format, it returns the next character
from the string.
int ugetx(char **s);
Low level helper function for reading Unicode text data. Given the
address of a pointer to a string in the current encoding format, it
returns the next character from the string, and advances the pointer to
the character after the one just read.
int usetc(char *s, int c);
Low level helper function for writing Unicode text data. It writes the
specified character to the given address in the current encoding format,
and returns the number of bytes written.
int uwidth(char *s);
Low level helper function for testing Unicode text data. It returns the
number of bytes occupied by the first character of the specified string,
in the current encoding format.
int ucwidth(int c);
Low level helper function for testing Unicode text data. It returns the
number of bytes that would be occupied by the specified character value,
when encoded in the current format.
int uisok(int c);
Low level helper function for testing Unicode text data. Tests whether
the specified value can be correctly encoded in the current format.
int uoffset(char *s, int index);
Returns the offset in bytes from the start of the string to the character
at the specified index. A zero index parameter will just return a copy of
s. If the index is negative, it counts backward from the end of the
string, so an index of -1 will return an offset to the last character.
int ugetat(char *s, int index);
Returns the character value at the specified index within the string. A
zero index parameter will return the first character of the string. If
the index is negative, it counts backward from the end of the string, so
an index of -1 will return the last character of the string.
int usetat(char *s, int index, int c);
Replaces the character at the specified index within the string with
value c, handling any adjustments for variable width data (ie. if c
encodes to a different width than the previous value at that location).
Returns the number of bytes by which the trailing part of the string was
moved. If the index is negative, it counts backward from the end of the
string.
int uinsert(char *s, int index, int c);
Inserts the character c at the specified index within the string, sliding
the rest of the data along to make room. Returns the number of bytes by
which the trailing part of the string was moved. If the index is
negative, it counts backward from the end of the string.
int uremove(char *s, int index);
Removes the character at the specified index within the string, sliding
the rest of the data back to fill the gap. Returns the number of bytes by
which the trailing part of the string was moved. If the index is
negative, it counts backward from the end of the string.
int ustrsize(char *s);
Returns the size of the specified string in bytes, not including the
trailing zero.
int ustrsizez(char *s);
Returns the size of the specified string in bytes, including the trailing
zero.
int utolower(int c);
int utoupper(int c);
int uisspace(int c);
int uisdigit(int c);
char * ustrdup(char *src)
char * ustrcpy(char *dest, char *src);
char * ustrcat(char *dest, char *src);
int ustrlen(char *s);
int ustrcmp(char *s1, char *s2);
char * ustrncpy(char *dest, char *src, int n);
char * ustrncat(char *dest, char *src, int n);
int ustrncmp(char *s1, char *s2, int n);
int ustricmp(char *s1, char *s2);
char * ustrlwr(char *s);
char * ustrupr(char *s);
char * ustrchr(char *s, int c);
char * ustrrchr(char *s, int c);
char * ustrstr(char *s1, char *s2);
char * ustrpbrk(char *s, char *set);
char * ustrtok(char *s, char *set);
double uatof(char *s);
long ustrtol(char *s, char **endp, int base);
double ustrtod(char *s, char **endp);
char * ustrerror(int err);
int uvsprintf(char *buf, char *format, va_list args);
int usprintf(char *buf, char *format, ...);
These all work like the equivalent ANSI C functions, but using whatever Unicode text format is currently selected. The size parameter to ustrncpy() and ustrncat() is given in bytes rather than characters (on the assumption that you will be using these routines to prevent overflowing the size of a memory buffer), while the size parameter to ustrncmp() is given in characters (because it doesn't make any sense for this to be in bytes). The usprintf() implementation complies with as much of the ANSI spec as I could remember when I wrote it, except that it doesn't support exponential notation for floating point values.