3.3. Locale-based character encoding

3.3. Locale-based character encoding
Prev	Chapter 3. Changing the behaviour of Hugs	Next

The Haskell 98 Report defines values of the Char type as the code points of Unicode (or equivalently ISO/IEC 10646). However files and other I/O streams typically consist of bytes, with characters in text files encoded as one or more bytes. In many systems, a similar encoding is also required for interactions with the system. Therefore at these points Hugs converts characters to and from sequences of bytes in a manner determined by the LC_CTYPE category of the current locale.

This conversion is not applied to the contents of files opened in binary mode. It is applied to program text, so you can use all the characters representable in your locale within comments and string literals. However only ISO Latin-1 characters are permitted in identifiers.

The form of the locale string, and how it is set, vary between systems.

On POSIX systems, this value is taken from the first nonempty environment variable from LC_ALL, LC_CTYPE and LANG.
On Windows, this value is the “user-default ANSI code page” (not the “current OEM code page” or the “ANSI code page”). This may be set using the General tab of the “Regional Options” control panel.

Prev	Up	Next
3.2. Environment variables used by Hugs	Home	3.4. Adding packages to a Hugs installation