I18n Notes
We have a series of forms we are developing for our foreign language customers that has sent me into many a forum looking for suggestions and tips for display, encoding, and so forth. Here are my notes on it and resources for developing i18n (cool acronym for internationalization
forms in Coldfusion and Apache.
Here are some notes and resources:
IANA/Unicode
Here is a list of all of the languages and their character sets:
www.iana.org/assignments/character-sets
Here is the unicode site that has detailed explanation of the UTF-8 character set, which is what most people use for i18n character encoding:
unicode.org
Apache 2.0.x
There is a directive in httpd.conf that you can set the default encoding of all the pages:
“AddDefaultCharset UTF-8″
Apache can do some amazing things with character translations and extensions, like serving up pages based on a browser header’s default character encoding using mod_mime:
Apache 2 mod_mime documentation
Arabic
One of the languages we are developing in is Arabic, where users read right-to-left. To an HTML table, you can set the table to display right-to-left using the following:
<table border=”0″ cellpadding=”3″ cellspacing=”0″ dir=”rtl”>
Then, if you want a section of the text to still display left-to-right, like a product name, you can do this in the text:
<bdo dir=”ltr”>Google</bdo>
You can also simplify all of this with style sheets (of course):
W3C I18N with XHTML/CSS
All ColdFusion pages
For all pages, I used the following code to encode the form data and the page using Unicode (utf-8).
<!— Set encoding —>
<cfprocessingDirective pageencoding=”utf-8″>
<cfset setEncoding(”form”,”utf-8″)>
Here is the section of the Coldfusion documentation that talks about this stuff:
Coldfusion 6.1 livedocs