Unicode and Multilingual Editors and Word Processors for Mac OS X
- How To Set Browser Encoding To Utf-8
- Convert Iso-8859-1 To Utf-8
- No Encoding Supplied: Defaulting To Utf-8
This happens because the file is not encoded in UTF-8 but in escaped Unicode, hence the reason why I am reading ' u00e7' as a string and not a Unicode character. Encoding and Decoding site. HTML Escape / URL Encoding / Base64 / MD5 / SHA-1 / CRC32 / and many other String, Number, DateTime, Color, Hash formats! I have to say that I've confused the community with the formulation of the question. When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user, in a fluent and transparent way.
Introduction
Mac OS X 10 did not originally include support for as many languages and scripts as Mac OS 9. Mac OS X 10.1 supported Central European, Cyrillic and Japanese, and Korean, Simplified Chinese and Traditional Chinese were made available as downloads. Mac OS X 10.2 introduced support for Arabic, Devanagari, Greek, Gujarati, Gurmukhi, Hebrew and Thai scripts. Mac OS X 10.3 introduced support for Armenian, Unified Canadian Aboriginal Syllabics and Cherokee scripts.
The Editors listed below are those that are available in versions designed for Mac OS X; other editors that are designed for Mac OS 9 can be used in Classic mode.
BBEdit
BBEdit is a text editor for OS X 10.3.9 or later that includes extensive support for producing HTML files and program code, as well as plain text files. It can edit text in several left-to-right languages and scripts, including double-byte scripts, and it supports the Mac’s Unicode keyboards. Older versions allowed only one font to be active at a time, and so only one non-Latin script plus unaccented Latin characters could be displayed properly simultaneously, but it can now display multiple scripts simultaneously. It can use any installed Web browser for WYSIWYG preview. Files that contain multiple scripts can be opened and saved with UTF-8 or UTF-16 character encoding.
BBEdit displaying multiple scripts simultaneously
(screen shot courtesy of Mark Garrett)
HTML tags and attributes can be typed directly, or selected from a floating palette or a menu, and are shown in user-selectable colours. BBEdit includes an HTML syntax checker, and a link checker for links within your site.
It is produced by Bare Bones Software, Inc. and costs US $199.00 plus shipping. A trial copy that can be used for 30 days is available.
jEdit
jEdit is a Unicode text editor that is written in Java and can run under Mac OS X, Linux and Windows. It can be used with any text file, but is intended for editing programming and markup languages, and has syntax colouring for over 60 of these, including HTML and XML. jEdit can open and save files with any encoding that is supported by Java, including UTF-8 and UTF-16. It can use any of the normal Mac OS X keyboards, but not the Unicode Hex Input keyboard.
A multi-script HTML document with UTF-8 encoding in jEdit
For multi-script documents, it is convenient to use a large Unicode font such as Arial Unicode MS. To change the default font:
- Click the jEdit title bar, to make sure that it is the current application.
- On the Utilities menu, select 'Global Options…'.
- In the Global Options dialog box, select 'Text Area' under jEdit Options.
- Click the font name in the box to the right of 'Text font:'.
- In the Font Selector dialog box, choose a 'Font family' (e.g. Arial Unicode MS), and optionally choose a font size and style.
- Click 'OK' to close the Font Selector dialog box.
- Click 'OK' to close the Global Options dialog box.
jEdit is produced by Slava Pestov and is freeware. For more information and to download the software, visit the jEdit - Open Source programmer's text editor Web site.
Mellel
Mellel is a Unicode-aware word processor that is designed for Mac OS X and supports many scripts and languages including Latin, Cyrillic, Greek, Arabic, Farsi, Hebrew, Chinese, Japanese and Korean. In addition to its native format, it can import and export RTF files (including multi-script files from Word for Windows) and plain text files with Mac, Windows and ISO encodings. It can use the normal Mac OS X keyboards and the Unicode Hex Input keyboard.
A multi-script document in Mellel
The program is still being developed, and future plans include HTML import and export.
Mellel is produced by RedleX and costs US $39; a free trial version is available. More information and downloads are available from the Welcome to RedleX - Creators of Mellel Web site. The optional downloads include Arabic and Hebrew keyboards and Persian fonts.
Mozilla Composer
The Composer component of Mozilla is a multilingual HTML editor that supports Unicode and can edit files in WYSIWYG, WYSIWYG plus tags and plain HTML modes. It supports Apple’s Unicode Hex Input and Extended Roman keyboards.
Mozilla Composer can produce files that include multiple scripts and languages, and it can save HTML files with UTF-8 character encoding.
By default, Mozilla Composer re-formats your HTML code to conform to its idea of good style. To turn off this option, so that HTML formatting is left alone:
- Click the Mozilla title bar to ensure that it is the current application.
- Click “Mozilla” on the menu bar at the top of the screen.
- Click “Preferences..” on the Mozilla menu.
- In the Preferences dialog box, click “Composer” in the list of categories.
- In the When Saving Files section, click the radio button for ”Retain original source formatting'.
- Click the “OK” button to close the Preferences dialog box.
Available only as part of Mozilla, which includes the Mozilla Navigator Web browser and can be downloaded free of charge from http://www.mozilla.org/releases/.
Netscape Composer 6.2
The Composer component of Netscape 6.2 is a multilingual HTML editor that supports Unicode and can edit files in WYSIWYG, WYSIWYG plus tags and plain HTML modes. It does not yet support Apple’s Unicode Hex Input and Extended Roman keyboards.
Composer 6.2 can produce files that include multiple scripts and languages, and it can save HTML files with UTF-8 character encoding.
By default, Netscape Composer re-formats your HTML code to conform to its idea of good style. To turn off this option, so that HTML formatting is left alone:
- Click the Mozilla title bar to ensure that it is the current application.
- Click 'Edit' on the menu bar at the top of the screen.
- Click 'Preferences..' on the Edit menu.
- In the Preferences dialog box, click 'Composer' in the list of categories.
- In the When Saving Files section, click the radio button for 'Retain original source formatting'.
- Click the 'OK' button to close the Preferences dialog box.
Available only as part of Netscape 6.2, which includes Netscape Navigator and can be downloaded free of charge from Netscape 6 Release.
Nisus Writer Express
Nisus Writer Express is a word processor for Mac OS X 10.3 or later. Its preferred file format is Rich Text Format (RTF), but it can also open and save as Rich Text Format Directory (RTFD), Microsoft Word, WordPerfect, AbiWord and HTML. It can open and save text files in UTF-8, UTF-16 and several other encodings. It supports all of the keyboards for left-to-right scripts, the IMEs for CJK, and Apple’s Unicode Hex Input keyboard driver, which allows you to enter any Unicode character by holding down the Options key while typing the 4-character hexadecimal character reference, e.g. 0E05 for the Thai character kho khon. From version 2.5, it supports editing of Arabic and Hebrew.
Unicode text displayed in Nisus Writer Express