Standard Compression Scheme for Unicode

The Standard Compression Scheme for Unicode (“SCSU”) is a text encoding finalized in Unicode Technical Specification #6.

There was no Python text codec support for SCSU, so I decided to write a module myself.

Installation

To install my SCSU module, install the package or clone the Git repository.

Usage

Import the module and use it like you would any other text codec.

import scsu

s = "Está es texto en español. これは日本語です。"
b = s.encode("SCSU")

Command line

Input files can be specified on the command line, piped in, or omitted completely to read from stdin.

To see all the options, use the -h option.
To see the module version, use the -V option.

By default, transcoding is done between SCSU and the codec that is returned with the locale.getpreferredencoding() function, and no signature byte string is inserted (when encoding) or removed (when decoding).

The -e option specifies the source encoding.
The -s option inserts a byte order mark when encoding and removes it when decoding. This is encoded as 0x0E 0xFE 0xFF in SCSU.

Output is always written to stdout.

Encoding

Use the encode subcommand to transcode encoded text to SCSU.

python3 -m scsu encode -e UTF-8 -s utf8.txt > scsu.txt

Decoding

Use the decode subcommand to transcode encoded text from SCSU.

python3 -m scsu decode -e UTF-8 -s scsu.txt > utf8.txt