Standard Compression Scheme for Unicode
The Standard Compression Scheme for Unicode (“SCSU”) is a text encoding finalized in Unicode Technical Specification #6.
There was no Python text codec support for SCSU, so I decided to write a module myself.
Installation
To install my SCSU module, install the package or clone the Git repository.
Usage
Import the module and use it like you would any other text codec.
import scsu
s = "Está es texto en español. これは日本語です。"
b = s.encode("SCSU")Command line
Input files can be specified on the command line, piped in, or omitted completely to read from stdin.
- To see all the options, use the
-hoption. - To see the module version, use the
-Voption.
By default, transcoding is done between SCSU and the codec that is returned with the locale.getpreferredencoding() function, and no signature byte string is inserted (when encoding) or removed (when decoding).
- The
-eoption specifies the source encoding. - The
-soption inserts a byte order mark when encoding and removes it when decoding. This is encoded as0x0E 0xFE 0xFFin SCSU.
Output is always written to stdout.
Encoding
Use the encode subcommand to transcode encoded text to SCSU.
python3 -m scsu encode -e UTF-8 -s utf8.txt > scsu.txtDecoding
Use the decode subcommand to transcode encoded text from SCSU.
python3 -m scsu decode -e UTF-8 -s scsu.txt > utf8.txt