Standard Compression Scheme for Unicode
The Standard Compression Scheme for Unicode (“SCSU”) is a text encoding finalized in Unicode Technical Specification #6.
There was no Python text codec support for SCSU, so I decided to write a module myself.
Installation
To install my SCSU module, install the package or clone the Git repository.
Usage
Import the module and use it like you would any other text codec.
import scsu
s = "Está es texto en español. これは日本語です。"
b = s.encode("SCSU")
Command line
Input files can be specified on the command line, piped in, or omitted completely to read from stdin
.
- To see all the options, use the
-h
option. - To see the module version, use the
-V
option.
By default, transcoding is done between SCSU and the codec that is returned with the locale.getpreferredencoding()
function, and no signature byte string is inserted (when encoding) or removed (when decoding).
- The
-e
option specifies the source encoding. - The
-s
option inserts a byte order mark when encoding and removes it when decoding. This is encoded as0x0E 0xFE 0xFF
in SCSU.
Output is always written to stdout
.
Encoding
Use the encode
subcommand to transcode encoded text to SCSU.
python3 -m scsu encode -e UTF-8 -s utf8.txt > scsu.txt
Decoding
Use the decode
subcommand to transcode encoded text from SCSU.
python3 -m scsu decode -e UTF-8 -s scsu.txt > utf8.txt