Creating automatic Unicode VT340 translation

March 26, 2025 · View on GitHub

Should be possible using charmap, genlocale, gconv, etc.

These notes are a bit scattered as I learn more.

VT340 settings

  • The VT340 has four slots G0 to G3, to make it easy to swap between four different character sets.

  • Setting the User Preferred Character Set to "Latin 1" on the VT340 causes Latin-1 to be loaded into G2.

  • Using Latin-1 and setting LANG to en_US.iso8859-1 gets most of what I want.

  • There appear to be only four useful character sets builtin on my North American VT340+G2:

    1. US-ASCII
    2. Latin-1
    3. DEC Special Graphics (AKA VT100 box drawing)
    4. DEC Technical Character Set
  • Of those, US-ASCII and Latin-1 are already used well by setting the LANG environment variable. So, the questions are:

    1. what would be the benefit of enabling the other two?
    2. how would output (UTF8->VT340) work? charmap? luit? screen?
    3. how would input work? Can the VT340 input more than just Latin-1?
    4. how do I ensure the correct charsets are loaded before shifting them?
  • Also, my VT340 reports having two "Alternate Fonts" that are exactly the same as the standard ASCII. Are these the "soft fonts" or can I replace them in firmware?

  • Each login session can have a different soft font (AKA "DRCS", "downloadable replacement character set") which can define 96 new characters. For example, see the APL font. If DEC's APL font is representative, then the soft font will be loaded into G1.

charmap

A charmap file can be used to convert characters in a Unicode file to something that can be shown on the VT340. However, it does not seem that one can use the same file for interactive use; one must instead compile a gconv module.

Example charmap file: decapl

Here is an example charmap file which converts APL symbols, like ⍋, into a byte, such as 0xE6. It presumes the VT340 has the APL softfont loaded into Graphic Right, which is what happens when an APL font file is cated to the screen.

<code_set_name> DECAPL <comment_char> % <escape_char> /

% decapl.charmap: Hackerb9's charmap(5) for APL on the VT340. % alias DEC Vax APL

% This is useful with iconv to convert to and from the DEC's peculiar % 8-bit encoding for the APL character set. Does not (yet) work % interactively. See bottom of file for comments.

CHARMAP /x00 NULL /x01 START OF HEADING /x02 START OF TEXT /x03 END OF TEXT /x04 END OF TRANSMISSION /x05 ENQUIRY /x06 ACKNOWLEDGE /x07 BELL /x08 BACKSPACE /x09 HORIZONTAL TABULATION /x0a LINE FEED /x0b VERTICAL TABULATION /x0c FORM FEED /x0d CARRIAGE RETURN /x0e SHIFT OUT /x0f SHIFT IN /x10 DATA LINK ESCAPE /x11 DEVICE CONTROL ONE /x12 DEVICE CONTROL TWO /x13 DEVICE CONTROL THREE /x14 DEVICE CONTROL FOUR /x15 NEGATIVE ACKNOWLEDGE /x16 SYNCHRONOUS IDLE /x17 END OF TRANSMISSION BLOCK /x18 CANCEL /x19 END OF MEDIUM /x1a SUBSTITUTE /x1b ESCAPE /x1c FILE SEPARATOR /x1d GROUP SEPARATOR /x1e RECORD SEPARATOR /x1f UNIT SEPARATOR /x20 SPACE /x21 Shriek /x22 QUOTATION MARK /x23 NUMBER SIGN /x24 DOLLAR SIGN /x25 PERCENT SIGN /x26 AMPERSAND /x27 APOSTROPHE /x28 LEFT PARENTHESIS /x29 RIGHT PARENTHESIS /x2a Star /x2b Plus /x2c COMMA /x2d Minus /x2e Dot /x2f Slash /x30 DIGIT ZERO /x31 DIGIT ONE /x32 DIGIT TWO /x33 DIGIT THREE /x34 DIGIT FOUR /x35 DIGIT FIVE /x36 DIGIT SIX /x37 DIGIT SEVEN /x38 DIGIT EIGHT /x39 DIGIT NINE /x3a COLON /x3b SEMICOLON /x3c LESS-THAN SIGN /x3d EQUALS SIGN /x3e GREATER-THAN SIGN /x3f Question MARK /x40 At /x41 LATIN CAPITAL LETTER A /x42 LATIN CAPITAL LETTER B /x43 LATIN CAPITAL LETTER C /x44 LATIN CAPITAL LETTER D /x45 LATIN CAPITAL LETTER E /x46 LATIN CAPITAL LETTER F /x47 LATIN CAPITAL LETTER G /x48 LATIN CAPITAL LETTER H /x49 LATIN CAPITAL LETTER I /x4a LATIN CAPITAL LETTER J /x4b LATIN CAPITAL LETTER K /x4c LATIN CAPITAL LETTER L /x4d LATIN CAPITAL LETTER M /x4e LATIN CAPITAL LETTER N /x4f LATIN CAPITAL LETTER O /x50 LATIN CAPITAL LETTER P /x51 LATIN CAPITAL LETTER Q /x52 LATIN CAPITAL LETTER R /x53 LATIN CAPITAL LETTER S /x54 LATIN CAPITAL LETTER T /x55 LATIN CAPITAL LETTER U /x56 LATIN CAPITAL LETTER V /x57 LATIN CAPITAL LETTER W /x58 LATIN CAPITAL LETTER X /x59 LATIN CAPITAL LETTER Y /x5a LATIN CAPITAL LETTER Z /x5b LEFT SQUARE BRACKET /x5c Slope /x5d RIGHT SQUARE BRACKET /x5e CIRCUMFLEX ACCENT /x5f LOW LINE /x60 GRAVE ACCENT /x61 LATIN SMALL LETTER A /x62 LATIN SMALL LETTER B /x63 LATIN SMALL LETTER C /x64 LATIN SMALL LETTER D /x65 LATIN SMALL LETTER E /x66 LATIN SMALL LETTER F /x67 LATIN SMALL LETTER G /x68 LATIN SMALL LETTER H /x69 LATIN SMALL LETTER I /x6a LATIN SMALL LETTER J /x6b LATIN SMALL LETTER K /x6c LATIN SMALL LETTER L /x6d LATIN SMALL LETTER M /x6e LATIN SMALL LETTER N /x6f LATIN SMALL LETTER O /x70 LATIN SMALL LETTER P /x71 LATIN SMALL LETTER Q /x72 LATIN SMALL LETTER R /x73 LATIN SMALL LETTER S /x74 LATIN SMALL LETTER T /x75 LATIN SMALL LETTER U /x76 LATIN SMALL LETTER V /x77 LATIN SMALL LETTER W /x78 LATIN SMALL LETTER X /x79 LATIN SMALL LETTER Y /x7a LATIN SMALL LETTER Z /x7b LEFT CURLY BRACKET /x7c VERTICAL LINE /x7d RIGHT CURLY BRACKET /x7e TILDE /x7f DELETE /x80 PADDING CHARACTER (PAD) /x81 HIGH OCTET PRESET (HOP) /x82 BREAK PERMITTED HERE (BPH) /x83 NO BREAK HERE (NBH) /x84 INDEX (IND) /x85 NEXT LINE (NEL) /x86 START OF SELECTED AREA (SSA) /x87 END OF SELECTED AREA (ESA) /x88 CHARACTER TABULATION SET (HTS) /x89 CHARACTER TABULATION WITH JUSTIFICATION (HTJ) /x8a LINE TABULATION SET (VTS) /x8b PARTIAL LINE FORWARD (PLD) /x8c PARTIAL LINE BACKWARD (PLU) /x8d REVERSE LINE FEED (RI) /x8e SINGLE-SHIFT TWO (SS2) /x8f SINGLE-SHIFT THREE (SS3) /x90 DEVICE CONTROL STRING (DCS) /x91 PRIVATE USE ONE (PU1) /x92 PRIVATE USE TWO (PU2) /x93 SET TRANSMIT STATE (STS) /x94 CANCEL CHARACTER (CCH) /x95 MESSAGE WAITING (MW) /x96 START OF GUARDED AREA (SPA) /x97 END OF GUARDED AREA (EPA) /x98 START OF STRING (SOS) /x99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) /x9a SINGLE CHARACTER INTRODUCER (SCI) /x9b CONTROL SEQUENCE INTRODUCER (CSI) /x9c STRING TERMINATOR (ST) /x9d OPERATING SYSTEM COMMAND (OSC) /x9e PRIVACY MESSAGE (PM) /x9f APPLICATION PROGRAM COMMAND (APC) /xa0 (Undefined) /xa1 Diaeresis /xa2 Less than or Equal to /xa3 Or /xa4 And /xa5 Not Equal To /xa6 Divide /xa7 Times /xa8 High Minus /xa9 Alpha /xaa Up Tack /xab Intersection /xac Downstile /xad Epsilon /xae Del /xaf Delta /xb0 Iota /xb1 Jot /xb2 Quad /xb3 Down Tack /xb4 Circle /xb5 Rho /xb6 Upstile /xb7 Down Arrow /xb8 Union /xb9 Omega /xba Right shoe /xbb Left shoe /xbc Gets /xbd Right Tack /xbe Goto /xbf Greater than or Equal to /xc0 Diamond /xc1 Left Tack /xc2 Delta Underbar % Is this how modern APL emits the underscore characters? % Mathematical Italic Capital letters (+ combining low line (U+0332))? /xc3 A - Alphabet is not right, should have underscore /xc4 B /xc5 C /xc6 D /xc7 E /xc8 F /xc9 G /xca H /xcb I /xcc J /xcd K /xce L /xcf M /xd0 N /xd1 O /xd2 P /xd3 Q /xd4 R /xd5 S /xd6 T /xd7 U /xd8 V /xd9 W /xda X /xdb Y /xdc Z /xdd Lamp /xde I-Beam /xdf Hydrant /xe0 Thorn /xe1 Domino /xe2 Quad Left Arrow /xe3 Quad Right Arrow /xe4 Quote Quad /xe5 Quad Down Caret /xe6 Grade Up /xe7 Grade Down /xe8 Del Tilde /xe9 Nor /xea Nand /xeb Log /xec Circle Bar /xed Transpose /xee Circle Stile /xef Comma Bar /xf0 Slash Bar /xf1 Slope Bar /xf2 Left Shoe Underbar /xf3 Right Shoe Underbar /xf4 Equal Underbar /xf5 Up Arrow /xf6 Squad /xf7 Squad /xf8 Squad /xf9 Squad /xfa Squad /xfb Squad /xfc Squad % Did /xfd, the APL OUT character, never get added to Unicode? It is % specified in IBM's APL Codepage, so this isn't just some random % thing DEC added. /xfd OUT - no matching unicode, substituting EOT /xfe Squad /xff (Undefined) END CHARMAP

% Characters mentioned in Dyalog's help manual but not % available in DEC's VT340 font. % ≢ U+2262 Equal Underbar Slash (NOT IDENTICAL TO) % ⍷ U+2377 Epsilon Underbar % ⍸ U+2378 Iota Underbar % ⍬ U+236C Zilde % ⍨ U+2368 Tilde Diaeresis % ⍣ U+2363 Star Diaeresis % ⍠ U+2360 Variant % ⌸ U+2338 Quad Equal % ⌺ U+233A Quad Diamond % ⍤ U+2364 Jot Diaeresis % ⍥ U+2365 Circle Diaeresis

% See: % http://help.dyalog.com/latest/index.htm#Language/Introduction/Language%20Elements.htm

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Example: converting from DEC's APL to Unicode UTF16 % % # NB: the slash is important to iconv to know this is a charmap file! % echo -n \xe1iconvf./charmap.apltUTF16BEhd'\xe1' | iconv -f ./charmap.apl -t UTF16BE | hd % % (Output shows U+2339 is correctly emitted when input is 0xE1). % % Example: converting from UTF-8 to DEC's APL % % echo -n '\xe2\x8c\xb9' | iconv -f utf-8 -t ./charmap.apl | hd % % (Output shows that 0xE1 is correctly emitted). % % Example: Transforming a file which is in the DEC APL codeset into % something visible on modern machines. % % iconv -f ./charmap.apl <SOMEANCIENTVAXFILE.APL >freshnewfile.apl %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Reminder to self from hackerb9: % charmap files do not do what I think they ought to do. % % They can be used to convert files, but a charmap file is NOT what is % needed to show APL on a VT340 when running a version of apl that % outputs Unicode. For that, I'll need to build a gconv module, which % is much ickier.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Note: One could compile this into the system locale definitions % and set LANG=en_US.APL, but there is no point until the matching % gconv module exists. Many programs would abort, with errors like, % "iconv: conversion to APL//TRANSLIT is not supported". Some versions % of Python will even dump core.

% Nevertheless, for future reference, here is how it is done: % % mkdir -p foo/usr/lib/locale/ % localedef -f decapl.charmap -i en_US --no-archive --prefix=foo en_US.decapl % % Test it out, by setting the LOCPATH and LANG/LC_ALL environment variables: % % LOCPATH=pwd/foo/usr/lib/locale/ LC_ALL=en_US.decapl locale charmap % % To install in the system directory, remove the --prefix option.

Charmap Usage

A charmap file can be used by iconv(1) to convert files from one character set to another as long as it is specified with at least one slash to indicate it is a filename. (E.g., ./decapl.charmap).

  • Example: converting from DEC's APL to Unicode UTF16

      # NB: the slash is important to iconv to know this is a charmap file!
      echo -n $'\xe1' | iconv -f ./decapl.charmap -t UTF16BE | hd
    

    (Output shows U+2339 is correctly emitted when input is 0xE1).

  • Example: converting from UTF-8 to DEC's APL

      echo -n $'\xe2\x8c\xb9' | iconv -f utf-8 -t ./decapl.charmap | hd
    

    (Output shows that 0xE1 is correctly emitted).

  • Example: Transforming a file which is in the DEC APL codeset into something visible on modern machines.

      iconv -f ./decapl.charmap <SOMEANCIENTVAXFILE.APL >freshnewfile.apl
    

"Compiling" charmap using localedef

The benefit of compiling with localedef is that (theoretically) the charmap can be used interactively (by setting LANG) instead of having to run iconv.

Note: Whie one could compile this into the system locale definitions and set LANG=en_US.DECAPL, there is no point until the matching gconv module exists. Many programs would abort, with errors like, "iconv: conversion to APL//TRANSLIT is not supported". Some versions of Python will even dump core.

This page does not document how to create a gconv module. Nevertheless, for future reference, here is how it is done:

mkdir -p foo/usr/lib/locale/ localedef -f decapl.charmap -i en_US --no-archive --prefix=foo en_US.decapl

Test it out, by setting the LOCPATH and LANG/LC_ALL environment variables:

LOCPATH=pwd/foo/usr/lib/locale/ LC_ALL=en_US.decapl locale charmap

To install in the system directory, remove the --prefix option.