TERMINAL GRAPHICS FOR UNICODE Frank da Cruz The Kermit Project Columbia University New York City USA fdc@columbia.edu http://www.columbia.edu/kermit/ D R A F T # 2 Wed Oct 7 18:32:05 1998 THIS IS A PREFORMATTED PLAIN-TEXT ASCII DOCUMENT. IT IS DESIGNED TO BE VIEWED AS-IS IN A FIXED-PITCH FONT. ITS WIDEST LINE IS 79 COLUMNS. IT CONTAINS NO TABS. IF IT LOOKS MESSY TO YOU, PLEASE FEEL FREE TO PICK UP A CLEAN COPY AT: ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt Previous drafts are at: ftp://kermit.columbia.edu/kermit/charsets/ucsterminal_nn.txt where nn is the draft number, e.g. "01". ABSTRACT A selection of terminal graphics characters is proposed for Unicode [24] and ISO 10646 [19] to allow Unicode-based terminal emulation software to (a) display glyphs that are found on popular types of terminals but currently are not available in Unicode, (b) debug terminal and other data streams, and (c) interoperate with other Unicode applications. CONTENTS 1. Introduction 2. Scope 3. Organization 4. Hex Bytes 5. Graphic Representation of Control Characters 6. Math Symbols 7. Line and Box Drawing Characters 8. Unfinished Business 9. Summary of Proposed Additional Characters 10. References Tables: 5.0. Unicode Control Characters 5.1. C0 Control Characters 5.2. C1 Control Characters 5.3. EBCDIC Control Characters 5.3A. Obsolete EBCDIC Control Characters 5.4. 3270 Control Characters 5.5. 3270 Terminal Operator Status Indicators 5.6. Additional Control-Like Pictures 6.1. Math Symbols for Terminals 7.1. Additional Line, Box, and Block Characters 9.1. Census of New Characters Figures: 4.1. Control Picture Display 5.1. Hex Byte Pictures 5.5. Connected Rectangles 7.1. "Framus" Glyphs Grateful acknowledgements to those whose comments on the first draft are reflected in the second: Kevin Bracey, Asmus Freytag, Tony Harminc, Elliotte Rusty Harold, Paul Keinanen, Karlsson Kent, Rick McGowan, Kenneth Whistler. 1. INTRODUCTION Terminal-host communication was the dominant form of interaction between human and computer from about 1974 (when CRTs became affordable)(1) to about 1994 (when the Web and Windows took over the mass market). Terminal-host communication is still widespread, especially in large organizations, and is expected to remain so for decades to come, playing an important part in organizations like universities, hospitals, government agencies, and corporations with central computing facilities, for use in applications ranging from sofware development and system/network administration, to email and text-based Web access, to data entry and inquiry, to transaction processing. A terminal, for purposes of this document, is a device for entry and display of text in a fixed-pitch font on a screen (or on paper) in which graphic characters are displayed as glyph images in rows and columns of fixed size "cells", one glyph image per cell. Terminals generally display (or otherwise handle) the characters of ASCII [1] or EBCDIC [13], and often also accented or non-Roman letters (or ideograms), and often also "graphic" (2) (non-alphabetic, non-digit, non-punctuation) characters for purposes of line- and box-drawing, mathematics, or other special effects. In recent years, physical terminals have largely disappeared from the scene, their functions subsumed into PCs running terminal-emulation software alongside other applications. Unicode has effectively met the need for encoding the earth's writing systems, but it is not well suited to terminal emulation since it lacks some of the required graphics characters. Without a standard encoding for the missing glyphs, each maker of terminal emulation software must create or contract for custom fonts with private encodings. Such fonts are not compatible with other (otherwise compatible) fonts on the same platform (e.g. when copying from a terminal window and pasting to a word processor), nor with each other. Furthermore, should Unicode printers become standard equipment on PCs, terminal graphics characters will not print correctly on them. Meanwhile, in the interest of "show[ing] the presence of ... control codes and the SPACE unequivocally when data is displayed" [24,p.6-84], Unicode includes a selection of control pictures. Makers (and supportors, and users) of terminal emulators and most other types of software could use this feature of Unicode to better advantage if it were extended to cover a greater portion of the "control space", or even to allow pictorial representation of any code at all. This document proposes a repertoire of terminal graphics and debugging characters to be added to Unicode and ISO 10646 to which all makers of fonts, code pages, and printers can refer when designing their products, and upon which all makers of terminal emulation and/or debugging software can base their screen displays. Notes: (1) Strictly speaking, terminals predate electronic computers by some decades; the Teletype (used as the control terminal on many mainframes and most minicomputers in the 1950s through 1970s) dates back to 1929. (2) Note the distinction between "graphic" meaning "printing" (as in "ISO 8859-1 is a graphic character set") versus "graphics" meaning having something to do with pictures. 2. SCOPE This document represents a survey of the following terminals: Digital Equipment Corporation VT100 through VT520 [3-9] Heath / Zenith 19 [10] Hewlett Packard HP-2621 and HP-2648 [11,12]] IBM 3164 and 3270 [15,16,27] Siemens Nixdorf 97801 [21] Televideo 922 and 965 [22,23] Wyse 60 and 370 [25,26] as well as: IBM PC code page 437 [14] which is the basis for numerous PC-oriented so-called ANSI emulations. 2.1. Problems Even within this fairly narrow scope, arriving at a sufficient set of character-cell terminal graphics for Unicode is complicated by the well-known problems that affect other preexisting character sets to varying degrees: 1. Lack of official names for the characters of some of the sets. 2. Lack of definitive, high-quality pictures of the glyphs in some cases. 3. Lack of descriptions of the purpose and intended use of the glyphs. 4. Lack of a current registration authority or owner in some cases. 5. Questions of unification of glyphs from different terminal makers. 6. End-user demand for specific characters or sets. The issue of unification is complicated by the fact that some of the terminal graphics characters are designed to join at cell boundaries to form "pictures" (such as boxes or forms to be filled out) or large characters (such as big math symbols) spanning multiple rows and/or columns. The relationship of similar-looking glyphs for different terminals is difficult to determine -- e.g. exactly where does a line touch an edge, and at what angle, and does it make a difference? 2.2. What This Proposal Does Not Contain This proposal does not require any action for well-known terminal presentation forms such as double-high and/or double-wide characters, bold, blinking, inverse, underlining, color, etc, since these are not encoding issues. In particular, no special code points are needed for double-high or double-wide characters, such as those seen on the DEC VT100 family of terminals, nor for compressed characters as seen on Data General and DEC terminals. This proposal also does not cover true graphics terminals, such as Tektronix vector graphics units, DEC ReGIS or Sixel graphics, etc, since these graphics regimes are not character-cell based. No attempt was made to account for the many Viewdata, Videotex, Teletex, Minitel, NAPLPS, or other mosaic graphics character sets. These should be tackled, if at all, by someone who knows something about them. Note that the graphic characters listed in this proposal rarely, if ever, appear on keyboard key labels. In general, these characters are never typed, not even on real terminals, but are displayed when the terminal is commanded into a special mode; for example, with ISO 2022 [17] character-set designation and invocation escape sequences. 3. ORGANIZATION This proposal groups terminal graphic characters into four major categories. Some categories are complete by definition (e.g. the 2-nibble hex codes, of which there can be only 256), but others should include space for expansion as new glyphs are discovered or needed. The categories are: Debugging Tools Graphical single-cell representation of Unicode, C0, C1, EBCDIC, and other control characters; hexadecimal dumps of terminal traffic: Sections 4 and 5. Math Symbols Although most math symbols found on terminals are already in Unicode, certain terminal-based applications rely on the ability to construct large symbols (integral and summation signs, braces, brackets) from smaller character-cell-sized pieces. Section 6. Line, Box, and Block Drawing Used for data entry, transaction processing, forms filling, etc, in markets ranging from car rental and airline reservations, to 911 operators, to medical information systems, to online library catalogs. Although Unicode does include a basic set (mainly those as U+2500), some others are missing. Section 7. Each category is important for terminal emulation, but the categories can be considered separately. The debugging tools category is not specific to terminal emulation, but can be used with a wide variety of applications: file analyzers, data or protocol analyzers, or for debugging of Web pages, word processor documents, etc. 3.1. Temporary Reference Code Assignments The characters proposed in this document are assigned temporary Unicode values from the Private Use area, strictly for reference within (or to) this document only. Final values should be assigned out of the Private Use range. The temporary allocations are: E000-E08F Control Pictures E0A0-E0B8 Math Symbols E0D0-E0EF Line and Box Drawing E100-E1FF Hex Bytes For a total of 512 positions, not fully populated. Obviously the final counts, code values, and block allocations, including reserved positions, are likely to change as this proposal evolves. 3.1. Character Properties All new characters proposed in this document should be precomposed, since no terminals (with the exception of certain APL and ALA terminals) are capable of composing characters on the fly from nonspacing diacritics or by overstriking. All proposed characters have Combining Class 0 (although some of the characters are designed to "combine" (connect) with other characters in adjacent cells). No "Letter" characters are proposed, therefore none of the proposed additions has the Case property. All proposed characters are strong left-to-right as to directionality, the same as existing characters in the same categories (box drawing, control pictures, etc). None of the proposed characters has the Numeric Value Property, although it might be tempting to assign it to Hex Bytes (see Section 4, Note 4). Many of the proposed box-drawing and math-technical characters have the Mirrored Property; this should be rather obvious when its name or description contains the word "left", "right", "top", or "bottom". I would venture that the proposed math symbols would have the Mathematical Property, including the extensible ones, since the current Integral Top and Bottom at U+2320, U+2321 have this property [24,Section 1.9]. 4. HEX BYTES Hexadecimal byte values, 2 hex digits each, allow any 8-bit byte to be displayed in hexadecimal in a single character cell (and therefore allow any Unicode character value to be displayed in two cells), for hex debugging in terminal emulators, line monitors, protocol analyzers, word processors, "dump" programs, Web browsers, etc. To prevent cell-boundary ambiguity, the font designer should employ some visual device to bind the two hex digits together in an unmistakable way, for example by arranging them diagonally within the character cell as shown in Figure 5.1: Figure 4.1: Hex Byte Pictures +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+--+ |0 | |0 | |0 | ... |0 | |1 | |1 | |1 | ... |E | |F | ... |F |F | | 1| | 2| | 3| | F| | 0| | 1| | 2| | F| | 0| | E| F| +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+--+ One glyph is required for each hex byte code 00 through FF, or 256 glyphs in all, as shown in Table 4.1, in which the "Code" column shows the temporary reference value for this document. Ideally, however, the final 8 bits of the actual code would correspond to the 8-bit value represented by the corresponding glyph. Table 4.1: Hex Byte Characters Code Byte Description E100 00 Symbol for Hex Byte 00 E101 01 Symbol for Hex Byte 01 : : : E1FF FF Symbol for Hex Byte FF Notes: (1) The proposal for hex byte symbols is independent of the other proposals within this document; however, several hex byte symbols are required for C1 control pictures (Section 5.2) in any case. (2) The SNI "IBM" character set [21] contains glyphs for 01 through 1F, which are shown sideways (rather than upright diagonal). I see no reason to encode these separately; others might disagree. (3) Hex bytes values can collide with control-character names: FF, D1, D2, D3, D4, etc (Section 5). If both hex bytes and control pictures are implemented, the font designer should ensure they are distinct enough visually that they will not be confused. (4) Should these symbols have the Numeric Value Property? I think not, since, unlike digits, Roman numerals, etc, they are not normally used as numbers, nor to write numbers. Summary: 256 new characters, U+E100 through U+E1FF. 5. GRAPHIC REPRESENTATION OF CONTROL CHARACTERS Digital VT220 and higher terminals, as well as Televideo, Wyse, HP, Perkin Elmer, and other models, allow the user to select whether control characters are acted upon or displayed graphically. Unicode itself includes its own "control characters" such as line and page separators, directionality controls, etc. Normally control characters are used to affect the format and presentation of glyphs on the screen. In "display controls", "transparent", or "debug" mode (the terminology varies with the terminal vendor), control characters are shown graphically rather than performing their normal functions; this allows analysis and debugging of the host-terminal data stream using a terminal, emulator, protocol analyzer, or line monitor. It also allows a more readable form of file dumping and analysis. A block of control pictures is already found in Unicode at U+2400, but: a. The illustrations in the Unicode book do not look like the control pictures that are actually used on terminals; b. They are for C0 only; there is no corresponding set of C1 control pictures; c. There are no pictures for the control characters unique to EBCDIC. d. Certain other terminal-specific control pictures are missing. A control picture allows the user to unequivocally determine the identity and position of control characters in the data stream by displaying each control chraracter as a unique (and mnemonic) glyph in a single terminal screen cell. Terminals do this by arranging the letters (or letter-digit combinations) of the official abbreviation for the control character in diagonally from upper left to lower right, as shown in Figure 5.1. Figure 5.1: Control Picture Display +---+ +---+ |L | |D | (except the two-character abbreviation appears on the | | | C | screen with the characters closer together) | F| | 1| +---+ +---+ The Unicode illustration for control pictures at U+2400, however, depicts the abbreviations horizontally. While the description of this block [24,p.6-84] states that "only the semantic is encoded... a particular application [can] use the graphic representation it prefers," a horizontal arrangement is chosen in the illustration (on p.7-188) for all characters except NL. But if they are implemented this way in a real font, it would be very difficult for the user to discern the boundary between one control picture and the next. It is suggested, therefore, that that next edition of the Unicode Standard illustrate these characters with the diagonal representation shown in Figure 5.1 (and in ISO 10646 [19]), since it is more likely that Unicode font designers will follow the illustrations in the Unicode Standard than attempt to procure the actual terminals or manuals to see how they do it. 5.0. Unicode Control Pictures Table 5.0 lists the nonprinting Unicode characters used for spacing, directionality control, and general formatting. These characters are in the U+2000 block, and are indicated by mnemonics inside broken-line squares. The Code column contains the temporary code value for the proposed symbol. The Val column contains the Unicode value of the character for which the symbolic representation is proposed. The Name column contains the desginator shown in the broken-line square in the Unicode code table, with a space standing for a line break (but see Note 2). The suggested glyphs are those shown in the Unicode Standard. Table 5.0: Unicode Control Characters Code Val Name Description E000 2000 NQ SP Symbol for En Quad E001 2001 MQ SP Symbol for Em Quad E002 2002 EN SP Symbol for En Space E003 2003 EM SP Symbol for Em Space E004 2004 3/M SP Symbol for Three-Per-Em-Space E005 2005 4/M SP Symbol for Four-Per-Em-Space E006 2006 6/M SP Symbol for Six-Per-Em-Space E007 2007 F SP Symbol for Figure Space E008 2008 P SP Symbol for Punctuation Space E009 2009 TH SP Symbol for Thin Space E00A 200A H SP Symbol for Hair Space E00B 200B ZW SP Symbol for Zero-Width Space E00C 200C ZW NJ Symbol for Zero-Width Non-Joiner E00D 200D ZW J Symbol for Zero-Width Joiner E00E 200E LRM Symbol for Left-to-Right Mark E00F 200F RLM Symbol for Right-to-Left Mark E010 2028 L SEP Symbol for Line Separator E011 2029 P SEP Symbol for Paragraph Separator E012 202A LRE Symbol for Left-to-Right Embedding E013 202B RLE Symbol for Right-to-Left Embedding E014 202C PDF Symbol for Pop Directional Formatting E015 202D LRO Symbol for Left-to-Right Override E016 202E RLO Symbol for Right-to-Left Override E017 206A I SS Symbol for Inhibit Symmetric Swapping E018 206B A SS Symbol for Activate Symmetric Swapping E019 206C I AFS Symbol for Inhibit Arabic Form Shaping E01A 206D A AFS Symbol for Activate Arabic Form Shaping E01B 206E NA DS Symbol for National Digit Shapes E01C 206F NO DS Symbol for Nominal Digit Shapes E01D FEFF ZWN BSP Symbol for Zero Width No Break Space E01E FFFE FF FE Symbol for Not A Character (Byte Order) (2) E01F FFFE FF FF Symbol for Not A Character (2) Notes: (1) There is no known need for these symbols when emulating current terminals. In the future, if/when terminals are based on Unicode, they might be useful in that context. In the meantime, makers of word processors, Web browsers, etc, might have a use for these glyphs. (2) No mnemonic or abbreviation is given for this "not-a-character" in the Unicode Standard. Summary: 32 characters, E0000-E01F. 5.1. C0 Control Pictures Table 5.1 lists the C0 Control Characters from the ASCII Standard [1] (and also in ISO 646 and ISO 6429). Each C0 control character has an official designator (from the appropriate ANSI [1] or ISO [18] standard): a 2- or 3-character sequence of (ASCII) alphanumeric characters. In some terminals, such as the DEC VT220 family [5], the control picture shows the designation in full. In others, such as Televideo [22,23], HP [11], and Perkin Elmer [20], each 3-character designator is replaced by a 2-character short form. The columns are as follows: Code: The Unicode value in hexadecimal. Val: The value of the control character's code in hexadecimal. Name: The full ASCII abbreviation for the control character's name. 2X: The 2-character abbreviation used on Televideo, HP, etc. Description: "Symbol for" followed by the character's standard name. Table 5.1: C0 Control Characters Code Val Name 2X Description 2400 00 NUL NU Symbol for Null 2401 01 SOH SH Symbol for Start of Heading 2402 02 STX SX Symbol for Start of Text 2403 03 ETX EX Symbol for End of Text 2404 04 EOT ET Symbol for End of Transmission 2405 05 ENQ EQ Symbol for Enquiry 2406 06 ACK AK Symbol for Acknowledge 2407 07 BEL BL Symbol for Bell 2409 09 BS BS Symbol for Backspace 2409 09 HT HT Symbol for Horizontal Tab 240A 0A LF LF Symbol for Line Feed 240B 0B VT VT Symbol for Vertical Tab 240C 0C FF FF Symbol for Form Feed (1) 240D 0D CR CR Symbol for Carriage Return 240E 0E SO SO Symbol for Shift Out 240F 0F SI SI Symbol for Shift In 2410 10 DLE DL Symbol for Data Link Escape 2411 11 DC1 D1 Symbol for Device Control 1 (1) 2412 12 DC2 D2 Symbol for Device Control 2 (1) 2413 13 DC3 D3 Symbol for Device Control 3 (1) 2414 14 DC4 D4 Symbol for Device Control 4 (1) 2415 15 NAK NK Symbol for Negative Acknowledge 2416 16 SYN SY Symbol for Synchronous Idle 2417 17 ETB EB Symbol for End of Transmission Block 2418 18 CAN CN Symbol for Cancel 2419 19 EM EM Symbol for End of Medium 241A 1A SUB SU Symbol for Substitute 241B 1B ESC EC Symbol for Escape 241C 1C FS FS Symbol for Field Separator (2) 241D 1D GS GS Symbol for Group Separator (2) 241E 1E RS RS Symbol for Record Separator (2) 241F 1F US US Symbol for Unit Separator (2) 2420 20 SP SP Symbol for Space (3) 2421 7F DEL DT Symbol for Delete (3) Notes: (1) Note the conflict/coincidence of these 2-character forms with hex bytes; see Note (3) in Section 4. (2) These C0 controls have alternative names, listed in Section 5.6. (3) Not, strictly speaking, a control character, but not a visible one either. Summary: No new code points, but it is recommended that C0 control pictures be illustrated diagonally, and that the 2-letter forms be listed as alternatives for font designers, especially for low resolutions or small point sizes. 5.2. C1 Control Pictures C1 Control characters are specified in ISO 6429 [18] (ISO Registration Number 77 [28]) and used, among other places, in the VT220 family of terminals [5] and the Wyse 370 [26], where they are represented in the right half of the "display controls" font as shown in Table 5.2 (DEC terminals use the full name, Wyse terminals use the 2X name). As with C0 controls, the "name" is displayed diagonally within the character cell. Unicode presently includes no C1 control pictures. The "Code" column shows the temporary Unicode value for reference within this document only; actual code assignments should be outside the Private Use area. The other columns are labeled as in Table 5.1. Table 5.2: C1 Control Characters Code Val Name 2X Description 80 80 (1) 81 81 (1) E022 82 BPH Symbol for Break Permitted Here (2) E023 83 NBH Symbol for No Break Here (2) E024 84 IND IN Symbol for Index (3) E025 85 NEL NL Symbol for Next Line E026 86 SSA SS Symbol for Start Selected Area E027 87 ESA ES Symbol for End Selected Area E028 88 HTS HS Symbol for Character Tabulation Set E029 89 HTJ HJ Symbol for Character Tabulation with Justification E02A 8A VTS VS Symbol for Line Tabulation Set E02B 8B PLD PD Symbol for Partial Line Forward E02C 8C PLU PU Symbol for Partial Line Backward E02D 8D RI RI Symbol for Reverse Line Feed E02E 8E SS2 S2 Symbol for Single Shift 2 E02F 8F SS3 S3 Symbol for Single Shift 3 E030 90 DCS DC Symbol for Device Control String E031 91 PU1 P1 Symbol for Private Use 1 E032 92 PU2 P2 Symbol for Private Use 2 E033 93 STS SE Symbol for Set Transmit State E034 94 CCH CC Symbol for Cancel Character E035 95 MW MW Symbol for Message Waiting E036 96 SPA SP Symbol for Start Protected (Guarded) Area E037 97 EPA EP Symbol for End Protected (Guarded) Area E038 98 SOS Symbol for Start of String (2) 99 (1) E03A 9A SCI Symbol for Single Character Introducer (2) E03B 9B CSI CS Symbol for Control Sequence Introducer E03C 9C ST ST Symbol for String Terminator E03D 9D OSC OS Symbol for Operating System Command E03E 9E PM PM Symbol for Privacy Message E03F 9F APC AP Symbol for Application Program Command Notes; (1) Undefined in ISO-6428, shown on VT220/WY370 terminal by hex byte symbols (see text just below these notes). (2) Defined in ISO-6428, but shown on VT220/WY370 terminal by hex value. (3) Removed from ISO-6428 in the third edition, but shown indicated on VT220/WY370 terminal. Note that three of the C1 control pictures are unassigned (the ones marked by "(1)", that would be at U+E020, U+E021, and U+E039 if these were assigned). These positions should be left vacant in case names are assigned to these characters in a future revision of ISO 6429, or terminals are discovered with control pictures for these codes. In the meantime, hex bytes are used; if a hex-byte block (Section 4) is defined, they can be taken from that block; otherwise, the particular values shown here (80, 81, and 99, and possibly also 98 and 9A) must be defined for this block. As with C0 controls, it is a matter for the font designer to choose the full designator from the Name column, or the 2-character alternatives from the 2X column. Summary: 29 New characters (if hex bytes are also approved) or 32 (if they are not). 5.3. EBCDIC Control Pictures The EBCDIC family of character sets [13,14,29] includes its own repertoire of control characters. Many of them, like NUL, SOH, FF, SO, SI, and so on, are coincident with ASCII C0 controls in name and semantics, and sometimes also in encoding. Others are unique to EBCDIC. Table 5.3 shows the EBCDIC control characters [29], in EBCDIC order. The Code column shows the Unicode value; those starting with 24 are already in Unicode block U+2400; those starting with E need to be added. The Val column shows the EBCDIC value (hex). The Name column shows the EBCDIC abbreviation for the code, and the description lists "Symbol for" plus the EBCDIC name. There are no known "2X" forms in use. Table 5.3: EBCDIC Control Characters Code Val Name Description 2400 00 NUL Symbol for Null 2401 01 SOH Symbol for Start of Heading 2402 02 STX Symbol for Start of Text 2403 03 ETX Symbol for End of Text E040 04 SEL Symbol for Select 2409 05 HT Symbol for Horizontal Tab E041 06 RNL Symbol for Required New Line 2421 07 DEL Symbol for Delete E042 08 GE Symbol for Graphic Escape E043 09 SPS Symbol for Superscript E044 0A RPT Symbol for Repeat 240B 0B VT Symbol for Vertical Tab 240C 0C FF Symbol for Form Feed (1) 240D 0D CR Symbol for Carriage Return 240E 0E SO Symbol for Shift Out 240F 0F SI Symbol for Shift In 2410 10 DLE Symbol for Data Link Escape 2411 11 DC1 Symbol for Device Control 1 2412 12 DC2 Symbol for Device Control 2 2413 13 DC3 Symbol for Device Control 3 E045 14 RES Symbol for Restore 2424 15 NL Symbol for New Line 2409 16 BS Symbol for Backspace E046 17 POC Symbol for Program Operator Communication 2418 18 CAN Symbol for Cancel 2419 19 EM Symbol for End of Medium E047 1A UBS Symbol for Unit Back Space E048 1B CU1 Symbol for Customer Use 1 E049 1C IFS Symbol for Interchange File Separator E04A 1D IGS Symbol for Interchange Group Separator E04B 1E IRS Symbol for Interchange Record Separator E04C 1F IUS Symbol for Interchange Unit Separator (2) E04D 20 DS Symbol for Digit Select E04E 21 SOS Symbol for Start of Significance 241C 22 FS Symbol for Field Separator E04F 23 WUS Symbol for Word Underscore E050 24 BYP Symbol for Bypass 240A 25 LF Symbol for Line Feed 2417 26 ETB Symbol for End of Transmission Block 241B 27 ESC Symbol for Escape E051 28 SA Symbol for Set Attribute E052 29 SFE Symbol for Start Field Extended E053 2A SM Symbol for Set Mode (3) E054 2B CSP Symbol for Control Sequence Prefix E055 2C MFA Symbol for Modify Field Attribute 2405 2D ENQ Symbol for Enquiry 2406 2E ACK Symbol for Acknowledge 2407 2F BEL Symbol for Bell E056 30 (Reserved by IBM for future use) E057 31 (Reserved by IBM for future use) 2416 32 SYN Symbol for Synchronous Idle E058 33 IR Symbol for Index Return E059 34 PP Symbol for Presentation Position E05A 35 TRN Symbol for Transparent E05B 36 NBS Symbol for Numeric Backspace 2404 37 EOT Symbol for End of Transmission E05C 38 SBS Symbol for Subscript E05D 39 IT Symbol for Indent Tabulation E05E 3A RFF Symbol for Reverse Form Feed E05F 3B CU3 Symbol for Customer Use 3 (4) 2414 3C DC4 Symbol for Device Control 4 2415 3D NAK Symbol for Negative Acknowledge E060 3E (Reserved by IBM for future use) 241A 3F SUB Symbol for Substitute Notes: (1) Conflict/coincidence with a hex byte; see Note (3) in Section 4. (2) The IUS control is sometimes also labeled ITB. (3) The SM control is sometimes also labeled SW (= Switch). (4) Note: There is no longer a Customer Use 2 (see Table 5.3A). Summary: 33 new characters, E040-E060, including 3 reserved. For reference, Table 5.3A shows the original names for EBCDIC control characters [13] that are now superseded by the names shown in Table 5.3. It is not proposed here that these be added to Unicode. Table 5.3A: Obsolete EBCDIC Control Characters Val Name Description Replaced By 04 PF Punch Off SEL 06 LC Lower Case RNL 0A SMM Start of Manual Message RPT 13 TM Tape Mark DC3 17 IL Idle POC 1A CC Cursor Control UBX 2B CU2 Customer Use 2 CSP 34 PN Punch On PP 35 RS Record Separator TRN 36 UC Upper Case NBS 5.4. IBM 3270 Terminal Orders and Controls Names for IBM 3270 terminal orders and controls [27] that are not already listed in Tables 5.1-5.3 are shown in Table 5.4, to be used in debugging 3270 data streams. Columns are as in the previous tables, except the Type column, in which: O = 3270 Terminal Order [27,Table 4-1] D = 3270 Terminal Order in normal display [27,p.E-3] L = LU 1 SCS Control Codes [27,Table 8-2] F = 3270 Format Control Order [27,Table 4-3] Table 5.4: 3270 Control Characters Code Val Name Type Description E070 1D SF O Symbol for Start Field E071 11 SBA O Symbol for Set Buffer Address E072 2C MF O Symbol for Modify Field E073 13 IC O Symbol for Insert Cursor E074 05 PT O Symbol for Program Tab E075 3C RA O Symbol for Repeat to Address E076 12 EUA O Symbol for Erase to Unprotected Address E077 04 VCS L Symbol for Vertical Channel Select E078 14 ENP L Symbol for Enable Presentation E079 24 INP L Symbol for Inhibit Presentation E07A 2B FMT L Symbol for Format E07B 1C DUP F Symbol for Duplicate E07C 1C DUP D Overscore asterisk (1) E07D 1E FM F Symbol for Field Mark E07E 1E FM D Overscore semicolon (1) E07F FF EO F Symbol for Eight Ones Notes: (1) When displayed "as itself". Summary: 16 new characters, E070-E07F. 5.5. 3270 Terminal Operator Status Indicators The IBM 3270 terminal displays a variety of unique glyphs in its Operator Information Area [15, Figure A-4]. Although they are not encoded in any IBM character set (known to me), they nevertheless appear on the screen, and are therefore required for accurate terminal emulation. These glyphs are listed in Table 5.5. Table 5.5: 3270 Terminal Operator Status Indicators Code Description E080 Human stick figure E081 Human stick figure in box E082 Clock at 6:10 (or 1:30) E083 White rectangle with stroke (1) E084 Black rectangle with stroke (2) E085 Lighting with stroke (3) E086 Security key (4) E087 Black and White Right-Pointing Triangles (5) Notes: (1) A rectangle like the one at U+25AD with an oblique stroke through it. Note that "white" and "black" are used in the sense of the Unicode standard, and do not imply any particular colors or measure of goodness. (2) A rectangle like the one at U+25AC with an oblique stroke through it. (3) A horizontal lightning symbol with an oblique stroke through it. (4) A picture of a key (indicating the keyboard is locked). (5) Like U+25B8 and U+25B9 in the same cell, arranged horizontally, left to right, like a double right-pointing arrowhead, used as a supplementary indicator. In many cases, black and/or white rectangles (U+25AD, U+25AC, U+E083, U+E084) are connected with a centered horizontal line such as the one at U+2500; two rectangles connected this way generally symbolize a 3270 terminal with a printer attached. Figure 5.5 shows an example. The font designer must ensure that a sequence: rectangle, line, rectangle, results in a pair of connected rectangles. Figure 5.5: Connected Rectangles +--------+ +--------+ | |------| | +--------+ +--------+ Summary: 8 new characters, E080-E087 5.6. Additional Control-Like Pictures Table 5.6 shows additional characters that are (or are likely to be) included in "display controls" mode on various terminals. Table 5.6: Additional Control-Like Pictures Code Name Description E090 LS1 Symbol for Locking Shift 1 (1) E091 LS0 Symbol for Locking Shift 0 (2) E092 CEX Symbol for Control Extension (3) E093 IS4 Symbol for Information Separator 4 (4) E094 IS3 Symbol for Information Separator 3 (5) E095 IS2 Symbol for Information Separator 2 (6) E096 IS1 Symbol for Information Separator 1 (7) E097 CL Symbol for Cancel Line (8) E098 Picture of Bell (9) E099 BP Word Processing Symbol BP (10) E09A BE Word Processing Symbol BE (10,11) E09B FN Word Processing Symbol FN (10) E09C FE Word Processing Symbol FE (10,11) E09D HF Word Processing Symbol BP (10) 2426 Symbol for Substitute Form Two (Reverse Question Mark) (12) Notes: (1) ISO name for SO [18]. (2) ISO name for SI [18]. (3) From JIS C 6225-1979 / ISO # 74 [28]. (4) ISO Name for FS [18]. (5) ISO Name for GS [18]. (6) ISO Name for RS [18]. (7) ISO Name for US [18]. (8) Used on HP terminals [11.12]. (9) Used on HP terminals in place of Symbol for BEL (U+2407) [11]. (10) From the Data General Word Processing Set [2]. (11) Conflict/Coincidence with Hex Byte; see Note (3) in Section 4. (12) The upright reverse question mark is used by DEC VT terminals to indicate that an invalid code was received. It also stands for SUB and/or RS in Wyse display controls mode [25,26], and is the glyph for 0xFF in the Televideo Multinational Character Set [23]. And it is also a glyph in the DG Special Graphics Character Set [2]. This one is not in Unicode at present, but is encoded in Amendment 18 to ISO 10646 at the code point shown, with the requisite shape of reverse question mark. Summary: 14 characters, E090-E09D. Section 5 Summary: Unicode Controls: 32 new characters, E000-E01F C0 Controls: 0 new characters C1 Controls: 32 new characters, E020-E03F EBCDIC Controls: 33 new characters, E040-E060 3270 Controls: 16 new characters, E070-E07F 3270 Indicators: 8 new characters, E080-E087 Misc Controls: 14 new characters, E090-E09E Total Control Pics: 135 6. MATH SYMBOLS Unicode has a generous supply of math symbols, and no doubt more are in the works. And of course it also includes the Latin, Greek, Fraktur, Hebrew, and other letters used in mathematical notation. However, terminal emulators also need special glyphs designed to be joined together in adjacent character cells, vertically or horizontally, to form large math symbols such as integrals, summation signs, braces, or brackets, such as the integral top and bottom that already exist at U+2320 and U+2321. Several other single-cell characters are also missing, including the small radical sign from the DEC Technical character set. Table 6.1 lists the needed characters, along with suggested temporary codes for them. At least one real terminal reference is shown for each character, in column/row notation, or an IBM Graphic Character Global Identifier (GCGID) [14]. Legend: SB = Square Bracket UL = Upper Left LL = Lower Left UR = Upper Right LR = Lower Right Table 6.1: Math Symbols for Terminals Code Description Reference E0A0 Extensible left brace middle DEC Tech 02/15 E0A1 Extensible left parenthesis bottom DEC Tech 02/12, IBM SS210000 E0A2 Extensible left parenthesis top DEC Tech 02/11, IBM SS200000 E0A3 Extensible left SB bottom DEC Tech 02/08 E0A4 Extensible left SB top DEC Tech 02/07 E0A5 Extensible right brace middle DEC Tech 03/00 E0A6 Extensible UR or LL brace section IBM SS240000 E0A7 Extensible LR or UL brace section IBM SS250000 E0A8 Extensible right parenthesis bottom DEC Tech 02/14, IBM SS230000 E0A9 Extensible right parenthesis top DEC Tech 02/13, IBM SS220000 E0AA Extensible right SB bottom DEC Tech 02/10 E0AB Extensible right SB top DEC Tech 02/08 E0AC Summation symbol bottom DEC Tech 03/02, DG Math 01/09(1) E0AD Summation symbol top DEC Tech 03/01, DG Math 01/08(1) E0AE Right ceiling corner DEC Tech 03/05 E0AF Right floor corner DEC Tech 03/06 E0B0 Radical symbol, small DEC Tech 00/01 E0B1 Radical symbol with stroke DG Math 01/13 E0B2 Superscript Latin small letter i SNI Math 03/00 E0B3 Latin small letter a with underbar SNI Math 04/04 (2) E0B4 Latin capital letter O with underbar SNI Math 04/09 (2) E0B5 Superscript almost-equal-to sign SNI IBM 06/12 E0B6 Superscript capital Greek letterSigma SNI IBM 06/13 E0B7 Superscript infinity sign SNI IBM 07/12 E0B8 Superscript proportional-to sign SNI IBM 07/13 References: DEC Tech = Digital Equipment Corporation Technical Character Set [5] SNI Math = Siemens Nixdorf Mathematisch [21] SNI IBM = Siemens Nixdorf IBM [21] DG Math = Data General Word-Processing, Greek, and Math Character Set [2] IBM = IBM Graphic Character Global Identifier (GCGID) [14] Notes: (1) Also GCGID SS280000 and SS29000. (2) These are like feminine and masculine ordinal, respectively, but full size, not superscripts. Summary: 25 new characters, E0A0-E0B8. 7. LINE, BOX, AND BLOCK CHARACTERS A particular need addressed by this proposal is the continued ability to support (sometimes mission-critical) terminal-based forms-filling applications that also require entry and display of international characters, as terminals are replaced by PCs. So far, Unicode has provided the international characters, but not necessarily all the needed character-cell based forms-drawing capabilities. Some terminals have vertical and horizontal lines that are not centered within the character cell, and currently not found in Unicode. Others have black rectangles or other shapes not found in the U+2580 block. Table 7.1 lists the additional line, box, and block characters needed to emulate the target terminals. Abbreviations: V = Vertical H = Horizontal L = Left R = Right LL = Lower Left LR = Lower Right UL = Upper Left UR = Upper Right Terminology: Quadrant A black rectangle filling one quarter of a cell, with one corner in the center and the opposite corner at a corner of the cell. So "Quadrant UL" is the upper left quadrant; "Quadrant UL and UR" is the top half of the cell (which happens to be coincident with U+2580 and so is not included here). Line Refers to a line that extends all the way to opposite edge(s) of a cell, designed to be joined to (a) line(s) in the adjacent cell(s). Bar Refers to a horizontal line that does not touch any cell edges. Wedge Refers to a character cell with a diagonal line connecting opposite corners, dividing it into two triangles; one black, the other white; the wedge is the black part. Thus an UL Wedge is similar to U+25E9, except it fills the entire character cell. Framus (Pick a better word!) is a shape composed of two triangles with their points meeting at the center of the cell to form an X with bars across the top and bottom, closing the open ends. A black framus has the two triangles filled in; a white one is in outline form. A framus with center bar has a horizontal line through the center of the cell. Figure 7.1: "Framus" Glyphs White Black With Bar ******* ******* ******* * * ***** * * * * *** * * * * ********* * * *** * * * * ***** * * ******* ******* ******* Table 7.1: Additional Line, Box, and Block Characters Code Description References E0D0 L V box line, extensible H19 07/12 (1) E0D1 R V box line, extensible H19 07/13 (1) E0D2 UL Wedge H19 07/02, IBM SF870000 E0D3 UR Wedge H19 05/14, IBM SF860000 E0D4 LL Wedge IBM SF850000 E0D5 LR Wedge IBM SF840000 E0D6 H line - Scan 1 DSG 06/15, H19 07/10, WG3 05/00, TVI 09/00 E0D7 H line - Scan 3 DSG 07/00, Wyse ANSI 01/01, WG3 05/00 E0D8 H line - Scan 5 DSG 07/01, Wyse ANSI 02/02 (2) E0D9 H line - Scan 7 DSG 07/02, Wyse ANSI 01/03, WG3 05/01 E0DA H line - Scan 9 DSG 07/03, H19 07/11, WG3 05/01, TVI 09/01 E0DB Quadrant LL H19 06/13, WG3 05/05, TVI 09/05 E0DC Quadrant LR H19 06/12, WG3 05/04, TVI 09/04 E0DD Quadrant UL H19 06/14, WG3 05/06, TVI 09/06 E0DE Quadrant UL and LL and LR WG3 05/11, TVI 09/11 E0DF Quadrant UL and LR H19 06/10 (3) E0E0 Quadrant UL and UR and LL WG3 05/12, TVI 09/12 E0E1 Quadrant UL and UR and LR WG3 05/13, TVI 09/13 E0E2 Quadrant UR H19 111, WG3 83, TVI 09/03 E0E3 Quadrant UR and LL (for completeness) E0E4 Quadrant UR and LL and LR WG3 05/14, TVI 09/14 E0E5 Full black diamond TVI 09/02 (4) E0E6 Black framus DGM 06/08 E0E7 Black framus + H center bar DGM 06/09 E0E8 White framus DGM 06/10 E0E9 White framus + H center bar DGM 06/11 E0EA R & L arrow to V center bar DGM 03/13 E0EB Up arrow to H center line DGL 02/12 E0EC R arrow to V center line DGL 02/13 E0ED L arrow to V center line DGL 02/14 E0EE Down arrow to H center line DGL 02/12 E0EF Box drawing double dash H DGL 03/12 (5) References: DGM = Data General Word-Processing, Greek, and Math Character Set [2] DGL = Data General Line Drawing Character Set [2] DSG = The DEC Special Graphics Character Set [5] H19 = The Heath/Zenith 19 Graphics Character Set [10] WG3 = The Wyse Graphics 3 Character Set [25] TVI = The Televideo 965 Multinational Character Set [23] IBM = Graphic Character Global Identifier (GCGID) [14] Wyse ANSI = Wyse 60 "Standard ANSI", "UK ANSI", and "ANSI Graphics" [25] Notes: (1) The vertical box lines are near, but not touching, the left and right edges of the cell, respectively, and are two pixels thick on the H19 screen. Similar to IBM GCID SF640000 and SF650000, respectively. (2) A centered horizontal is already in Unicode U+2500, but this one might need to be encoded separately if existing one does not mesh well with other line and box characters. (3) Only on Zenith models, not original Heathkits. (4) Full black diamond, with points touching center of each cell wall. (5) Similar to U+2504 but double rather than triple. Also note that Quadrants UL+UR, UR+LR, LL+LR, UL+LL (half blocks) are already encoded at block U+2580. Summary: 32 New glyphs, Range E0D0 to E0EF. 8. UNFINISHED BUSINESS The selection of characters presented in this draft is far from comprehensive. Hundreds of other terminals from the past 30+ years are likely to have glyphs or entire character sets covered neither here nor in Unicode, and these might or might not be important in some application somewhere. Readers are invited, therefore, to propose any needed additions, bearing in mind that Unicode code space is not unlimited. Several character sets found in the references consulted are ignored here, fully or in part, due to lack of motivation (nobody has ever asked us, in our role of terminal emulator maker, to support them). Obviously these, and any other missing sets (such as the many Videotex/Teletex/etc mosaic sets), can be considered if there is a demand. Siemens Nixdorf Facet A set of 95 mosaic graphics, but not resembling any of the ISO Videotex mosaic sets; difficult to describe. Siemens Nixdorf Klammern (Brackets) A set of 95 assorted blobs, bracket and brace pieces, clocks, arrows, hourglasses, and Greek letters, some of which are unique; others can be unified with existing Unicode characters or characters in this proposal. Hewlett Packard Line Drawing Mostly coincident with Unicode box-drawing set at U+2500, but with a handful of unique characters, such as single-to-triple box intersections, single-to-double intersections with wide spacing, etc. These should be mappable to existing U+25xx glyphs without causing riots in the streets. Hewlett Packard Big Character Pieces Thick line segments for drawing large characters, used on the HP-2648. 9. SUMMARY OF PROPOSED ADDITIONAL CHARACTERS If all the proposed new characters are added to the UCS, this will enable terminal emulators to fully handle at least the following terminal character sets, which were not previously covered in full: ASCII/ISO Display Controls for DEC, Hewlett Packard, Televideo, and others. EBCDIC Display Controls for the IBM 3270 Hexadecimal debugging DEC Technical DEC Special Graphics Data General Word-Processing, Greek, and Math (1) Data General Line Drawing Heath/Zenith 19 Graphics Hewlett Packard 2621 and HPTERM Siemens Nixdorf's "IBM" set (plus parts of its Klammern and Facet sets) Televideo Multinational Wyse Graphics 3 (Graphics 1 and 2 were already covered) Wyse "Standard ANSI", "UK ANSI", and "ANSI Graphics" Notes: (1) Except the DG logo character, which is presumed off limits. Terminals supporting these character sets are numerous indeed. An incomplete list includes: DEC VT100, VT102, VT220/240, VT320/330/340, VT420, VT520/525; Data General 210, 215, 217, 413, and 463; the Heath / Zenith 19; the Perkin Elmer 550 and 1100; and numerous Televideo and Wyse models. The new characters proposed in this document are listed in Table 10.1. Priorities: For terminal emulation the most important categories are, in descending order: 1. Line, Box, and Block characters 2. Extensible math symbols 3. C1 and EBCDIC control pictures 4. Hex bytes For adding debugging capabilities to Unicode applications in general: 1. Hex bytes 2. Unicode control pictures 3. C1 and EBCDIC control pictures Table 10.1: Census of New Characters Code Description E000 Symbol for En Quad E001 Symbol for Em Quad E002 Symbol for En Space E003 Symbol for Em Space E004 Symbol for Three-Per-Em-Space E005 Symbol for Four-Per-Em-Space E006 Symbol for Six-Per-Em-Space E007 Symbol for Figure Space E008 Symbol for Punctuation Space E009 Symbol for Thin Space E00A Symbol for Hair Space E00B Symbol for Zero-Width Space E00C Symbol for Zero-Width Non-Joiner E00D Symbol for Zero-Width Joiner E00E Symbol for Left-to-Right Mark E00F Symbol for Right-to-Left Mark E010 Symbol for Line Separator E011 Symbol for Paragraph Separator E012 Symbol for Left-to-Right Embedding E013 Symbol for Right-to-Left Embedding E014 Symbol for Pop Directional Formatting E015 Symbol for Left-to-Right Override E016 Symbol for Right-to-Left Override E017 Symbol for Inhibit Symmetric Swapping E018 Symbol for Activate Symmetric Swapping E019 Symbol for Inhibit Arabic Form Shaping E01A Symbol for Activate Arabic Form Shaping E01B Symbol for National Digit Shapes E01C Symbol for Nominal Digit Shapes E01D Symbol for Zero Width No Break Space E01E Symbol for Not A Character (Byte Order) E01F Symbol for Not A Character E020 (Reserved) E021 (Reserved) E022 Symbol for Break Permitted Here E023 Symbol for No Break Here E024 Symbol for Index E025 Symbol for Next Line E026 Symbol for Start Selected Area E027 Symbol for End Selected Area E028 Symbol for Character Tabulation Set E029 Symbol for Character Tabulation with Justification E02A Symbol for Line Tabulation Set E02B Symbol for Partial Line Forward E02C Symbol for Partial Line Backward E02D Symbol for Reverse Line Feed E02E Symbol for Single Shift 2 E02F Symbol for Single Shift 3 E030 Symbol for Device Control String E031 Symbol for Private Use 1 E032 Symbol for Private Use 2 E033 Symbol for Set Transmit State E034 Symbol for Cancel Character E035 Symbol for Message Waiting E036 Symbol for Start Protected (Guarded) Area E037 Symbol for End Protected (Guarded) Area E038 Symbol for Start of String E039 (Reserved) E03A Symbol for Single Character Introducer E03B Symbol for Control Sequence Introducer E03C Symbol for String Terminator E03D Symbol for Operating System Command E03E Symbol for Privacy Message E03F Symbol for Application Program Command E040 Symbol for Select E041 Symbol for Required New Line E042 Symbol for Graphic Escape E043 Symbol for Superscript E044 Symbol for Repeat E045 Symbol for Restore E046 Symbol for Program Operator Communication E047 Symbol for Unit Back Space E048 Symbol for Customer Use 1 E049 Symbol for Interchange File Separator E04A Symbol for Interchange Group Separator E04B Symbol for Interchange Record Separator E04C Symbol for Interchange Unit Separator E04D Symbol for Digit Select E04E Symbol for Start of Significance E04F Symbol for Word Underscore E050 Symbol for Bypass E051 Symbol for Set Attribute E052 Symbol for Start Field Extended E053 Symbol for Set Mode E054 Symbol for Control Sequence Prefix E055 Symbol for Modify Field Attribute E056 (Reserved) E057 (Reserved) E058 Symbol for Index Return E059 Symbol for Presentation Position E05A Symbol for Transparent E05B Symbol for Numeric Backspace E05C Symbol for Subscript E05D Symbol for Indent Tabulation E05E Symbol for Reverse Form Feed E05F Symbol for Customer Use 3 E060 (Reserved) E070 Symbol for Start Field E071 Symbol for Set Buffer Address E072 Symbol for Modify Field E073 Symbol for Insert Cursor E074 Symbol for Program Tab E075 Symbol for Repeat to Address E076 Symbol for Erase to Unprotected Address E077 Symbol for Vertical Channel Select E078 Symbol for Enable Presentation E079 Symbol for Inhibit Presentation E07A Symbol for Format E07B Symbol for Duplicate E07C Overscore asterisk E07D Symbol for Field Mark E07E Overscore semicolon E07F Symbol for Eight Ones E080 Human stick figure E081 Human stick figure in box E082 Clock at 6:10 (or 1:30) E083 White rectangle with stroke E084 Black rectangle with stroke E085 Lighting with stroke E086 Security key E087 Black and White Right-Pointing Triangles E090 Symbol for Locking Shift 1 E091 Symbol for Locking Shift 0 E092 Symbol for Control Extension E093 Symbol for Information Separator 4 E094 Symbol for Information Separator 3 E095 Symbol for Information Separator 2 E096 Symbol for Information Separator 1 E097 Symbol for Cancel Line E098 Picture of Bell E099 Word Processing Symbol BP E09A Word Processing Symbol BE E09B Word Processing Symbol FN E09C Word Processing Symbol FE E09D Word Processing Symbol BP E0A0 Extensible left brace middle E0A1 Extensible left parenthesis bottom E0A2 Extensible left parenthesis top E0A3 Extensible left SB bottom E0A4 Extensible left SB top E0A5 Extensible right brace middle E0A6 Extensible UR or LL brace section E0A7 Extensible LR or UL brace section E0A8 Extensible right parenthesis bottom E0A9 Extensible right parenthesis top E0AA Extensible right SB bottom E0AB Extensible right SB top E0AC Summation symbol bottom E0AD Summation symbol top E0AE Right ceiling corner E0AF Right floor corner E0B0 Radical symbol, small E0B1 Radical symbol with stroke E0B2 Superscript Latin small letter i E0B3 Latin small letter a with underbar E0B4 Latin capital letter O with underbar E0B5 Superscript almost-equal-to sign E0B6 Superscript capital Greek letter Sigma E0B7 Superscript infinity sign E0B8 Superscript proportional-to sign E0D0 L V box line, extensible E0D1 R V box line, extensible E0D2 UL Wedge E0D3 UR Wedge E0D4 LL Wedge E0D5 LR Wedge E0D6 H line - Scan 1 E0D7 H line - Scan 3 E0D8 H line - Scan 5 E0D9 H line - Scan 7 E0DA H line - Scan 9 E0DB Quadrant LL E0DC Quadrant LR E0DD Quadrant UL E0DE Quadrant UL and LL and LR E0DF Quadrant UL and LR E0E0 Quadrant UL and UR and LL E0E1 Quadrant UL and UR and LR E0E2 Quadrant UR E0E3 Quadrant UR and LL E0E4 Quadrant UR and LL and LR E0E5 Full black diamond E0E6 Black framus E0E7 Black framus + H center bar E0E8 White framus E0E9 White framus + H center bar E0EA R & L arrow to V center bar E0EB Up arrow to H center line E0EC R arrow to V center line E0ED L arrow to V center line E0EE Down arrow to H center line E0EF Box drawing double dash H E100 Symbol for Hex Byte 00 E101 Symbol for Hex Byte 01 : : E1FF Symbol for Hex Byte FF Summary: Hex bytes: 256 Control pictures: 135 Unicode Controls: 32 C0 Controls: 0 C1 Controls: 32 EBCDIC Controls: 33 3270 Controls: 16 3270 Indicators: 8 Misc Controls: 14 Math Symbols: 25 Line/Box/Block: 32 Total: 448 10. REFERENCES [1] American National Standards Institute, ANSI X3.4-1986, Code for Information Interchange (ASCII), 1986. [2] Data General, Programming the Display Terminal: Models D217, D413, and D463, Westboro, MA, 1991. [3] Digital Equipment Corporation, VT100 User Guide, EK-VT100-UG-002, Maynard, MA, 1979. [4] Digital Equipment Corporation, VT100 Video Terminal User Guide, EK-VT102-UG-003, Maynard, MA, 1982. [5] Digital Equipment Corporation, VT220 Owner's Manual, EK-VT220-UG-003, Maynard, MA, 1984. [6] Digital Equipment Corporation, VT220 Series Programmer Reference Manual, EK-VT240-RM-002, Maynard, MA, 1984. [7] Digital Equipment Corporation, VT330/VT340 Programmer Reference Manual, Volume 1: Text Programming, ED-VT3XX-TP-002, Maynard, MA, 1988. [8] Digital Equipment Corporation, Installing and Using the VT420 Video Terminal EK-VT420-UG.002, Maynard, MA, 1988. [9] Digital Equipment Corporation, VT520/VT525 Video Terminal Programmer Inforamtion, EK-VT520-RM.A01, Maynard, MA, 1994. [10] Heathkit Manual for the Video Terminal Model H19, The Heath Company, Benton Harbor, MI, 1979. [11] Hewlett Packard 2621A/P Interactive Terminal Owner's Manual, 1978. [12] Hewlett Packard 2648A Graphics Terminal Reference Manual, 1977. [13] IBM System/360 Principles of Operation, GA22-6821-8, Poughkeepsie, NY, 1970. [14] IBM National Language Design Guide, Volume 2: National Language Support Reference Manual, 4th Edition, North York, ON, 1994. [15] IBM 3270 Information Display System, Component Description, GA27-2749-10, 1980. [16] IBM 3164 ASCII Color Display Station Description, GA18-2317-1, 1986. [17] ISO International Standard 2022, Information processing -- ISO 7-bit and 8-bit coded character sets -- Code extension techniques, Third Edition, Geneva, 1986. [18] ISO/IEC International Standard 6429, Information technology -- Control functions for coded character sets, Third Edition, Geneva, 1992. [19] ISO/IEC 10646-1, International Standard 10646, Information Processing -- Multiple-Octet Coded Character Set, 1993-now. [20] Perkin Elmer Model 1100 User's Manual, Randolph, NJ, 1978. [21] Siemens Nixdorf, Bildschirmeinheit 97801-5xx Schnittstellen, Benutzerhandbuch, München, 1991. [22] Televideo 922 Video Terminal Display Operator's Manual, Sunnyvale, CA, 1984. [23] Televideo 922 Video Terminal Display Operator's Manual, Sunnyvale, CA, 1988. [24] The Unicode Standard, Version 2.0, Addison-Wesley Developers Press, 1996. [25] Wyse WY-60 Programmer's Guide, Wyse Technology, San Jose, CA, 1987. [26] Wyse WY-370 Programmer's Guide, Wyse Technology, San Jose, CA, 1990. [27] IBM 3270 Information Display System, Data Stream Programmer's Reference, GA23-0059-06, 1991. [28] ISO International Register of Coded Characters to Be Used with Escape Sequences, European Computer Manufacturers Association (ECMA), Geneva, 1985-present. [29] IBM Character Data Representation Architecture, Level 1 Registry, IBM Canada Ltd., National Language Technical Centre, Ontario, SC09-1391-00, 1990. (End)