Elbrus architecture

November 16, 2020 · View on GitHub

Overview

Elbrus 2000 (Elbrus or e2k for short), is a SPARC-inspired VLIW architecture developed by the Moscow Center for SPARC Technology (MCST).

Elbrus machine code is organized into very long instruction words (VLIW), which consist of multiple so-called syllables that are executed together.

References

Several useful documents about Elbrus are available on the internet, albeit mostly in Russian.

Memory organization

Most operations in Elbrus code either:

  • Take the values of one or more registers, compute a function, and write the result to another register, or
  • Load a value from memory into a register or store a value from a register into memory.

Register file, RF (Регистровый файл , РгФ)

The 256 general-purpose registers of the Register File (RF/РгФ) are divided into two categories:

  • 224 registers are part of the procedure stack in a windowed way. They can become available or unavailable during procedure calls and returns. (See also elbrus-prog chapter 9.3.1.1)
  • 32 registers are global registers. They are available during the whole runtime of a program.
32-bit64-bitdescription
%g0%dg0Global register (0-31)
%r0%dr0Procedure stack register, relative to start of current window
%b[0]%db[0]Mobile base registers, relative to the start of the current window, plus BR

TODO: last eight global registers are designated rotatable area

Changing the register window

The procedure stack contains parameters and local data of procedures. Its top area is stored in the register file (RF). On overflow or underflow of the register file, its contents are automatically swapped in/out of memory. Launch of a new procedure allocates a window on the procedure stack, which may overlap with the calling procedure's window.

Procedure chain stack (стек связующей информации)

Stack of return addresses. It can only be manipulated by the operating system and the hardware. Its top area is stored in CF (chain file) registers.

On this stack the following information is encoded in two quad words:

  • return address
  • compilation unit index descriptor (CUIR)
  • window base (wbs) in the register file
  • presence of real 80 (?)
  • predicate file
  • user stack descriptor
  • rotatable area base
  • processor status register

On overflow or underflow of the chain file, its contents are automatically swapped in/out of memory.

Predicate file, PF (Предикатный файл, ПФ)

Comparison operations produce one-bit results (true or false) that can be stored in the predicate registers.

Predicates can be used in conditional control transfers (jumps/calls), or in the conditional execution of individual operations.

There are 32 predicate registers in the predicate file, which appear as %pred0 to %pred31 in assembly code.

Special purpose registers

Special purpose registers can be read using the rrs and rrd operations, and writing using the rws and rwd operations.

NameDescription
CUIRcompilation unit index register, индекс дескрипторов модуля компиляции
PSHTPprocedure stack hardware top pointer
PSPprocedure stack pointer - contains the virtual base address of the procedure stack.
WDwindow descriptor - contains the base and the size of the current procedure's window into the procedure stack.
PCSHTPprocedure chain stack hardware top pointer
USBRuser stack base pointer, РгБСП
USDuser stack descriptor, ДСП

Regular Instructions

Elbrus' wide instructions (широкая команда, ШК) are comprised of a header syllable and zero or more additional syllables. Wide instructions are 8 byte aligned and up to 16 words (64 bytes) long.

Syllables

AbbreviationDescription
HSHeader syllable - it encodes length and structure of a wide instruction
SSStubs syllable - short operations that take only a few bits to encode
ALSArithmetic logic channel syllable
CSControl syllable
ALESArithmetic logic extension channel semi-syllable. They extend corresponding ALS. ALES2 and ALES5 are only available on Elbrus v4 and higher.
AASArray access semi-syllable
LTSLiteral syllable - literals to be used as operands
PLSPredicate logic syllable - processing of boolean values
CDSConditional syllable - specified which operations are to be executed under which condition

The first syllable is the header syllable. It is always present. Presence of other syllables depend on the purpose of the command. Syllables occur in the following order:

  • HS
  • SS
  • ALS0, ALS1, ALS2, ALS3, ALS4, ALS5
  • CS0
  • ALES2, ALES5
  • CS1
  • ALES0, ALES1, ALES3, ALES4
  • AAS0, AAS1, AAS2, AAS3, AAS4, AAS5
  • LTS3, LTS2, LTS1, LTS0
  • PLS2, PLS1, PLS0
  • CDS2, CDS1, CDS0

Syllable packing

Semi-syllables ALES and AAS are a half-word (2 bytes) long. All other syllables are one word (4 bytes) long.

Syllables SS, ALS* and CS0 occur as indicated in the header syllable in the order described above. They are packed, e.g. if header bits indicate presence of ALS0 and ALS2 but not SS nor ALS1, then the syllable ALS0 follows directly after HS and ALS2 follows directly after ALS0.

If presence of ALES2 or ALES5 is indicated, then a whole word is allocated for them, whether both are present or not. The first of both to be present occupies the more significant half of the word, the second is encoded in the less significant half. For example, when looking at the syllables as bytes, if ALES2 and ALES5 are present, then the first two bytes of the little endian word contain ALES5 and the last two bytes contain ALES2. If only ALES5 is present, the first two bytes are empty and the last two bytes contain ALES5.

CS1 may follow right after the previously described syllables.

ALES{0,1,3,4} and AAS* start at the word indicated by the "middle pointer" from the header syllable. Their ordering is the same as for ALES2 and ALES5 (high half first, low half second) but they are all packed. This means that any two syllables of ALES{0,1,3,4} and AAS{0,1} may share a word. ALES* may not share a word with AAS{2,3,4,5} because presence of the latter implies presence of AAS0 and/or AAS1. For example, if ALES0, ALES1, ALES4, AAS0 and AAS2 are indicated, then they are encoded as ALES1, ALES0, AAS0, ALES4, two bytes left empty, and finally AAS2.

LTS*, PLS* and CDS* are decoded starting from the end of the wide command. CDS* and PLS* are not indicated by individual flags but rather by their number. For example, there cannot be a PLS2 without a PLS0 and PLS1. LTS take any remaining words between the other syllables. For example, if after the AAS there are five words remaining in the wide command and two CDS and one PLS are indicated, then two words for LTS are left. They would be encoded as LTS1, LTS0, PLS0, CDS1, CDS0.

We do not know what happens if more syllables are indicated than there is space allocated or if syllables are encoded to overlap.

HS - Header syllable

BitNameDescription
31ALS5arithmetic-logic syllable 5 presence
30ALS4arithmetic-logic syllable 4 presence
29ALS3arithmetic-logic syllable 3 presence
28ALS2arithmetic-logic syllable 2 presence
27ALS1arithmetic-logic syllable 1 presence
26ALS0arithmetic-logic syllable 0 presence
25ALES5arithmetic-logic extension syllable 5 presence
24ALES4arithmetic-logic extension syllable 4 presence
23ALES3arithmetic-logic extension syllable 3 presence
22ALES2arithmetic-logic extension syllable 2 presence
21ALES1arithmetic-logic extension syllable 1 presence
20ALES0arithmetic-logic extension syllable 0 presence
19:18PLSnumber of predicate logic syllables
17:16CDSnumber of conditional execution syllables
15CS1control syllable 1 presence
14CS0control syllable 0 presence
13set_mark
12SSstub syllable presence
11--unused
10loop_mode
9:7nop
6:4Length of instruction, in multiples of 8 bytes, minus 8 bytes
3:0Number of words occupied by SS, ALS, CS, ALES2, ALES5 - called "middle pointer"

SS - Stubs syllable

Stubs syllable format 1 - SF1
BitNameDescription
31:30ipdinstruction prefetch depth
29eapend array prefetch
28bapbegin array prefetch
27srp
26vfdi
25crp (?)
24abgi
23abgd
22abnf
21abnt
20typetype is 0 for SF1
19abpf
18abpt
17alcf
16alct
15array access syllable 0 and 2 presence
14array access syllable 0 and 3 presence
13array access syllable 1 and 4 presence
12array access syllable 1 and 5 presence
11:10ctopctpr number used in control transfer (ct) instructions
9?
8:0ctcondcondition code for control transfers (ct)
Stubs syllable format 2 - SF2
BitNameDescription
31:30ipdinstruction prefetch depth
29:28encodes invts and flushts, see below
27srp (?)
26encodes invts and flushts, see below
25crp (?)
20typetype is 1 for SF2
4:0predpred num
(ss >> 27 & 6) | (ss >> 26 & 1)Description
2invts
3flushts
6invts ? %predN
7invts ? ~ %predN
ct condition codes

The condition code in the stubs syllable controls under which conditions a control transfer operation is executed.

Bitdescription
4:0Predicate number (from pred0 to pred31)
8:5Condition type
Typesyntaxdescription
0--never
1always
2? %pred0if predicate is true
3? ~ %pred0if predicate is false
4? #LOOP_END
5? #NOT_LOOP_END
6? %pred0 || #LOOP_END
7? ~ %pred0 && #NOT_LOOP_END
8(TODO, depends on syllable)
9(TODO, depends on syllable)
10(reserved)
11(reserved)
12(reserved)
13(reserved)
14? ~ %pred0 || #LOOP_END
15? %pred0 && #NOT_LOOP_END

#LOOP_END and #NOT_LOOP_END are sometimes spelled as %LOOP_END and %NOT_LOOP_END.

ALS - Arithmetic-logical syllables

BitDescription
31Speculative mode
30:24Opcode
23:16Operand src1, or opcode extension
15:8Operand src2
7:0Operand src3, dst, or cmp opcode extension

See chapter 'Arithmetic-logical operations' for more information on the operands.

ALES - Arithmetic-logical extension syllables

BitDescription
15:8Opcode2
7:0src3 (in ALEF1) or opcode extension 2 or cmp opcode extension (in ALEF2)

CS - Control syllables

CS0 and CS1 encode different operations.

Syllablepatternnamedescription
CS0, CS10xxxxxxxset*setwd/setbn/setbp/settr
CS11xxxxxxxvrfpszvrfpsz + setwd/setbn/setbp/settr
CS02xxxxxxxputtsdputtsd with a multiple-of-8 parameter relative to the start of the current instruction
CS1200000xxsetei
CS128000000setsft
CS0, CS1300000xxwaitwait for specified kinds of operations to complete
CS04xxxxxxxdispprepare a relative jump in ctpr1
CS05xxxxxxxldispprepare an array prefetch program (?) in ctpr1
CS06xxxxxxxsdispprepare a system call in ctpr1
CS070000000returnprepare to return from procedure in ctpr1
CS08xxxxxxx+--disp/ldisp/sdisp/return with ctpr2
CS0cxxxxxxx+--disp/ldisp/sdisp/return with ctpr3
CS16xxxx000setmasSet memory address specifier for load and store operations
set*

The set* operation sets several parameters related to register windows. Most bits are encoded in the CS0 syllable itself, but some are also read from the LTS0 syllable.

According to ldis, setwd is always performed, but settr, setbn, and setbp have to be enabled by setting the corrsponding bits in CS0.

Syl.bitnamedescription
CS128enable vfrpsz
CS27enable settr
CS26enable setbn
CS25enable setbp
CS22:18setbp psz=x
CS17:12setbn rcur=x
CS11:6setbn rsz=x
CS5:0setbn rbs=x
LTS016:12vfrpsz rpsz=x
LTS011:5setwd wsz=x
LTS04setwd nfx=x
LTS03setwd dbl=x
wait
Bitnamedescription
5ma_cwait for all previous memory access operations to complete
4fl_cwait for all previous cache flush operations to complete
3ls_cwait for all previous load operations to complete
2st_cwait for all previous store operations to complete
1all_ewait for all previous operations to issue all possible exceptions
0all_cwait for all previous operations to complete
disp/ldisp/sdisp/return

The disp operation prepares a jump to a different location by using one of the control transfer preparation registers (ctpr1 to ctpr3).

bitdescription
31:30can be 1, 2, or 3 for ctpr1, ctpr2, or ctpr3 respectively
29:28can be 0, 1, 2, or 3, for disp, ldisp, sdisp, or return respectively
27:0offset or system call number

For disp and ldisp, the offset is relative to the start of the current instruction, and in multiples of eight bytes. For example, in an instruction at 0x1000, with CS0=40000042, we get disp %ctpr1, 0x1210.

ldisp is only allowed with ctpr2.

For sdisp, the system call number is not shifted. CS0=6000001a is sdisp %ctpr1, 0x1a.

The return operation doesn't take an offset. The offset field should be zero in this case.

setmas (setting the memory address specifier)

Memory address specifiers control multiple aspects of load and store operations. Their 7-bit format is described elsewhere.

The MAS can be independently specified for load and store operations, in CS1:

CS1 bitsdescription
27:21MAS for load operations
20:14MAS for store operations

Array Prefetch Instructions

Array prefetch instructions are run asynchronously on the array access unit. They are always 16 bytes long. To assemble array prefetch instructions, the mnemonic fapb is used. To call an array prefetch program, load its address with ldisp to %ctpr2 (no need to call or ct). Even though array prefetch instructions should only ever be called by ldisp and are not processed using the same facilities as regular instructions, they always seem to be terminated by a regular branch instruction. The maximum length of an array prefetch program is 32 instructions.

Arithmetic-logical operations

ALU operations are generally identified by several aspects:

  • The opcode field in the ALS
  • If a corrsponding ALES exists, the opcode2 field in the ALES
  • Opcode extension, opcode extension 2, and cmp opcode extension, depending on the opcode
  • The ALUs in which the operation can be performed. Sometimes the same opcode can mean different operations in different ALUs (numbered from 0 to 5)

The format of an arithmetic-logical operation (ALOPF) is determined by opcode, channel, and presence of an ALES. The presence and location of additional identifying criteria of an operation as well as operands depend on the ALOPF.

Other variations:

  • Some operations require two ALS
  • Some operations require a Memory Address Specifier (MAS) in CS1
  • Some operations have predicates. Some operations require additional data from CDS.
  • ALOPF1, ALOPF2, ALOPF3, ALOPF7, ALOPF8 require no ALES, all others seem to require an ALES.

Operands and other fields

Fieldencoded incomment
opcodeales[30:24]
opcode2ales[15:8]
opcode extensionals[23:16]
opcode extension 2ales[7:0]
cmp opcode extensionals[7:5] or ales[7:0]
src1als[23:16]source operand 1
src2als[15:8]source operand 2 - can encode access to literal syllables (LTS)
src3als[7:0] or ales[7:0]source operand 3 - for ALOPF3 and ALOPF13 it is in ALS, for ALOPF21 it is in ALES
dstals[7:0], or als[4:0] for predicate registersdestination register

src1 encoding

PatternRangeDescription
0xxx xxxx00-7fRotatable area procedure stack register
10xx xxxx80-bfprocedure stack register
110x xxxxc0-dfconstant between 0 and 31
111x xxxxe0-ffglobal register

src2 encoding

src2 that are not status register numbers are encoded as follows:

PatternRangeDescription
0xxx xxxx00-7fRotatable area procedure stack register
10xx xxxx80-bfprocedure stack register
1100 xxxxc0-cfconstant between 0 and 15
1101 000xd0-d1reference to 16 bit literal semi-syllable, low half of LTS0 or LTS1
1101 010xd4-d5reference to 16 bit literal semi-syllable, high half of LTS0 and LTS1
1101 10xxd8-dbreference to 32 bit literal syllable LTS0, LTS1, LTS2, or LTS3
1101 11xxdc-dereference to 64 bit literal syllable pair LTS1:LTS0, LTS2:LTS1, or LTS3:LTS2
111x xxxxe0-ffglobal register

Literal half-syllables are sign-extended on access. Thus, values 0-0x7fff and 0xffff8000-0xffffffff (-0x8000 to -1) can be encoded in a literal half-syllable.

src3 encoding

PatternRangeDescription
0xxx xxxx00-7fRotatable area procedure stack register
10xx xxxx80-bfprocedure stack register
111x xxxxe0-ffglobal register

dst encoding

dst that are not predicate register numbers or status register numbers are encoded as follows:

PatternRangeDescription
0xxx xxxx00-7fRotatable area procedure stack register
10xx xxxx80-bfprocedure stack register
1100 1101cd%tst
1100 1110ce%tc
1100 1111cf%tcd
1101 0001d1%ctpr1
1101 0010d2%ctpr2
1101 0011d3%ctpr3
1101 1110de%empty.lo
1101 1111df%empty.hi
111x xxxxe0-ffglobal register

opcode2 values

Opcode2Name
0x01EXT
0x02EXT1
0x03EXT2
0x04FLB
0x05FLH
0x06FLW
0x07FLD
0x08ICMB0
0x09ICMB1
0x0aICMB2
0x0bICMB3
0x0cFCMB0
0x0dFCMB1
0x0ePFCMB0
0x0fPFCMB1
0x10LCMBD0
0x11LCMBD1
0x12LCMBQ0
0x13LCMBQ1
0x16QPFCMB0
0x17QPFCMB1

Arithmetic-logical operation formats (ALOPF)

Several operand formats are defined.

FormatHas ALES?src1src2src3dstopcode extopcode ext 2cmp opcode extExampleComment
1xxxadds, ld{b,h,w,d}
2xxxmovx, popcnts
3xxals[7:0]st{b,h,w,d}
7xxxals[7:5]cmposbdst is a predicate register
8xxals[7:5]cctopodst is a predicate register
11xxxxxmuls
11 (with literal)xxxxpsllqhThese opcodes require a literal in ales[7:0]
12xxxxxfsqrtsOpcode pshufh is special as it requires a literal in ales[7:0].
13xxxals[7:0]xstq
15xxxxrws, rwddst is a status register; opcode2 is EXT; opcode extension 2 is 0xc0
16xxxxrrs, rrdsrc2 is a status register; opcode2 is EXT; opcode extension 2 is 0xc0
17xxxxales[7:0]pcmpeqbopdst is a predicate register; opcode2 is EXT1
21xxxales[7:0]xincs_fb
22xxxxxmovtqopcode2 is EXT; ALES opcode extension is 0xc0

For the locations of operands where none is explicitly specified here, see table 'Operands and other fields'.

TODO: ALOPF5, ALOPF6, ALOPF7, ALOPF9, ALOPF10, ALOPF19

NOTE: ALOPF9 and ALOPF10 have a 16 bit opcode extension

List of operations

The following tables are grouped by opcode2 and sorted by opcode.

Short operations (without ALES)

OpcodeALUsnameALS[23:16]ALS[15:8]ALS[7:0]data widthdescription
0x00allandssrc1src2dst32 bitsCompute bit-wise AND of src1 and src2, store result in dst
0x01allanddsrc1src2dst64 bitsCompute bit-wise AND of src1 and src2, store result in dst
0x10alladdssrc1src2dst32 bitsCompute bit-wise AND of src1 and src2, store result in dst
0x11alladddsrc1src2dst64 bitsCompute bit-wise AND of src1 and src2, store result in dst
0x2425stbsrc1src2src38 bitsstore 8-bit value from src3 to address at src1+src2
0x2525sthsrc1src2src316 bitsstore 16-bit value from src3 to address at src1+src2
0x2625stwsrc1src2src332 bitsstore 32-bit value from src3 to address at src1+src2
0x260134bitrevs0xc0src2dst32 bits
0x2725stdsrc1src2src364 bitsstore 64-bit value from src3 to address at src1+src2
0x270134bitrevd0xc0src2dst64 bits
0x640235ldbsrc1src2dst8 bitsload 8-bit value from address at src1+src2, store into dst
0x650235ldhsrc1src2dst16 bitsload 16-bit value from address at src1+src2, store into dst
0x660235ldwsrc1src2dst32 bitsload 32-bit value from address at src1+src2, store into dst
0x670235lddsrc1src2dst64 bitsload 64-bit value from address at src1+src2, store into dst

EXT (opcode2 = 1)

OpcodeALUsnameALS[23:16]ALS[15:8]ALS[7:0]ALES[7:0]data widthdescription
0x580getsp0xecsrc2dstunused32 -> 64Add src2 to user stack pointer, store in user stack pointer and dst