#This file was created by LinuxDoc-SGML #(conversion : Frank Pavageau and Jose' Matos) \lyxformat 2.15 \textclass linuxdoc \language default \inputencoding default \fontscheme default \papersize Default \paperfontsize default \spacing single \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \defskip medskip \quotes_language default \quotes_times 2 \paperorientation portrait \papercolumns 1 \papersides 1 \paperpagestyle default \layout Title \added_space_top vfill \added_space_bottom vfill F-CPU Architecture: Register Organization \layout Author Andrew D. Balsa (\begin_inset LatexDel \htmlurl{ \end_inset mailto:andrebalsa@altern.org \begin_inset LatexDel }{ \end_inset andrebalsa@altern.org \begin_inset LatexDel } \end_inset ) \layout Date August 1998 \layout Abstract A description of the register organization of the Freedom CPU architecture. \layout Section Introduction \layout Standard One of the most important features defining a CPU architecture is the internal register organization, and the F-CPU architecture breaks with tradition here: we have chosen a memory-to-memory architecture (with some twists). \layout Standard Among the advantages of this architecture, one of the most important ones is the fact that any patents relative to register-register architectures are automatically avoided. Also, we are innovating, and innovation has a value by itself. \layout Standard But the choice of a memory-to-memory architecture was also motivated by technical, performance reasons: \layout Enumerate We wanted to avoid the usual cycle where variables are loaded from memory into registers, then processed, then returned to memory. These memory-to-CPU-to-memory cycles have three disadvantages: a) they increase the latency for processing any variable, b) they are useless, but increase the number of instructions needed for even the simplest assignment and c) the compiler has to work harder at optimizing away these register load/unload cycles. \layout Enumerate The presence of large, fast L1 caches inside CPUs and the constant improvements in VLSI technology allow more choices when it comes to an efficient memory hierarchy, compared to the "traditional" memory hierarchy in Von Neumann machines: main memory <---> caches <---> CPU registers. In fact, we skip an entire level, since all we have now is: main memory <---> caches. \layout Enumerate A large register set means a longer context switch latency, because of the time needed to save the clobbered registers. By avoiding the use of registers, the issue is entirely avoided. \layout Enumerate The issue of register windows has been previously researched and the conclusion is that the benefits brought by this technique are independent of either the CISC or RISC basic architectural choice. So we decided to include this "twist" in our memory-to-memory architecture. In fact, it's very easy to implement a "memory window" feature on top of a memory-to-memory architecture. \layout Enumerate Since now all data move instructions are basically memory-to-memory move instructions, the resulting instruction set is simpler and more orthogonal. The general purpose registers are truly general purpose. Greatly simplifies the user-visible machine. \layout Enumerate We also have the advantages of low register pressure and graceful performance degradation in those special cases where many variables are needed. \layout Enumerate Simplifies the internal CPU architecture. We have the L0 data cache and the ALU directly connected to the internal CPU bus. Also simplifies control unit structure. We get the advantages of a large register set without spending any silicon real estate for this. \layout Enumerate We can now tie our basic CPU clock frequency to how fast we can make the L0 data cache operate. This simplifies the basic job of the F-1 VLSI implementation team. \layout Section Register Organization Implementation \layout Standard The F-CPU architecture has the standard Program Counter or instruction pointer (PC) 64-bit register, and also a standard Status or flags (ST) 32-bit register. \layout Standard And it has a Memory Window (MW) 64-bit register, which contains the base address of the active memory window. This memory window is a set of 32 8-byte blocks that can be accessed using short versions of the standard instructions. Data pointed to by a memory window is usually already in the L0 data cache, providing zero-latency accesses. At any instant, there are 32 possible memory windows that can be accessed (with an 8KB L0 data cache). \layout Standard The F-CPU architecture also has many dedicated registers to control: \layout Standard a) CPU Configuration \layout Standard b) Memory Regions \layout Standard c) FPU Control \layout Standard d) Multiprocessing \layout Standard e) Paging \layout Standard f) Segmentation \layout Standard g) Interrupt Processing \layout Standard h) Coprocessor Control \layout Standard i) Performance Monitoring \layout Standard j) TimeStamp Counter Control \layout Standard k) Reconfigurable Logic Control \layout Section Instruction Set \layout Subsection Addressing Modes \layout Standard In a memory-to-memory architecture there is obviously no distinction between a register-based addressing mode and a memory-based addressing mode. An interesting feature is also that there is not much sense in including immediate addressing modes, since these can just as well be thought of as PC-relative memory-to-memory operations. \layout Standard Consequently, the addressing modes of the F-CPU architecture are few and simple: \layout Itemize direct. \layout Itemize indirect. \layout Itemize PC-relative direct. \layout Itemize PC-relative displacement. \layout Standard Other addressing modes can be synthesized using parallelizable instruction sequences, hence with near zero-cost in terms of performance. \layout Subsection Instruction Format \layout Standard With a 64-bit instruction format, few addressing modes, external FPU and coprocessors and generally regular instruction set, the F-CPU architecture has a very simple instruction format. All instructions are 64-bit long. \layout Standard Two bits decode into the following four classes of instructions: \layout Itemize Standard ALU and branch instructions. \layout Itemize Control instructions. \layout Itemize DMA instructions. \layout Itemize FPU and coprocessor instructions. \layout Standard Since the F-CPU instruction set is regular with respect to the size of data, all instructions can address either a byte (8 bits), a word (16 bits), a double word (or double, meaning 32 bits) or a quad word (or quad, meaning 64 bits). Two bits are thus spent. There are no alignment requirements. \layout Standard Referencing any one of the memory addresses in the current active window takes 5 bits per argument. In any of the other 31 windows takes an extra 5 bits. Anywhere in memory takes obviously a full 64-bit address. \layout Standard \the_end