74 points by walterbell 5 days ago
The actual simulation is at http://visual6502.org/sim/varm/armgl.html
A (extremely terse, partial) description can be found at http://www.righto.com/2016/02/reverse-engineering-arm1-proce...
I wrote several articles about the ARM1 when this simulator was released in 2015. A better article to start with is: http://www.righto.com/2015/12/reverse-engineering-arm1-ances...
Dave Mugridge also wrote some articles about the ARM1, focusing more on the ALU and registers: https://daveshacks.blogspot.com/2015/12/inside-alu-of-armv1-...
Nice! One thing though:
For http://visual6502.org/sim/varm/armgl.html, would be much nicer that dragging would pan rather than "3D rotate" the view. The panning with wasd is too slow and not compatible with some keyboard layouts.
And of course zooming around mouse cursor rather than around center of screen would also help to zoom towards the part you want.
The 3D rotation is gimmicky but not actually useful to see the gates, and the current UI just doesn't let me zoom to gates I want without spending too much effort fighting the slow panning and the zooming target.
I had a question, the article states the following:
>"One very nice thing about the 32-bit instruction set is its pervasive conditional execution, which helps one avoid branching over code. For example, this sequence of instructions resets the register r0 to 0 if its value is equal to or less than zero, or forces its value to 1 if its value is greater than zero:
CMP r0, #0 ; if (r0 <= 0)
MOVLE r0, #0 ; r0 = 0;
MOVGT r0, #1 ; else r0 = 1
Without the conditional moves (MOVLE and MOVGT) after the compare (CMP), you'd have to branch after the compare, which is wasteful."
How are those those two conditional moves after the CMP operation more efficient than branching? Aren't they kind of branches themselves? What would the alternative "branching" sequence look like then?
It'd look something like
All of that being said, the branching version tends to be nicer for OoO cores since there aren't data dependencies on the flag registers any more, hence why you see RISC ISAs designed for OoO cores removing conditional execution for most instructions (AArch64 and RISC-V standout here).
In the ARM2 era (probably the same for ARM1?) a basic ALU instruction such as MOV took 1 cycle, and a branch took 4 (if taken) or 1 (if not). (There were extra DRAM page cycles every 4 words too)
So for a simple if/else, it was usually both less code and faster to use a straight line of conditional instructions. In more complicated cases, if the programmer was feeling clever, it was possible to update the status flags to get three-way (or more!) conditionals in straight-line branchless code. Fun!
The conditional moves convert into a NOP when the condition is false.
The idea here is that a branch results in a pipeline flush which takes a couple of cycles to refill.
In practice, most CPUs have very good branch predictors these days and conditional moves aren’t all that useful anymore.
That’s probably the reason why they don’t exist for later ISAs such as RISC-V.
A real branch would involve a jump instruction somewhere. With a branch, different code executes depending on the condition. With the code above, you get different data depending on the condition.
Thanks for all the great explanations and insights. I really appreciate it. Cheers.
The article states:
>"The ARM2 had pretty much the same instruction set as the ARM1, although featured new multiplication and (later) atomic swap instructions."
Does this mean that the ARM1 didn't support any atomic operations or were they using something else besides "compare and swap"?
The ARM1 did not have any atomic operation. You only need those if you have more than one processor. It also lacked the multiply and multiply-accumulate instructions, as stated above. These took multiple cycles, which is not very RISC-like. That is also true of the load multiple and store multiple instructions of the ARM2 (I don't remember if the ARM1 had them). The ARM2 also added the coprocessor interface.
Oops - in the analysis of the PLA2 in the ARM1 there are both the load/store multiple instructions and the coprocessor stuff. In fact, together they take up about half of the logic. So I was remembering it wrong, then.
Ah of course there were no mutli-cores back then. That makes total sense. Thanks.
Does anyone else find it slightly entertaining that this is an article from a news outlet titled "The Register"?
I never thought about it, but now that you mention it, it is a great name for an IT news site. ;-)