Computers Introduction: Assembler

An assembler creates object code by translating assembly instruction mnemonics into opcodes, and by resolving symbolic namesfor memory locations and other entities.^[1] The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most assemblers also include macro facilities for performing textual substitution—e.g., to generate common short sequences of instructions as inline, instead of called subroutines.

Assemblers are generally simpler to write than compilers for high-level languages,^{[citation needed]} and have been available since the 1950s. Modern assemblers, especially for RISC architectures, such as SPARC or POWER, as well as x86 and x86-64, optimizeInstruction scheduling to exploit the CPU pipeline efficiently.^{[citation needed]}

Number of passes

There are two types of assemblers based on how many passes through the source are needed to produce the executable program.

One-pass assemblers go through the source code once. Any symbol used before it is defined will require "errata" at the end of the object code (or, at least, no earlier than the point where the symbol is defined) telling the linker or the loader to "go back" and overwrite a placeholder which had been left where the as yet undefined symbol was used.
Multi-pass assemblers create a table with all symbols and their values in the first passes, then use the table in later passes to generate code.
In both cases, the assembler must be able to determine the size of each instruction on the initial passes in order to calculate the addresses of symbols. This means that if the size of an operation referring to an operand defined later depends on the type or distance of the operand, the assembler will make a pessimistic estimate when first encountering the operation, and if necessary pad it with one or more "no-operation" instructions in a later pass or the errata. In an assembler with peephole optimization addresses may be recalculated between passes to allow replacing pessimistic code with code tailored to the exact distance from the target.

The original reason for the use of one-pass assemblers was speed of assembly; however, modern computers perform multi-pass assembly without unacceptable delay. The advantage of the multi-pass assembler is that the absence of a need for errata makes the linker (or the loader if the assembler directly produces executable code) simpler and faster.^[2]

High-level assemblers

More sophisticated high-level assemblers provide language abstractions such as:

Advanced control structures
High-level procedure/function declarations and invocations
High-level abstract data types, including structures/records, unions, classes, and sets
Sophisticated macro processing (although available on ordinary assemblers since late 1950s for IBM 700 series and since 1960's for IBM/360, amongst other machines)
Object-oriented programming features such as classes, objects, abstraction, polymorphism, and inheritance^[3]

See Language design below for more details.

Assembly language

A program written in assembly language consists of a series of (mnemonic) processor instructions and meta-statements (known variously as directives, pseudo-instructions and pseudo-ops), comments and data. Assembly language instructions usually consist of an opcode mnemonic followed by a list of data, arguments or parameters.^[4]These are translated by an assembler into machine language instructions that can be loaded into memory and executed.

For example, the instruction that tells an x86/IA-32 processor to move an immediate 8-bit value into a register. The binary code for this instruction is 10110 followed by a 3-bit identifier for which register to use. The identifier for the AL register is 000, so the followingmachine code loads the AL register with the data 01100001.^[5]

10110000 01100001

This binary computer code can be made more human-readable by expressing it in hexadecimal as follows

B0 61

Here, B0 means 'Move a copy of the following value into AL', and 61 is a hexadecimal representation of the value 01100001, which is 97 in decimal. Intel assembly language provides the mnemonic MOV (an abbreviation of move) for instructions such as this, so the machine code above can be written as follows in assembly language, complete with an explanatory comment if required, after the semicolon. This is much easier to read and to remember.

MOV AL, 61h       ; Load AL with 97 decimal (61 hex)

In some assembly languages the same mnemonic such as MOV may be used for a family of related instructions for loading, copying and moving data, whether these are immediate values, values in registers, or memory locations pointed to by values in registers. Other assemblers may use separate opcodes such as L for "move memory to register", ST for "move register to memory", LR for "move register to register", MVI for "move immediate operand to memory", etc.

The Intel opcode 10110000 (B0) copies an 8-bit value into the AL register, while 10110001 (B1) moves it into CL and 10110010 (B2) does so into DL. Assembly language examples for these follow.^[5]

MOV AL, 1h        ; Load AL with immediate value 1
MOV CL, 2h        ; Load CL with immediate value 2
MOV DL, 3h        ; Load DL with immediate value 3

The syntax of MOV can also be more complex as the following examples show.^[6]

MOV EAX, [EBX]    ; Move the 4 bytes in memory at the address contained in EBX into EAX
MOV [ESI+EAX], CL ; Move the contents of CL into the byte at address ESI+EAX

In each case, the MOV mnemonic is translated directly into an opcode in the ranges 88-8E, A0-A3, B0-B8, C6 or C7 by an assembler, and the programmer does not have to know or remember which.^[5]

Transforming assembly language into machine code is the job of an assembler, and the reverse can at least partially be achieved by adisassembler. Unlike high-level languages, there is usually a one-to-one correspondence between simple assembly statements and machine language instructions. However, in some cases, an assembler may provide pseudoinstructions (essentially macros) which expand into several machine language instructions to provide commonly needed functionality. For example, for a machine that lacks a "branch if greater or equal" instruction, an assembler may provide a pseudoinstruction that expands to the machine's "set if less than" and "branch if zero (on the result of the set instruction)". Most full-featured assemblers also provide a rich macro language (discussed below) which is used by vendors and programmers to generate more complex code and data sequences.

Each computer architecture has its own machine language. Computers differ in the number and type of operations they support, in the different sizes and numbers of registers, and in the representations of data in storage. While most general-purpose computers are able to carry out essentially the same functionality, the ways they do so differ; the corresponding assembly languages reflect these differences.

Multiple sets of mnemonics or assembly-language syntax may exist for a single instruction set, typically instantiated in different assembler programs. In these cases, the most popular one is usually that supplied by the manufacturer and used in its documentation.

Computers Introduction

Who Gave The Idea Of First Computer?

About Me

Chitika

Search This Blog

Pages

Saturday, 10 March 2012

Assembler

Number of passes

High-level assemblers

Assembly language

0 comments:

Post a Comment

Popular Posts

Blog Archive

Followers

Popular Posts

Popular Posts(LAST 7 DAYS)

Site Stats