Assembly Language Guide for beginners

Assembly language is the oldest programming language. Of all languages, it closely resembles the native language of a computer.

Direct access to a computer’s hardware.
To understand a great deal about your computer’s architecture and operating system.

What background should I have?

Computer Programming (C++, C#, Java, etc)

What is an Assembler?

An assembler is like a translator for computers. It takes instructions written in a language that’s easy for humans to understand (Assembly Language) and converts them into a language that computers can directly follow (Machine Code).

💡

An Assembler is a software that converts an assembly language code to machine code.

MASM (Microsoft Assembler),
NASM (Netwide Assembler)
TASM (Borland turbo Assembler)

What is a Linker?

A linker is an important utility program that takes the object files, produced by the assembler to join them into a single executable file. There are two types of linkers, dynamic and linkage.

Both linkers are used to create executable files (EXEs) or dynamic-link libraries (DLLs) from object files. LINK.EXE is the 16-bit linker and LINK32.EXE is the 32-bit linker.

What is a Debugger?

A debugger is a tool that allows you to examine the state of a running program, including the contents of memory locations and registers.

MASM CodeView, Turbo Debugger, and Visual Studio are debuggers that allow you to step through code, examine registers and memory, and set breakpoints.

What hardware/software do I need for assembly language?

A computer with an Intel386, Intel486 or one of the Pentium processors (IA-32 processor family).
OS: Microsoft Windows, MS-DOS, Linux running a DOS emulator. Editor (VS Code, Sublime Text, Notepad++ etc), Linker, Assembler, Debugger.

What types of programs will I create?

6-Bit Real-Address Mode: In 16-bit real-address mode, you can create programs that are small and simple. These programs can access a maximum of 1 MB of memory and cannot use multitasking or memory protection features. e.g., Text editors, Calculators, Games, Device drivers etc.

32-Bit Protected Mode: In 32-bit protected mode, you can create more complex programs that can access more memory and use multitasking and memory protection features. e.g., Operating systems, Word processors, Spreadsheets, Databases, Web browsers etc.

How does assembly language (AL) relate to machine language?

Assembly language has a one-to-one relationship with machine language. Each assembly language instruction corresponds directly to a specific machine language instruction that the computer’s central processing unit (CPU) can understand and execute.

What will I learn?

Basic Boolean logic: Learn the principle of Boolean algebra and how logic gates work.
Basic principle of computer architecture: Understand the fundamental components of a computer system and how they interact.
IA-32 Processors and Memory Management: Delve into the architecture of IA-32 processors and explore memory management modes.
High-Level Language Compilation: Discover how high-level programming languages (such as C++) are translated into machine code.
Machine-Level Debugging: Improve your skills in debugging programs at the machine level.
Interaction with the Operating System: Explore how application programs communicate with an operating system.

How do C++ and Java Relate to AL?

C++ and Java are high–level programming languages which means they are designed to be more human-readable and easier to work with as compared to assembly language and machine code. Programs written in high level programming languages can be converted into assembly code using compilers.

Is AL portable?

💡

A language whose source code can be compiled and run on a wide variety of computer systems is said to be portable.

Assembly language (AL) is generally not portable across different computer architectures or processor families. Each computer architecture (e.g., x86, ARM, MIPS) has its own specific assembly language with unique instructions and syntax.

Why learn AL?

Assembly language is used to write programs for embedded systems, which are small, specialized computers that control devices like cars, appliances, and medical equipment.
It can be used to write highly optimized programs that are very fast and efficient.
It can help you to understand the inner workings of the computer hardware.
Assembly is often used to write device drivers, which are programs that control the hardware devices that are connected to a computer.

Are there any rules in AL?

💡

Yes, there are a few rules, mainly due to the physical limitations of the processor and its native instruction set.

Assembly Language Application

It is rare to see large application programs written completely in assembly language because they would take too much time to write and maintain.

AL is used to optimize certain sections of application programs for speed and to access computer hardware. It can be used in

Business application for single & multiple platforms
Hardware device driver
Embedded systems & computer games.

Comparing ASM to High-Level Languages

Type of Application	High-Level Languages	Assembly Languages
Business application software, written for single platform, medium to large size	Easy to organize and maintain large sections of code. Portable between operating systems.	Difficult to maintain. Not portable between operating systems.
Hardware device driver	May not provide for direct hardware access. Awkward coding techniques may be required.	Easy to access hardware directly.
Business application written for multiple platforms (different operating systems)	Portable between operating systems.	Not portable between operating systems.
Embedded systems and computer games requiring direct hardware access	Difficult to maintain. Not portable between operating systems.	Easy to access hardware directly. Ideal for small, efficient programs.

Virtual Machine Concept

It is an effective way to explain how a computer’s hardware and software are related.

In terms of programming languages:

Each computer has a native machine language (language L0) that runs directly on its hardware.
A more human-friendly language is usually constructed above machine language, called Language L1.

💡

Programs written in L1 can run in two different ways:
Interpretation - L0 program interprets and executes L1 instructions one-by-one.
Translation - L1 program is completely translated into an L0 program, which then runs on the computer hardware.

In terms of a hypothetical computer:

VM1 for Language L1: First, you create a virtual machine (VM1) that understands and executes commands written in language L1.
VM2 for Language L2: Then, you create a virtual machine (VM2) that understands and executes commands written in language L2.

The process can repeat until a virtual machine VMn can be designed that supports a powerful, easy-to-use language. This language is easy for humans to write and understand while being able to perform complex tasks.

The Java programming language is based on the virtual machine concept.

A program written in the Java language is translated by a Java compiler into Java Byte code.
Java Byte Code: A low-level language that is quickly executed at run time by Java virtual machine (JVM).
The JVM has been implemented on many different computer System, making Java programs relatively system-independent.

Specific Machine Levels

Level 0 - Digital Logic

Digital logic system is the lowest level of computer architecture, consisting of logic gates that are used to construct the CPU and other components.

Level 1 - Microarchitecture

Microarchitecture is the way that a computer's processor interprets and executes instructions. It is a lower level of abstraction than the instruction set architecture (ISA), which is the set of instructions that the processor can understand. The microarchitecture is typically a proprietary secret of the processor's manufacturer, and it is not generally possible for average users to write microinstructions.

Level 2 - Instruction Set Architecture

Instruction set architecture (ISA) is the set of instructions that a computer processor can understand and execute. It is a level of abstraction that is above the physical hardware, but below the programming languages that humans use.

Level 3 - Operating System

An operating system (OS) is a software program that manages the computer's hardware and software resources and provides common services for computer programs. It is a level of abstraction that sits between the hardware and the user.

Level 4 - Assembly Language

Assembly language is a low-level programming language that is used to control the hardware of a computer. It uses instruction mnemonics such as ADD, SUB, and MOV that are easily translated to the instruction set architecture level. Assembly language programs are usually translated into machine language before they begin to execute.

Level 5 - High-level Language

High-level languages are programming languages that are designed to be easy for humans to read and write. They are usually compiled into assembly language (level 4) before they can be executed. Some popular high-level languages include C++, C#, Java, and Python.

Data Representation

Binary Numbers

Binary digits are 1 and 0 and they represent true and false respectively.

MSB – most significant bit The bit with the greatest weight. In a binary number, the bits are numbered from right to left, the leftmost bit is the MSB.
LSB – least significant bit The bit with the least weight. In a binary number, the LSB is the rightmost bit.

Each digit (bit) is either 1 or 0
Each bit represents a power of 2.

e.g., the binary number for 143 is 10001111. It has 8 bits, each of which represents a power of 2.

Bit	Value	Power of 2
7	1	2^7 = 128
6	0	2^6 = 64
5	0	2^5 = 32
4	0	2^4 = 16
3	1	2^3 = 8
2	1	2^2 = 4
1	1	2^1 = 2
0	1	2^0 = 1

The value is 128 + 8 + 4 + 2 + 1 = 143.

Binary to Decimal Conversion

To convert a unsigned binary number into a decimal number.

bit1 * 2^0 + bit2 * 2^1 . . .

here bit1 is rightmost bit.

e.g., convert 101 into decimal.

1 * 2^0 + 0 * 2^1 + 1 * 2^2 = 5

Decimal to Binary Conversion

To convert a decimal number into binary, divide the number by 2 repeatedly until we get 0 as quotient and then arrange remainders from bottom to top to get binary number.

e.g, to convert 143

Number	Quotient	Remainder
143	71	1
71	35	1
35	17	1
17	8	1
8	4	0
4	2	0
2	1	0
1	0	1

So the binary number of 143 is 10001111.

Binary Addition

💡

Easy to learn!

Integer Storage Sizes

A word is a standard unit of data for a certain processor architecture. The fundamental data types of the Intel Architecture are bytes, words, doublewords, and quadwords.

💡

Largest unsigned integer that may be stored in 20 bits is 1,048,575 (2^20 - 1).

Storage Size	Number of bits	Range of unsigned integers
Byte	8 bits	0 to 255 (2⁸ - 1)
Word	16 bits	0 to 65,535 (2¹⁶ - 1)
Double word	32 bits	0 to 4,294,967,295 (2³² - 1)
Quadword	64 bits	0 to 18,446,744,073,709,551,615 (2⁶⁴ - 1)

Hexadecimal Integers

All values in memory are stored in binary. Because long binary numbers are hard to read, we use hexadecimal representation.

Decimal	Binary	Hexadecimal
0	0000	0
1	0001	1
2	0010	2
3	0011	3
4	0100	4
5	0101	5
6	0110	6
7	0111	7
8	1000	8
9	1001	9
10	1010	A
11	1011	B
12	1100	C
13	1101	D
14	1110	E
15	1111	F

Binary to Hexadecimal

To convert binary to hexadecimal, make groups of 4 bits from rightmost and then convert them into hexadecimal using above table.

💡

Convert 101101010011110010100 to Hexadecimal

Bit group	Hexadecimal
0100	4
1001	9
0111	7
1010	A
0110	6
0001	1

16A794 is its hexadecimal.

💡

You can convert Binary to Hexadecimal simply by converting each hexadecimal number into binary, bingo!

Hexadecimal to Decimal

You can apply this formula too. digit1 * 16^0 + digit2 * 16^1 . . .

here digit1 is the rightmost hexadecimal number.

e.g., to convert 3BA4 4 * 16^0 + A * 16^1 + B * 16^2 + 3 * 16^3

4 * 1 + 10 * 16 + 11 * 256 + 3 * 4096

4 + 160 + 2816 + 12288 = 15268

15268 is the decimal conversion of 3BA4.

Decimal to Hexadecimal

Divide the decimal number until get quotient 0 and then arrange remainders from bottom to top to get the hexadecimal number.

e.g., convert 422

Number	Quotient	Remainder
422	26	6
26	16	10 (A)
16	0	1

1A6 is the hexadecimal equivalent of 422.

Hexadecimal Addition

💡

Easy to learn!

Hexadecimal Subtraction

💡

Practical Question The address of var1 is 00400020. The address of the next variable after var1 is 0040006A. How many bytes are used by var1?

💡

004006A - 00400020 = 4A (74 bytes in decimal)

Signed Integers

The most significant bit (MSB) represents the sign of a binary number i.e., 0 is for positive and 1 is for negative.

💡

If the highest digit of a hexadecimal integer is > 7, the value is negative. Examples: 8A, C5, A2, 9D.

Decimal	Signed Binary
0	0000
1	0001
2	0010
3	0011
4	0100
5	0101
6	0110
7	0111
-0	1000
-1	1001
-2	1010
-3	1011
-4	1100
-5	1101
-6	1110
-7	1111

Two's Compliment

Two's complement is a way of representing signed integers in binary.

💡

It can be used to convert a positive number into a negative.

To do this, invert all the bits (change 0s to 1s and 1s to 0s) and then add 1 to the number.

10001111 is +143.

1. Invert all bits: 01110000

2. Add 1 to 01110000 01110000 + 00000001 = 01110001

01110001 is -143.

Note 10001111 + 01110001 = 00000000

Binary Subtraction using 2s Compliment

To subtract two binary numbers using 2's complement, we do the following:

Find the 2's complement of the subtrahend.
Add the 2's complement of the subtrahend to the minuend.

If there is a carry-over, ignore it.

Solve 1011 - 0101

2's complement of the subtrahend = 1010

1011 + 1010 = 1101

💡

The result of the subtraction is the 2's complement of the difference. To get the actual difference, we take the 2's complement of the result.

2's complement of 1101 = 0010 Therefore, the difference is 0010.

Learn how to do the following

Forming the two's complement of a hexadecimal integer

Convert the hexadecimal number to Binary and then take the 2s complement of that number and convert it back to hexadecimal.

For example, the two's complement of the hexadecimal integer AB is 55.

AB = 10101011 (binary)

~AB = 01010100 (binary)

1 + ~AB = 01010101 (binary)

Converting signed binary to decimal

If the most significant bit (MSB) is 0, the number is positive. Otherwise, the number is negative.
If the number is positive, convert the binary number to decimal as usual.
If the number is negative, take the two's complement of the binary number and add 1. Then, convert the resulting binary number to decimal.

Converting signed decimal to binary

If the number is positive, convert it into binary as usual.
In case of a negative number, first, ignore the sign and convert it into binary and then take 2s compliment of that binary number.

e.g., Convert -10 to binary.

10 is 1010 in binary
2s compliment of 1010 is 0110
0110 is -10 in signed binary.

Ranges of Signed Integers

Data Type	Minimum and Maximum	Power of 2
signed byte	-128 to 127	-2^7 to 2^7-1
signed word	-32,768 to 32,767	-2^15 to 2^15-1
signed double word	-2,147,483,648 to 2,147,483,647	-2^31 to 2^31-1
signed quadword	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807	-2^63 to 2^63-1

💡

One bit is reserved for signs.

Character Storage

The character set is a collection of characters that can be used to represent text. The most common character set is ASCII (American Standard Code for Information Interchange), which uses 7 bits to represent each character. This means that there are a total of 128 characters in ASCII, including letters, numbers, punctuation marks, and control characters.

Standard ASCII is the first 127 characters of the ASCII character set. It includes all of the characters that are necessary for basic text processing.

Extended ASCII is the range of characters from 128 to 255 in the ASCII character set. It includes characters that are not used in basic text processing, such as graphics symbols and Greek characters.
A null-terminated string is a string of characters that is terminated by a special character called a null byte.
ASCII table is a reference table that maps each ASCII character to its corresponding numeric value.
💡

String "ABC123" is stored as a sequence of 6 ASCII characters: 41h, 42h, 43h, 31h, 32h, and 33h.

Numeric Data Representation

Numeric data can be represented in two ways: pure binary and ASCII digit string.

Pure binary is a number stored in its raw binary format.
ASCII digit string is a string of ASCII characters that represent a number.
Pure binary is a raw format that can be directly calculated by computers.
ASCII digit string is a human-readable format that uses ASCII characters to represent numbers.

Boolean Algebra

Boolean algebra deals with logical operations on binary variables. It was founded by George Boole in the mid-19th century. Boolean algebra is used in many areas of computer science, including digital logic, circuit design etc.

The basic operations of Boolean algebra are AND, OR, and NOT.
A truth table shows all the inputs and outputs of a Boolean function.

NOT Operation

Inverts a Boolean Value.

Input A	NOT A
0	1
1	0

AND Operation

Gives true if both values are true and false otherwise.

Input A	Input B	Output
0	0	0
0	1	0
1	0	0
1	1	1

OR Operation

Gives true if at least one value is true and false otherwise.

Input A	Input B	Output
0	0	0
0	1	1
1	0	1
1	1	1

💡

Operator precedence order is NOT (highest) then AND and then OR.

Assembly Language Chapter 1

Questions Related to the AL