LLVM IR

LLVM intermediate representation (IR) is the language LLVM uses in the middleware for passes to be performed on. One can think of it as a platform-agnostic assembly, built to interface between a wide variety of coding languages and computer architectures.

Formats

Just like with assembly, there is both a human-readable format (.ll) and a bitcode format (.bc). LLVM IR can be loaded with the LLVM C++ library where everything will be represented as objects. You can output the human-readable format with the command clang -S -emit-llvm foo.c.

Structures

IR has a hierarchical structure consisting of modules containing global variables and functions, functions contain basic blocks, and basic blocks contain instructions.

A basic block is the most fundamental unit in control flow. It consists of a list of instructions with only one entry and one exit. This means that once a basic block has been entered, every instruction in it will be executed sequentially.

Since basic blocks are the most fundamental unit in control flow, they are used as the nodes in control flow graphs (CFGs) and the edges represent how a program can jump between basic blocks.

Loops are defined as multiple basic blocks that have at least one edge back to an ancestor.

A call graph is like a CFG but the nodes are functions or subroutines.

Values

A value is not just a variable but the base construct of a lot of things in LLVM—instructions, basic blocks, and operators. You can see a graph showing its subclasses in the LLVM reference.

Iterating LLVM Units

While iterating through instructions in a basic block is simple enough as it is simply a sequence, the issue becomes more complicated when we consider the relationships between basic blocks and functions where execution is not a sequence but a tree.

More information on this [here].

Static Single-Assignment (SSA) Form

Static single-assignment form means that variables can only be assigned once as constants—there are no varying (or more precisely, mutating) variables! This can be quite annoying for performing analyses since it requires a slight shift in thinking about how data evolves in a program.

More information on this [here].

---

~ Jakob Nacanaynay
(nack-uh-nigh-nigh)
he/him/his