LLVM is an acronym that stands for low level virtual machine. It also refers to a compiling technology called the LLVM project, which is a collection of modular and reusable compiler and toolchain technologies. The LLVM project has grown beyond its initial scope as the project is no longer focused on traditional virtual machines.

Diagram depicting LLVM (low level virtual machine) being used as a collection of modular and reusable compiler and toolchain technologies for data.
Image from Jacopo Mangiavacchi via Medium


What Does LLVM Stand For?

LLVM was originally an acronym for low level virtual machine. The LLVM project started in 2000 as research at the University of Illinois. It studied compilation techniques for dynamic programming languages and LLVM static analysis. The goal of the LLVM project was to provide a SSA-based (static single assignment) compilation strategy. Today, LLVM has broadened its scope to include many different academic research, commercial and open source projects that have little relationship to virtual machines.

What is LLVM?

LLVM is a compiler and a toolkit for building compilers, which are programs that convert instructions into a form that can be read and executed by a computer.

The LLVM project is a collection of modular and reusable compiler and toolchain technologies. LLVM helps build new computer languages and improve existing languages. It automates many of the difficult and unpleasant tasks involved in language creation, such as porting the outputted code to multiple platforms and architectures.

How Is LLVM Different From GCC?

LLVM and the GNU Compiler Collection (GCC) are both compilers. The difference is that GCC supports a number of programming languages while LLVM isn’t a compiler for any given language. LLVM is a framework to generate object code from any kind of source code.

While LLVM and GCC both support a wide variety languages and libraries, they are licensed and developed differently. LLVM libraries are licensed more liberally and GCC has more restrictions for its reuse.

When it comes to performance differences, GCC has been considered superior in the past. But LLVM is gaining ground.

How A LLVM Compiler Works

On the front end, the LLVM compiler infrastructure uses clang — a compiler for programming languages C, C++ and CUDA — to turn source code into an interim format. Then the LLVM clang code generator on the back end turns the interim format into final machine code.

The compiler has five basic phases:
Lexical Analysis — Converts program text into words and tokens (everything apart from words, such as spaces and semicolons).

Parsing — Groups the words and tokens from the lexical analysis into a form that makes sense.

Semantic Analyser — Identifies the types and logics of the programs.

Optimization — Cleans the code for better run-time performance and addresses memory-related issues.

Code Generation — Turns code into a binary file that is executable.

How Does HEAVY.AI Work With LLVM?

HEAVY.AI has invested heavily into optimizing its code so a wide range of analytic workloads can run optimally on GPUs. This is why our compilation framework is built on LLVM in addition to utilizing the speed of GPUs. LLVM allows HEAVY.AI to transform query plans into architecture-independent intermediate code (LLVM IR) and then to use any of the LLVM architecture-specific “backends” to compile that IR code for the needed target, such as NVIDIA GPUs. The LLVM based C++ compiler, clang, generates the corresponding LLVM IR, and OmniSci combines it with our explicitly generated IR.