Type Deduction Analysis

Reconstructing Transparent Pointer Types in LLVM-IR
Niccolò Nicolosi
Niccolò Nicolosi
Gabriele Magnani
Gabriele Magnani
Emilio Corigliano
Emilio Corigliano
Davide Baroffio
Davide Baroffio
Federico Reghenzani
Federico Reghenzani
Giovanni Agosta
Giovanni Agosta
Politecnico di Milano, Italy
Full paper at: doi.org/10.1145/3771775.3786268
CC '26 — 35th ACM SIGPLAN International Conference on Compiler Construction
Sydney, Australia · January 31 – February 1, 2026
The Problem & The Solution

LLVM 17: Opaque Pointers

Starting from LLVM 17, all pointer types are represented by a single opaque type ptr, removing explicit pointee-type information from the IR. This breaks analyses that rely on knowing what a pointer points to.

; Before (typed pointers)
%x = load i32*, i32** %a

; After LLVM 17 (opaque pointers)
%x = load ptr, ptr %a

Existing projects like TAFFO and ASPIS are forced to rely on outdated LLVM versions.

TDA: Type Deduction Analysis

A new LLVM module analysis pass that reconstructs transparent pointer types directly from opaque-pointer IR, without requiring source code or debug information.

Key properties: interprocedural, works at any stage of the middle-end optimization pipeline, fully integrated within the LLVM pipeline, no external dependencies, and robust to common coding patterns including recursion and aliasing.
Tree Type Representation

Types are represented as ordered trees. Each node belongs to one of three classes:

Primitive Type Nodes
int
float
Composite Type Nodes
Struct
Array
Pointer Node
ptr

float*

ptr float

int*[]

Array ptr int

Opaque ptr

ptr

Single leaf pointer node = opaque

Transparent type: a type whose tree contains no opaque pointer. Opaque pointer: a single leaf pointer node.

How TDA Works
STEP 1

Build Graph

Construct an interprocedural def-use graph of the program, linking call sites to function arguments and return values

STEP 2

Observe Uses

For each LLVM instruction (alloca, load, store, GEP, global), derive type evidence from how values are used

STEP 3

Merge Types

Progressively merge type evidence across the graph, refining opaque pointers into transparent types

STEP 4

Converge

Iterate until a fixed point is reached, guaranteed by the lattice structure of the type domain

TDA operates on the def-use graph, not the control-flow graph. It reconstructs a type for each SSA value by examining how values are used throughout the program and consistently merging the evidence. The analysis is interprocedural and handles recursion, aliasing rules, and TBAA metadata. See the paper for the full formalization and convergence proofs.
Experimental Results

Tested on three benchmark suites — PolyBench (30 benchmarks), AxBench (5 benchmarks), and ABSURD (6 benchmarks) — using clang-18 at O0 and O3 optimization levels.

84.72%
Harmonic Average
Transparent types recovered at O0
87.99%
Harmonic Average
Transparent types recovered at O3
100%
Peak Recovery
Achieved in most ABSURD benchmarks at both O0 and O3

Pointer Type Alias Recovery in O3 (Arithmetic Means)

ABSURD (6)
91.9%
PolyBench (30)
90.0%
AxBench (5)
80.5%

Pointer Type Alias Recovery in O0 (Arithmetic Means)

ABSURD (6)
95.3%
PolyBench (30)
86.9%
AxBench (5)
70.6%
Transparent
Partially Transparent
Opaque
8.57%
Harmonic Average
Compile-time overhead relative to O3 pipeline
Time Complexity
O(N · |V| · |T|)
N = nodes in flow-graph, |V| = number of SSA values, |T| = largest type graph size
Case Studies
TAFFO
Tuning Assistant for Floating-point to Fixed-point Optimization

A precision-tuning compiler framework that converts floating-point computations to fixed-point. Requires pointee-type information to identify convertible values, build metadata for ranges and fixed-point formats, and generate correct conversion instructions. Previously limited to LLVM ≤ 15.

Approximate Computing
ASPIS
Automatic Software-based Protection and Integrity Suite

A fault-tolerance suite implementing EDDI and REDDI algorithms for data-flow hardening against Single-Event Upsets. Needs type information to compare redundant structured data and to perform temporary argument duplication via memcpy over the correct size.

Safety-Critical Systems
By integrating TDA as the first step of each pipeline, both TAFFO and ASPIS were updated to support LLVM versions greater than 15, with their expected behavior fully restored across all benchmarks.
Full paper at: doi.org/10.1145/3771775.3786268