Type Deduction Analysis

      Full paper at: doi.org/10.1145/3771775.3786268

      CC '26 — 35th ACM SIGPLAN International Conference on Compiler Construction

      Sydney, Australia · January 31 – February 1, 2026

Motivation

The Problem & The Solution

⚠

LLVM 17: Opaque Pointers

Starting from LLVM 17, all pointer types are represented by a single opaque type ptr, removing explicit pointee-type information from the IR. This breaks analyses that rely on knowing what a pointer points to.

          ; Before (typed pointers)

          %x = load i32*, i32** %a

          ; After LLVM 17 (opaque pointers)

          %x = load ptr, ptr %a

Existing projects like TAFFO and ASPIS are forced to rely on outdated LLVM versions.

✓

TDA: Type Deduction Analysis

A new LLVM module analysis pass that reconstructs transparent pointer types directly from opaque-pointer IR, without requiring source code or debug information.

Key properties: interprocedural, works at any stage of the middle-end optimization pipeline, fully integrated within the LLVM pipeline, no external dependencies, and robust to common coding patterns including recursion and aliasing.

Formalism

Tree Type Representation

Types are represented as ordered trees. Each node belongs to one of three classes:

Primitive Type Nodes

int

float

…

Composite Type Nodes

Struct

Array

…

Pointer Node

ptr

float*

int*[]

Opaque ptr

Single leaf pointer node = opaque

Transparent type: a type whose tree contains no opaque pointer. Opaque pointer: a single leaf pointer node.

Methodology

How TDA Works

STEP 1

Build Graph

Construct an interprocedural def-use graph of the program, linking call sites to function arguments and return values

STEP 2

Observe Uses

For each LLVM instruction (alloca, load, store, GEP, global), derive type evidence from how values are used

STEP 3

Merge Types

Progressively merge type evidence across the graph, refining opaque pointers into transparent types

STEP 4

Converge

Iterate until a fixed point is reached, guaranteed by the lattice structure of the type domain

TDA operates on the def-use graph, not the control-flow graph. It reconstructs a type for each SSA value by examining how values are used throughout the program and consistently merging the evidence. The analysis is interprocedural and handles recursion, aliasing rules, and TBAA metadata. See the paper for the full formalization and convergence proofs.

Evaluation

Experimental Results

Tested on three benchmark suites — PolyBench (30 benchmarks), AxBench (5 benchmarks), and ABSURD (6 benchmarks) — using clang-18 at O0 and O3 optimization levels.

84.72%

Harmonic Average

Transparent types recovered at O0

87.99%

Harmonic Average

Transparent types recovered at O3

100%

Peak Recovery

Achieved in most ABSURD benchmarks at both O0 and O3

Pointer Type Alias Recovery in O3 (Arithmetic Means)

ABSURD (6)

91.9%

PolyBench (30)

90.0%

AxBench (5)

80.5%

Pointer Type Alias Recovery in O0 (Arithmetic Means)

ABSURD (6)

95.3%

PolyBench (30)

86.9%

AxBench (5)

70.6%

Transparent

Partially Transparent

Opaque

8.57%

Harmonic Average

Compile-time overhead relative to O3 pipeline

Time Complexity

O(N · |V| · |T|)

N = nodes in flow-graph, |V| = number of SSA values, |T| = largest type graph size

Application

Case Studies

TAFFO

Tuning Assistant for Floating-point to Fixed-point Optimization

A precision-tuning compiler framework that converts floating-point computations to fixed-point. Requires pointee-type information to identify convertible values, build metadata for ranges and fixed-point formats, and generate correct conversion instructions. Previously limited to LLVM ≤ 15.

Approximate Computing

ASPIS

Automatic Software-based Protection and Integrity Suite

A fault-tolerance suite implementing EDDI and REDDI algorithms for data-flow hardening against Single-Event Upsets. Needs type information to compare redundant structured data and to perform temporary argument duplication via memcpy over the correct size.

Safety-Critical Systems

By integrating TDA as the first step of each pipeline, both TAFFO and ASPIS were updated to support LLVM versions greater than 15, with their expected behavior fully restored across all benchmarks.

Full paper at: doi.org/10.1145/3771775.3786268