Eleventh LangSec Workshop

at IEEE Security & & Privacy, May 15, 2025

Papers & Slides

Keynotes

"Parsers, the fractal attack surface", Daniel Wallach (Rice University, DARPA I2O)

Old-school security researchers have always talked about the software attack surface – the code that's immediately reachable by a potential adversary and therefore the code which has the greatest need of hardening against attack. Surprisingly little research has observed that this code tends to be parsers – code that converts arrays of bytes into internal data structures of whatever sort. For old-school C programmers, this is exactly the place where ad-hoc hand-written code will make incorrect assumptions about data being well-formed, and then it's all downhill from there. This talk considers recent efforts in looking at these parsers as first class citizens in the world of computer software. We'll discuss recent trends to identify and replace these ad-hoc parsers with high-assurance alternatives.

[slides]

"The Art of Fault Injection: Weird Machines All The Way Down", Cristofaro Mune (Raelize)

Physics underlies all computation and data representations. Fault Injection attacks have shown that data misrepresentation — faults — can be introduced in hardware logic, causing unintended system behavior, ultimately affecting computation.

Despite the unpredictability of the underlying physics, relevant categories — Fault Models — have been identified in literature and used in powerful attacks. Powerful primitives can be be built, often outside of the typical software security modeling. For instance, Instruction Corruption, a common Fault Model, is capable of yielding PC control to an attacker simply out of transfers of attacker-controlled data, without targeting any parser.

This talk suggests that mapping out the implications of these Fault Models throughout our computing systems may be a worthy challenge for LangSec research.

[slides]

Invited talks

"From Legacy to Verified Parsers with AI", Tahina Ramananandro (Microsoft Research)

Secure parsing is critical: improper data parsing and validation is ranked among one of the most dangerous software weaknesses and vulnerabilities, mostly due to the presence of unverified handwritten parsers. In this presentation, we show an end-to-end 3-layer methodology to automatically replace existing handwritten parsers with formally verified parsers:

  1. Our work builds on EverParse, which generates a formally verified parser from a data format specification.
  2. So far such specifications have been handwritten. To automate this part, we introduce 3DGen, leveraging AI and symbolic testing to automatically generate data format specifications from existing code and documentation. Such specifications are then consumed by EverParse.
  3. It then remains to replace existing parsers with the ones generated by 3DGen and EverParse. So far this has been done manually. To automate this part, we introduce AutoParse, leveraging AI to refactor existing code to separate parsing from processing. Then, the parser code isolated by AutoParse is fed to 3DGen to generate a specification matching its behavior. At the end of the day, the isolated parser code is fully replaced with the verified parser generated by EverParse.

[slides]

Papers

"Exploring Zero-Shot Prompting for Generating Data Format Descriptions", Prashant Anantharaman and Vishnupriya Varadharaju (Narf Industries)

Parsers validate and process untrusted user input and transform it into data structures that provide easier access. Software engineers either build these parsers for data formats from scratch or leverage libraries targeting specific formats. The recent surge in data description languages (DDLs) and parser combinator libraries for parsing data formats has aided developers in producing parsers using standardized tools. However, producing a parser for an unfamiliar data format in an unfamiliar DDL can be daunting, given the learning curve of understanding two specifications or manuals well.

As more researchers adopt tools such as GitHub Copilot to simplify their programming tasks, we ask whether LLMs already hold sufficient knowledge to produce valid DDL specifications for popular data formats. To explore this, we systematically prompt LLMs to provide specifications in valid DDL syntax and evaluate whether these specifications are syntactically valid and correct.

We found that while some LLMs, such as GPT 4 Turbo, Claude 3.5 Sonnet, and Deepseek V3, can produce valid Kaitai Struct YAML files, Hammer C files, and Rust Nom files, they struggle to produce valid specifications in complex DDLs, such as DaeDaLus, Spicy, and DFDL. In general, all LLMs fare much better at producing syntactically valid C code using the Hammer library and Rust code using the Nom library, given the large corpora of valid C and Rust code available. None of the LLMs in our test were able to produce a valid DFDL file or DaeDaLus file. We also found that while providing the specification manuals for the DDLs did not help in producing more syntactically valid specifications, providing sample specification files led to minor improvements in the number of successful compilations.

[paper] [slides]

"C2VPG: Translating Practical Context-Free Grammars into Visibly Pushdown Grammars by Order-Based Tagging", Xiaodong Jia and Gang Tan (Pennsylvania State University)

Context-free grammars (CFGs) are widely used to specify the syntax of programming languages. However, their inherent complexity and lack of structural nesting information make them less suitable for certain parsing and analysis tasks. Visibly pushdown grammars (VPGs) address these limitations by introducing explicit call, return, and plain symbols, enabling efficient parsing and analysis of nested structures. Translating practical CFGs into VPGs remains challenging, especially with ambiguous constructs like the dangling-else issue, where the order of call and return symbols must be carefully managed to ensure correct parsing.

In this paper, we present C2VPG, a tool for automatically translating practical CFGs into VPGs using a novel order-based tagging method. Our approach introduces a sound algorithm that automatically determines an order on return symbols and constructs a tagger that assigns call, return, and plain tags to terminals in a CFG based on this order. This method resolves the tagging challenge posed by the dangling-else problem, where return symbols could be optional in sentences. We evaluate our approach on 396 real-world grammars from the ANTLR repository, achieving a 61% success rate in converting CFGs into VPGs.

We discuss the challenges posed by practical grammar design hat prevent C2VPG's translations. Our results demonstrate that C2VPG is both practical and efficient, and could assist language designers in creating more robust grammars.

[paper] [slides]

"Email Smuggling with Differential Fuzzing of MIME Parsers", Seyed Behnam Andarzian, Martin Meyers and Erik Poll (Radboud University)

A single email gets parsed multiple times: by the mail server receiving the email, by virus or spam filters, and finally by the mail client that displays the email to the user. MIME (Multipurpose Internet Mail Extensions) is a standard that extends the format of email messages, allowing them to include multimedia content, attachments, and non-ASCII text. Ensuring that email content is correctly formatted and interpreted across different systems is crucial.

Differentials in how these applications parse the same MIME message can have security implications. For instance, a virus or spam filter might ignore part of the data that ends up being processed by the mail client.

We present experiments with differential fuzzing to discover differentials in how MIME parsers handle the same message. We investigate the root causes and see if the differentials can be exploited. Our research reveals many parser differentials in MIME parsers, including some that can be exploited to smuggle emails past virus and spam filters. On top of that, our experiment found many memory corruption bugs.

[paper] [slides]

"Towards programming languages free of injection-based vulnerabilities by design", Eric Alata and Pierre-François Gimenez (LAAS-CNRS, INSA; Univ. Rennes, INRIA, IRISA)

Many systems are controlled via commands built upon user inputs. For systems that deal with structured commands, such as SQL queries, XML documents, or network messages, such commands are generally constructed in a "fill-in-the-blank" fashion: the user input is concatenated with a fixed part written by the developer (the template). However, the user input can be crafted to modify the command's semantics intended by the developer and lead to the system's malicious usages. Such an attack, called an injection-based attack, is considered one of the most severe threat to web applications. Solutions to prevent such vulnerabilities exist but are generally ad hoc and rely on the developer's expertise and diligence. Our approach addresses these vulnerabilities from the formal language theory's point of view. We formally define two new security properties. The first one, "intent-equivalence", guarantees that a developer's template cannot lead to malicious injections. The second one, "intent-security", guarantees that every possible template is intent-equivalent, and therefore that the programming language itself is secure. From these definitions, we show that new design patterns can help create programming languages that are secure by design.

[paper] [slides]

"Large Language Models for Validating Network Protocol Parsers", Mingwei Zheng, Danning Xie and Xiangyu Zhang (Purdue University)

Network protocol parsers are essential for enabling correct and secure communication between devices. Bugs in these parsers can introduce critical vulnerabilities, including memory corruption, information leakage, and denial-of-service attacks. An intuitive way to assess parser correctness is to compare the implementation with its official protocol standard. However, this comparison is challenging because protocol standards are typically written in natural language, whereas implementations are in source code. Existing methods like model checking, fuzzing, and differential testing have been used to find parsing bugs, but they either require significant manual effort or ignore the protocol standards, limiting their ability to detect semantic violations. To enable more automated validation of parser implementations against protocol standards, we propose PARVAL, a multi-agent framework built on large language models (LLMs). PARVAL leverages the capabilities of LLMs to understand both natural language and code. It transforms both protocol standards and their implementations into a unified intermediate representation, referred to as format specifications, and performs a differential comparison to uncover inconsistencies. We evaluate PARVAL on the Bidirectional Forwarding Detection (BFD) protocol. Our experiments demonstrate that PARVAL successfully identifies inconsistencies between the implementation and its RFC standard, achieving a low false positive rate of 5.6%. PARVAL uncovers seven unique bugs, including five previously unknown issues.

[paper] [slides]

Research Reports

"Parsing with the Logic FC", Owen M. Bell, Sam M. Thompson and Dominik D. Freydenberger (Loughborough University)

FC is a logic on strings that has been primarily studied in database theory in the context of information extraction. In this report, we argue that FC and its extensions can be used for more. In particular, we argue that it can be used as a unifying framework for combining parsers that aligns with the principles of Language-Theoretic Security (LangSec). We first survey the recent literature on FC and its extensions, and explain the different criteria we have for efficiency. We then explain how we can see FC and its extensions as a replacement for regex, and how it fits into LangSec. Finally, we explain how this can be pulled together into a framework for combining parsers due to the compositionality of the model.

[paper] [slides]

"AI Security is a LangSec Problem", Max von Hippel and Evan Miyazono (Benchify, Inc.; Atlas Computing)

The rapid development of Artificial Intelligence (AI) systems, and particularly Large Language Models (LLMs), has already started changing how software is written in industry. In this work, we categorize two important features of modern AI systems—structured outputs and tool-use—and explain how the security of each is, inherently, a LangSec problem. We provide anecdotal evidence from the San Francisco startup ecosystem to illustrate how companies are currently using, deploying, and securing AI systems with these features. Based on these observations and our analysis of current practices, we identify three concrete research directions where the LangSec community can contribute to securing both the parsing of LLM outputs and the safe deployment of LLM-powered tools. This work should be read as a call-to-action for the LangSec community to tackle outstanding, and growing, security problems catalyzed by AI.

[paper] [slides]

"Hi-Res: Precise Exploit Detection using Object-Granular Memory Monitoring", Ziyang Yang, Saumya Solanki, Scott Rixner and Nathan Dautenhahn (Rice University; Serenitix)

Despite numerous methods for identifying, preventing, and protecting against kernel-level exploits, attacks persist. One of the key challenges is the prevalence of weird machines—unintended computational artifacts that attackers dynamically stitch together from unmonitored low-level operations. This paper presents Hi-Res, a programmable detection framework that systematically lifts high-level exploit behaviors from their low-level memory operations. Unlike traditional methods that rely on expert-driven, hand-crafted monitors, Hi-Res automatically generates a unique fingerprint of kernel execution given a specific input and execution contexts. Hi-Res projects memory traces into a high-resolution hyperplane, where behavioral fingerprints are constructed from observed access patterns. Using this representation, Hi-Res, is able to explore the hypothesis that low-level program traces exhibit locality properties that are distinct, context-sensitive memory access patterns unique to specific workloads. This locality coupled with the concrete Hi-Res representation enables the empirical modeling of working sets without prior knowledge of program semantics. By analyzing specific dynamic context tuples—such as system call, access-from location, allocation contexts, and call stacks—we demonstrate that these fingerprints reliably differentiate between normal and exploit behaviors. Our results confirm that locality serves as a robust signal for precise exploit detection, establishing Hi-Res as a general, data-driven framework for dynamic security monitoring.

[paper] [slides]

"Automatic Schema Inference from Unknown Protobuf Messages", Jared Chandler (Dartmouth College)

Packed binary formats present a challenge for network analysts, reverse engineers, and security researchers. Determining which fields are required, which fields are optional, and which fields can be repeated demands attention to detail, specialized human expertise, and ample time. We present an automatic schema inference approach targeting the widely used protobuf serialization format. Our approach recursively identifies field arity constraints from a collection of raw binary messages and reports them to a user as a complete protobuf schema. In our evaluation, this approach demonstrates high accuracy across a variety of inputs including real-world protobuf messages and binary files.

[paper] [slides]

Work-in-Progress and Posters

"Removing the Vulnerable Webapp: Combining JWT and Stored Procedures to Foil SQL Injection", Falcon Darkstar Momot (Aiven Oy)

SQL injection remains an important vulnerability class for multiple reasons, including poor code-data separation in the access protocol, and excessive trust placed in diverse applications to enforce schema rules. Exploitation of SQL injection elevates privilege from the application user context into the application context. This work proposes to eliminate any distinction between those two contexts, but preserves the ability to use standard web authentication mechanisms and accommodates the needs of applications that manage their own users and serve a large-scale user base. The solution places responsibility for API definition and user authorization in the database schema. A comparative advantage over existing approaches is discussed.

[paper] [slides]

"Extending OpenAPI for semantic checking of API usage", Jacob Torrey (Thinkst)

This WIP talk explores how to add semantic contracts for RESTful API types and functionality.

RPC protocols historically have supported automated synthesis of statically-typed client stubs. The move to RESTful web APIs lost that powerful feature, until the standardization of OpenAPI. However, all previous schemes at most allowed for the expression of data types from a machine perspective (str, int, etc.).

By annotating web APIs with the ontic or human-centric types of data and exposing an API contract, developers can use formal reasoning when consuming the API. This is demonstrated in a proof-of-concept model with an ontic/human-centric type registry along with verifiable conversions between them. An OpenAPI extension is used to annotate API endpoints with their registered types, and also offer semantic contracts on their behavior. A demonstration of a simple API is provided to show how such a scheme improves safety of consuming the API without adding developer effort.