Improper parsing is one of the most frequent and dangerous causes of software security weaknesses. Many of those are due to binary data parsed by handwritten code in memory unsafe languages. In this talk, I present EverParse, a library and tool to automatically generate efficient, provably secure parsers in C. A showcase application of EverParse is its deployment in Microsoft Hyper-V, the virtualization technology underlying the Microsoft Azure cloud platform. Today, every network packet passing through Azure is first parsed by code generated by EverParse, ensuring that only well-formed packets are accepted, and all others are rejected. EverParse has also been used successfully for a variety of other applications, including TLS, QUIC, X.509, and ELF. I describe how EverParse helped us solve the challenges we encountered across these applications: 1/ how to come up with a formal specification for those data formats based on standards and existing code; 2/ how to generate formally verified parser code from such a format specification with least user proof effort; 3/ how to deploy and maintain such verified code into production. Based on our experience, I posit that formally verified binary data parsers are a sweet spot for software hardening in high-impact scenarios with least user effort. From there, I also survey some further challenges and opportunities, ranging from support for more data formats (ASN.1, CBOR, etc.) to AI-assisted format specification discovery.
Link to the EverParse project page: https://project-everest.github.io/everparse/
Tahina Ramananandro is a Principal Research Software Development Engineer in the Research in Software Engineering (RiSE) team at Microsoft Research in Redmond, WA, USA. Devoted to formal verification of industrial-grade software, Tahina is one of the main contributors of the F* programming language and proof assistant, and of the EverParse verified parsing toolchain, grown out of Project Everest, a multi-year, multi-organization research expedition for provably secure communication software. Tahina's research interests include verified parsing, program logics, formal semantics of programming languages and verified compilation. In 2012, Tahina defended his PhD thesis at INRIA and Université Paris Diderot on verified compilation for C++ objects; then he worked at Yale University on verified separate compilation for the CertiKOS formally verified operating system kernel and hypervisor. More on Tahina's webpage: https://www.microsoft.com/en-us/research/people/taramana/
Adversarial logic (AL) is a new program logic inspired from Incorrectness Logic (IL) by O'Hearn. This new type of logic provides foundations for bug finding rather than the verification of the absence of bugs, and admits no false positives while allowing false negatives.
AL, just like IL, provides under-approximation of possible executions rather than over-approximation (as generally done for verification). Indeed, exploitability is a fundamentally under-approximate property, where only one path needs to be exploitable.
In adversarial logic, the target program and the attack program are composed in parallel and assertions are attack-driven, where the adversary must prove the existence of incorrectness conditions sufficient to exploit a vulnerability. This type of program analysis is useful to determine the criticality of software bugs and distinguish benign bugs from security vulnerabilities that can lead to compromise.
Published at the International Static Analysis Symposium (Auckland, NZ, Dec 2022), https://link.springer.com/chapter/10.1007/978-3-031-22308-2_19
Julien Vanegue leads the software security strategy at Bloomberg in New York as a member of the office of the CTO. Julien's role includes research and development of program analysis solutions tailored to Bloomberg's software stack. Bloomberg is a world leader in financial information distribution.
Parser differentials emerge when two (or more) parsers interpret the same input in different ways. Differences in parsing behavior is difficult to detect due to (1) challenges in abstracting out the parser from complex code-bases and (2) proving the equivalence of parsers. Parser differentials remain understudied as they are a novel unexpected bug resulting from the interaction of software components—sometimes even independent modules— which may individually appear bug-free. We present a survey of many known parser differentials and conduct a root-cause analysis of them. We, do so with an aim to uncover insights on how we can best conceptualize the underlying causes of their emergence. In studying these differentials, we have isolated certain design anti-patterns that give rise to parser differentials in software systems. We show how these differentials do not fit nicely into the state-of-the-art model of parser differentials and thus propose improvements to it. Additionally, we provide suggestions to help prevent parser differentials in software systems.
Complex data formats implicitly demand complex logic to parse and apprehend them. The Portable Document Format (PDF) is among the most demanding formats because it is used as both a data exchange and presentation format, and it has a particularly stringent tradition of supporting interoperability and consistent presentation. These requirements create complexity that presents an opportunity for adversaries to encode a variety of exploits and attacks. To investigate whether there is an association between structural malforms and malice (using PDF files as the example challenge format), we built PolyDoc, a tool that conducts format-aware tracing of files pulled from the PolySwarm network. The PolySwarm network crowdsources threat intelligence by running files through several industry-scale threat-detection engines. The PolySwarm network provides a PolyScore, which indicates whether a file is safe or malicious, as judged by threat-detection engines. We ran PolyDoc in a live hunt mode to gather PDF files submitted to PolySwarm and then trace the execution of these PDF files through popular PDF tools such as Mutool, Poppler, and Caradoc. We collected and analyzed 58,906 files from PolySwarm. Further, we used the PDF Error Ontology to assign error categories based on tracer output and compared them to the PolyScore. Our work demonstrates three core insights. First, PDF files classified as malicious contain syntactic malformations. Second, “uncategorized” error ontology classes were common across our different PDF tools—demonstrating that the PDF Error Ontology may be underspecified for files that real-world threat engines receive. Finally, attackers leverage specific syntactic malformations in attacks: malformations that current PDF tools can detect.
Programs often combine computation from multiple components that have access to far more than needed to do their jobs. Understanding legitimate access rights as well as applying least-privilege transformations to minimize runtime access has long been an objective. Unfortunately, prior work requires expert understanding of program semantics, severely restricting privilege separation in practice We present the object-encapsulation model, a program representation and analysis framework for understanding and measuring least-privilege relationships between parts of a program without expertise. The object-encapsulation model does this by mapping program code and data into a new low-level object-oriented representation, the Encapsulated-object Capability Graph, using simple lexical scoping rules. Each member of an encapsulated-object is then classified as private or public based on whether it is accessible from another encapsulated-object based on the program’s low-level reference graph. We use this graph to introduce new metrics and methods for evaluating asymptotic privilege complexity and present a new static analysis tool that derives the graph and characterizes privilege within the program. Our deep dive of parsers reveals they are well encapsulated, requiring access to an average of 44/663 callable interfaces and 150/239 external objects. Overall, the object-encapsulation model provides an essential element to language level analysis of least-privilege in complex systems to aid in numerous program understanding and refactoring efforts.
This paper proposes an unsupervised classification method that partitions a set of files into non-overlapping dialects based upon their behaviors, determined by messages produced by a collection of programs that consume them. The pattern of messages can be used as the signature of a particular kind of behavior, with the understanding that some messages are likely to co-occur, while others are not.
Patterns of messages can be used to classify files into dialects. A dialect is defined by a subset of messages, called the required messages. Once files are conditioned upon dialect and its required messages, the remaining messages are statistically independent.
With this definition of dialect in hand, we present a greedy algorithm that deduces candidate dialects from a dataset consisting of a matrix of file-message data, demonstrate its performance on several file formats, and prove conditions under which it is optimal. We show that an analyst needs to consider fewer dialects than distinct message patterns, which reduces their cognitive load when studying a complex format.
Data format specification languages such as PDF or HTML have been used extensively for exchanging structured data over the internet. While receivers of data files (e.g., PDF viewers or web browsers) perform syntax validation of files, validating deep semantic properties has not been systematically explored in practice. However, data files that violate semantic properties may cause unintended effects on receivers, such as causing them to crash or creating security breaches, as recent attacks showed. We present our tool DISV (Domain Independent Semantic Validator). It includes a declarative specification language for users to specify semantic properties of a data format. It also includes a validator that takes a data file together with a property specification and checks if the file follows the specification. We demonstrate a rich variety of properties that can be verified by our tool using eight case studies over three data formats. We also demonstrate that our tool can be used to detect advanced attacks on PDF documents.
A blind spot is any input to a program that can be arbitrarily mutated without affecting the program's output. Blind spots can be used for steganography or to embed malware payloads. If blind spots overlap file format keywords, they indicate parsing bugs that can lead to exploitable differentials. For example, one could craft a document that renders one way in one viewer and a completely different way in another viewer. This paper formalizes the operational semantics of blind spots, leading to a technique that automatically detects blind spots based on dynamic information flow tracking. An efficient implementation is introduced and evaluated against a corpus of over a thousand diverse PDFs. There are zero false-positive blind spot classifications and the missed detection rate is bounded above by 11%. On average, at least 5% of each PDF file is completely ignored by the parser. Our results show promise that this technique is an efficient automated means to detect exploitable parser bugs and differentials. Nothing in the technique is tied to PDF in general, so it can be immediately applied to other notoriously difficult-to-parse formats like ELF, X.509, and XML.
Subtle bugs that only manifest in certain software configurations are notoriously difficult to correctly trace. Sometimes called Heisenbugs or Bohrbugs, these runtime variability flaws often result from inadvertently invoking undefined behavior in languages like C and C++. In this paper, we present a work-in-progress novel analysis technique for detecting and correctly diagnosing the causes of variability bugs through comparing control-affecting data flow across differently compiled program variants. Our UBet prototype dynamically derives a runtime control flow trace while simultaneously tracing universal data flow for a program processing a given input, operating at a level of tracing completeness not achievable through similar dynamic instrumentation means. Sans compiler bugs or undefined behavior, every compile-time configuration for a program (i.e., optimization level or other compiler flags vary) should be semantically equivalent. Therefore, any input for which a program variant produces inconsistent output indicates a variability bug. Our analysis compares control-affecting data flow traces from disagreeing program version runs to identify the most closely related input bytes and determine where in the program the processing variability originates. If an instrumented variant’s output equals that of an uninstrumented counterpart, we can reasonably conclude our instrumentation did not introduce the observer effect, which could otherwise cause false positive variability. Though here we initially demonstrate our work-in-progress technique on C++ variability bugs in Nitro, the American Department of Defense NITF (National Imagery Transmission Format) reference implementation parser, our approach applies equally to other programs and input types beyond NITF parsers. Finally, we sketch a path toward completing this work and refining our analysis, including evaluating parsers for other popular binary and text file formats.
Intrusion Detection Systems (IDS) act as a first line of defense for network infrastructure by identifying malicious traffic and reporting it to administrators. Signature-based IDS identify this traffic by attempting to parse packets according to user-supplied rules based on well-known examples of bad traffic. However, test data can be difficult to come by (due to its sensitive nature) which makes evaluating new rules difficult. In this work we discuss the limitations of an existing SMT-based synthesis approach to automatically generating malicious network traffic. We then present a survey of how IDS rules are written in practice using an open-source corpus of over 30,000 rules and discuss a road-map towards extending the existing approach with the goal of generating security test data characterizing a broad range of threats, as well as ancillary uses assisting users in writing IDS rules and identifying IDS implementation bugs. Finally, we share early results from an evaluation of one such extension which successfully generated IDS test data for over 90% of the rules evaluated.
URL parsing confusion is a perennial source of security vulnerabilities in web services. URL parser differentials are exceedingly common due to the existence of numerous incompatible URL standards. We present dippygram, a new, grammar-aware differential fuzzing framework, and demonstrate its effectiveness at discovering parsing bugs in popular URL parsers written in C, C++, and Python. While many parsing differences are to be expected due to the multitude of standards, dippygram discovers URL parsing bugs that violate all standards. This is, to our knowledge, the first application of differential fuzzing to URL parsers. Additionally, our work differentiates itself by being coverage-guided, grammar-aware, and supporting Python targets, which are not usually subject to fuzz testing.
As the number of parsers written for a data format grows, the number of interpretations of that format's specification also grows. Often, these interpretations differ in subtle, hard-to-determine ways that can result in parser differentials. For example, two widely-used HTTP parsers have been shown to process packet headers differently, allowing for the exfiltration of private files. To help find, diagnose, and mitigate the risks of parser differentials, we present the Format Analysis Workbench (FAW), a collection of tools for collecting information on parser/input interactions and analyzing those interactions to detect differentials. This tool suite supports any number of file formats through a flexible configuration, allows for processing to be scaled horizontally, and can be run offline. It has been used for results including the analysis of more than 1 million PDF files and unifying parser behaviors across these files to identify a gold standard of validity across multiple parsers. The included statistical tools have been used to identify the root causes of parser rendering differentials, including mislabeled non-embedded fonts. Tools for instrumenting existing parsers are also included, such as PolyTracker, allowing for the analysis of blind spots which might be used to craft differentials for other parsers, or to exfiltrate large quantities of data. Through allowing users to characterize parser behaviors at scale against large corpuses of inputs, the FAW helps to mitigate security risks arising from parser behaviors by lifting up their interactions with input and making it tractable to resolve differentials back to their behavioral causes.
Critical infrastructure stakeholders need to baseline their systems to understand expected protocol communications. Baseline behaviors may vary based on operational context. Expected operations during a maintenance window, for exam- ple, may be different from normal operations. Furthermore, constructing system baselines for Industrial Control Systems (ICS) is difficult and time-consuming. ICS processes generate artifacts expressed across heterogeneous data sources such as network traffic and device logs. This paper explores the hypoth- esis that such ICS artifacts form a language in the language- theoretic sense. From a theoretical perspective, the variety of implementations of ICS protocols and constrained environment of OT networks provide a rich application domain for language- theoretic approaches. We present several use cases related to the practical construction of system baselines: grammars for data fusion, language dialects for device fingerprinting, and security automata for system baselining.