The Tower of Babel

LANGSEC: Language-theoretic Security

"The View from the Tower of Babel"

The Eleventh LangSec IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2025 will be held on May 15th, 2025.

The Tenth LangSec IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2024 was held on May 23th, 2024.

The Ninth LangSec IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2023 was held on May 25th, 2023.

The Eighth LangSec IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2022 was held as hybrid in-person/virtual on May 26th, 2022.

The Seventh LangSec IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2021 was held virtually on May 27th and 28th, 2021.

The Sixth LangSec IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2020 was held virtually on May 21, 2020.

The Fifth LangSec IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2018 was held in San Francisco on May 24, 2018. This year's focus was on practical automation tools for designers of protocols and developers of protocol stacks facing hostile inputs.

The Fourth LangSec IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2017 was held in San Jose on May 25, 2017. This year, we are planning a LangSec Hackathon to go along with workshop, hacking on secure implementation of popular protocols with the Hammer parser construction kit, in any languages Hammer supports.

The Third LangSec IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2016 was held in San Jose on May 26, 2016, keynoted by Doug McIlroy. The keynote, full papers, research reports, and presentation slides are posted at http://spw16.langsec.org/papers.html.

Our presentation at S4x16 applied the LangSec design principles and the Hammer tool kit to implementing a parser for DNP3, a popular ICS/SCADA protocol. Details & code: http://langsec.org/dnp3/

The Second Language-theoretic Security (LangSec) IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2015 took place in San Jose on May 21, 2015, keynoted by Dan Geer. Workshop program and all presented papers and slides are now posted. The text of Dan Geer's keynote is also posted.

We released a series of video tutorials for Hammer, a LangSec secure(r) parser construction kit: the HammerPrimer on Github. Please help us beta-test this tutorial!

The First Language-theoretic Security (LangSec) IEEE S&P Workshop at the IEEE Security & Privacy Symposium 2014 took place in San Jose, May 18, 2014, keynoted by Caspar Bowden and Felix 'FX' Lindner. Workshop program and all presented papers are now posted.

The Language-theoretic approach (LANGSEC) regards the Internet insecurity epidemic as a consequence of ad hoc programming of input handling at all layers of network stacks, and in other kinds of software stacks. LANGSEC posits that the only path to trustworthy software that takes untrusted inputs is treating all valid or expected inputs as a formal language, and the respective input-handling routines as a recognizer for that language. The recognition must be feasible, and the recognizer must match the language in required computation power.

When input handling is done in ad hoc way, the de facto recognizer, i.e. the input recognition and validation code ends up scattered throughout the program, does not match the programmers' assumptions about safety and validity of data, and thus provides ample opportunities for exploitation. Moreover, for complex input languages the problem of full recognition of valid or expected inputs may be UNDECIDABLE, in which case no amount of input-checking code or testing will suffice to secure the program. Many popular protocols and formats fell into this trap, the empirical fact with which security practitioners are all too familiar.

LANGSEC helps draw the boundary between protocols and API designs that can and cannot be secured and implemented securely, and charts a way to building truly trustworthy protocols and systems. A longer summary of LangSec in this USENIX Security BoF hand-out, and in the talks, articles, and papers below.

LANGSEC in pictures: Occupy Babel!

How to get on the LANGSEC mailing list: subscribe at https://mail.langsec.org/list/

Articles: Talks:
2011 USENIX ;login:
  • "Exploit Programming: from Buffer Overflows to Weird Machines and Theory of Computation", Sergey Bratus, Michael E. Locasto, Meredith L. Patterson, Len Sassaman, Anna Shubina [PDF]

  • "The Halting Problems of Network Stack Insecurity", Len Sassaman, Meredith L. Patterson, Sergey Bratus, Anna Shubina [PDF], [PDF@USENIX]
(The first article explains the "weird machines" view of exploitation, the second one starts with a computation-theoretic view. We recommend reading both, and choosing the reading order based on your background.)

2012 IEEE S&P Journal:

  • "A Patch for Postel's Robustness Principle", Len Sassaman, Meredith L. Patterson, Sergey Bratus, [PDF]
2014 IEEE S&P Journal:
  • Beyond Planted Bugs in "Trusting Trust": The Input-Processing Frontier, Sergey Bratus, Trey Darley, Michael Locasto, Meredith L. Patterson, Rebecca ".bx" Shapiro, Anna Shubina [PDF]
2015 USENIX ;login:
  • The Bugs We Have to Kill, Sergey Bratus, Meredith L. Patterson, and Anna Shubina [PDF]
2017 USENIX ;login:
  • Curing the Vulnerable Parser: Design Patterns for Secure Input Handling, Sergey Bratus, Lars Hermerschmidt, Sven M. Hallberg, Michael E. Locasto, Falcon D. Momot, Meredith L. Patterson, and Anna Shubina [PDF] [local PDF]

Papers:
  • Security Applications of Formal Language Theory, Len Sassaman, Meredith L. Patterson, Sergey Bratus, Michael E. Locasto, Anna Shubina [Dartmouth Computer Science Technical Report TR2011-709], published in IEEE Systems Journal, Volume 7, Issue 3, Sept. 2013

  • The Seven Turrets of Babel: A Taxonomy of LangSec Errors and How to Expunge Them, Falcon Darkstar Momot, Sergey Bratus, Sven M. Hallberg, Meredith L. Patterson, in IEEE SecDev 2016, Nov. 2016, Boston. [PDF]. (See also Brucon 2012, Shmoocon 2013 talks for more LangSec-preventable weakness and vulnerability examples)

  • Implementing a vertically hardened DNP3 control stack for power applications, Sergey Bratus, Adam J. Crain, Sven M. Hallberg, Daniel P. Hirsch, Meredith L. Patterson, Maxwell Koo, Sean W. Smith, in the Second Annual Industrial Control System Security (ICSS) Workshop at ACSAC 2016, Dec. 2016, Los Angeles. [PDF]. In this paper, we describe how we applied LangSec to writing a hardened parser for the popular and highly complex DNP3 industrial control protocol, to obtain a parser that stood up to state-of-the-art fuzzing where other implementations didn't.

Theory:

Vulnerabilities & bugs:

Software practice:

  • "LANGSEC 2011-2016", CONFidence 2013 Keynote, Meredith L. Patterson, [slides], [video]
  • "Cats and Dogs Living Together: LangSec is Also About Usability", Meredith L. Patterson, [slides], [video]

LangSec for ICS/SCADA applications:

  • "Building a Literate Parser and Proxy for DNP3", Sven M. Hallberg, Sergey Bratus, Adam Crain, S4x16 [slides], [demos video]
  • "Taken Out of Context: Language Theoretic Security & Potential Applications for ICS", Darren Highfill, Sergey Bratus, Meredith L. Patterson, S4x14, [slides].

Tools:
  • Hammer, https://github.com/UpstandingHackers/hammer, is a parser construction kit with bindings for C/C++, Java, Ruby, Python, Perl, Go, .Net, and PHP. Like many modern parsing libraries, it provides a parser combinator interface for writing grammars as inline domain-specific languages, but Hammer also provides 5 different parsing back-ends. It's also bit-oriented rather than character-oriented, making it ideal for parsing binary data such as images, network packets, audio, and executables. Hammer grammars can include single-bit flags or multi-bit constructs that span character boundaries, with no hassle. Hammer is thread-safe and reentrant. HammerPrimer is a tutorial for Hammer, in a series of Youtube videos.
  • MCHammer, https://github.com/McHammerCoder, is a binary-capable parser and unparser generator for Java. It generates a parser and unparser based on a grammar defining their input and output language. Injection attacks for the defined language are prevented by the unparser through automatic extraction and generation of an encoding.

Please link to this page as http://langsec.org/.