The Ninth Workshop on Language-Theoretic Security (LangSec) at the IEEE CS Security & Privacy Workshops solicits contributions related to the growing field of language-theoretic security.
In a nutshell, since any software that processes inputs is, in fact, an interpreter, and any inputs are thus its programs, input validation is not essentially different from program verification. To be trustworthy, input validation must therefore be grounded in models that describe precisely what the properties of valid inputs are. Luckily, such models exist, and can be made accessible to regular developers.
The language-theoretic approach (LangSec) is mission assurance for connected software and hardware exposed to attacks via malicious inputs--through a practical data and code co-design methodology and filtering of legacy formats down to safe subsets.
LangSec regards the Internet insecurity epidemic as a consequence of ad hoc input handling. LangSec posits that the only path to trustworthy computer software that takes untrusted inputs is treating all valid or expected inputs as a formal language, and the respective input-handling routine as a parser for that language. Only then can any correctness guarantees be assured for the input-handling code. Ambiguity of message/protocol specification is insecurity; ad hoc parsing is an engine of exploitation; overly complex syntax can make judging security properties of input impractical or even undecidable.
LangSec explains why ad hoc "input sanitization", "sanity checking", and other advice to be more careful with inputs is not enough, and why numerous secure programming initiatives have not ended input-driven exploitation. LangSec is also a code and protocol auditing methodology.
Treating input-handling code as an automaton allows the defender to reason about its behavior. The more limited computational power of the automaton, the easier the reasoning. The root cause of many bugs, memory corruptions, and exploitation is trying to validate inputs with inappropriate automata (e.g., much of XSS is due to "validating" context-free HTML with regexps). The recognizer automaton should be just as powerful as warranted by the message format, and no more; unnecessary complexity is computational power given to the attacker.
[LangSec 2014] (keynoted by Caspar Bowden and Felix 'FX' Lindner)
[LangSec 2015] (keynoted by Dan Geer)
[LangSec 2016] (keynoted by Douglas McIlroy)
[LangSec 2017] (keynoted by Perry Metzger)
[LangSec 2018] (keynoted by Mike Walker)
[LangSec 2020] (keynoted by David Walker and Jeremy Yallop)
[LangSec 2021] (keynoted by Kathleen Fisher, George Cybenko, and Joshua M. Ackerman)
[LangSec 2022] (keynoted by Ross Anderson, Nicholas Boucher, and Cristina Cifuentes)
Full papers and presentations freely available at the above URLs.