馃/ Fuzzing Ezno - Part I

Jul 06, 2024

When I heard about stc and Ezno, both attempts to rewrite TypeScript in Rust, I was dubious. Ezno exists though! I'm excited and impressed by the work that Ben has been able to put into it, and I've attempted to provide my share of input via fuzz testing harnesses, helped in creation by my friend Addison Crump.

We started with a naive string-based fuzzer, which just spewed valid UTF-8 strings into the module parser.

#![no_main]
use ezno_parser::{ASTNode, Module, ParseOutput, SourceId, ToStringSettingsAndData};
use libfuzzer_sys::{fuzz_target, Corpus};
use pretty_assertions::assert_eq;
use std::str;
/// `do_fuzz` will take an arbitrary string, parse once and see if it returned a valid AST
/// then it will print and parse that AST a second time and compare the printed outputs.
/// If the second parse has a ParseError, that's a bug!!
fn do_fuzz(data: &str) -> Corpus {
let input = data.trim_start();
let Ok(ParseOutput(module, state)) = Module::from_string(
input.to_owned(),
Default::default(),
SourceId::NULL,
None,
Vec::new(),
) else {
return Corpus::Reject
};
let output1 =
module.to_string(&ToStringSettingsAndData(Default::default(), state.function_extractor));
let Ok(ParseOutput(module, state)) = Module::from_string(
output1.to_owned(),
Default::default(),
SourceId::NULL,
None,
Vec::new(),
) else {
panic!("input: `{input}`\noutput1: `{output1}`\n\nThis parse should not error because it was just parsed above");
};
let output2 =
module.to_string(&ToStringSettingsAndData(Default::default(), state.function_extractor));
assert_eq!(output1, output2);
Corpus::Keep
}
fuzz_target!(|data: &str| {
do_fuzz(data);
});

The fuzzer harness is defined via a do_fuzz() function that takes a string as input and returns whether the run was useful (Corpus:Keep, Corpus:Reject) as output. Since this is a very simple and naive input, and a parser written in a generally memory safe language (rust), we need a stronger oracle for bug finding.

To do so, we parse the string once, and reject if the parse fails. This allows us to weed out inputs that aren't acceptable to the parser, while implicitly testing that the parser does not panic on arbitrary strings.

let input = data.trim_start();
let Ok(ParseOutput(module, state)) = Module::from_string(
input.to_owned(),
Default::default(),
SourceId::NULL,
None,
Vec::new(),
) else {
return Corpus::Reject
};

Next, we use the built-in printer to take the AST the parser produced and print it back into a string. We parse the printed string again, this time panicking if the parse fails, since that would mean the parser disagrees with itself when parsing the same value twice

let output1 =
module.to_string(&ToStringSettingsAndData(Default::default(), state.function_extractor));
let Ok(ParseOutput(module, state)) = Module::from_string(
output1.to_owned(),
Default::default(),
SourceId::NULL,
None,
Vec::new(),
) else {
panic!("input: `{input}`\noutput1: `{output1}`\n\nThis parse should not error because it was just parsed above");
};

Finally, we can print out the second AST and compare it to the first printed AST and compare whether the parser's printed representation of the code is also self-consistent. If all of these assertions pass, we return Corpus:Keep to indicate that it is still an interesting code-path even though it didn't cause an error.

let output2 =
module.to_string(&ToStringSettingsAndData(Default::default(), state.function_extractor));
assert_eq!(output1, output2);
Corpus::Keep

This has been able to catch a good number of incomplete bits of the parser, where a category of syntax wasn't completed yet, and there were still todo!()' s about. Today, the fuzzer runs in CI and doesn't find as much, but I'm still working through and helping report examples found by running the fuzzer on my M1 Max MacBook Pro.

Ben has been very receptive to these bug reports and has been able to provide fixes for them rather quickly! He fixed both issues in one PR:

In an update I'll write about the other approach we took to adding a fuzzer for the ezno parser.