25 June 2023

Generating Tree-sitter and Grammar Wasm Binaries with Emscripten

While developing Edita I had to get pretty familiar with WASM concepts and Emscripten, which is what compiles the source files (e.g. C) into WASM and JavaScript files. This post documents some of the issues I encountered and serves as a guide to building and using WASM files using recent versions of Emscripten on Linux.

Tree-sitter

Edita uses a fork of tree-sitter with the following changes:

Fork: https://github.com/gushogg-blake/tree-sitter.

emscripten/emsdk/llvm

A recent version of emcc (emscripten) must also be used, in order to get this change: https://github.com/emscripten-core/emscripten/pull/18382 (use locateFile in dynamic module loader):

emcc -v # 3.1.43-git

Otherwise you won’t be able to use locateFile to control the path that gets requested for side module wasm files, and it will default to something like /name.wasm—which obviously won’t work if your side modules are kept somewhere like /tree-sitter/langs/name.wasm.

This probably has to be installed via emsdk:

The latest version of emcc also depends on the latest version of llvm:

./emsdk install llvm-git-main-64bit
./emsdk activate llvm-git-main-64bit

Tools available for installation with emsdk can be seen by running ./emdsk list.

The tree-sitter build scripts should now use the latest version of emcc (you’ll have to reload the shell or run the command again to get it onto $PATH).

Creating tree-sitter.wasm and tree-sitter.js

cd projects/tree-sitter
./script/build-wasm --static

The --static option indicates static linking. The list of grammars to statically link is hard-coded in build-wasm.

The files (tree-sitter.js and tree-sitter.wasm) are created in lib/binding_web.

Creating Grammar Wasm Files

Wasm files can be created with a non-patched tree-sitter installed as tree-sitter-cli:

git clone https://github.com/.../tree-sitter-[lang]
npx tree-sitter build-wasm tree-sitter-[lang]

The language wasm file will be created in the current directory.

Misc

Emscripten settings: https://github.com/emscripten-core/emscripten/blob/main/src/settings.js.