12 June 2024

Longest valid Truncate sequence - algorithm comparison

Me and my brother recently started playing a turn-based strategy/territory-building word game called Truncate. In it, you place tiles on a board to create words, a bit like in Scrabble. Valid words are strong, while invalid words are vulnerable to attack, meaning you can lose those tiles and any that depend on them. This led both of us to wonder: what is the longest possible sequence of plays in a line which all the created words are valid?

Manually, I came up with a -> relapsed. The whole sequence is:

  • a
  • la (Truncate uses a weird dictionary)
  • lap
  • laps
  • lapse
  • elapse
  • elapsed
  • relapsed

… for 8 total plays.

We’re both programmers, so naturally we also both wondered about writing an algorithm to find the longest sequence.

The function signature would be longestSequence(allValidWords: string[]) -> string[], where the returned array is the sequence of played words from smallest to largest as in the list above.

Interestingly, while I have much more experience solving leetcode-style algorithmic problems than my brother, the high-level approach he came up with turned out to be an order of magnitude faster than mine.

My approach was to go through all words and try adding every letter either side, for 26 * 2 possibilities per word:

import fs from "fs";

let letters = "abcdefghijklmnopqrstuvwxyz".split("");

let words = fs.readFileSync("./words.txt").toString().split("\n").filter(Boolean).map(s => s.split(" ")[0].replace("*", ""));

let set = new Set(words);

function longestSequence() {
	let longest = null;
	
	for (let word of words) {
		let sequence = getLongestSequence([word]);
		
		if (!longest || sequence.length > longest.length) {
			longest = sequence;
		}
	}
	
	return longest;
}

function getLongestSequence(path) {
	let head = path.at(-1);
	let extensions = [...getExtensions(head)];
	let longest = path;
	
	for (let ext of extensions) {
		let sequence = getLongestSequence([...path, ext]);
		
		if (sequence.length > longest.length) {
			longest = sequence;
		}
	}
	
	return longest;
}

function *getExtensions(word) {
	for (let letter of letters) {
		let prefix = letter + word;
		let suffix = word + letter;
		
		if (set.has(prefix)) {
			yield prefix;
		}
		
		if (set.has(suffix)) {
			yield suffix;
		}
	}
}

console.time();

let seq = longestSequence();

let s = seq[0].length;
let e = seq.at(-1).length;
let d = e - s;

console.timeEnd();

console.log(seq);
console.log("start word length: " + s);
console.log("end word length: " + e);
console.log("sequence length: " + seq.length);

This takes ~1.5s on Truncate’s ~113,000 word dictionary.

My brother’s approach was to try each word and remove a letter either side, for 2 possibilities per word:

import fs from "fs";

let words = fs.readFileSync("./words.txt").toString().split("\n").filter(Boolean).map(s => s.split(" ")[0].replace("*", ""));

let set = new Set(words);

function findSmallestRoot(word, path=null) {
	if (!path) {
		path = [word];
	}
	
	if (word.length === 1) {
		return {root: word, path};
	}
	
	let l = word.substr(0, word.length - 1);
	let r = word.substr(1);
	
	let lRoot;
	let rRoot;
	
	if (isWord(l)) {
		lRoot = findSmallestRoot(l, [l, ...path]);
	}
	
	if (isWord(r)) {
		rRoot = findSmallestRoot(r, [r, ...path]);
	}
	
	if (!lRoot && !rRoot) {
		return {root: word, path};
	} else if (lRoot && !rRoot) {
		return lRoot;
	} else if (rRoot && !lRoot) {
		return rRoot;
	}
	
	if (lRoot.root.length < rRoot.root.length) {
		return lRoot;
	}
	
	if (rRoot.root.length < lRoot.root.length) {
		return rRoot;
	}
	
	return lRoot;
}

function isWord(w) {
	return set.has(w);
}

function findLongestSequence() {
	let longestSeq = null;
	
	for (let word of words) {
		let {path} = findSmallestRoot(word);
		
		if (!longestSeq || path.length > longestSeq.length) {
			longestSeq = path;
		}
	}
	
	return longestSeq;
}

console.time();

let seq = findLongestSequence();

console.timeEnd();

console.log(seq);

This takes 89ms.

(I wrote the actual code in both cases, so any bugs/inefficiencies are mine).

The logical next step would seem to be to memoise the findLongestSequence call – not for the entire path, which is only taken once, but for the current word: we will reach lap from both flap and clap, so we should probably save the result of that subtree (lap -> la -> a).

RIIR (“Rewrite It In Rust”)

I’ve also been learning Rust recently, so I used this as an exercise for that. The results weren’t as spectacular as I was expecting: I thought the Rust version would be tens or hundreds of times faster, but in the end the best I could get was only about twice as fast*. I won’t have made it as efficient as an experienced Rust dev would have, obviously, but still; I think this is a testament to how well-optimised V8 is.

* Runtimes just now are in fact very close to the JS versions; I’m not sure what I was doing to get a 2x speedup before. I did realise that you have to compile in release mode to get anywhere decent performance.

My version:

use std::{
	fs::{File},
	io::{BufRead, BufReader},
	env,
	collections::HashSet,
};

fn lines_from_file(path: &String) -> Vec<String> {
	BufReader::new(File::open(path).unwrap()).lines().map(|s| s.unwrap()).collect()
}

fn word_from_line(line: &String) -> String {
	let without_asterisk = line.replace("*", "");
	
	let mut parts = without_asterisk.split(" ");
	
	let word = String::from(parts.next().unwrap());
	
	word
}

const LETTERS: &str = "abcdefghijklmnopqrstuvwxyz";

fn is_word(words_set: &HashSet<String>, word: &String) -> bool {
	words_set.contains(word)
}

fn find_longest_sequence_from(
	words_vec: &Vec<String>,
	words_set: &HashSet<String>,
	from: &mut Vec<String>,
) -> Vec<String> {
	let mut longest_sequence = from.clone();
	let word = from.last().unwrap().clone();
	
	for letter in LETTERS.chars() {
		let prefix: String = format!("{}{}", letter, word);
		let suffix: String = format!("{}{}", word, letter);
		
		for candidate in [prefix, suffix] {
			if is_word(words_set, &candidate) {
				//let mut with_candidate = from.clone();
				//with_candidate.push(candidate);
				from.push(candidate);
				
				let seq = find_longest_sequence_from(words_vec, words_set, from);
				
				if seq.len() > longest_sequence.len() {
					longest_sequence = seq;
				}
				
				from.pop();
			}
		}
	}
	
	longest_sequence
}

// find longest sequence of valid Truncate moves
// - starting with a word and adding a letter either
// side to produce another valid word

fn find_longest_sequence(
	words_vec: &Vec<String>,
	words_set: &HashSet<String>,
) -> Vec<String> {
	let mut longest_sequence: Option<Vec<String>> = None;
	
	for word in words_vec {
		let seq = find_longest_sequence_from(&words_vec, &words_set, &mut vec![word.to_string()]);
		
		if longest_sequence.is_none() || seq.len() > longest_sequence.clone().unwrap().len() {
			longest_sequence = Some(seq);
		}
	}
	
	longest_sequence.unwrap()
}

fn main() {
	let args: Vec<String> = env::args().collect();
	
	assert!(args.len() == 2);
	
	let dictionary_filename = &args[1];
	
	let words = lines_from_file(dictionary_filename).into_iter().map(|s| word_from_line(&s));
	
	let words_set: HashSet<String> = HashSet::from_iter(
		words.clone(),
	);
	
	let words_vec: Vec<String> = words.collect();
	
	use std::time::Instant;
	let now = Instant::now();
	
	let seq = find_longest_sequence(&words_vec, &words_set);
	
	let elapsed = now.elapsed();
	
	println!("{:.2}", elapsed.as_millis());
	
	for word in seq {
		println!("{}", word);
	}
}

This runs in ~1.2s, similar to the JS version.

My brother’s version:

use std::{
	fs::{File},
	io::{BufRead, BufReader},
	env,
	collections::HashSet,
};

use std::time::Instant;

fn lines_from_file(path: &String) -> Vec<String> {
	BufReader::new(File::open(path).unwrap()).lines().map(|s| s.unwrap()).collect()
}

fn word_from_line(line: &String) -> String {
	let without_asterisk = line.replace("*", "");
	
	let mut parts = without_asterisk.split(" ");
	
	let word = String::from(parts.next().unwrap());
	
	word
}

fn is_word(words_set: &HashSet<String>, word: &String) -> bool {
	words_set.contains(word)
}

fn find_longest_sequence_from(
	words_vec: &Vec<String>,
	words_set: &HashSet<String>,
	from: &mut Vec<String>,
) -> Vec<String> {
	let mut longest_sequence = from.clone();
	let word = from.last().unwrap().clone();
	
	let left: String = (&word[..word.len()-1]).to_string();
	let right: String = (&word[1..]).to_string();
	
	for candidate in [left, right] {
		if is_word(words_set, &candidate) {
			//let mut with_candidate = from.clone();
			//with_candidate.push(candidate);
			from.push(candidate);
			
			let seq = find_longest_sequence_from(words_vec, words_set, from);
			
			if seq.len() > longest_sequence.len() {
				longest_sequence = seq;
			}
			
			from.pop();
		}
	}
	
	longest_sequence
}

// find longest sequence of valid Truncate moves
// - starting with a word and adding a letter either
// side to produce another valid word

fn find_longest_sequence(
	words_vec: &Vec<String>,
	words_set: &HashSet<String>,
) -> Vec<String> {
	let mut longest_sequence: Option<Vec<String>> = None;
	
	for word in words_vec {
		let seq = find_longest_sequence_from(&words_vec, &words_set, &mut vec![word.to_string()]);
		
		if longest_sequence.is_none() || seq.len() > longest_sequence.clone().unwrap().len() {
			longest_sequence = Some(seq);
		}
	}
	
	longest_sequence.unwrap()
}

fn main() {
	let args: Vec<String> = env::args().collect();
	
	assert!(args.len() == 2);
	
	let dictionary_filename = &args[1];
	
	let words = lines_from_file(dictionary_filename).into_iter().map(|s| word_from_line(&s));
	
	let words_set: HashSet<String> = HashSet::from_iter(
		words.clone(),
	);
	
	let words_vec: Vec<String> = words.collect();
	
	let now = Instant::now();
	
	let seq = find_longest_sequence(&words_vec, &words_set);
	
	let elapsed = now.elapsed();
	
	println!("{:.2}", elapsed.as_millis());
	
	for word in seq {
		println!("{}", word);
	}
}

This runs in 86ms.

Update I’m realising that 86ms might actually be quite fast. I think I was vaguely thinking of the dictionary as being about 5,000 words for some reason, when evaluating this result. Given that we’re actually doing a recursive search for each of 113,000 elements, 86ms seems more reasonable.