22 June 2024

Learning Rust #1

I’ve started learning Rust, and I thought it would be interesting to keep a log of things that I find surprising or hard to grok, to look back on later.

A bit of background: I’ve done tiny bits here and there in compiled/statically-typed/non-memory-managed languages, but not enough to really get a feel for them. I have a broad knowledge of computers going down to the transistor level, mostly from reading Charles Petzold’s book CODE, but I’ve spent my career doing full-stack web dev with high-level scripting languages.

:::note

Side note: I kind of dismissed the idea of languages with explicit allocation early on in my career—I was probably 15—after learning that you couldn’t just add an arbitrary number of elements to an array in C. I’ve only recently learned that you can actually reallocate in order to grow an array, and I think part of the blame for that is on how these concepts are introduced in introductory materials: adding an arbitrary number of elements to a list is kind of the whole purpose of the list data structure, so it’s misleading for beginners to introduce what is essentially a useless data structure—the fixed-size array*—so early on, and not to mention more advanced concepts like linked lists and reallocation until later.

I think this is partly down to an unusual ability I seem to have for connecting new knowledge to what I already know, in a concrete enough way to notice when something doesn’t make sense. Introductory C material will either imply or explicitly state that you need to allocate enough spaces in the array—remember, the only list of things data structure we know about so far—for your max. use case. But that can’t possibly be how this works! Surely not all operating systems and programs are allocating as many array elements as they could possibly need right from the get go – that goes against all intuitions about efficiency and sanity. You know you must be missing something, but the book is keeping frustratingly quiet about what it might be, and even that there’s an issue in the first place.

* I’m not a C programmer, so I might be doing a Dunning Kruger here, but isn’t that basically a struct where for some reason you don’t want to give any of the elements names? How often is that useful, or advisable?

Anyway, I’m much more open to learning new concepts and paradigms now than I was at 15, and Rust seems like a good place to start.

:::

Here’s where I’ve been puzzled or had to stop and think so far:

  • You can’t just put a global variable in for debugging/ad-hoc profiling like you can with JS. (I couldn’t get perf installed; something about specific kernel versions and no binary package available.) In fact, Rust by design makes you jump through a lot of hoops to get a global variable, and I couldn’t get it working. I ended up threading a struct through a bunch of functions instead. I suppose it makes sense that this is hard to do in Rust, given its focus on keeping track of which code is modifying what data.

  • A more general thread is the realisation that when you program in a low-level language, you’re programming the CPU and the memory, whereas in a high-level language you’re programming the interpreter/environment/VM. (Speaking very loosely, of course.) It feels like there’s another level, almost, of algorithmic thinking you have to do to maintain an efficient memory layout for your Rust program to work with.

  • You never really have to think about mutation or copying objects in memory in JS. Objects are always by reference, primitives are always by value, and within that basic framework you can basically do what you want. Lower-level languages force you to think about whether an object is on the stack or the heap, and whether you want to overwrite it or make a copy. A good illustration of this is in the different types available: in JavaScript, a string is just a string (unless it’s a String, but you never really have to think about that). In Rust, it could be:

    • &str
    • String
    • &String
    • &mut String

    … and there are others, like &'static str, that I haven’t looked into yet.

    These distinctions come into play whenever you want to manipulate and combine values, which of course happens a lot in programming. One example is combining two strings: in JavaScript this is as simple as stringA + stringB, but in Rust you have to think much more about what’s actually going on there: the bytes from each string must be read and copied from their respective memory locations into a new memory location, and there are various ways of going about that with different performance trade-offs. Calling .clone() a lot makes you think about just how much of this shuffling is required for a program to do its thing.

Oh, and the error messages are great! So good it almost feels like cheating, really, so I’ve made a point of trying to figure out the underlying concepts before blindly applying the patch suggestions.

This is also the first time I’ve used ChatGPT to answer basic questions about a topic I’m learning, and it has also been amazingly useful. I guess it’s like the ultimate semantic search engine.

My first proper algorithms in Rust are available in my longest valid truncate sequence post. I’m aiming to do some more IO and long-running processes for my next entry.