Are immutable hash tables constant-time insert and lookup? Actually, it looks like the answer is “no, not really”:
This picture shows insertion time per insertion, so we’d expect to see something like a flat line. Of course, there’s a lot of hand-waving involving “as n approaches infinity” and what that actually means on a machine with limited memory, but it seems clear from this picture that if you’re creating a hash table with about ten million elements, you should probably consider using a mutable hash table; the time required to create that table will be about 40 seconds for an immutable hash table vs. about 7 seconds for a mutable one.
This graph strongly suggests that the added time is entirely GC time, assuming that GC is not running in parallel, which I believe it’s not.
Drawing this on a linear-x graph suggests that the times for the immutable hash table insertion are well below linear; I’m thinking they’re somewhere between log n and (log n)^2.
What about lookup? Well, I ran two experiments; one on keys that are in the table, and one on keys that are not.
In this experiment, there are 3 million lookups on each tree size. The numbers for both of them are interesting in that they move around quite a lot; the error bars aren’t that big, and you can see (especially in the failed lookups) that there are some definite patterns. First of all, the immutable tree lookups pretty clearly display an upward trend, suggesting that lookup is actually log(n), albeit with a fairly small constant (about 20% per 10x). The lookups on the mutable hash tables also seem to be growing, though in their case there seems to be a sawtooth pattern, presumably occurring when the size of the table passes a particular threshold.
In the case of lookups, though, unlike insertion, there’s no clear mutable-vs-immutable winner, at least for the table sizes that I used. Lookups are generally around 150 microseconds, compared to about 600 microseconds to insert into a mutable hash table.
I’ve done a lot of programming in Racket. A lot. And people often ask me: “What do you think of Racket? Should I try it?”
The answer is simple: No. No, you should not.
You’re the kind of person who would do very badly with Racket. Here’s why:
All those parentheses! Good Lord, the language is swimming in parentheses. It’s not uncommon for a line to end with ten or twelve parentheses.
Almost no mutation! Idiomatic Racket code doesn’t set the values of variables in loops, and it doesn’t set the values of result variables in if branches, and you can’t declare variables without giving them values, and Racket programmers hardly ever use classes with mutable fields. There’s no return at all. It’s totally not like Java or C. It’s very strange and unsettling.
Library support. Yes, there are lots of libraries available for Racket, but there are many more in, say, Python. I think there are currently fifty-five thousand packages available for Python.
Racket is an experimental language: when the Racket team decides that the language should change, it does. Typed Racket is evolving rapidly, and even core Racket is getting fixes and new functionality every day.
Racket is not a popular language. You’re not going to be able to search for code snippets on line with anything like the success rate that you’d have for JavaScript or Python or Java.
Racket will ruin you for life as a Java developer. You will be agonizingly aware of how much boilerplate you’re cranking out, and after every hour of shoveling Java, you will sneak off to the bathroom and write a tiny beautiful macro that no one will ever be allowed to see or use.
If none of these succeed in scaring you off, well, then, go ahead and give it a try. Just remember: I warned you.
All three of these sets of passwords are randomly generated from a set of 2^56; they’re all equivalently secure. The second ones and the third ones are generated using markov models built from the text of Charles Dickens’ A Tale Of Two Cities, where transitions are made using Huffman Trees.
The secret sauce here is that since traversing a Huffman tree to a common leaf requires fewer bits than traversing that same tree to reach a deep leaf, we can drive the generating model using a pool of bits, and use varying numbers of bits depending on the likelihood of the taken transition.
This means that there’s a 1-to–1 mapping between the sequences of bits and the corresponding English-like textual fragments, thus guaranteeing the security of the passwords (or, more precisely, reducing it to the problem of generating a cryptographically secure sequence of bits, which many smart people have thought hard about already).
Another reasonable way to describe this process is that we’re just “decompressing” randomly generated crypto bits using a model trained on Dickens.
The only difference between the second and third pools is that the second one uses a 2nd-order markov model—meaning that the choice of a letter is driven by the prior 2—and that the third one uses a 3rd-order model, resulting in more Dickensian text—but also in longer text.
Naturally, you can push this further. When you get to a 5th order model, you get passwords like this:
not bitter their eyes, armed; I am natural
me. Is that. At fire, and, and—in separable;
reason off. The nailed abound tumbril o
and many more." “See, that,” return-
falls, any papers over these listen
do you, yes." "I beg to takes merc
paper movement off before," said, Charles," rejoin
that. She—had the season flung found." He o
Much more Dickensian, much longer. Same security.
You can try it out yourself; Molis Hai contains a small JS implementation of this, and a canned set of 2nd-order trees.
Please note that there’s nothing secret about the model; we’re assuming that an attacker already knows exactly how you’re generating your passwords. The only thing he or she is missing is the 56 bits you used to generate your password.
For a more carefully written paper that explains this a bit more slowly, see the preprint at ArXiv.
Naturally, you can use any corpus you like. I tried generating text using a big slab of my own e-mails, and aside from a serious tendency to follow the letter “J” with the letters “o”, “h”, and “n”, I didn’t notice a huge difference, at least not in the 2nd-order models. Well, actually, here’s an example:
0.91, Also: Zahid We rigor
argustorigoring tent r
Myrics args foling") (
can’s fortalk at html-unds
having avaScript" 0.88489232B
John? I doe.cal fluore let a
botheird, creally, there thic
to ind [(solutell wil
It’s probably true that Charles Dickens wasn’t quite so likely to type “avascript” as I am. Or “html”.
To read the Racket code I used to generate the models, see github.
And for Heaven’s sake, let me know about related work that I missed!
Ooh, just came across this today. I really liked this logo… I did this back in 2012, if the date stamp is to be believed. I think this struck a nice balance between the letter “r” and the lambda (reversed, yes).
To be honest, though, I’m probably flattering myself if I think that this blog post is being read by anyone who doesn’t already know lots about Rust.
One of the key requirements of a language like Rust is that it be embeddable; that is, it should be possible to call Rust code from another language just as it’s possible to call C code from another language.
This is now possible.
To illustrate this, Brian Anderson posted a lovely example of embedding Rust in Ruby. But of course, embedding Rust in Ruby is pretty much exactly the same as embedding Rust in any other language.
So, without further ado, here’s the setup. You just happen to have a small web app written in Racket that performs a Gaussian Blur. You decide to optimize the performance by porting your code to Rust. Then you want to plug your Rust code into your Racket application. Done! Here’s the github repo that contains all of the code.
Let’s see that again in slow motion.
First, here’s the gaussian blur function, written in Racket. We’re going to stick with a grayscale image. It works fine in color, but the code is just that much harder to read.
;; the gaussian filter used in the racket blur.;; boosted center value by 1/1000 to make sure that whites stay white.(definefilter'[[0.0110.0840.011][0.0840.6200.084][0.0110.0840.011]]);; racket-blur: blur the image using the gaussian filter;; number number list-of-bytes -> vector-of-bytes(define(racket-blurwidthheightdata)(definedata-vec(list->vectordata));; ij->offset : compute the offset of the pixel data within the buffer(define(ij->offsetij)(+i(*jwidth)))(definebytes-len(*widthheight))(definenew-bytes(make-vectorbytes-len0))(definefilter-x(length(carfilter)))(definefilter-y(lengthfilter))(defineoffset-x(/(sub1filter-x)2))(defineoffset-y(/(sub1filter-y)2));; compute the filtered byte array(for*([xwidth][yheight])(definenew-val(for*/fold([sum0.0])([dxfilter-x][dyfilter-y])(definesample-x(modulo(+dx(-xoffset-x))width))(definesample-y(modulo(+dy(-yoffset-y))height))(definesample-value(vector-refdata-vec(ij->offsetsample-xsample-y)))(defineweight(list-ref(list-reffilterdy)dx))(+sum(*weightsample-value))))(vector-set!new-bytes(ij->offsetxy)new-val))(vector->listnew-bytes))
Suppose we want to rewrite that in Rust. Here’s what it might look like:
Pretty similar. Of course, it uses curly braces, so it runs about three times faster…
So: what kind of glue code is necessary to link the Rust code to the Racket code? Not a lot. On the Rust side, we need to create a pointer to the C data, then copy the result back into the source buffer when we’re done with the blur: