Posts tagged Racket

hash table timings

:: Racket

By: John Clements

Are immutable hash tables constant-time insert and lookup? Actually, it looks like the answer is “no, not really”:

This picture shows insertion time per insertion, so we’d expect to see something like a flat line. Of course, there’s a lot of hand-waving involving “as n approaches infinity” and what that actually means on a machine with limited memory, but it seems clear from this picture that if you’re creating a hash table with about ten million elements, you should probably consider using a mutable hash table; the time required to create that table will be about 40 seconds for an immutable hash table vs. about 7 seconds for a mutable one.

This graph strongly suggests that the added time is entirely GC time, assuming that GC is not running in parallel, which I believe it’s not.

Drawing this on a linear-x graph suggests that the times for the immutable hash table insertion are well below linear; I’m thinking they’re somewhere between log n and (log n)^2.

What about lookup? Well, I ran two experiments; one on keys that are in the table, and one on keys that are not.

In this experiment, there are 3 million lookups on each tree size. The numbers for both of them are interesting in that they move around quite a lot; the error bars aren’t that big, and you can see (especially in the failed lookups) that there are some definite patterns. First of all, the immutable tree lookups pretty clearly display an upward trend, suggesting that lookup is actually log(n), albeit with a fairly small constant (about 20% per 10x). The lookups on the mutable hash tables also seem to be growing, though in their case there seems to be a sawtooth pattern, presumably occurring when the size of the table passes a particular threshold.

In the case of lookups, though, unlike insertion, there’s no clear mutable-vs-immutable winner, at least for the table sizes that I used. Lookups are generally around 150 microseconds, compared to about 600 microseconds to insert into a mutable hash table.

Racket Sucks, Don’t Try It

:: Racket, Programming Languages

By: John Clements

I’ve done a lot of programming in Racket. A lot. And people often ask me: “What do you think of Racket? Should I try it?”

The answer is simple: No. No, you should not.

You’re the kind of person who would do very badly with Racket. Here’s why:

  • All those parentheses! Good Lord, the language is swimming in parentheses. It’s not uncommon for a line to end with ten or twelve parentheses.

  • Almost no mutation! Idiomatic Racket code doesn’t set the values of variables in loops, and it doesn’t set the values of result variables in if branches, and you can’t declare variables without giving them values, and Racket programmers hardly ever use classes with mutable fields. There’s no return at all. It’s totally not like Java or C. It’s very strange and unsettling.

  • Library support. Yes, there are lots of libraries available for Racket, but there are many more in, say, Python. I think there are currently fifty-five thousand packages available for Python.

  • Racket is an experimental language: when the Racket team decides that the language should change, it does. Typed Racket is evolving rapidly, and even core Racket is getting fixes and new functionality every day.

  • Racket is not a popular language. You’re not going to be able to search for code snippets on line with anything like the success rate that you’d have for JavaScript or Python or Java.

  • Racket will ruin you for life as a Java developer. You will be agonizingly aware of how much boilerplate you’re cranking out, and after every hour of shoveling Java, you will sneak off to the bathroom and write a tiny beautiful macro that no one will ever be allowed to see or use.

If none of these succeed in scaring you off, well, then, go ahead and give it a try. Just remember: I warned you.

Molis Hai: generating Passwords using Charles Dickens

:: Racket, Security

By: John Clements

TL;DR: Molis Hai

Randomly generated passwords:

  • dMbcGp=A(
  • 9eMRV7N%[
  • R]eJxx68v
  • GVUN#ek5z
  • ms8AG09-h
  • sVh2TT4wx
  • Y]sa7-b(f
  • BOrnNGLqk

More randomly generated passwords:

  • wargestood hury on,
  • wealenerity," stp
  • twould, aftilled himenu
  • Whaideve awasaga
  • andir her hing ples. F
  • spe it humphadeas a
  • to and ling, ace upooke,
  • Mr. Syd, why.’ tred. "D

Yet More randomly generated passwords

  • brothe aponder and," reasun
  • ther atternal telle is be
  • his me, he foundred, id
  • allant our faces of rai
  • time! What it of vail
  • sourned," reate." Manybody.
  • they would reck," read-doom
  • raise thack ther meant,

Which of these look easiest to remember?

All three of these sets of passwords are randomly generated from a set of 2^56; they’re all equivalently secure. The second ones and the third ones are generated using markov models built from the text of Charles Dickens’ A Tale Of Two Cities, where transitions are made using Huffman Trees.

The secret sauce here is that since traversing a Huffman tree to a common leaf requires fewer bits than traversing that same tree to reach a deep leaf, we can drive the generating model using a pool of bits, and use varying numbers of bits depending on the likelihood of the taken transition.

This means that there’s a 1-to–1 mapping between the sequences of bits and the corresponding English-like textual fragments, thus guaranteeing the security of the passwords (or, more precisely, reducing it to the problem of generating a cryptographically secure sequence of bits, which many smart people have thought hard about already).

Another reasonable way to describe this process is that we’re just “decompressing” randomly generated crypto bits using a model trained on Dickens.

The only difference between the second and third pools is that the second one uses a 2nd-order markov model—meaning that the choice of a letter is driven by the prior 2—and that the third one uses a 3rd-order model, resulting in more Dickensian text—but also in longer text.

Naturally, you can push this further. When you get to a 5th order model, you get passwords like this:

  • not bitter their eyes, armed; I am natural
  • me. Is that. At fire, and, and—in separable;
  • reason off. The nailed abound tumbril o
  • and many more." “See, that,” return-
  • falls, any papers over these listen
  • do you, yes." "I beg to takes merc
  • paper movement off before," said, Charles," rejoin
  • that. She—had the season flung found." He o

Much more Dickensian, much longer. Same security.

You can try it out yourself; Molis Hai contains a small JS implementation of this, and a canned set of 2nd-order trees.

Please note that there’s nothing secret about the model; we’re assuming that an attacker already knows exactly how you’re generating your passwords. The only thing he or she is missing is the 56 bits you used to generate your password.

For a more carefully written paper that explains this a bit more slowly, see the preprint at ArXiv.

Naturally, you can use any corpus you like. I tried generating text using a big slab of my own e-mails, and aside from a serious tendency to follow the letter “J” with the letters “o”, “h”, and “n”, I didn’t notice a huge difference, at least not in the 2nd-order models. Well, actually, here’s an example:

  • 0.91, Also: Zahid We rigor
  • argustorigoring tent r
  • Myrics args foling") (
  • can’s fortalk at html-unds
  • having avaScript" 0.88489232B
  • John? I doe.cal fluore let a
  • botheird, creally, there thic
  • to ind [(solutell wil

It’s probably true that Charles Dickens wasn’t quite so likely to type “avascript” as I am. Or “html”.

To read the Racket code I used to generate the models, see github.

And for Heaven’s sake, let me know about related work that I missed!

my old racket logo

:: Racket

By: John Clements

Ooh, just came across this today. I really liked this logo… I did this back in 2012, if the date stamp is to be believed. I think this struck a nice balance between the letter “r” and the lambda (reversed, yes).

a proposed racket logo

a proposed racket logo

Embedding Rust in Racket

:: Rust, Racket, Programming Languages

By: John Clements

Is this post a thinly disguised ripoff of Brian Anderson’s post about embedding Rust in Ruby? Why yes. Yes it is.

Okay, let me start with a little background. Rust is a magnificent language that comes from Mozilla; it’s targeted at programmers who want

  • high and predictable performance,
  • control over memory layout,
  • good support for concurrency, and
  • safety.

I think the Mozilla Research homepage is probably the best place to start learning about Rust.

To be honest, though, I’m probably flattering myself if I think that this blog post is being read by anyone who doesn’t already know lots about Rust.

One of the key requirements of a language like Rust is that it be embeddable; that is, it should be possible to call Rust code from another language just as it’s possible to call C code from another language.

This is now possible.

To illustrate this, Brian Anderson posted a lovely example of embedding Rust in Ruby. But of course, embedding Rust in Ruby is pretty much exactly the same as embedding Rust in any other language.

Say, for instance, Racket.

So, without further ado, here’s the setup. You just happen to have a small web app written in Racket that performs a Gaussian Blur. You decide to optimize the performance by porting your code to Rust. Then you want to plug your Rust code into your Racket application. Done! Here’s the github repo that contains all of the code.

Let’s see that again in slow motion.

First, here’s the gaussian blur function, written in Racket. We’re going to stick with a grayscale image. It works fine in color, but the code is just that much harder to read.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
;; the gaussian filter used in the racket blur.
;; boosted center value by 1/1000 to make sure that whites stay white.
(define filter '[[0.011 0.084 0.011]
                 [0.084 0.620 0.084]
                 [0.011 0.084 0.011]])

;; racket-blur: blur the image using the gaussian filter
;; number number list-of-bytes -> vector-of-bytes
(define (racket-blur width height data)
  (define data-vec (list->vector data))
  ;; ij->offset : compute the offset of the pixel data within the buffer
  (define (ij->offset i j)
    (+ i (* j width)))
  (define bytes-len (* width height))
  (define new-bytes (make-vector bytes-len 0))
  (define filter-x (length (car filter)))
  (define filter-y (length filter))
  (define offset-x (/ (sub1 filter-x) 2))
  (define offset-y (/ (sub1 filter-y) 2))
  ;; compute the filtered byte array
  (for* ([x width]
         [y height])
    (define new-val
      (for*/fold ([sum 0.0])
        ([dx filter-x]
         [dy filter-y])
        (define sample-x (modulo (+ dx (- x offset-x)) width))
        (define sample-y (modulo (+ dy (- y offset-y)) height))
        (define sample-value (vector-ref data-vec (ij->offset sample-x sample-y)))
        (define weight (list-ref (list-ref filter dy) dx))
        (+ sum (* weight sample-value))))
    (vector-set! new-bytes (ij->offset x y) new-val))
  (vector->list new-bytes))

Suppose we want to rewrite that in Rust. Here’s what it might look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
fn blur_rust(width: uint, height: uint, data: &[u8]) -> ~[u8] {

    let filter = [[0.011, 0.084, 0.011],
                  [0.084, 0.620, 0.084],
                  [0.011, 0.084, 0.011]];

    let mut newdata = ~[];

    for uint::range(0, height) |y| {
        for uint::range(0, width) |x| {
            let mut new_value = 0.0;
            for uint::range(0, filter.len()) |yy| {
                for uint::range(0, filter.len()) |xx| {
                    let x_sample = x - (filter.len() - 1) / 2 + xx;
                    let y_sample = y - (filter.len() - 1) / 2 + yy;
                    let sample_value = data[width * (y_sample % height) + (x_sample % width)];
                    let sample_value = sample_value as float;
                    let weight = filter[yy][xx];
                    new_value += sample_value * weight;
                }
            }
            newdata.push(new_value as u8);
        }
    }

    return newdata;
}

Pretty similar. Of course, it uses curly braces, so it runs about three times faster…

So: what kind of glue code is necessary to link the Rust code to the Racket code? Not a lot. On the Rust side, we need to create a pointer to the C data, then copy the result back into the source buffer when we’re done with the blur:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#[no_mangle]
pub extern fn blur(width: c_uint, height: c_uint, data: *mut u8) {
    let width = width as uint;
    let height = height as uint;

    unsafe {
        do vec::raw::mut_buf_as_slice(data, width * height) |data| {
            let out_data = blur_rust(width, height, data);
            vec::raw::copy_memory(data, out_data, width * height);
        }
    }
}

On the Racket side, it’s just a question of making an ffi call, which is super-concise:

1
2
3
4
5
6
7
8
;; link to the rust library:
(define rust-lib (ffi-lib (build-path here "libblur-68a2c114141ca-0.0")))
(define rust-blur-fun (get-ffi-obj "blur" rust-lib (_fun _uint _uint _cvector -> _void)))

(define (rust-blur width height data)
  (define cvec (list->cvector data _byte))
  (rust-blur-fun width height cvec)
  (cvector->list cvec))

And away you go!

I’ve got this code running live at FIXME. What’s that you say? You can’t seem to find FIXME?