All three of these sets of passwords are randomly generated from a set of 2^56; they’re all equivalently secure. The second ones and the third ones are generated using markov models built from the text of Charles Dickens’ A Tale Of Two Cities, where transitions are made using Huffman Trees.
The secret sauce here is that since traversing a Huffman tree to a common leaf requires fewer bits than traversing that same tree to reach a deep leaf, we can drive the generating model using a pool of bits, and use varying numbers of bits depending on the likelihood of the taken transition.
This means that there’s a 1-to–1 mapping between the sequences of bits and the corresponding English-like textual fragments, thus guaranteeing the security of the passwords (or, more precisely, reducing it to the problem of generating a cryptographically secure sequence of bits, which many smart people have thought hard about already).
Another reasonable way to describe this process is that we’re just “decompressing” randomly generated crypto bits using a model trained on Dickens.
The only difference between the second and third pools is that the second one uses a 2nd-order markov model—meaning that the choice of a letter is driven by the prior 2—and that the third one uses a 3rd-order model, resulting in more Dickensian text—but also in longer text.
Naturally, you can push this further. When you get to a 5th order model, you get passwords like this:
not bitter their eyes, armed; I am natural
me. Is that. At fire, and, and—in separable;
reason off. The nailed abound tumbril o
and many more." “See, that,” return-
falls, any papers over these listen
do you, yes." "I beg to takes merc
paper movement off before," said, Charles," rejoin
that. She—had the season flung found." He o
Much more Dickensian, much longer. Same security.
You can try it out yourself; Molis Hai contains a small JS implementation of this, and a canned set of 2nd-order trees.
Please note that there’s nothing secret about the model; we’re assuming that an attacker already knows exactly how you’re generating your passwords. The only thing he or she is missing is the 56 bits you used to generate your password.
For a more carefully written paper that explains this a bit more slowly, see the preprint at ArXiv.
Naturally, you can use any corpus you like. I tried generating text using a big slab of my own e-mails, and aside from a serious tendency to follow the letter “J” with the letters “o”, “h”, and “n”, I didn’t notice a huge difference, at least not in the 2nd-order models. Well, actually, here’s an example:
0.91, Also: Zahid We rigor
argustorigoring tent r
Myrics args foling") (
can’s fortalk at html-unds
having avaScript" 0.88489232B
John? I doe.cal fluore let a
botheird, creally, there thic
to ind [(solutell wil
It’s probably true that Charles Dickens wasn’t quite so likely to type “avascript” as I am. Or “html”.
To read the Racket code I used to generate the models, see github.
And for Heaven’s sake, let me know about related work that I missed!
One of my children is in third grade. As part of a “back-to-school” night this year, I sat in a very small chair while a teacher explained to me the “Math Practices” identified as part of the new Common Core standards for math teaching.
Perhaps the small chair simply made me more receptive, taking me back to third grade myself, but as she ran down the list, I found myself thinking: “gosh, these are exactly the same skills that I want to impart to beginning programmers!”
Make sense of problems and persevere in solving them.
Reason abstractly and quantitatively.
Construct viable arguments and critique the reasoning of others.
Model with Mathematics.
Use appropriate tools strategically.
Attend to precision.
Look for and make use of structure.
Look for and express regularity in repeated reasoning.
Holy Moley! Those are incredibly relevant in teaching programming. Furthermore, they sound like they were written by someone intimately familiar with the How To Design Programs or Bootstrap curricula. Indeed, in the remainder of my analysis, I’ll be referring specifically to the steps 1–4 of the design recipe proposed by HtDP (as, e.g., “step 2 of DR”).
I can’t think of any relevant tags for this, and I’d like to think it’s because this is such a broadly applicable idea that it spans most categories.
The idea is this: many systems take inputs and produce outputs. Often, users of these systems would like to know WHY these systems produced these outputs.
I have two examples:
First, home automation. There’s a bunch of work on home automation and how it’s perceived, and one thing that comes across clearly is the frustration and unhappiness that users experience when the system acts in a way that isn’t what they expect.1
In response to this, it seems that the most obvious first step is to have a WHY button. In this case: why did you just turn all the lights off? Why is the thermostat cranked up to a hundred?
Second Example: programming, and more specifically debugging. This is a fairly obvious domain, and there’s already lots of work on time-travel debugging. The basic idea is the same: you have an outcome, it’s not the outcome you want, and you want to try to understand why. In this case, it’s not that we don’t know that we want a WHY button, it’s just really hard to implement.
In between these two extremes, there are lots of other examples. One of the one that gives me the most trouble is in system configuration. Why isn’t SpamAssassin running? (Should be not impossible.) Why is distnoted taking so much memory? (Harder.) Why does my mouse freeze whenever I hit a key in an xterm window behind VNC? (okay, that one is just a debugging question).
Actually, these last issues are interesting ones, because they dip into the space of “search”. That is, I would try to solve all of these last three by just using a search engine. At the moment, though, search doesn’t help me figure out why my program isn’t working (much) or why my lights aren’t on.
SO: what is it that makes the WHY button possible? In general, it appears to me that the answer is simply: declarative programming. When your program is written in a declarative language, it’s much more likely that you can get a good non-brain-bending “why” out of it.
Go, declarative languages!
Okay, now you can share all of the existing Human Interface research that already covers this topic.
Thanks!
1 : Brush, A. J., et al. “Home automation in the wild: challenges and opportunities.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2011.
This summer I started raising wild yeast. It was more or less a biology experiment, and it was extremely hit and miss, mostly miss, for the first few months.
For one thing, it took me quite a while to figure out that you actually do have to dump 4/5 of the starter down the drain every day; I was trying to feed an ever-growing bowl; not nearly enough food-to-yeast ratio, and the resulting bacteria were really… um… not the ones I was looking for.
Fortunately, I enjoy strange smells, but the rest of my family started moving out of the room when I would start feeding my yeast.
Anyhow, I’ve now figured out more or less how to keep a culture alive.
The next big step was reading Chad Robertson’s
Tartine Bread. It turns out that if you have a reasonably lively yeast culture, and a little patience, it’s totally possible to make bread that the rest of your family actually likes a lot. Here’s a loaf from two weeks ago:
loaf of bread
I also got an interesting lesson last week when I’d been away for two days and was trying to jump-start my starter again; the starter was actually just fine, but my “help” was actually doing terrible things to it; I started feeding it every eight hours, and I realize now that the fast feeding—or, more precisely, the fast 4/5-killing—was annihilating the population of my microbiological zoo.
Anyhow, it’s hard to knock the stuff out completely, and this week it’s bubbling away as well as ever.
Next week, maybe I start experimenting with higher hydration.
Ooh, just came across this today. I really liked this logo… I did this back in 2012, if the date stamp is to be believed. I think this struck a nice balance between the letter “r” and the lambda (reversed, yes).
So, I’d like to do some statistical analysis. I hear that R is really good at this. Let’s download it and take a look.
(Ten minutes later)
AAAHHH! MY EYES! THEY’RE BLEEDING!
What about Matlab? It’s the same story.1 As a programming languages person, these languages make me … well, angry.
Why?
Well, after thinking about this for a while, it seems to me that what I hate most about these languages is their complete lack of a small semantic core.
Take a language like Racket, JavaScript, Java, or C— these languages don’t have a heck of a lot in common, but they share
…
is this all just library design? Most of the things I really hate can easily be constructed in any dynamic library through a suitable application of
Terrible Library Design (tm)
… except that when it applies to things like vector dereference, it feels like fairly ‘core’ syntax.
…
Example time! First, R does this crazy thing in distinguishing logical from numeric vectors.
In the first of these two array derefs, we’re using the indices from the vector to decide what elements of a to take. In the second case, though, the index expression is a ‘logical vector’ and is therefore tiled to the length of the original one, and used to decide whether to take the corresponding element.
If you imagine this as part of a language semantics, you’d see this horrible side-condition attached to these rules, where array deref’ing works in totally different ways depending on the kind of argument it gets.
To say nothing of the silent tiling, which seems like an open invitation to horrible bugs.
But wait, we can compound this problem with some nasty coercion:
12
>a[c(4,c(FALSE,TRUE,TRUE))][1]"d""a""a"
What on earth is going on here? First of all, vectors get silently flattened, so that c(3,c(4,5)) is the same as c(3,4,5) — ugh — but then, the logical values are coerced into numeric ones, so the index vector that’s produced is actually c(4,0,1,1), which is then used to index the vector a. But why are there only three values? Oh, well, there’s no index 0, so let’s just skip that one, shall we?
Honestly, I guess the real problem is in thinking of something like R as a programming language; it’s not. It’s a statistical analysis tool, with a rich text-based interface. After all, would I get upset if Photoshop used ‘b’ for blur and ‘s’ for sharpen and I couldn’t nest them the way that I wanted, using parentheses? Probably not.
And finally: apologies for everything I’ve written. I’ve used R for about fifteen minutes, and this piece is really just me blowing off a small amount of steam. Not well written, not well thought-out. Meh.
Actually, maybe not; I spoke with a friend yesterday, and I get the impression that Matlab may not be as horrible as R, here. ↩
Well, it’s 10:47 in the morning, and everyone’s still alive.
This is only news because today was the twentieth running of the long island challenge granite mon thing, and everyone made it safe and sound.
As usual, we got up early; we met at the yacht club at 6:00 AM, which (I see) is actually an hour later than last year. In fact, we might well have started an hour earlier, as we would have had a wee bit more tidal assist and a wee bit less chop.
Be that as it may, it was an absolutely gorgeous morning, and the water was warm. Actually, if you must know, it was way too warm. Not for the swimmers, but rather for the aquatic life. I’ve been doing this swim for about twenty years, now, and the water just keeps getting warmer and warmer. I think it was 65 degrees, but apparently a few days ago Henry Becton recorded a temperature of 75 degrees. This is why all of our aquatic life is dying. Nice for human swimmers, though….
I think this might be a good time to give a shout-out to MERI, which has been monitoring the blue hill watershed since 2004, among many other projects. They report that the ocean temperature has risen by an average of 1.56 degrees celsius, which is … a lot.
So, the death of the planet notwithstanding, we had a really nice swim.
Here we are before the swim (apologies to Tricia, who is entirely hidden here):
pre-swim
Here we are after the swim:
post-swim
From left to right:
Alice Clements
Henry Becton (didn’t swim, but he looks great in this picture)
Charlotte Clews Lawther
Tricia Sawyer
Jerome Lawther
Moira McMahon
John Clements
Samantha Lee
Mary Clews
We would never have attempted this without the astonishing volunteers, including:
Sara Becton
Ethan Coit
Kitty Clements
Robin Clements
Tom Clements
Molly Cooper
Henry Clews
Amanda Herman
John Jeffrey
Deborah Miller-Little
Wing Taylor
Will Taylor
Following the swim, Charlotte and Jerome biked up to Millinocket, and the next day, climbed Katahdin. Guys, may I include your picture at the top?
Thanks to Wing and Alice for pictures. And finally, Alice Clements once again gets credit for organizing the event. Thanks!
Let me start by saying this: I can’t possibly be the first one to think about this.
I’ve been working up in Mountain View, this week. It was a great trip, and I stayed in a totally gorgeous little place. When I got home from work, I either went running or biking. After that, I cooked myself a small dinner, ate it outside, then went into my room to get a bit of work done or get ready for bed.
Macros are a wonderful thing. A hygienic macro system puts language extensions within the reach of non-compiler-hackers.
However, to date, most modern, hygienic macro systems are associated with languages like Scheme and Racket that are quite high-level. My claim is that macros are probably much more useful in low-level languages. Here’s why: