I love knowing that I have powers and knowledge that other people don’t.
I want to talk about that last one. It’s the one I’m least comfortable talking about, and I think it’s somewhere near the center of a number of discussions I’ve had with others and with myself about motivation, teaching, and the future of Computer Science.
A few days ago, I picked up G.H. Hardy’s A Mathematician’s Apology again. He is, to be blunt, an unreconstructed bigot in the area of Native Mathematical Ability. He believes that either you got it or you ain’t, and if you do, it is a thing that sets you apart from other men. (Not from other women, because in the 1920s, everyone was a man. Look it up.)
On the one hand, I can see the appalling lack of mental rigor in some of his arguments. I won’t go point by point, but it appears to me that the nailed-down-mathematical-argument part of his brain simply wasn’t engaged when he made arguments like the assertion that in general those with mathematical talent aren’t likely to have any other useful one.
Nevertheless, much of his thinking pervades mine. I think that thinking such as his fuels much of my own drive (such as it is), and indeed is relatively close to my central joy in life.
Let me put it differently, and more positively: Richard Feynman writes about the simple joy of discovering things for oneself. I completely subscribe to this. Yesterday, of my own accord, I examined the notion of continuity on functions from the reals to the reals and the more general one of continuity on topological spaces, and convinced myself that they’re perfectly aligned (on the functions from reals to reals). That’s a really beautiful thing, and I wanted to run around and tell everyone about it (and here I am doing that). I got to figure that out for myself, and that was a beautiful thing.
Let’s dig a little deeper, though, and try to understand why that should be a beautiful thing. Why is it thrilling to know things, or even better to discover them for ourselves?
Ultimately, I think that it comes down to this: it gives us power, and fuels our egos.
That doesn’t sound very nice, does it?
Well, in many ways it’s the same thing that drives us to work hard at anything. Why do we run races? Why do we compete at Golf? Why do we work hard for promotions, or try to explore new lands? Why do we … do anything?
Well, on a basic level, we’re trying to survive as a species. In order to do this, every individual is programmed to pull hard, to do her very best work, to train harder than the other guy.
When we put it that way, it doesn’t sound so bad. Survival of the species, and more locally, survival of our own DNA. I want to be the best at something, in order to give my DNA a competitive advantage. This is locally visible as an advantage over other people, but manifests itself globally as a path toward improving the survival of the species.
Looking at it in the global context, though, tempers the drive for individual success; my success benefits the species only when it doesn’t come at the expense of the species as a whole. If I try to ensure the success of my genes by eliminating those around me, I’m not benefitting the species, I’m hurting it, and I won’t be around very long.
So, what do we have? We’ve developed a system where we compete—to a point. We compete in venues where our individual success leads to improvement for our families, our communities, and our species.
But enough about you.
What does this mean for me?
When I was a child, I did well at math. I understood things easily, and none of the concepts gave me trouble. When there was an interesting new concept, I spent a bit of time thinking about it, and then I understood it. I got the same kind of satisfaction from math that you might from a video game: a bit of work, and lots of instant gratification. To be sure: I had a privileged upbringing, and lots of attention from teachers to help me succeed. I have relatively few illusions about the part that my background made in my success.
It’s no surprise, then, that I chose to focus on math; Math class was my favorite, and until college, I could convince myself that I was the best in every class (following Hardy here, I’ll suggest that some mild egotism is not out of place here—it’s helpful to believe that you’re the best, even when the evidence is weak).
In college, I discovered lots of folks that were better mathematicians than I was.
Hmm… I can see that I’m going astray here; from a discussion intended to be about the motivations of programmers and programming students, I’m drifting over into nature-vs-nurture.
Today was a spectacular beautiful Maine morning on Blue Hill bay. Especially at 7:00 AM. How do I know this? Today was the 21st running of the long island swim. Or the Granite (Wo)Mon. Or whatever you want to call it. This means that it was also the TWENTIETH ANNIVERSARY of the silly idea. Only five more years until the quarter-century….
This year’s edition was organized by organizer extraordinaire Mary Clews, and it went off without a hitch. The water was extremely warm (and yes, this is not good for the fauna), and the water was pretty much glassy smooth. There were enough chase boats, and everyone finished in a reasonable time. Yay!
Special mention this year is made of Amanda Herman, who is a first time swimmer and I have no idea how fast she was because she was way way ahead of me the whole time. Great Job!
Traveling across the country with yeast. Hard. On the way out here I slopped a bit in a ziploc bag, and though I’d been planning to give the TSA guys some lame story when they pointed out that the bag was way bigger than 3.5 ounces, in the event I actually forgot about it completely, and they missed it on the scan or decided to give me a pass.
I got to Connecticut, and it was still pretty healthy. I feel self-righteous about only feeding my yeast unbleached flour, but basically, it was bubbling along just fine.
So here I am grading another exam. This exam question asks students to imagine what would happen to an interpreter if environments were treated like stores. Then, it asks them to construct a program that would illustrate the difference.
They fail, completely.
(Okay, not completely.)
By and large, it’s pretty easy to characterize the basic failing: these students are unwilling to break the rules.
So, I just bought a copy of Screenflow, an apparently reputable piece of screencasting software for the mac. They sent me a license code. Now it’s time to enter it. Here’s the window:
Hmm… says here all I need to do is… enter my admin password, and then the license code.
Wait… what?
Why in heaven’s name would Screenflow need to know my admin password here?
My guess is that it’s because it wants to put something in the keychain for me. That’s not a very comforting thought; it also means that it could delete things from my keychain, copy the whole thing, etc. etc.
This is a totally unacceptable piece of UI. I think it’s probably both Apple’s and Telestream’s fault. Really, though, I just paid $100 and now I have to decide whether to try to get my money back, or just take the chance that Telestream isn’t evil.
Good news, students. I can’t accurately predict your final grades based solely on your first two assignments, quizzes, and labs.
I tried, though…
First, I took the data from Winter 2015. I separated the students into a training set (70%) and a validation set (30%). I used ordinary least-squares approximation to learn a linear weighting of the scores on the first two labs, the first two assignments, and the first two quizzes. I then applied this weighting to the validation set, to see how accurate it was.
Short story: not accurate enough.
On the training set, the RMS error is 7.9% of the final grade, which is not great but might at least tell you whether you’re going to get an A or a C. Here’s a picture of the distribution of the errors on the training set:
distribution of errors on training set
The x axis is labeled in tenths of a percentage point. This is the density of the errors, so the y axis is somewhat unimportant.
Unfortunately, on the validation set, things fell apart. Specifically, the RMS error was 19.1%, which is pretty terrible. Here’s the picture of the distribution of the errors:
distribution of errors on validation set
Ah well… I guess I’m going to have to grade the midterm.
All three of these sets of passwords are randomly generated from a set of 2^56; they’re all equivalently secure. The second ones and the third ones are generated using markov models built from the text of Charles Dickens’ A Tale Of Two Cities, where transitions are made using Huffman Trees.
The secret sauce here is that since traversing a Huffman tree to a common leaf requires fewer bits than traversing that same tree to reach a deep leaf, we can drive the generating model using a pool of bits, and use varying numbers of bits depending on the likelihood of the taken transition.
This means that there’s a 1-to–1 mapping between the sequences of bits and the corresponding English-like textual fragments, thus guaranteeing the security of the passwords (or, more precisely, reducing it to the problem of generating a cryptographically secure sequence of bits, which many smart people have thought hard about already).
Another reasonable way to describe this process is that we’re just “decompressing” randomly generated crypto bits using a model trained on Dickens.
The only difference between the second and third pools is that the second one uses a 2nd-order markov model—meaning that the choice of a letter is driven by the prior 2—and that the third one uses a 3rd-order model, resulting in more Dickensian text—but also in longer text.
Naturally, you can push this further. When you get to a 5th order model, you get passwords like this:
not bitter their eyes, armed; I am natural
me. Is that. At fire, and, and—in separable;
reason off. The nailed abound tumbril o
and many more." “See, that,” return-
falls, any papers over these listen
do you, yes." "I beg to takes merc
paper movement off before," said, Charles," rejoin
that. She—had the season flung found." He o
Much more Dickensian, much longer. Same security.
You can try it out yourself; Molis Hai contains a small JS implementation of this, and a canned set of 2nd-order trees.
Please note that there’s nothing secret about the model; we’re assuming that an attacker already knows exactly how you’re generating your passwords. The only thing he or she is missing is the 56 bits you used to generate your password.
For a more carefully written paper that explains this a bit more slowly, see the preprint at ArXiv.
Naturally, you can use any corpus you like. I tried generating text using a big slab of my own e-mails, and aside from a serious tendency to follow the letter “J” with the letters “o”, “h”, and “n”, I didn’t notice a huge difference, at least not in the 2nd-order models. Well, actually, here’s an example:
0.91, Also: Zahid We rigor
argustorigoring tent r
Myrics args foling") (
can’s fortalk at html-unds
having avaScript" 0.88489232B
John? I doe.cal fluore let a
botheird, creally, there thic
to ind [(solutell wil
It’s probably true that Charles Dickens wasn’t quite so likely to type “avascript” as I am. Or “html”.
To read the Racket code I used to generate the models, see github.
And for Heaven’s sake, let me know about related work that I missed!