Happy Hexadecimal New Year!

:: Time

By: John Clements

Happy Hexadecimal New Year!

Actually, I missed it. It’s now

57:00:40:6d

… so the new year, the 88th since the epoch, was a full 4 hexadecimal hours ago.

All of you TAI64 log watchers can put on your party hats and blow your tooters.

One additional handy attribute of Hexadecimal new year is that it doesn’t generally happen in the middle of the night.

Yahoo!

Confessions of a Programmer

:: Programming

By: John Clements

I am a programmer.

Here are some of the things that I love.

  • I love programming.
  • I love abstraction and purity.
  • I love complexity, and fiddly details.
  • I love math.
  • I love knowing that I have powers and knowledge that other people don’t.

I want to talk about that last one. It’s the one I’m least comfortable talking about, and I think it’s somewhere near the center of a number of discussions I’ve had with others and with myself about motivation, teaching, and the future of Computer Science.


A few days ago, I picked up G.H. Hardy’s A Mathematician’s Apology again. He is, to be blunt, an unreconstructed bigot in the area of Native Mathematical Ability. He believes that either you got it or you ain’t, and if you do, it is a thing that sets you apart from other men. (Not from other women, because in the 1920s, everyone was a man. Look it up.)

On the one hand, I can see the appalling lack of mental rigor in some of his arguments. I won’t go point by point, but it appears to me that the nailed-down-mathematical-argument part of his brain simply wasn’t engaged when he made arguments like the assertion that in general those with mathematical talent aren’t likely to have any other useful one.

Nevertheless, much of his thinking pervades mine. I think that thinking such as his fuels much of my own drive (such as it is), and indeed is relatively close to my central joy in life.

Let me put it differently, and more positively: Richard Feynman writes about the simple joy of discovering things for oneself. I completely subscribe to this. Yesterday, of my own accord, I examined the notion of continuity on functions from the reals to the reals and the more general one of continuity on topological spaces, and convinced myself that they’re perfectly aligned (on the functions from reals to reals). That’s a really beautiful thing, and I wanted to run around and tell everyone about it (and here I am doing that). I got to figure that out for myself, and that was a beautiful thing.

Let’s dig a little deeper, though, and try to understand why that should be a beautiful thing. Why is it thrilling to know things, or even better to discover them for ourselves?

Ultimately, I think that it comes down to this: it gives us power, and fuels our egos.

That doesn’t sound very nice, does it?

Well, in many ways it’s the same thing that drives us to work hard at anything. Why do we run races? Why do we compete at Golf? Why do we work hard for promotions, or try to explore new lands? Why do we … do anything?

Well, on a basic level, we’re trying to survive as a species. In order to do this, every individual is programmed to pull hard, to do her very best work, to train harder than the other guy.

When we put it that way, it doesn’t sound so bad. Survival of the species, and more locally, survival of our own DNA. I want to be the best at something, in order to give my DNA a competitive advantage. This is locally visible as an advantage over other people, but manifests itself globally as a path toward improving the survival of the species.

Looking at it in the global context, though, tempers the drive for individual success; my success benefits the species only when it doesn’t come at the expense of the species as a whole. If I try to ensure the success of my genes by eliminating those around me, I’m not benefitting the species, I’m hurting it, and I won’t be around very long.

So, what do we have? We’ve developed a system where we compete—to a point. We compete in venues where our individual success leads to improvement for our families, our communities, and our species.

But enough about you.

What does this mean for me?

When I was a child, I did well at math. I understood things easily, and none of the concepts gave me trouble. When there was an interesting new concept, I spent a bit of time thinking about it, and then I understood it. I got the same kind of satisfaction from math that you might from a video game: a bit of work, and lots of instant gratification. To be sure: I had a privileged upbringing, and lots of attention from teachers to help me succeed. I have relatively few illusions about the part that my background made in my success.

It’s no surprise, then, that I chose to focus on math; Math class was my favorite, and until college, I could convince myself that I was the best in every class (following Hardy here, I’ll suggest that some mild egotism is not out of place here—it’s helpful to believe that you’re the best, even when the evidence is weak).

In college, I discovered lots of folks that were better mathematicians than I was.

Hmm… I can see that I’m going astray here; from a discussion intended to be about the motivations of programmers and programming students, I’m drifting over into nature-vs-nurture.

Further Editing Required.

Or, as Piet Hein put it,

Things Take Time.

Granite Mon 2015

:: granitemon

By: John Clements

Today was a spectacular beautiful Maine morning on Blue Hill bay. Especially at 7:00 AM. How do I know this? Today was the 21st running of the long island swim. Or the Granite (Wo)Mon. Or whatever you want to call it. This means that it was also the TWENTIETH ANNIVERSARY of the silly idea. Only five more years until the quarter-century….

This year’s edition was organized by organizer extraordinaire Mary Clews, and it went off without a hitch. The water was extremely warm (and yes, this is not good for the fauna), and the water was pretty much glassy smooth. There were enough chase boats, and everyone finished in a reasonable time. Yay!

Special mention this year is made of Amanda Herman, who is a first time swimmer and I have no idea how fast she was because she was way way ahead of me the whole time. Great Job!

Swimmers:

  • John Clements
  • Charlotte Clews Lawther
  • Mary Clews
  • Chris Guinness
  • Sean Guinness
  • Amanda Herman
  • Tricia Sawyer
  • Pat Starkey

Crew:

  • Henry Clews
  • Jerome Lawther
  • Jenney Wilder
  • Eliza Wilmerding
  • Sara Ardrey
  • Hal Clews
  • Jeff Rushton
  • Ann Luskey
  • Georgia Clews
  • Lucy Clews
  • Xavier Clements
  • Neddie Clews
  • Charlotte Weir

End-of-swim Hosts (tea and towels, many thanks!):

  • Henry Becton
  • Jeannie Becton
  • Guy Ardrey

what are you doing to my yeast, blue hill?

:: Bread

By: John Clements

Traveling across the country with yeast. Hard. On the way out here I slopped a bit in a ziploc bag, and though I’d been planning to give the TSA guys some lame story when they pointed out that the bag was way bigger than 3.5 ounces, in the event I actually forgot about it completely, and they missed it on the scan or decided to give me a pass.

I got to Connecticut, and it was still pretty healthy. I feel self-righteous about only feeding my yeast unbleached flour, but basically, it was bubbling along just fine.

Then I brought it to Maine.

Break it! Confrontational thinking in computer science

:: Programming Languages, Teaching

By: John Clements

So here I am grading another exam. This exam question asks students to imagine what would happen to an interpreter if environments were treated like stores. Then, it asks them to construct a program that would illustrate the difference.

They fail, completely.

(Okay, not completely.)

By and large, it’s pretty easy to characterize the basic failing: these students are unwilling to break the rules.

my root password? oh sure, here it is.

:: Mac, Security, Programming Languages

By: John Clements

So, I just bought a copy of Screenflow, an apparently reputable piece of screencasting software for the mac. They sent me a license code. Now it’s time to enter it. Here’s the window:

Hmm… says here all I need to do is… enter my admin password, and then the license code.

Wait… what?

Why in heaven’s name would Screenflow need to know my admin password here?

My guess is that it’s because it wants to put something in the keychain for me. That’s not a very comforting thought; it also means that it could delete things from my keychain, copy the whole thing, etc. etc.

This is a totally unacceptable piece of UI. I think it’s probably both Apple’s and Telestream’s fault. Really, though, I just paid $100 and now I have to decide whether to try to get my money back, or just take the chance that Telestream isn’t evil.

It’s hard to predict student scores

:: Machine Learning

By: John Clements

Good news, students. I can’t accurately predict your final grades based solely on your first two assignments, quizzes, and labs.

I tried, though…

First, I took the data from Winter 2015. I separated the students into a training set (70%) and a validation set (30%). I used ordinary least-squares approximation to learn a linear weighting of the scores on the first two labs, the first two assignments, and the first two quizzes. I then applied this weighting to the validation set, to see how accurate it was.

Short story: not accurate enough.

On the training set, the RMS error is 7.9% of the final grade, which is not great but might at least tell you whether you’re going to get an A or a C. Here’s a picture of the distribution of the errors on the training set:

distribution of errors on training set

distribution of errors on training set

The x axis is labeled in tenths of a percentage point. This is the density of the errors, so the y axis is somewhat unimportant.

Unfortunately, on the validation set, things fell apart. Specifically, the RMS error was 19.1%, which is pretty terrible. Here’s the picture of the distribution of the errors:

distribution of errors on validation set

distribution of errors on validation set

Ah well… I guess I’m going to have to grade the midterm.

Molis Hai: generating Passwords using Charles Dickens

:: Racket, Security

By: John Clements

TL;DR: Molis Hai

Randomly generated passwords:

  • dMbcGp=A(
  • 9eMRV7N%[
  • R]eJxx68v
  • GVUN#ek5z
  • ms8AG09-h
  • sVh2TT4wx
  • Y]sa7-b(f
  • BOrnNGLqk

More randomly generated passwords:

  • wargestood hury on,
  • wealenerity," stp
  • twould, aftilled himenu
  • Whaideve awasaga
  • andir her hing ples. F
  • spe it humphadeas a
  • to and ling, ace upooke,
  • Mr. Syd, why.’ tred. "D

Yet More randomly generated passwords

  • brothe aponder and," reasun
  • ther atternal telle is be
  • his me, he foundred, id
  • allant our faces of rai
  • time! What it of vail
  • sourned," reate." Manybody.
  • they would reck," read-doom
  • raise thack ther meant,

Which of these look easiest to remember?

All three of these sets of passwords are randomly generated from a set of 2^56; they’re all equivalently secure. The second ones and the third ones are generated using markov models built from the text of Charles Dickens’ A Tale Of Two Cities, where transitions are made using Huffman Trees.

The secret sauce here is that since traversing a Huffman tree to a common leaf requires fewer bits than traversing that same tree to reach a deep leaf, we can drive the generating model using a pool of bits, and use varying numbers of bits depending on the likelihood of the taken transition.

This means that there’s a 1-to–1 mapping between the sequences of bits and the corresponding English-like textual fragments, thus guaranteeing the security of the passwords (or, more precisely, reducing it to the problem of generating a cryptographically secure sequence of bits, which many smart people have thought hard about already).

Another reasonable way to describe this process is that we’re just “decompressing” randomly generated crypto bits using a model trained on Dickens.

The only difference between the second and third pools is that the second one uses a 2nd-order markov model—meaning that the choice of a letter is driven by the prior 2—and that the third one uses a 3rd-order model, resulting in more Dickensian text—but also in longer text.

Naturally, you can push this further. When you get to a 5th order model, you get passwords like this:

  • not bitter their eyes, armed; I am natural
  • me. Is that. At fire, and, and—in separable;
  • reason off. The nailed abound tumbril o
  • and many more." “See, that,” return-
  • falls, any papers over these listen
  • do you, yes." "I beg to takes merc
  • paper movement off before," said, Charles," rejoin
  • that. She—had the season flung found." He o

Much more Dickensian, much longer. Same security.

You can try it out yourself; Molis Hai contains a small JS implementation of this, and a canned set of 2nd-order trees.

Please note that there’s nothing secret about the model; we’re assuming that an attacker already knows exactly how you’re generating your passwords. The only thing he or she is missing is the 56 bits you used to generate your password.

For a more carefully written paper that explains this a bit more slowly, see the preprint at ArXiv.

Naturally, you can use any corpus you like. I tried generating text using a big slab of my own e-mails, and aside from a serious tendency to follow the letter “J” with the letters “o”, “h”, and “n”, I didn’t notice a huge difference, at least not in the 2nd-order models. Well, actually, here’s an example:

  • 0.91, Also: Zahid We rigor
  • argustorigoring tent r
  • Myrics args foling") (
  • can’s fortalk at html-unds
  • having avaScript" 0.88489232B
  • John? I doe.cal fluore let a
  • botheird, creally, there thic
  • to ind [(solutell wil

It’s probably true that Charles Dickens wasn’t quite so likely to type “avascript” as I am. Or “html”.

To read the Racket code I used to generate the models, see github.

And for Heaven’s sake, let me know about related work that I missed!