Whoops! The day is now past. The annual Granite Wo-Mon swim took place on August 6th.
Tricia Sawyer got the ball rolling this year, as she has in the past two or three, by asking when the swim was going to be. Favorable tides suggested either mid-July or early August, if I recall correctly, and we settled on the 6th.
We met on the dock at 6:30, and headed out to Long Island. I think we began the swim at around 7:15.
It was an absolutely gorgeous day, and the water was drop-dead gorgeous. That is, it’s so warm that it’s killing lots of marine life. Nice for swimmers, maybe not so nice for everyone else. I swam without a wet suit for the second year running, and I have to say; it wasn’t even cold. Even jumping into the ocean at 7:15 in the morning. Not cold. I’m guessing it was above 20 celsius. Very warm.
Also, there was a good breeze at 7 AM, and it picked up steadily. The chop grew accordingly, and it took me a full 2 hours to finish, compared to 1:38 and 1:15 (!) in 2016 and 2015 respectively. Even discounting the ongoing decrepitude of advancing age (and a near-total lack of preparation), I’d say it was VERY CHOPPY.
PICTURES GO HERE…NEED PICTURES.
- John Clements
- Mary Clews
- Chris Guinness
- Amanda Herman
- Lane Murnik
- Tricia Sawyer
- Pat Starkey
Kudos to first-time swimmer Lane Murnik! Hope to see you many more times. Hopefully it’ll be a bit smoother next year.
Amazing Awesome Chase boats and welcoming committee:
- Sara Ardrey
- Guy Ardrey
- Henry Becton
- Jeannie Becton
- Alice Clements
- Anika Clements
- Xavier Clements
- Henry Clews
- Hal Clews
- Martha Faye (sp?)
- Sean Guinness
- Brennan Starkey
- Eliza Wilmerding
- Oliver (Wilmerding-Herman?)
- His sister?
- (?) Fellow with Lane Murnik aagh sorry I don’t know your name.
Are immutable hash tables constant-time insert and lookup? Actually, it looks like the answer is “no, not really”:
This picture shows insertion time per insertion, so we’d expect to see something like a flat line. Of course, there’s a lot of hand-waving involving “as n approaches infinity” and what that actually means on a machine with limited memory, but it seems clear from this picture that if you’re creating a hash table with about ten million elements, you should probably consider using a mutable hash table; the time required to create that table will be about 40 seconds for an immutable hash table vs. about 7 seconds for a mutable one.
This graph strongly suggests that the added time is entirely GC time, assuming that GC is not running in parallel, which I believe it’s not.
Drawing this on a linear-x graph suggests that the times for the immutable hash table insertion are well below linear; I’m thinking they’re somewhere between log n and (log n)^2.
What about lookup? Well, I ran two experiments; one on keys that are in the table, and one on keys that are not.
In this experiment, there are 3 million lookups on each tree size. The numbers for both of them are interesting in that they move around quite a lot; the error bars aren’t that big, and you can see (especially in the failed lookups) that there are some definite patterns. First of all, the immutable tree lookups pretty clearly display an upward trend, suggesting that lookup is actually log(n), albeit with a fairly small constant (about 20% per 10x). The lookups on the mutable hash tables also seem to be growing, though in their case there seems to be a sawtooth pattern, presumably occurring when the size of the table passes a particular threshold.
In the case of lookups, though, unlike insertion, there’s no clear mutable-vs-immutable winner, at least for the table sizes that I used. Lookups are generally around 150 microseconds, compared to about 600 microseconds to insert into a mutable hash table.
I’ve updated the dependency chart to match the 2017–2019 catalog.
This now includes DATA classes as well as CPE and CSC classes.
Why, professor Clements, why? Why are you taking all of our loops away?
I feel like the Grinch, sometimes, taking all of the pretty little candy loops away from the wide-eyed first-year students.
But the fact is… that I really don’t like loops.
Loops in every popular language simply execute a block of code repeatedly. This means that your code has to perform mutation in order to allow a value to escape from the code. This requires before-and-after reasoning, and introduces a huge new source of bugs into your program. By contrast, there’s almost no chance to tangle the action of the caller and the callee in a recursive call.
Corollary of this: you can easily update loop variables in the wrong order: e.g.: i += 1, sum += arr[i]. Oops, bug.
Loops can’t easily be debugged; in a functional style, each loop iteration is a recursive call for which the student can write an explicit test. Not possible using loops.
In functional style, you can’t forget to update a variable; each recursive call must contain values for all loop variables, or you get a sensible (and generally statically detectable) error.
You can’t write a loop in a non-tail-calling fashion. You have to do all of the work on the way down. Traversing a tree is basically impossible (to do it, you’re going to wind up building a model of the stack, turing tar pit etc.).
Finally, writing in a functional style is about 80% of the way to proving your code correct; you have to choose a name for the action of the recursive call, and you can usually state an invariant on the relation between the inputs and the outputs without difficulty.
Nothing here is new; it’s just a way of blowing off steam while grading my students’ terrible, terrible code. Sigh.
Okay, it’s week four of data structures in Python. In the past few days, I’ve read a lot of terrible code. Here’s a beautiful, horrible, example:
# An IntList is one of
# - None, or
# - Pair(int, IntList)
def __init__(self, first, rest):
self.first = first
self.rest = rest
# standard definitions of __eq__ and __repr__ ...
# A Position is one of
# - an int, representing a list index, or
# - None
# IntList int -> Position
# find the position of the sought element in the list, return None if not found.
def search(l, sought):
if l == None:
rest_result = search(l.rest, sought)
if (l.first == sought or rest_result) != None:
if l.first == sought:
return 1 + rest_result
This code works correctly. It searches a list to find the position of a given element. Notice anything interesting about it?
Take a look at the parentheses in the
if line. How do you feel about this code now?
(Spoilers after the jump. Figure out why it works before clicking, and how to avoid this problem.)
It’s much closer to ‘go’ time now with Python, and I must say, getting to know Python better is not making me like it better. I know it’s widely used, but it really has many nasty bits, especially when I look toward using it for teaching. Here’s my old list:
- Testing framework involves hideous boilerplate.
- Testing framework has standard problems with floating-point numbers.
- Scoping was clearly designed by someone who’d never taken (or failed to pay attention in) a programming languages course.
- The vile ‘return’ appears everywhere.
But wait, now I have many more, and I’m a bit more shouty:
- Oh dear lord, I’m going to have to force my students to implement their own equality method in order to get test-case-intensional checking. Awful. Discovering this was the moment when I switched from actually writing Python to writing Racket code that generates Python. Bleah.
- Python’s timing mechanism involves a hideously unhygienic “pass me a string representing a program” mechanism. Totally dreadful. Worse than C macros.
- Finally, I just finished reading Guido Van Rossum’s piece on tail-calling, and I find his arguments not just unconvincing, not just wrong, but sort of deliberately insulting. His best point is his first: TRE (or TCO or just proper tail-calling) can reduce the utility of stack traces. However, the solution of translating this code to loops destroys the stack traces too! You can argue that you lose stack frames in those instances in which you make tail calls that are not representable as loops, and in that case I guess I’d point you to our work with continuation marks. His next point can be paraphrased as “If we give them nice things, they might come to depend on them.” Well, yes. His third point suggests to me that he’s tired of losing arguments with Scheme programmers. Fourth, and maybe this is the most persuasive, he points out that Python is a poorly designed language and that it’s not easy for a compiler to reliably determine whether a call is in tail position. Actually, it looks like he’s wrong even here; I read it more carefully, and he’s getting hung up on some extremely simple scoping issues. I’m really not impressed by GvR as a language designer.
Some data: how long is it taking our students to graduate? How is that changing over time?
It looks like CSC and CPE are improving. It looks like CPE students are having trouble getting out in 4 years. On the other hand, the 5-year graduation rate for CPE majors is steadily rising.
It looks like lots of students graduate in four years and one quarter.
It looks like very few students graduate in their sixth year; that may be because there’s nobody left; this graph doesn’t distinguish between those who haven’t yet graduated, and those who are discontinued or disqualified. However, given the very small numbers for six years, it seems reasonable to assume that most of those that haven’t finished after six years already gave up, possibly years earlier.
Note that for students admitted in 2012, none of them are shown as having graduated in 4 years plus 2 quarters or more… because it hasn’t yet been that long since they were admitted. By the same token, we don’t see six-year numbers for 2011 admits, etc.
Also, all of this data uses “adjusted cohorts”. That is, those that change majors out of the program are removed from the denominator and the numerator, and those changing majors into the program are added to the denominator and the numerator.
(EDIT: images updated with 2172 data.)
The TL/DR: we’re teaching way fewer sections of service courses.
Here’s a picture of (lecture, non-supervisory) sections taught by CSSE faculty since Fall 2008:
Because the number of sections taught has gone up rather dramatically, here’s the same data, normalized by sections per year, to show the fraction of classes taught in each category:
There are many caveats and things to explain about these pictures.
Following Zoe’s excellent suggestion, I added color to the course dependency chart:
The arrows should be the same as the previous chart, though you’ll notice that because of the way I generated it, the chart now includes the classes that don’t have any edges connected to them.
The colors relate to the number of students that have enrolled in each class. Specifically, the log of the number of students that have taken the class from fall 2010 through spring 2016. Classes with zero enrollment are given log(0.5). The class with the highest value is red, the one with the lowest value is green. Yes, there should be a legend. Classes that aren’t in the Computer Science catalog are gray.
Note that I have no idea how many of these enrolls are repeats. That would be interesting, but I’ll need a different data source to answer that question.
The numbers for each class are interesting, and I think I’ll just publish them separately.
(EDIT: updated with 2174 data.)
This picture shows the evolution of class sizes in classes taught by Computer Science faculty since Fall 2008. Specifically, it shows the likelihood that a randomly chosen scu will belong to a class of a given size.
So, for instance, you can see that in Fall 2008, more than 50% of our SCUs were delivered in classes of size less than 30, and that in Winter 2017 (2172), about 90% of our SCUs were delivered in classes of size 40 or less.
Here’s the picture for the College as a whole:
You may be wondering why I keep yammering on about SCUs, rather than saying things like “50% of classes…”.