Posts tagged Programming Languages

DWIM vs. Small Semantic Core

:: R, Programming Languages

By: John Clements

So, I’d like to do some statistical analysis. I hear that R is really good at this. Let’s download it and take a look.

(Ten minutes later)

AAAHHH! MY EYES! THEY’RE BLEEDING!

What about Matlab? It’s the same story.1 As a programming languages person, these languages make me … well, angry.

Why?

Well, after thinking about this for a while, it seems to me that what I hate most about these languages is their complete lack of a small semantic core.

Take a language like Racket, JavaScript, Java, or C— these languages don’t have a heck of a lot in common, but they share

is this all just library design? Most of the things I really hate can easily be constructed in any dynamic library through a suitable application of

Terrible Library Design (tm)

… except that when it applies to things like vector dereference, it feels like fairly ‘core’ syntax.

Example time! First, R does this crazy thing in distinguishing logical from numeric vectors.

1
2
3
4
5
6
> a
[1] "a" "b" "c" "d"
> a[c(2,4,3)]
[1] "b" "d" "c"
> a[c(FALSE,TRUE)]
[1] "b" "d"

In the first of these two array derefs, we’re using the indices from the vector to decide what elements of a to take. In the second case, though, the index expression is a ‘logical vector’ and is therefore tiled to the length of the original one, and used to decide whether to take the corresponding element.

If you imagine this as part of a language semantics, you’d see this horrible side-condition attached to these rules, where array deref’ing works in totally different ways depending on the kind of argument it gets.

To say nothing of the silent tiling, which seems like an open invitation to horrible bugs.

But wait, we can compound this problem with some nasty coercion:

1
2
> a[c(4,c(FALSE,TRUE,TRUE))]
[1] "d" "a" "a"

What on earth is going on here? First of all, vectors get silently flattened, so that c(3,c(4,5)) is the same as c(3,4,5) — ugh — but then, the logical values are coerced into numeric ones, so the index vector that’s produced is actually c(4,0,1,1), which is then used to index the vector a. But why are there only three values? Oh, well, there’s no index 0, so let’s just skip that one, shall we?

Honestly, I guess the real problem is in thinking of something like R as a programming language; it’s not. It’s a statistical analysis tool, with a rich text-based interface. After all, would I get upset if Photoshop used ‘b’ for blur and ‘s’ for sharpen and I couldn’t nest them the way that I wanted, using parentheses? Probably not.

And finally: apologies for everything I’ve written. I’ve used R for about fifteen minutes, and this piece is really just me blowing off a small amount of steam. Not well written, not well thought-out. Meh.

  1. Actually, maybe not; I spoke with a friend yesterday, and I get the impression that Matlab may not be as horrible as R, here. 

Macros: more important in low-level languages?

:: Macros, Programming Languages

By: John Clements

Macros are a wonderful thing. A hygienic macro system puts language extensions within the reach of non-compiler-hackers.

However, to date, most modern, hygienic macro systems are associated with languages like Scheme and Racket that are quite high-level. My claim is that macros are probably much more useful in low-level languages. Here’s why:

Too Elegant For September

:: Programming Languages, Teaching

By: John Clements

Being on sabbatical has given me a bit of experience with other systems and languages. Also, my kids are now old enough to “mess around” with programming. Learning from both of these, I’d like to hazard a bit of HtDP heresy: students should learn for i = 1 to 10 before they learn

1
2
3
(define (sum lon)
  (cond [(empty? lon) 0]
        [else (+ (first lon) (sum (rest lon)))]))

To many of you, this may seem obvious. I’m not writing to you. Or maybe you folks can just read along and nod sagely.

HtDP takes this small and very lovely thing—recursive traversals over inductively defined data—and shows how it covers a huge piece of real estate. Really, if students could just understand how to write this class of programs effectively, they would have a vastly easier time with much of the rest of their programming careers, to say nothing of the remainder of their undergraduate tenure. Throw a few twists in there—a bit of mutation for efficiency, some memoization, some dynamic programming—and you’re pretty much done with the programming part of your first four years.

The sad thing is that many, many students make it through an entire four-year curriculum without ever really figuring out how to write a simple recursive traversal of an inductively defined data structure. This makes professors sad.

Among the Very Simple applications of this nice idea is that of “indexes.” That is, the natural numbers can be regarded as an inductively defined set, where a natural number is either 0 or the successor of a natural number. This allows you to regard any kind of indexing loop as simply a special case of … a recursive traversal of an inductively defined data structure.

So here’s the problem: in September, you face a bunch of bright-eyed, enthusiastic, deeply forgiving first-year college students. And you give them the recursive traversal of the inductively defined data structure. A very small number of them get it, and they’re off to the races. The rest of them struggle, and struggle, and finally get their teammates to help them write the code, and really wish they’d taken some other class.

NB: the rest of this makes less sense… even to me. Not finished.

However, another big part of the problem is … well, monads are like burritos.

Let me take a step back.

The notion of repeated action is a visceral and easily-understood one. Here’s what I mean. “A human can multiply a pair of 32-bit integers in about a minute. A computer can multiply 32-bit integers at a rate of several billion per second, or about a hundred billion times as fast as a person.” That’s an easily-understood claim: we understand what it means to the same thing a whole bunch of times really fast.

So, when I write

for i=[1..100] multiply_two_numbers();

It’s pretty easy to understand that I’m doing something one hundred times.

Embedding Rust in Racket

:: Rust, Racket, Programming Languages

By: John Clements

Is this post a thinly disguised ripoff of Brian Anderson’s post about embedding Rust in Ruby? Why yes. Yes it is.

Okay, let me start with a little background. Rust is a magnificent language that comes from Mozilla; it’s targeted at programmers who want

  • high and predictable performance,
  • control over memory layout,
  • good support for concurrency, and
  • safety.

I think the Mozilla Research homepage is probably the best place to start learning about Rust.

To be honest, though, I’m probably flattering myself if I think that this blog post is being read by anyone who doesn’t already know lots about Rust.

One of the key requirements of a language like Rust is that it be embeddable; that is, it should be possible to call Rust code from another language just as it’s possible to call C code from another language.

This is now possible.

To illustrate this, Brian Anderson posted a lovely example of embedding Rust in Ruby. But of course, embedding Rust in Ruby is pretty much exactly the same as embedding Rust in any other language.

Say, for instance, Racket.

So, without further ado, here’s the setup. You just happen to have a small web app written in Racket that performs a Gaussian Blur. You decide to optimize the performance by porting your code to Rust. Then you want to plug your Rust code into your Racket application. Done! Here’s the github repo that contains all of the code.

Let’s see that again in slow motion.

First, here’s the gaussian blur function, written in Racket. We’re going to stick with a grayscale image. It works fine in color, but the code is just that much harder to read.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
;; the gaussian filter used in the racket blur.
;; boosted center value by 1/1000 to make sure that whites stay white.
(define filter '[[0.011 0.084 0.011]
                 [0.084 0.620 0.084]
                 [0.011 0.084 0.011]])

;; racket-blur: blur the image using the gaussian filter
;; number number list-of-bytes -> vector-of-bytes
(define (racket-blur width height data)
  (define data-vec (list->vector data))
  ;; ij->offset : compute the offset of the pixel data within the buffer
  (define (ij->offset i j)
    (+ i (* j width)))
  (define bytes-len (* width height))
  (define new-bytes (make-vector bytes-len 0))
  (define filter-x (length (car filter)))
  (define filter-y (length filter))
  (define offset-x (/ (sub1 filter-x) 2))
  (define offset-y (/ (sub1 filter-y) 2))
  ;; compute the filtered byte array
  (for* ([x width]
         [y height])
    (define new-val
      (for*/fold ([sum 0.0])
        ([dx filter-x]
         [dy filter-y])
        (define sample-x (modulo (+ dx (- x offset-x)) width))
        (define sample-y (modulo (+ dy (- y offset-y)) height))
        (define sample-value (vector-ref data-vec (ij->offset sample-x sample-y)))
        (define weight (list-ref (list-ref filter dy) dx))
        (+ sum (* weight sample-value))))
    (vector-set! new-bytes (ij->offset x y) new-val))
  (vector->list new-bytes))

Suppose we want to rewrite that in Rust. Here’s what it might look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
fn blur_rust(width: uint, height: uint, data: &[u8]) -> ~[u8] {

    let filter = [[0.011, 0.084, 0.011],
                  [0.084, 0.620, 0.084],
                  [0.011, 0.084, 0.011]];

    let mut newdata = ~[];

    for uint::range(0, height) |y| {
        for uint::range(0, width) |x| {
            let mut new_value = 0.0;
            for uint::range(0, filter.len()) |yy| {
                for uint::range(0, filter.len()) |xx| {
                    let x_sample = x - (filter.len() - 1) / 2 + xx;
                    let y_sample = y - (filter.len() - 1) / 2 + yy;
                    let sample_value = data[width * (y_sample % height) + (x_sample % width)];
                    let sample_value = sample_value as float;
                    let weight = filter[yy][xx];
                    new_value += sample_value * weight;
                }
            }
            newdata.push(new_value as u8);
        }
    }

    return newdata;
}

Pretty similar. Of course, it uses curly braces, so it runs about three times faster…

So: what kind of glue code is necessary to link the Rust code to the Racket code? Not a lot. On the Rust side, we need to create a pointer to the C data, then copy the result back into the source buffer when we’re done with the blur:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#[no_mangle]
pub extern fn blur(width: c_uint, height: c_uint, data: *mut u8) {
    let width = width as uint;
    let height = height as uint;

    unsafe {
        do vec::raw::mut_buf_as_slice(data, width * height) |data| {
            let out_data = blur_rust(width, height, data);
            vec::raw::copy_memory(data, out_data, width * height);
        }
    }
}

On the Racket side, it’s just a question of making an ffi call, which is super-concise:

1
2
3
4
5
6
7
8
;; link to the rust library:
(define rust-lib (ffi-lib (build-path here "libblur-68a2c114141ca-0.0")))
(define rust-blur-fun (get-ffi-obj "blur" rust-lib (_fun _uint _uint _cvector -> _void)))

(define (rust-blur width height data)
  (define cvec (list->cvector data _byte))
  (rust-blur-fun width height cvec)
  (cvector->list cvec))

And away you go!

I’ve got this code running live at FIXME. What’s that you say? You can’t seem to find FIXME?

Random Code 9: Perl

:: CodeCritic, Programming Languages

By: John Clements

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#
# Globals:
#

# Regex to match balanced [brackets]. See Friedl's
# "Mastering Regular Expressions", 2nd Ed., pp. 328-331.
my $g_nested_brackets;
$g_nested_brackets = qr{
	(?> 								# Atomic matching
	   [^\[\]]+							# Anything other than brackets
	 | 
	   \[
		 (??{ $g_nested_brackets })		# Recursive set of nested brackets
	   \]
	)*
}x;


# Table of hash values for escaped characters:
my %g_escape_table;
foreach my $char (split //, '\\`*_{}[]()>#+-.!') {
	$g_escape_table{$char} = md5_hex($char);
}


# Global hashes, used by various utility routines
my %g_urls;
my %g_titles;
my %g_html_blocks;

# Used to track when we're inside an ordered or unordered list
# (see _ProcessListItems() for details):
my $g_list_level = 0;


#### Blosxom plug-in interface ##########################################

# Set $g_blosxom_use_meta to 1 to use Blosxom's meta plug-in to determine
# which posts Markdown should process, using a "meta-markup: markdown"
# header. If it's set to 0 (the default), Markdown will process all
# entries.
my $g_blosxom_use_meta = 0;

sub start { 1; }
sub story {
	my($pkg, $path, $filename, $story_ref, $title_ref, $body_ref) = @_;

	if ( (! $g_blosxom_use_meta) or
	     (defined($meta::markup) and ($meta::markup =~ /^\s*markdown\s*$/i))
	     ){
			$$body_ref  = Markdown($$body_ref);
     }
     1;
}

“Project 2 Testing”

:: Programming Languages

By: John Clements

{% gist 2411008 %}

{% gist 2411005 %}

{% gist 2411004 %}

{% gist 2411002 %}

{% gist 2410993 %}

{% gist 2410918 %}

{% gist 2410916 %}

{% gist 2410910 %}

{% gist 2410909 %}

{% gist 2410908 %}

{% gist 2410905 %}

{% gist 2410904 %}

{% gist 2410899 %}

{% gist 2410889 %}

{% gist 2410886 %}

{% gist 2410882 %}

{% gist 2410860 %}

{% gist 2410856 %}

random code 8: urrgggh

:: CodeCritic, Programming Languages

By: John Clements

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
<?php


namespace Symfony\Component\Form\Tests\Extension\Core\Type;

use Symfony\Component\Form\FormError;

class DateTimeTypeTest extends LocalizedTestCase
{
    public function testSubmit_dateTime()
    {
        $form = $this->factory->create('datetime', null, array(
            'data_timezone' => 'UTC',
            'user_timezone' => 'UTC',
            'date_widget' => 'choice',
            'time_widget' => 'choice',
            'input' => 'datetime',
        ));

        $form->bind(array(
            'date' => array(
                'day' => '2',
                'month' => '6',
                'year' => '2010',
            ),
            'time' => array(
                'hour' => '3',
                'minute' => '4',
            ),
        ));

        $dateTime = new \DateTime('2010-06-02 03:04:00 UTC');

        $this->assertDateTimeEquals($dateTime, $form->getData());
    }

    public function testSubmit_string()
    {
        $form = $this->factory->create('datetime', null, array(
            'data_timezone' => 'UTC',
            'user_timezone' => 'UTC',
            'input' => 'string',
            'date_widget' => 'choice',
            'time_widget' => 'choice',
        ));

        $form->bind(array(
            'date' => array(
                'day' => '2',
                'month' => '6',
                'year' => '2010',
            ),
            'time' => array(
                'hour' => '3',
                'minute' => '4',
            ),
        ));

        $this->assertEquals('2010-06-02 03:04:00', $form->getData());
    }
...
}

Random Code 7: sh

:: CodeCritic, Programming Languages

By: John Clements

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# Check for updates on initial load...
if [ "$DISABLE_AUTO_UPDATE" != "true" ]
then
  /usr/bin/env ZSH=$ZSH zsh $ZSH/tools/check_for_upgrade.sh
fi

# Initializes Oh My Zsh

# add a function path
fpath=($ZSH/functions $ZSH/completions $fpath)

# Load all of the config files in ~/oh-my-zsh that end in .zsh
# TIP: Add files you don't want in git to .gitignore
for config_file ($ZSH/lib/*.zsh) source $config_file

# Set ZSH_CUSTOM to the path where your custom config files
# and plugins exists, or else we will use the default custom/
if [[ -z "$ZSH_CUSTOM" ]]; then
    ZSH_CUSTOM="$ZSH/custom"
fi


is_plugin() {
  local base_dir=$1
  local name=$2
  test -f $base_dir/plugins/$name/$name.plugin.zsh \
    || test -f $base_dir/plugins/$name/_$name
}
# Add all defined plugins to fpath. This must be done
# before running compinit.
for plugin ($plugins); do
  if is_plugin $ZSH_CUSTOM $plugin; then
    fpath=($ZSH_CUSTOM/plugins/$plugin $fpath)
  elif is_plugin $ZSH $plugin; then
    fpath=($ZSH/plugins/$plugin $fpath)
  fi
done

# Load and run compinit
autoload -U compinit
compinit -i


# Load all of the plugins that were defined in ~/.zshrc
for plugin ($plugins); do
  if [ -f $ZSH_CUSTOM/plugins/$plugin/$plugin.plugin.zsh ]; then
    source $ZSH_CUSTOM/plugins/$plugin/$plugin.plugin.zsh
  elif [ -f $ZSH/plugins/$plugin/$plugin.plugin.zsh ]; then
    source $ZSH/plugins/$plugin/$plugin.plugin.zsh
  fi
done

# Load all of your custom configurations from custom/
for config_file ($ZSH_CUSTOM/*.zsh) source $config_file

# Load the theme
if [ "$ZSH_THEME" = "random" ]
then
  themes=($ZSH/themes/*zsh-theme)
  N=${#themes[@]}
  ((N=(RANDOM%N)+1))
  RANDOM_THEME=${themes[$N]}
  source "$RANDOM_THEME"
  echo "[oh-my-zsh] Random theme '$RANDOM_THEME' loaded..."
else
  if [ ! "$ZSH_THEME" = ""  ]
  then
    if [ -f "$ZSH/custom/$ZSH_THEME.zsh-theme" ]
    then
      source "$ZSH/custom/$ZSH_THEME.zsh-theme"
    else
      source "$ZSH/themes/$ZSH_THEME.zsh-theme"
    fi
  fi
fi

random code 6: Python

:: CodeCritic, Programming Languages

By: John Clements

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
class SelectDateWidget(Widget):
    """
    A Widget that splits date input into three <select> boxes.

    This also serves as an example of a Widget that has more than one HTML
    element and hence implements value_from_datadict.
    """
    none_value = (0, '---')
    month_field = '%s_month'
    day_field = '%s_day'
    year_field = '%s_year'

    def __init__(self, attrs=None, years=None, required=True):
        # years is an optional list/tuple of years to use in the "year" select box.
        self.attrs = attrs or {}
        self.required = required
        if years:
            self.years = years
        else:
            this_year = datetime.date.today().year
            self.years = range(this_year, this_year+10)

    def render(self, name, value, attrs=None):
        try:
            year_val, month_val, day_val = value.year, value.month, value.day
        except AttributeError:
            year_val = month_val = day_val = None
            if isinstance(value, basestring):
                if settings.USE_L10N:
                    try:
                        input_format = get_format('DATE_INPUT_FORMATS')[0]
                        v = datetime.datetime.strptime(value, input_format)
                        year_val, month_val, day_val = v.year, v.month, v.day
                    except ValueError:
                        pass
                else:
                    match = RE_DATE.match(value)
                    if match:
                        year_val, month_val, day_val = [int(v) for v in match.groups()]
        choices = [(i, i) for i in self.years]
        year_html = self.create_select(name, self.year_field, value, year_val, choices)
        choices = MONTHS.items()
        month_html = self.create_select(name, self.month_field, value, month_val, choices)
        choices = [(i, i) for i in range(1, 32)]
        day_html = self.create_select(name, self.day_field, value, day_val,  choices)

        output = []
        for field in _parse_date_fmt():
            if field == 'year':
                output.append(year_html)
            elif field == 'month':
                output.append(month_html)
            elif field == 'day':
                output.append(day_html)
        return mark_safe(u'\n'.join(output))

Random Code 5: opoo

:: CodeCritic, Programming Languages

By: John Clements

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
  def link
    if f.linked_keg.directory? and f.linked_keg.realpath == f.prefix
      opoo "This keg was marked linked already, continuing anyway"
      # otherwise Keg.link will bail
      f.linked_keg.unlink
    end

    keg = Keg.new(f.prefix)
    keg.link
  rescue Exception => e
    onoe "The linking step did not complete successfully"
    puts "The formula built, but is not symlinked into #{HOMEBREW_PREFIX}"
    puts "You can try again using `brew link #{f.name}'"
    keg.unlink

    ohai e, e.backtrace if ARGV.debug?
    @show_summary_heading = true
  end

  def fix_install_names
    Keg.new(f.prefix).fix_install_names
  rescue Exception => e
    onoe "Failed to fix install names"
    puts "The formula built, but you may encounter issues using it or linking other"
    puts "formula against it."
    ohai e, e.backtrace if ARGV.debug?
    @show_summary_heading = true
  end

  def clean
    require 'cleaner'
    Cleaner.new f
  rescue Exception => e
    opoo "The cleaning step did not complete successfully"
    puts "Still, the installation was successful, so we will link it into your prefix"
    ohai e, e.backtrace if ARGV.debug?
    @show_summary_heading = true
  end

  def pour
    fetched, downloader = f.fetch
    f.verify_download_integrity fetched, f.bottle_sha1, "SHA1"
    HOMEBREW_CELLAR.cd do
      downloader.stage
    end
  end