Taking a rest from my AU prep, I headed across to Zurich last night for an F# meetup focused on machine learning. Primarily because I’m interested in machine learning as a field but also because it seemed a good opportunity to dust off my F# skills.
It was interesting to be on the train when the news from the first day of Microsoft’s Connect(); event hit the airwaves: the main headline being that .NET is going open source and cross-platform. Yes, folks, it’s actually happening: .NET 5 is going to be supported on Linux and OS X. And .NET is already on GitHub with the first pull request already approved. It’s not clear what exactly the impact of this news is with respect to desktop software on the Mac: the initial target for this is .NET Core – the server-targeting subset of .NET – but I have to see this as a great thing for the .NET community, however it plays out.
So, back to last night’s F# session, which was of course with a room full of people who are used to working with an open source language: F# went open source a couple of years ago. The session was run by Mathias Brandewinder, who is holding this same “coding dojo” at various locations around Europe. Mathias is originally from France but is now based in San Francisco. He’s clearly passionate about using F# for machine learning applications and engaging with the F# community as a whole.
The dojo was based around a programming challenge from Kaggle.com: to write a hand-written digit recognizer that takes a sequence of greyscale bitmaps (stored in a CSV file as a series of per-pixel integers) and attempts to classify them based on some training data (a subset of the data provided on Kaggle.com, I should add). We were sorted into groups, although the members of our group ended up deciding to attack the problem independently (Daniel, the person sitting to my left in the below photo, is a professional F# programmer… he was already porting his code to be parallelized on the GPU by the time I’d managed to finish the exercise).
Here’s the F# code I managed to come up with for the basic challenge (after a bit of tidy up).
// Functions to go from an array of comma-separated strings
// to a list of tuples of the classified digit and an array
// integers representing pixels
let split a = List.map (fun (s:string) -> s.Split(',')) a
let ints = Array.map Int32.Parse
let tuple a = Seq.head a, Seq.skip 1 a |> Seq.toArray
// Classify using the nearest-neighbour algorithm, checking
// the Euclidean distance between the respective pixels in the
// images being compared.
let eucDist p1 p2 = p1-p2 |> fun x -> x*x
let dist a b = Array.map2 eucDist a b |> Array.sum
let classify a r = r |> List.minBy (fun t -> dist (snd t) a) |> fst
// Read in a CSV file from the specified path and create
// a list of tuples containing the classified digit and an
// array of integers for the pixels
let read path =
File.ReadAllLines(path) |> Array.toList |> split
|> List.tail |> List.map ints |> List.map tuple
let tpath = "Z:\\Data\\FSharp dojo\\trainingsample.csv"
let vpath = "Z:\\Data\\FSharp dojo\\validationsample.csv"
// Read in the training data and store it for later use
let tdata = read tpath
// Read in the validation sample and print the percentage of
// cases that are correctly classified
|> List.averageBy (fun x ->
if (classify (snd x) tdata = fst x) then 1. else 0.)
|> (*) 100. |> printfn "%g%% of entries classified"
As suggested by Mathias, the code implements the “nearest neighbor” algorithm (i.e. the k-nearest neighbors algorithm where k == 1). This morning I went ahead and coded up a version of the classify function that allows the k nearest neighbours to vote on the result, too…
let classify2 a r =
r |> List.sortBy (fun t -> dist (snd t) a)
|> Seq.take 5 |> Seq.countBy id
|> Seq.head |> fst |> fst
… but it didn’t actually change anything: both versions returned the same result (which is the one expected, unless you start to implement more advanced techniques to recognise off-centre digits, etc.).
94.4% of entries classified
It was a really fun introduction to solving machine learning problems using F#. This is a field that’s clearly relevant when users interact with complex systems – including design tools – so I’m happy to have taken at least a baby step towards some real understanding of the domain.
Photo copyright Mathias Brandewinder.