Elyanah: Data Tools for Elixir
Contributed by Tom Welsh. Tom is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his capstone project (due on the 12th week of the program).
Building a neural network in Elixir was fun, but it wasn't incredibly useful. My proof-of-concept wasn't really flexible enough to be useful in real-world applications. I decided, instead, to start at the bottom, building the basic mathematical foundations for what I hope will eventually become a a useful library for data analysis in Elixir. I've named my future library Elyanah
, after my friend's new daughter, and the first module of it is Elyanah.Numeric
.
I chose to begin with basic linear algebra operation on arrays and matrices. I felt that it fit more with my understanding of the Elixir approach (which is focused on transforming data) to work on standard Elixir Lists (for arrays) and Lists of Lists (for matrices), rather than defining new structs.
My first function was the array dot product:
def dot(a, b, dims \\ :trunc) do zip(a, b, dims) |> Enum.reduce(0, fn {a,b}, acc -> acc + a * b end) end
The zip
function called here is not the standard Enum.zip
or Stream.zip
. I wanted to support some form of NumPy-style data broadcasting, at least optionally. This functionality is supported by my zip function. When dims
is set to :cycle
the data broadcasts:
def zip(a, b, dims \\ :trunc) def zip(a, b, :trunc), do: Stream.zip(a,b) def zip(a, b, _) when length(a) == length(b), do: Stream.zip(a,b) def zip(a, b, :strict) when length(a) != length(b) do raise ArithmeticError, message: "Array dimensions do not match." end def zip(a, b, :cycle) when length(a) > length(b) do Stream.zip(a, Stream.cycle(b)) end def zip(a, b, :cycle) when length(a) < length(b) do Stream.zip(Stream.cycle(a), b) end
I also implemented element-wise addition, subtraction, multiplication, and division on arrays. The documentation is available here.
Next, I moved on to matrix methods. Here, for example, is multiplication, which leverages the dot product method we saw earlier. It handles multiplication of matrices by other matrices, but also by arrays and scalars.
def multiply([[_|_]|_] = a, [[_|_]=h|_] = b) do for aa <- a, bb Enum.chunk(length(h)) end def multiply([_|_] = a, [[_|_]|_] = b), do: multiply([a], b) def multiply([[_|_]|_] = a, [_|_] = b), do: multiply(a, transpose([b])) def multiply(a, [[_|_]|_] = b) do Enum.map(b, &(Array.multiply(a, &1))) end def multiply([[_|_]|_] = a, b), do: multiply(b, a)
I implemented many more matrix methods. The full list is available in the documentation.
To make these really convenient to use, I implemented several infix operators. The implementations are designed to match on matrices and arrays and call the appropriate methods, while otherwise falling back into their default implementations. Since the .
operator in Elixir is a special form that cannot have its behavior overridden, for the dot product, I had to use the <|>
operator, which is part of a set of infix operators in Elixir that have been left open for custom implementations.
The infix operators allow us to do things like this:
iex> use Elyanah.Numeric nil iex> [[1,2],[3,4],[5,6]] * [[1,2,3],[4,5,6]] [[9, 12, 15], [19, 26, 33], [29, 40, 51]]
The implementation of the *
operator is as follows:
def ([[_|_]|_] = a) * b, do: Matrix.multiply(a, b) def a * ([[_|_]|_] = b), do: Matrix.multiply(a, b) def ([_|_] = a) * b, do: Array.multiply(a, b) def a * ([_|_] = b), do: Array.multiply(a, b) def a * b, do: Kernel.*(a,b)
When defining these operators, we must make sure the built-in versions are not loaded as well. This is the method that gets invoked when we use
the Elyanah.Numeric
module:
defmacro __using__(_opts) do quote do import Kernel, except: [*: 2, +: 2, -: 2, /: 2] import Elyanah.Numeric end end
The full list of infix operators implemented for matrices and arrays is available here.
While there's certainly many more useful linear algebra operations that can be defined, my next goal is to implement some sort of DataFrame module.
Links to documentation and source code can be found on the Elyanah homepage.