It has been about 4 months since Google and Julia org kindly accepted my first GSoC proposal, and the exciting program is ending. My project is focused on improving the training process of Flux.jl, but since that don't fill 3 months full-time work, I also made some enhancements for Knet.jl in the last a few weeks.
Flux is a julia ML library under active development. At the time when I start participating this program, Flux compiles
arbitrary julia function wraped by a macro @net
to a computation graph which can then run on MXNet or TensorFlow
backend. However a few weeks ago, it switched to a custom auto diff implementation and runs on
CUDANative. The code base changed like a totally different project, but the
idea is the same: you just write the forward pass of your model in plain julia and get the ability to train it on GPUs
for free.
My main contributions are:
optimizers: Flux default to the basic SGD to train a model, which performs
not very well for some models. I implemented several popular optimizers including AdaGrad, RMSProp, Adam, etc. The code
is orgnized by components, so one can easily compose decay
or momentum
with custom components to make a new optimizer.
batch training: Flux is originaly designed to handle single instance and
batch seamlessly, however the implementation is not complete. I added a Batched
type to iter over data in batch and
fix several places to make batched input works in both backends.
training julia model: Flux is aimed to compile and run models in backends,
but it is also useful to test the training process in pure julia. I implemented the back!
method for primitive layers
so we can train a Chain
of primitive layers in pure julia.
training on TensorFlow backend: When I first read Flux, it use the native training function provided by TensorFlow.jl. Later, it swiched to a custom implementation to keep unified with MXNet backend, but unfortunately, the implementation is broken. I fixed and tested the training on TensorFlow backend
callback with throttle
: Previously Flux use a macro to call the callback
function during training. I implemented the throttle
function from underscore.js so things can be less magic.
Most of them are already merged. However, since Flux is switching to a new archtecture, some of these changes are outdated and some are broken. They need to be rebased when Flux become relatively stable again.
Knet is another ML library which is built from scratch. It contains an AD implementation seperated as
AutoGrad.jl, an KArray
type that runs on GPU, and some neural network
functions. Knet records the computation graph dynamically, so the user can freely run a model contains branches or loops
without unrolling or something like that.
My main contributions are:
support Julia v0.6: One of the biggest breaking change of Julia
v0.6 is the changed semantic of "dot operations". A series of dot oprerations will be fused to a single broadcast
call
by the julia compiler, which is conflict with current AutoGrad implementation. I implemented a little trick to circumvent
the fusing using the dispatch mechanism so it pass the test on Julia v0.6.
modular interface: The primary API of Knet is grad
, which returns
a function that calculates the gradients of the first argument of a function. This means one need to put all trainable
parameters together in a data structure and pass it to the forward function. For models involves many layers, this is
annoying and error prone. I implemented a modular interface like Flux or PyTorch, where each layer keeps and update their
parameters themselves. I also provided some common layers, which can be used as building blocks of more complicated models.
This summer I learnt and practised a lot of things of Julia and neural networks, with the stipends from Google. I really thank my mentor Mike J Innes who keep indicating the right direction to go and helping out when I get stucked.