This was the documentation for a project I completed as part of my compilers class. It’s a language to define neural networks, which is then optimized and compiled into efficient executable C++ code. The code of the compiler needs some refining, and will be released shortly. Also, note that the entire project was completed in around 48 hours, so forgive any errors and lack of foresight.
Input to the compiler is a .nnc
file. The file is divided into two sections:
Each property of the model is preceded by ~
. Currently the following
properties are available:
Example of model properties:
~NAME: example_neural_network
~DTYPE: float32
~BATCH_SIZE: 64
~LOSS: crossentropy
~OPTIMIZER: SGD(0.1)
With the propeties of the model finished, we need to now define the model.
Every tensor in the definition has @
before the name, and every tensor has
BATCH_SIZE
as the 0-th dimension. All other variables are considered scalars
if they don’t have @
preceding it.
Only single input and single output networks are supported.
NCC has some modules defined that can directly emit machine code:
Example of model definition:
@A: INPUT({28*28})
@B: DENSE(@A, (100))
@C: ACTIVATION(@B, relu)
@R: DROPOUT(@C, d)
@D: DENSE(@R, (10))
@E: ACTIVATION(@D, softmax)
OUTPUT(@E)
All mathematical operations that need to be evaluated prior to compilation
can be wrapped in {mathematical_op}
. You can assign values to scalar in
preprocessors.
Example:
{a = 12}
{b = a + 1}
@A: DENSE(@B, ({a * b})
We can define a reusable module as a composition of primitive module as a single block in the following way:
block BLOCK_NAME(@tensor1, @tensor2, param1, param2) [
...
spit @OUT
]
where spit
is used to define the output of a block
. A block can only
spit a single tensor. A block can take any number of tensors and scalars
as params. No preprocessor operations are allowed in a block.
!
is used for single line comments.
No multiline comments are supported.
You can compile a network using the following command:
nncgo filename.nnc
This generates two files: filename.h
and filename.so
.
filename.h
will have a class called the same as given in ~NAME
property.
You can instantiate an object of a class which creates the weight matrices
required. If filename.h5
is in the same directory, the weights are
loaded from a file.
The class has the following methods:
forward()
minimize()
finalize()
Takes a tensor as input, and returns a tensor as a output.
Takes a tensor as input, and returns the loss as output. Performs one step of optimization.
Writes the weight matrices into filename.h5
, which can be loaded up for
later.
The .nnc
file goes through multiple phases:
Uses inline python to compute all the preprocessor operations.
Expands each block
call into a pure module definition. A pure module definition
uses only primitive blocks without any blocks
or preprocessors
. This is written
to a .nccpure
file.
We use flex
and bison
to convert a .nccpure
into a DAG where each
edge is an operation, and each node is the intermediate tensor result.
We optimize the graph by fusing operations that are faster, and we remove dead end operations that aren’t used for the calculation of output tensor. Example: consecutive matmuls, consecutive transposes, matmul+transpose, etc.
We generate C++ code using the optimized graph as well as model properties. We make calls to BLAS functions for various primitive modules. The various functions are also generated along with functionality to load and store weights.
The generated C++ file is compiled using LLVM (clang) into the final .h
and
.so
files.
.nnc
to .nncpure
convertion is done using python.
.nncpure
to machine code is done using C++ along with flex and bison.
Most of the BLAS operations are self-implementated, but it can be easily replaced with faster alternatives such as Intel MKL or OpenBLAS. We can extend this further to use CUDA/CuDNN or OpenCL.
Written on April 5th, 2019 by Dheeraj R. Reddy