Proposed code generation system
===============================

Overview
++++++++

The proposed code generation system has the following stages, illustrated in
``/brian2/codegen/temp/example_state_update.py``:

1. Start with an 'abstract' code string which might be generated by a state
   updater, e.g. if the equations were::

        dV/dt = x : volt
        x = -V/tau : volt/second
        tau : second

   then the abstract code would be::
    
        _tmp_V = x
        V += _tmp_V*dt 

2. Translate into intermediate statements, which are language-independent, but
   already do some work::

        x := -V/tau (subexpression)
        _tmp_V := x (constant)
        V += _tmp_V*dt (in-place) 
        
3. Translate statements to a snippet in a specific language, e.g. Python::

        tau = _array_tau
        V = _array_V
        x = -V/tau
        _tmp_V = x
        V += _tmp_V*dt

   C++::
    
        const double tau = _array_tau[_neuron_idx];
        double V = _array_V[_neuron_idx];
        double x = -V/tau;
        const double _tmp_V = x;
        V += _tmp_V*dt;
        _array_V[_neuron_idx] = V;

4. Insert into a template for compilable code, e.g. C::

        for(int _neuron_idx=0; _neuron_idx<_num_neurons; _neuron_idx++)
        {
            const double tau = _array_tau[_neuron_idx];
            double V = _array_V[_neuron_idx];
            double x = -V/tau;
            const double _tmp_V = x;
            V += _tmp_V*dt;
            _array_V[_neuron_idx] = V;
        }
        
   CUDA::
   
        __global__ stateupdate(int _num_neurons, double t, double dt)
        {
            const int _neuron_idx = threadIdx.x+blockIdx.x*blockDim.x;
            if(_neuron_idx>=_num_neurons) return;
            const double tau = _array_tau[_neuron_idx];
            double V = _array_V[_neuron_idx];
            double x = -V/tau;
            const double _tmp_V = x;
            V += _tmp_V*dt;
            _array_V[_neuron_idx] = V;
        }
        
5. Create a :class:`CodeObject`, compile in a given namespace, run.

Note that the code generated by the example is slightly more complicated than
I've shown above, because it has a few more features (described below).

Details
+++++++

Abstract code
-------------

The abstract code is parsed as a sequence of statements of the form::

    variable operator expression
    
At the moment, this abstract code is just given as a multiline string, and
parsing is done by regexp, although this could be changed. The transformation
into the intermediate representation is handled by :func:`make_statements`,
described below.
    
Intermediate representation
---------------------------

The intermediate representation consists of a series of :class:`Statement`
objects, which encode the variable, operator and expression, as well as
additional information. In this representation we have an additional operator
``:=`` (generated automatically from the abstract code) which says that the
statement is defining a new variable (information that is important later in
generating code in particularly C++). We also have additional information,
including the dtype of the variable, whether it is being defined as constant,
whether the operator is in-place or not and whether it is the definition of a
subexpression. This information is used later.

Snippet representation
----------------------

The sequence of statements is converted into a code snippet by the
:class:`Langauge` object, specifically the method
:meth:`Language.translate_statement_sequence`. This
returns code that can be inserted into a template, which is either just a
simple string (in which case it is inserted into the default ``%CODE%`` slot
in the template), or a dict of ``(slot, code)`` pairs, in which case the
different parts of the code are inserted into different slots in the template.

The languages are defined in the ``brian2.codegen.languages`` package, and the
idea is that they should be self-contained so that others can write additional
languages without changing the Brian core.

Code representation
-------------------

This is a multiline string giving an executable section of code, but it has no
namespace attached yet.

:class:`CodeObject`
-------------------

This is an executable object that has the following behaviour:

* Initialise as ``codeobj = CodeObject(code_string)``
* Compile in namesapce with ``codeobj.compile(namespace)``
* Execute with additional variables via ``codeobj(name1=val1, name2=val2)``,
  these variables are added to the namespace (for variables which change during
  a run such as ``t``).

Abstract to intermediate
------------------------

In translation from the abstract to intermediate representations, in function
:func:`make_statements` in ``brian2.codegen.translation``,
we do some inference on the sequence to add some additional information and
possibly additional statements. To do this, we give extra information in the
form of a dict of :class:`Specifier` objects (defined in
``brian2.codegen.specifiers``). This is
a dict of ``(name, spec)`` pairs, where ``spec`` is a :class:`Specifier`.

The specifiers are defined in ``brian2.codegen.specifiers``:

``Value(dtype)``
    The variable is a single scalar value that will be inserted into the
    namespace at runtime

``ArrayVariable(array_name, index, dtype)``
    The variable is a value that comes from an array with the given array name
    and dtype, with the given index variable.

``Index(all=True)``
    An array index, the actual creation of and iterating over index values is
    done by the template. The ``all=True/False`` keyword is used for
    optimisation in vectorised languages, i.e. you do slightly different things
    depending on whether you're operating on the whole vector or a subvector.

``Subexpression(expr)``
    The variable is a common subexpression, these would probably come from the
    Equations objects and be specified explicitly by the user. They're a strong
    indication that something is going to be computed once and used multiple
    times, which could be important for optimisation.

``OutputVariable(dtype)``
    The variable will be created by the sequence of statements and is
    considered as part of the output of the function (e.g. ``_cond`` for
    thresholds).

``Function()``
    The variable is a function name (haven't quite decided how this should be
    handled yet). 

The :func:`make_statements` function works by going through, line by line, the
sequence of statements, and seeing what needs to be done to make it work, and
adding extra information that can be used when translating to a specific
language. More precisely, it does the following:

* For each ``var = expr`` statement, it works out if it is assigning a value to
  an existing variable or creating a new variable, in which case it replaces it
  by 'var := expr'.

* For each ``var := expr`` statement, it works out if the variable is constant
  or not (allowing C++ to add the const keyword), by checking if it is
  subsequently written to or not.

* It creates new variables for the subexpressions if and when they are needed,
  and it checks if they need to be recomputed or not (i.e. if we have an
  expression ``x=y*z`` and we do ``a+=x; y+=1; b+=x`` then we need to translate
  this to: ``x=y*z; a+=x; y+=1; x=y*z; b+=x`` because the ``y+=1`` step
  invalidates the previous ``x=y*z`` computation).

If you run ``translation.py`` you will see this in action in the console output. 

Intermediate to snippet
-----------------------

The function :func:`brian2.codegen.tranlation.translate` just calls
:func:`make_statements` and then calls
:meth:`Language.translate_statement_sequence` on the sequence of statements
produced. This method, defined separately for each language, can make use of
the additional methods :meth:`Language.translate_expression` and
:meth:`Language.translate_statement`. The idea of these two methods is that
they are primarily used for translating the syntax of expressions and
statements from Python format (the input format) to the chosen language. So,
for example for C++ we need to translate ``x**2`` to ``pow(x,2)``. Fortunately,
``sympy`` comes to the rescue and does this for us. Here is the definition
of :meth:`CLanguage.translate_expression` for example::

    def translate_expression(self, expr):
        expr = sympy.sympify(expr)
        return sympy.printing.ccode.CCodePrinter().doprint(expr)
        
At this stage, the extra information available in :class:`Statement` is used,
for example to add the C++ ``const`` specifier. To translate numpy dtypes into
C++ declarations, there is a function
:func:`brian2.codegen.languages.c.c_data_type`.

In addition to the syntax declarations, the
:meth:`Language.translate_statement_sequence` method also handles any
additional declarations of intermediate variables, reading and writing of values
to arrays, and so forth. For example, in C++ we read from an array like so::

    double V = _array_V[_neuron_idx];
    
and write back to like so::

    _array_V[_neuron_idx] = V;

The base class method :meth:`Language.array_read_write` returns a pair of sets
``(read, write)`` giving the names of the variables which need to be read and
those which need to be written to.

In fact, the C++ class also generates some additional pointers using the C++
``restrict`` keyword, so that the actual code generated for the example used
throughout is a pair, first of pointer declarations::

    double * __restrict__ _ptr_array_tau = _array_tau;
    double * __restrict__ _ptr_array_V = _array_V;
    
and then of the inner loop code::

    const double tau = _ptr_array_tau[_neuron_idx];
    double V = _ptr_array_V[_neuron_idx];
    double x = -V/tau;
    const double _tmp_V = x;
    V += _tmp_V*dt;
    _ptr_array_V[_neuron_idx] = V;
    
The restrict keyword tells the C++ compiler that any value accessed by that
pointer will not be accessed by any other pointer, which allows us it to make
some optimisations that are nice for CPU but are extremely important for GPU
where global memory accesses are slow.

Snippet to code
---------------

The final step is to select a template and insert the snippet into the template.
The templates are provided by the ``Language.template_*`` methods::

    def template_iterate_all(self, index, size):
        raise NotImplementedError

    def template_iterate_index_array(self, index, array, size):
        raise NotImplementedError

    def template_state_update(self):
        return self.template_iterate_all('_neuron_idx', '_num_neurons')

    def template_reset(self):
        return self.template_iterate_index_array('_neuron_idx', '_spikes', '_num_spikes')

    def template_threshold(self):
        raise NotImplementedError

    def template_synapses(self):
        raise NotImplementedError 

The generic templates ``iterate_all`` and ``iterate_index_array`` are there for
future flexibility, they provide a template that takes a given index and
iterates either over all values or over a subset of values given by an index
array. At the moment they are just used by ``state_update`` and ``reset``
though. The methods that raise ``NotImplementedError`` need to be defined
for each language.

The insertion into a template is handled by the :meth:`Language.apply_template`
method which in turn uses the :func:`apply_code_template` function defined
in ``brian2.codegen.templating``. Templates consist of a string with
multiple slots or placeholders of the form ``%CODE%`` (the default slot) which
are replaced by values returned by the
:meth:`Language.translate_statement_sequence` method (see above). The
templating functions simply handle these replacements, and correct indentation
of the code (important for Python and nice for readability).

Code to :class:`CodeObject`
---------------------------

The final code is passed to the :class:`CodeObject` which is language-specific
and defined in the file, e.g. ``brian2.codegen.languages.c``, etc.

Example usage
+++++++++++++

This example is a simplified version of
``brian2.codegen.temp.example_state_update``::

    # Not used, but would be the source of the rest
    eqs = '''
    dV/dt = x : volt
    x = -V/tau : volt/second
    tau : second
    '''
    
    language = PythonLanguage()
    
    # This would be the output of the numerical integration step maker, and
    # is the input to code generation
    abstract = '''
    _tmp_V = x
    V += _tmp_V*dt
    '''
    
    # These are the specifiers for the corresponding NeuronGroup
    specifiers = {
        'V':ArrayVariable('_array_V', '_neuron_idx', float64),
        'tau':ArrayVariable('_array_tau', '_neuron_idx', float64),
        'x':Subexpression('-V/tau'),
        'dt':Value(float64),
        '_neuron_idx':Index(all=True),
        }
    
    # Intermediate representation
    intermediate = make_statements(abstract, specifiers, float64)

    # Snippet representation
    snippet = translate(abstract, specifiers, float64, lang)
    
    # Code representation
    code = language.apply_template(snippet, language.template_state_update())
    
    # CodeObject representation
    codeobj = lang.code_object(code)
    
    # Creation of namespace
    V = pylab.rand(N)
    tau = pylab.ones(N)*30*0.001
    dt = 0.001
    namespace = {
        '_array_V': V,
        '_array_tau': tau,
        '_num_neurons': N,
        'dt': dt,
        }
    
    # Compilation in namespace
    codeobj.compile(namespace)
    
    # Run for 100 steps
    for t in pylab.arange(100)*dt:
        codeobj(t=t)

The output of the full example looks like this::

    EQUATIONS:
    
    dV/dt = x : volt
    x = -V/tau : volt/second
    tau : second
    
    ABSTRACT CODE:
    
    _tmp_V = x
    V += _tmp_V*dt
    
    INTERMEDIATE STATEMENTS:
    
    x := -V/tau (subexpression)
    _tmp_V := x (constant)
    V += _tmp_V*dt (in-place)
    
    PythonLanguage
    ==============
    
    tau = _array_tau
    V = _array_V
    x = -V/tau
    _tmp_V = x
    V += _tmp_V*dt
    
    NumexprPythonLanguage
    =====================
    
    tau = _array_tau
    V = _array_V
    x = _numexpr.evaluate("-V/tau")
    _tmp_V = x
    V += _numexpr.evaluate("_tmp_V*dt")
    
    CLanguage
    =========
    
    #define CSR_FLUSH_TO_ZERO         (1 << 15)
    unsigned csr = __builtin_ia32_stmxcsr();
    csr |= CSR_FLUSH_TO_ZERO;
    __builtin_ia32_ldmxcsr(csr);
        
    double * __restrict__ _ptr_array_tau = _array_tau;
    double * __restrict__ _ptr_array_V = _array_V;
    for(int _neuron_idx=0; _neuron_idx<_num_neurons; _neuron_idx++)
    {
        const double tau = _ptr_array_tau[_neuron_idx];
        double V = _ptr_array_V[_neuron_idx];
        double x = -V/tau;
        const double _tmp_V = x;
        V += _tmp_V*dt;
        _ptr_array_V[_neuron_idx] = V;
    }
    
    CUDALanguage
    ============
    
    __global__ stateupdate(int _num_neurons, double t, double dt)
    {
        const int _neuron_idx = threadIdx.x+blockIdx.x*blockDim.x;
        if(_neuron_idx>=_num_neurons) return;
        double * __restrict__ _ptr_array_tau = _array_tau;
        double * __restrict__ _ptr_array_V = _array_V;
        const double tau = _ptr_array_tau[_neuron_idx];
        double V = _ptr_array_V[_neuron_idx];
        double x = -V/tau;
        const double _tmp_V = x;
        V += _tmp_V*dt;
        _ptr_array_V[_neuron_idx] = V;
    }
    
    
    Timings
    =======
    
    Num neurons = 100000
    Duration = 1.0 s
    
    PythonLanguage: 1.15605783463
    NumexprPythonLanguage: 1.53245091438
    CLanguage: 0.414674043655
    CUDALanguage: not implemented
