The BetaML.Nn Module

BetaML.NnModule
BetaML.Nn module

Implement the functionality required to define an artificial Neural Network, train it with data, forecast data and assess its performances.

Common type of layers and optimisation algorithms are already provided, but you can define your own ones subclassing respectively the AbstractLayer and OptimisationAlgorithm abstract types.

The module provide the following types or functions. Use ?[type or function] to access their full signature and detailed documentation:

Model definition:

  • DenseLayer: Classical feed-forward layer with user-defined activation function
  • DenseNoBiasLayer: Classical layer without the bias parameter
  • VectorFunctionLayer: Layer whose activation function run over the ensable of its nodes rather than on each one individually. No learnable weigths on input, optional learnable weigths as parameters of the activation function.
  • ScalarFunctionLayer: Layer whose activation function run over each node individually, like a classic DenseLayer, but with no learnable weigths on input and optional learnable weigths as parameters of the activation function.
  • ReplicatorLayer: Alias for a ScalarFunctionLayer with no learnable parameters and identity as activation function
  • ReshaperLayer: Reshape the output of a layer (or the input data) to the shape needed for the next one
  • PoolingLayer: In the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel. Weightless.
  • ConvLayer: A generic N+1 (channels) dimensional convolutional layer
  • GroupedLayer: To stack several layers into a single layer, e.g. for multi-branches networks
  • NeuralNetworkEstimator: Build the chained network and define a cost function

Each layer can use a default activation function, one of the functions provided in the Utils module (relu, tanh, softmax,...) or one provided by you. BetaML will try to recognise if it is a "known" function for which it sets the exact derivatives, otherwise you can normally provide the layer with it. If the derivative of the activation function is not provided (either manually or automatically), AD will be used and training may be slower, altought this difference tends to vanish with bigger datasets.

You can alternativly implement your own layer defining a new type as subtype of the abstract type AbstractLayer. Each user-implemented layer must define the following methods:

  • A suitable constructor
  • forward(layer,x)
  • backward(layer,x,next_gradient)
  • get_params(layer)
  • get_gradient(layer,x,next_gradient)
  • set_params!(layer,w)
  • size(layer)

Model fitting:

  • fit!(nn,X,Y): fitting function
  • fitting_info(nn): Default callback function during fitting
  • SGD: The classical optimisation algorithm
  • ADAM: A faster moment-based optimisation algorithm

To define your own optimisation algorithm define a subtype of OptimisationAlgorithm and implement the function single_update!(θ,▽;opt_alg) and eventually init_optalg!(⋅) specific for it.

Model predictions and assessment:

  • predict(nn) or predict(nn,X): Return the output given the data

While high-level functions operating on the dataset expect it to be in the standard format (nrecords × ndimensions matrices) it is customary to represent the chain of a neural network as a flow of column vectors, so all low-level operations (operating on a single datapoint) expect both the input and the output as a column vector.

source

Module Index

Detailed API

BetaML.Nn.ADAMType
ADAM(;η, λ, β₁, β₂, ϵ)

The ADAM algorithm, an adaptive moment estimation optimiser.

Fields:

  • η: Learning rate (stepsize, α in the paper), as a function of the current epoch [def: t -> 0.001 (i.e. fixed)]
  • λ: Multiplicative constant to the learning rate [def: 1]
  • β₁: Exponential decay rate for the first moment estimate [range: ∈ [0,1], def: 0.9]
  • β₂: Exponential decay rate for the second moment estimate [range: ∈ [0,1], def: 0.999]
  • ϵ: Epsilon value to avoid division by zero [def: 10^-8]
source
BetaML.Nn.ConvLayerType
struct ConvLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

A generic N+1 (channels) dimensional convolutional layer

EXPERIMENTAL: Still too slow for practical applications

This convolutional layer has two constructors, one with the form ConvLayer(input_size,kernel_size,nchannels_in,nchannels_out), and an alternative one as ConvLayer(input_size_with_channel,kernel_size,nchannels_out). If the input is a vector, use a ReshaperLayer in front.

Fields:

  • input_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)

  • output_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)

  • weight::Array{WET, NDPLUS2} where {NDPLUS2, WET<:Number}: Weight tensor (aka "filter" or "kernel") with respect to the input from previous layer or data (kernelsize array augmented by the nchannelsin and nchannels_out dimensions)

  • usebias::Bool: Wether to use (and learn) a bias weigth [def: true]

  • bias::Vector{WET} where WET<:Number: Bias (nchannels_out array)

  • padding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)

  • padding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)

  • stride::StaticArraysCore.SVector{ND, Int64} where ND: Stride

  • ndims::Int64: Number of dimensions (excluding input and output channels)

  • f::Function: Activation function

  • df::Union{Nothing, Function}: Derivative of the activation function

  • x_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: x ids of the convolution (computed in preprocessing- itself at the beginning oftrain`

  • y_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1: y ids of the convolution (computed in preprocessing- itself at the beginning oftrain`

  • w_ids::Array{StaticArraysCore.SVector{NDPLUS2, Int64}, 1} where NDPLUS2: w ids of the convolution (computed in preprocessing- itself at the beginning oftrain`

  • y_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y

  • y_to_w_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS2}}, 1}, NDPLUS1} where {NDPLUS1, NDPLUS2}: A y-dims array of vectors of corresponding w(s) contributing to the giving y

source
BetaML.Nn.ConvLayerMethod
ConvLayer(
    input_size,
    kernel_size,
    nchannels_in,
    nchannels_out;
    stride,
    rng,
    padding,
    kernel_eltype,
    kernel_init,
    usebias,
    bias_init,
    f,
    df
) -> ConvLayer{_A, _B, _C, typeof(identity), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}

Instantiate a new nD-dimensional, possibly multichannel ConvolutionalLayer

The input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention.

Positional arguments:

  • input_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.
  • kernel_size: Size of the kernel (aka filter or learnable weights) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.
  • nchannels_in: Number of channels in input
  • nchannels_out: Number of channels in output

Keyword arguments:

  • stride: "Steps" to move the convolution with across the various tensor dimensions [def: ones]
  • padding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep the same dimensions in output (with stride==1)]
  • f: Activation function [def: relu]
  • df: Derivative of the activation function [default: try to match a known funcion, AD otherwise. Use nothing to force AD]
  • kernel_eltype: Kernel eltype [def: Float64]
  • kernel_init: Initial weigths with respect to the input [default: Xavier initialisation]. If explicitly provided, it should be a multidimensional array of kernel_size augmented by nchannels_in and nchannels_out dimensions
  • bias_init: Initial weigths with respect to the bias [default: Xavier initialisation]. If given it should be a nchannels_out vector of scalars.
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Xavier initialization is sampled from a Uniform distribution between ⨦ sqrt(6/(prod(input_size)*nchannels_in))
  • to retrieve the output size of the layer, use size(ConvLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]
  • with strides higher than 1, the automatic padding is set to keep outsize = inside/stride
source
BetaML.Nn.ConvLayerMethod
ConvLayer(
    input_size_with_channel,
    kernel_size,
    nchannels_out;
    stride,
    rng,
    padding,
    kernel_eltype,
    kernel_init,
    usebias,
    bias_init,
    f,
    df
) -> ConvLayer{_A, _B, _C, typeof(identity), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}

Alternative constructor for a ConvLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.

For arguments and default values see the documentation of the main constructor.

source
BetaML.Nn.DenseLayerType
struct DenseLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a layer in the network

Fields:

  • w: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)
  • wb: Biases (n)
  • f: Activation function
  • df: Derivative of the activation function
source
BetaML.Nn.DenseLayerMethod
DenseLayer(
    nₗ,
    n;
    rng,
    w_eltype,
    w,
    wb,
    f,
    df
) -> DenseLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}

Instantiate a new DenseLayer

Positional arguments:

  • nₗ: Number of nodes of the previous layer
  • n: Number of nodes

Keyword arguments:

  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (n,nₗ)]
  • wb: Initial weigths with respect to bias [default: Xavier initialisation, dims = (n)]
  • f: Activation function [def: identity]
  • df: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ+n))
  • Specify df=nothing to explicitly use AD
source
BetaML.Nn.DenseNoBiasLayerType
struct DenseNoBiasLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a layer without bias in the network

Fields:

  • w: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)
  • f: Activation function
  • df: Derivative of the activation function
source
BetaML.Nn.DenseNoBiasLayerMethod
DenseNoBiasLayer(
    nₗ,
    n;
    rng,
    w_eltype,
    w,
    f,
    df
) -> DenseNoBiasLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}

Instantiate a new DenseNoBiasLayer

Positional arguments:

  • nₗ: Number of nodes of the previous layer
  • n: Number of nodes

Keyword arguments:

  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]
  • f: Activation function [def: identity]
  • df: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD if f is unknown]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ,n))
source
BetaML.Nn.GroupedLayerType
struct GroupedLayer <: AbstractLayer

Representation of a "group" of layers, each of which operates on different inputs (features) and acting as a single layer in the network.

Fields:

  • layers: The individual layers that compose this grouped layer
source
BetaML.Nn.GroupedLayerMethod
GroupedLayer(layers) -> GroupedLayer

Instantiate a new GroupedLayer, a layer made up of several other layers stacked together in order to cover all the data dimensions but without connect all the inputs to all the outputs like a single DenseLayer would do.

Positional arguments:

  • layers: The individual layers that compose this grouped layer

Notes:

  • can be used to create composable neural networks with multiple branches
  • tested only with 1 dimensional layers. For convolutional networks use ReshaperLayers before and/or after.
source
BetaML.Nn.LearnableType

Learnable(data)

Structure representing the learnable parameters of a layer or its gradient.

The learnable parameters of a layers are given in the form of a N-tuple of Array{Float64,N2} where N2 can change (e.g. we can have a layer with the first parameter being a matrix, and the second one being a scalar). We wrap the tuple on its own structure a bit for some efficiency gain, but above all to define standard mathematic operations on the gradients without doing "type piracy" with respect to Base tuples.

source
BetaML.Nn.NeuralNetworkE_hpType

**`

mutable struct NeuralNetworkE_hp <: BetaMLHyperParametersSet

`**

Hyperparameters for the Feedforward neural network model

Parameters:

  • layers: Array of layer objects [def: nothing, i.e. basic network]. See subtypes(BetaML.AbstractLayer) for supported layers

  • loss: Loss (cost) function [def: squared_cost] It must always assume y and ŷ as (n x d) matrices, eventually using dropdims inside.

  • dloss: Derivative of the loss function [def: dsquared_cost if loss==squared_cost, nothing otherwise, i.e. use the derivative of the squared cost or autodiff]

  • epochs: Number of epochs, i.e. passages trough the whole training sample [def: 200]

  • batch_size: Size of each individual batch [def: 16]

  • opt_alg: The optimisation algorithm to update the gradient at each batch [def: ADAM()]

  • shuffle: Whether to randomly shuffle the data at each iteration (epoch) [def: true]

  • tunemethod: The method - and its parameters - to employ for hyperparameters autotuning. See SuccessiveHalvingSearch for the default method. To implement automatic hyperparameter tuning during the (first) fit! call simply set autotune=true and eventually change the default tunemethod options (including the parameter ranges, the resources to employ and the loss function to adopt).

To know the available layers type subtypes(AbstractLayer)) and then type ?LayerName for information on how to use each layer.

source
BetaML.Nn.NeuralNetworkE_optionsType

NeuralNetworkE_options

A struct defining the options used by the Feedforward neural network model

Parameters:

  • cache: Cache the results of the fitting stage, as to allow predict(mod) [default: true]. Set it to false to save memory for large data.

  • descr: An optional title and/or description for this model

  • verbosity: The verbosity level to be used in training or prediction (see Verbosity) [deafult: STD]

  • cb: A call back function to provide information during training [def: fitting_info

  • autotune: 0ption for hyper-parameters autotuning [def: false, i.e. not autotuning performed]. If activated, autotuning is performed on the first fit!() call. Controll auto-tuning trough the option tunemethod (see the model hyper-parameters)

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

source
BetaML.Nn.NeuralNetworkEstimatorType

NeuralNetworkEstimator

A "feedforward" (but also multi-branch) neural network (supervised).

For the parameters see NeuralNetworkE_hp and for the training options NeuralNetworkE_options (we have a few more options for this specific estimator).

Notes:

  • data must be numerical
  • the label can be a n-records vector or a n-records by n-dimensions matrix, but the result is always a matrix.
    • For one-dimension regressions drop the unnecessary dimension with dropdims(ŷ,dims=2)
    • For classification tasks the columns should normally be interpreted as the probabilities for each categories

Examples:

  • Classification...
julia> using BetaML

julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];

julia> y = ["a","b","b","b","b","a"];

julia> ohmod = OneHotEncoder()
A OneHotEncoder BetaMLModel (unfitted)

julia> y_oh  = fit!(ohmod,y)
6×2 Matrix{Bool}:
 1  0
 0  1
 0  1
 0  1
 0  1
 1  0

julia> layers = [DenseLayer(2,6),DenseLayer(6,2),VectorFunctionLayer(2,f=softmax)];

julia> m      = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=300,verbosity=LOW)
NeuralNetworkEstimator - A Feed-forward neural network (unfitted)

julia> ŷ_prob = fit!(m,X,y_oh)
***
*** Training  for 300 epochs with algorithm ADAM.
Training..       avg ϵ on (Epoch 1 Batch 1):     0.4116936481380642
Training of 300 epoch completed. Final epoch error: 0.44308719831108734.
6×2 Matrix{Float64}:
 0.853198    0.146802
 0.0513715   0.948629
 0.0894273   0.910573
 0.0367079   0.963292
 0.00548038  0.99452
 0.808334    0.191666

julia> ŷ      = inverse_predict(ohmod,ŷ_prob)
6-element Vector{String}:
 "a"
 "b"
 "b"
 "b"
 "b"
 "a"
  • Regression...
julia> using BetaML

julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];

julia> y = 2 .* X[:,1] .- X[:,2] .+ 3;

julia> layers = [DenseLayer(2,6),DenseLayer(6,6),DenseLayer(6,1)];

julia> m      = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=3000,verbosity=LOW)
NeuralNetworkEstimator - A Feed-forward neural network (unfitted)

julia> ŷ      = fit!(m,X,y);
***
*** Training  for 3000 epochs with algorithm ADAM.
Training..       avg ϵ on (Epoch 1 Batch 1):     33.30063874270561
Training of 3000 epoch completed. Final epoch error: 34.61265465430473.

julia> hcat(y,ŷ)
6×2 Matrix{Float64}:
   4.1    4.11015
 -16.5  -16.5329
 -13.8  -13.8381
 -18.4  -18.3876
 -27.2  -27.1667
   2.7    2.70542
source
BetaML.Nn.PoolingLayerType
struct PoolingLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a pooling layer in the network (weightless)

EXPERIMENTAL: Still too slow for practical applications

In the middle between VectorFunctionLayer and ScalarFunctionLayer, it applyes a function to the set of nodes defined in a sliding kernel.

Fields:

  • input_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Input size (including nchannel_in as last dimension)

  • output_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1: Output size (including nchannel_out as last dimension)

  • kernel_size::StaticArraysCore.SVector{NDPLUS2, Int64} where NDPLUS2: kernelsize augmented by the nchannelsin and nchannels_out dimensions

  • padding_start::StaticArraysCore.SVector{ND, Int64} where ND: Padding (initial)

  • padding_end::StaticArraysCore.SVector{ND, Int64} where ND: Padding (ending)

  • stride::StaticArraysCore.SVector{ND, Int64} where ND: Stride

  • ndims::Int64: Number of dimensions (excluding input and output channels)

  • f::Function: Activation function

  • df::Union{Nothing, Function}: Derivative of the activation function

  • y_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1: A y-dims array of vectors of ids of x(s) contributing to the giving y

source
BetaML.Nn.PoolingLayerMethod
PoolingLayer(
    input_size,
    kernel_size,
    nchannels_in;
    stride,
    kernel_eltype,
    padding,
    f,
    df
) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}

Instantiate a new nD-dimensional, possibly multichannel PoolingLayer

The input data is either a column vector (in which case is reshaped) or an array of input_size augmented by the n_channels dimension, the output size depends on the input_size, kernel_size, padding and striding but has always nchannels_out as its last dimention.

Positional arguments:

  • input_size: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.
  • kernel_eltype: Kernel eltype [def: Float64]
  • kernel_size: Size of the kernel (aka filter) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.
  • nchannels_in: Number of channels in input
  • nchannels_out: Number of channels in output

Keyword arguments:

  • stride: "Steps" to move the convolution with across the various tensor dimensions [def: kernel_size, i.e. each X contributes to a single y]
  • padding: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def: nothing, i.e. set the padding required to keep outside = inside / stride ]
  • f: Activation function. It should have a vector as input and produce a scalar as output[def: maximum]
  • df: Derivative (gradient) of the activation function for the various inputs. [default: nothing (i.e. use AD)]

Notes:

  • to retrieve the output size of the layer, use size(PoolLayer[2]). The output size on each dimension d (except the last one that is given by nchannels_out) is given by the following formula (ceiled): output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]
  • differently from a ConvLayer, the pooling applies always on a single channel level, so that the output has always the same number of channels of the input. If you want to reduce the channels number either use a ConvLayer with the desired number of channels in output or use a ReghaperLayer to add a 1-element further dimension that will be treated as "channel" and choose the desided stride for the last pooling dimension (the one that was originally the channel dimension)
source
BetaML.Nn.PoolingLayerMethod
PoolingLayer(
    input_size_with_channel,
    kernel_size;
    stride,
    padding,
    f,
    kernel_eltype,
    df
) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}

Alternative constructor for a PoolingLayer where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2] if one wish.

For arguments and default values see the documentation of the main constructor.

source
BetaML.Nn.ReshaperLayerType
struct ReshaperLayer{NDIN, NDOUT} <: AbstractLayer

Representation of a "reshaper" (weigthless) layer in the network

Reshape the output of a layer (or the input data) to the shape needed for the next one.

Fields:

  • input_size::StaticArraysCore.SVector{NDIN, Int64} where NDIN: Input size

  • output_size::StaticArraysCore.SVector{NDOUT, Int64} where NDOUT: Output size

source
BetaML.Nn.ReshaperLayerType
ReshaperLayer(
    input_size
) -> ReshaperLayer{_A, _B} where {_A, _B}
ReshaperLayer(
    input_size,
    output_size
) -> ReshaperLayer{_A, _B} where {_A, _B}

Instantiate a new ReshaperLayer

Positional arguments:

  • input_size: Shape of the input layer (tuple).
  • output_size: Shape of the input layer (tuple) [def: prod([input_size...])), i.e. reshape to a vector of appropriate lenght].
source
BetaML.Nn.SGDType
SGD(;η=t -> 1/(1+t), λ=2)

Stochastic Gradient Descent algorithm (default)

Fields:

  • η: Learning rate, as a function of the current epoch [def: t -> 1/(1+t)]
  • λ: Multiplicative constant to the learning rate [def: 2]
source
BetaML.Nn.ScalarFunctionLayerType
struct ScalarFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a ScalarFunction layer in the network. ScalarFunctionLayer applies the activation function directly to the output of the previous layer (i.e., without passing for a weigth matrix), but using an optional learnable parameter (an array) used as second argument, similarly to [VectorFunctionLayer(@ref). Differently from VectorFunctionLayer, the function is applied scalarwise to each node.

The number of nodes in input must be set to the same as in the previous layer

Fields:

  • w: Weigths (parameter) array passes as second argument to the activation function (if not empty)
  • n: Number of nodes in output (≡ number of nodes in input )
  • f: Activation function (vector)
  • dfx: Derivative of the (vector) activation function with respect to the layer inputs (x)
  • dfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w)

Notes:

  • The output size of this layer is the same as those of the previous layers.
source
BetaML.Nn.ScalarFunctionLayerMethod
ScalarFunctionLayer(
    nₗ;
    rng,
    wsize,
    w_eltype,
    w,
    f,
    dfx,
    dfw
) -> ScalarFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}

Instantiate a new ScalarFunctionLayer

Positional arguments:

  • nₗ: Number of nodes (must be same as in the previous layer)

Keyword arguments:

  • wsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]
  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]
  • f: Activation function [def: softmax]
  • dfx: Derivative of the activation function with respect to the data [default: try to match with well-known derivatives, resort to AD if f is unknown]
  • dfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • If the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)
  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))
source
BetaML.Nn.VectorFunctionLayerType
struct VectorFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer

Representation of a VectorFunction layer in the network. Vector function layer expects a vector activation function, i.e. a function taking the whole output of the previous layer an input rather than working on a single node as "normal" activation functions would do. Useful for example with the SoftMax function in classification or with the pool1D function to implement a "pool" layer in 1 dimensions. By default it is weightless, i.e. it doesn't apply any transformation to the output coming from the previous layer except the activation function. However, by passing the parameter wsize (a touple or array - tested only 1D) you can pass the learnable parameter to the activation function too. It is your responsability to be sure the activation function accept only X or also this learnable array (as second argument). The number of nodes in input must be set to the same as in the previous layer (and if you are using this for classification, to the number of classes, i.e. the previous layer must be set equal to the number of classes in the predictions).

Fields:

  • w: Weigths (parameter) array passes as second argument to the activation function (if not empty)
  • nₗ: Number of nodes in input (i.e. length of previous layer)
  • n: Number of nodes in output (automatically inferred in the constructor)
  • f: Activation function (vector)
  • dfx: Derivative of the (vector) activation function with respect to the layer inputs (x)
  • dfw: Derivative of the (vector) activation function with respect to the optional learnable weigths (w)

Notes:

  • The output size of this layer is given by the size of the output function,

that not necessarily is the same as the previous layers.

source
BetaML.Nn.VectorFunctionLayerMethod
VectorFunctionLayer(
    nₗ;
    rng,
    wsize,
    w_eltype,
    w,
    f,
    dfx,
    dfw,
    dummyDataToTestOutputSize
) -> VectorFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}

Instantiate a new VectorFunctionLayer

Positional arguments:

  • nₗ: Number of nodes (must be same as in the previous layer)

Keyword arguments:

  • wsize: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]
  • w_eltype: Eltype of the weigths [def: Float64]
  • w: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]
  • f: Activation function [def: softmax]
  • dfx: Derivative of the activation function with respect to the data

[default: try to match with well-known derivatives, resort to AD if f is unknown]

  • dfw: Derivative of the activation function with respect to the learnable parameter [default: nothing (i.e. use AD)]
  • dummyDataToTestOutputSize: Dummy data to test the output size [def:

ones(nₗ)]

  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • If the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)
  • To avoid recomputing the activation function just to determine its output size, we compute the output size once here in the layer constructor by calling the activation function with dummyDataToTestOutputSize. Feel free to change it if it doesn't match with the activation function you are setting
  • Xavier initialization = rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))
source
Base.sizeMethod
size(layer)

Get the size of the layers in terms of (size in input, size in output) - both as tuples

Notes:

  • You need to use import Base.size before defining this function for your layer
source
Base.sizeMethod
size(layer::ConvLayer) -> Tuple{Tuple, Tuple}

Get the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension

source
Base.sizeMethod
size(
    layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET} where {TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number}
) -> Tuple{Tuple, Tuple}

Get the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension

source
BetaML.Nn.ReplicatorLayerMethod
ReplicatorLayer(
    n
) -> ScalarFunctionLayer{_A, typeof(identity), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}

Create a weigthless layer whose output is equal to the input.

Fields:

  • n: Number of nodes in output (≡ number of nodes in input )

Notes:

  • The output size of this layer is the same as those of the previous layers.
  • This is just an alias for a ScalarFunctionLayer with no weigths and identity function.
source
BetaML.Nn.backwardMethod
backward(layer,x,next_gradient)

Compute backpropagation for this layer with respect to its inputs

Parameters:

  • layer: Worker layer
  • x: Input to the layer
  • next_gradient: Derivative of the overal loss with respect to the input of the next layer (output of this layer)

Return:

  • The evaluated gradient of the loss with respect to this layer inputs
source
BetaML.Nn.fitting_infoMethod

fittinginfo(nn,xbatch,ybatch,x,y;n,batchsize,epochs,epochsran,verbosity,nepoch,n_batch)

Default callback funtion to display information during training, depending on the verbosity level

Parameters:

  • nn: Worker network
  • xbatch: Batch input to the network (batch_size,din)
  • ybatch: Batch label input (batch_size,dout)
  • x: Full input to the network (n_records,din)
  • y: Full label input (n_records,dout)
  • n: Size of the full training set
  • n_batches : Number of baches per epoch
  • epochs: Number of epochs defined for the training
  • epochs_ran: Number of epochs already ran in previous training sessions
  • verbosity: Verbosity level defined for the training (NONE,LOW,STD,HIGH,FULL)
  • n_epoch: Counter of the current epoch
  • n_batch: Counter of the current batch

#Notes:

  • Reporting of the error (loss of the network) is expensive. Use verbosity=NONE for better performances
source
BetaML.Nn.forwardMethod
forward(layer,x)

Predict the output of the layer given the input

Parameters:

  • layer: Worker layer
  • x: Input to the layer

Return:

  • An Array{T,1} of the prediction (even for a scalar)
source
BetaML.Nn.forwardMethod
forward(
    layer::ConvLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},
    x
) -> Any

Compute forward pass of a ConvLayer

source
BetaML.Nn.forwardMethod
forward(
    layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},
    x
) -> Any

Compute forward pass of a ConvLayer

source
BetaML.Nn.get_gradientMethod
get_gradient(layer,x,next_gradient)

Compute backpropagation for this layer with respect to the layer weigths

Parameters:

  • layer: Worker layer
  • x: Input to the layer
  • next_gradient: Derivative of the overaall loss with respect to the input of the next layer (output of this layer)

Return:

  • The evaluated gradient of the loss with respect to this layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_params() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.
source
BetaML.Nn.get_gradientMethod

get_gradient(nn,x,y)

Low level function that retrieve the current gradient of the weigthts (i.e. derivative of the cost with respect to the weigths). Unexported in BetaML >= v0.9

Parameters:

  • nn: Worker network
  • x: Input to the network (d,1)
  • y: Label input (d,1)

#Notes:

  • The output is a vector of tuples of each layer's input weigths and bias weigths
source
BetaML.Nn.get_paramsMethod
get_params(layer)

Get the layers current value of its trainable parameters

Parameters:

  • layer: Worker layer

Return:

  • The current value of the layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the get_gradient() and set_params() functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in its Learnable type.
source
BetaML.Nn.get_paramsMethod

get_params(nn)

Retrieve current weigthts

Parameters:

  • nn: Worker network

Notes:

  • The output is a vector of tuples of each layer's input weigths and bias weigths
source
BetaML.Nn.init_optalg!Method
init_optalg!(opt_alg::ADAM;θ,batch_size,x,y,rng)

Initialize the ADAM algorithm with the parameters m and v as zeros and check parameter bounds

source
BetaML.Nn.init_optalg!Method

initoptalg!(optalg;θ,batch_size,x,y)

Initialize the optimisation algorithm

Parameters:

  • opt_alg: The Optimisation algorithm to use
  • θ: Current parameters
  • batch_size: The size of the batch
  • x: The training (input) data
  • y: The training "labels" to match
  • rng: Random Number Generator (see FIXEDSEED) [deafult: Random.GLOBAL_RNG]

Notes:

  • Only a few optimizers need this function and consequently ovverride it. By default it does nothing, so if you want write your own optimizer and don't need to initialise it, you don't have to override this method
source
BetaML.Nn.preprocess!Method
preprocess!(layer::AbstractLayer)

Preprocess the layer with information known at layer creation (i.e. no data info used)

This function is used for some layers to cache some computation that doesn't require the data and it is called at the beginning of fit!. For example, it is used in ConvLayer to store the ids of the convolution.

Notes:

  • as it doesn't depend on data, it is not reset by reset!
source
BetaML.Nn.set_params!Method
set_params!(layer,w)

Set the trainable parameters of the layer with the given values

Parameters:

  • layer: Worker layer
  • w: The new parameters to set (Learnable)

Notes:

  • The format of the tuple wrapped by Learnable must be consistent with those of the get_params() and get_gradient() functions.
source
BetaML.Nn.set_params!Method

set_params!(nn,w)

Update weigths of the network

Parameters:

  • nn: Worker network
  • w: The new weights to set
source
BetaML.Nn.single_update!Method

singleupdate!(θ,▽;nepoch,nbatch,batchsize,xbatch,ybatch,opt_alg)

Perform the parameters update based on the average batch gradient.

Parameters:

  • θ: Current parameters
  • : Average gradient of the batch
  • n_epoch: Count of current epoch
  • n_batch: Count of current batch
  • n_batches: Number of batches per epoch
  • xbatch: Data associated to the current batch
  • ybatch: Labels associated to the current batch
  • opt_alg: The Optimisation algorithm to use for the update

Notes:

  • This function is overridden so that each optimisation algorithm implement their

own version

  • Most parameters are not used by any optimisation algorithm. They are provided

to support the largest possible class of optimisation algorithms

  • Some optimisation algorithms may change their internal structure in this function
source