The BetaML.Nn Module
BetaML.Nn
— ModuleBetaML.Nn module
Implement the functionality required to define an artificial Neural Network, train it with data, forecast data and assess its performances.
Common type of layers and optimisation algorithms are already provided, but you can define your own ones subclassing respectively the AbstractLayer
and OptimisationAlgorithm
abstract types.
The module provide the following types or functions. Use ?[type or function]
to access their full signature and detailed documentation:
Model definition:
DenseLayer
: Classical feed-forward layer with user-defined activation functionDenseNoBiasLayer
: Classical layer without the bias parameterVectorFunctionLayer
: Layer whose activation function run over the ensable of its nodes rather than on each one individually. No learnable weigths on input, optional learnable weigths as parameters of the activation function.ScalarFunctionLayer
: Layer whose activation function run over each node individually, like a classicDenseLayer
, but with no learnable weigths on input and optional learnable weigths as parameters of the activation function.ReplicatorLayer
: Alias for aScalarFunctionLayer
with no learnable parameters and identity as activation functionReshaperLayer
: Reshape the output of a layer (or the input data) to the shape needed for the next onePoolingLayer
: In the middle betweenVectorFunctionLayer
andScalarFunctionLayer
, it applyes a function to the set of nodes defined in a sliding kernel. Weightless.ConvLayer
: A generic N+1 (channels) dimensional convolutional layerGroupedLayer
: To stack several layers into a single layer, e.g. for multi-branches networksNeuralNetworkEstimator
: Build the chained network and define a cost function
Each layer can use a default activation function, one of the functions provided in the Utils
module (relu
, tanh
, softmax
,...) or one provided by you. BetaML will try to recognise if it is a "known" function for which it sets the exact derivatives, otherwise you can normally provide the layer with it. If the derivative of the activation function is not provided (either manually or automatically), AD will be used and training may be slower, altought this difference tends to vanish with bigger datasets.
You can alternativly implement your own layer defining a new type as subtype of the abstract type AbstractLayer
. Each user-implemented layer must define the following methods:
- A suitable constructor
forward(layer,x)
backward(layer,x,next_gradient)
get_params(layer)
get_gradient(layer,x,next_gradient)
set_params!(layer,w)
size(layer)
Model fitting:
fit!(nn,X,Y)
: fitting functionfitting_info(nn)
: Default callback function during fittingSGD
: The classical optimisation algorithmADAM
: A faster moment-based optimisation algorithm
To define your own optimisation algorithm define a subtype of OptimisationAlgorithm
and implement the function single_update!(θ,▽;opt_alg)
and eventually init_optalg!(⋅)
specific for it.
Model predictions and assessment:
predict(nn)
orpredict(nn,X)
: Return the output given the data
While high-level functions operating on the dataset expect it to be in the standard format (nrecords × ndimensions matrices) it is customary to represent the chain of a neural network as a flow of column vectors, so all low-level operations (operating on a single datapoint) expect both the input and the output as a column vector.
Module Index
BetaML.Nn.ReplicatorLayer
BetaML.Nn.backward
BetaML.Nn.fitting_info
BetaML.Nn.forward
BetaML.Nn.forward
BetaML.Nn.forward
BetaML.Nn.get_gradient
BetaML.Nn.get_gradient
BetaML.Nn.get_params
BetaML.Nn.get_params
BetaML.Nn.init_optalg!
BetaML.Nn.init_optalg!
BetaML.Nn.preprocess!
BetaML.Nn.set_params!
BetaML.Nn.set_params!
BetaML.Nn.single_update!
BetaML.Nn.ADAM
BetaML.Nn.ConvLayer
BetaML.Nn.ConvLayer
BetaML.Nn.ConvLayer
BetaML.Nn.DenseLayer
BetaML.Nn.DenseLayer
BetaML.Nn.DenseNoBiasLayer
BetaML.Nn.DenseNoBiasLayer
BetaML.Nn.GroupedLayer
BetaML.Nn.GroupedLayer
BetaML.Nn.Learnable
BetaML.Nn.NeuralNetworkE_hp
BetaML.Nn.NeuralNetworkE_options
BetaML.Nn.NeuralNetworkEstimator
BetaML.Nn.PoolingLayer
BetaML.Nn.PoolingLayer
BetaML.Nn.PoolingLayer
BetaML.Nn.ReshaperLayer
BetaML.Nn.ReshaperLayer
BetaML.Nn.SGD
BetaML.Nn.ScalarFunctionLayer
BetaML.Nn.ScalarFunctionLayer
BetaML.Nn.VectorFunctionLayer
BetaML.Nn.VectorFunctionLayer
Detailed API
BetaML.Nn.ADAM
— TypeADAM(;η, λ, β₁, β₂, ϵ)
The ADAM algorithm, an adaptive moment estimation optimiser.
Fields:
η
: Learning rate (stepsize, α in the paper), as a function of the current epoch [def: t -> 0.001 (i.e. fixed)]λ
: Multiplicative constant to the learning rate [def: 1]β₁
: Exponential decay rate for the first moment estimate [range: ∈ [0,1], def: 0.9]β₂
: Exponential decay rate for the second moment estimate [range: ∈ [0,1], def: 0.999]ϵ
: Epsilon value to avoid division by zero [def: 10^-8]
BetaML.Nn.ConvLayer
— Typestruct ConvLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer
A generic N+1 (channels) dimensional convolutional layer
EXPERIMENTAL: Still too slow for practical applications
This convolutional layer has two constructors, one with the form ConvLayer(input_size,kernel_size,nchannels_in,nchannels_out)
, and an alternative one as ConvLayer(input_size_with_channel,kernel_size,nchannels_out)
. If the input is a vector, use a ReshaperLayer
in front.
Fields:
input_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1
: Input size (including nchannel_in as last dimension)output_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1
: Output size (including nchannel_out as last dimension)weight::Array{WET, NDPLUS2} where {NDPLUS2, WET<:Number}
: Weight tensor (aka "filter" or "kernel") with respect to the input from previous layer or data (kernelsize array augmented by the nchannelsin and nchannels_out dimensions)usebias::Bool
: Wether to use (and learn) a bias weigth [def: true]bias::Vector{WET} where WET<:Number
: Bias (nchannels_out array)padding_start::StaticArraysCore.SVector{ND, Int64} where ND
: Padding (initial)padding_end::StaticArraysCore.SVector{ND, Int64} where ND
: Padding (ending)stride::StaticArraysCore.SVector{ND, Int64} where ND
: Stridendims::Int64
: Number of dimensions (excluding input and output channels)f::Function
: Activation functiondf::Union{Nothing, Function}
: Derivative of the activation functionx_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1
: x ids of the convolution (computed inpreprocessing
- itself at the beginning of
train`y_ids::Array{StaticArraysCore.SVector{NDPLUS1, Int64}, 1} where NDPLUS1
: y ids of the convolution (computed inpreprocessing
- itself at the beginning of
train`w_ids::Array{StaticArraysCore.SVector{NDPLUS2, Int64}, 1} where NDPLUS2
: w ids of the convolution (computed inpreprocessing
- itself at the beginning of
train`y_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1
: A y-dims array of vectors of ids of x(s) contributing to the giving yy_to_w_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS2}}, 1}, NDPLUS1} where {NDPLUS1, NDPLUS2}
: A y-dims array of vectors of corresponding w(s) contributing to the giving y
BetaML.Nn.ConvLayer
— MethodConvLayer(
input_size,
kernel_size,
nchannels_in,
nchannels_out;
stride,
rng,
padding,
kernel_eltype,
kernel_init,
usebias,
bias_init,
f,
df
) -> ConvLayer{_A, _B, _C, typeof(identity), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}
Instantiate a new nD-dimensional, possibly multichannel ConvolutionalLayer
The input data is either a column vector (in which case is reshaped) or an array of input_size
augmented by the n_channels
dimension, the output size depends on the input_size
, kernel_size
, padding
and striding
but has always nchannels_out
as its last dimention.
Positional arguments:
input_size
: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.kernel_size
: Size of the kernel (aka filter or learnable weights) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.nchannels_in
: Number of channels in inputnchannels_out
: Number of channels in output
Keyword arguments:
stride
: "Steps" to move the convolution with across the various tensor dimensions [def:ones
]padding
: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def:nothing
, i.e. set the padding required to keep the same dimensions in output (with stride==1)]f
: Activation function [def:relu
]df
: Derivative of the activation function [default: try to match a known funcion, AD otherwise. Usenothing
to force AD]kernel_eltype
: Kernel eltype [def:Float64
]kernel_init
: Initial weigths with respect to the input [default: Xavier initialisation]. If explicitly provided, it should be a multidimensional array ofkernel_size
augmented bynchannels_in
andnchannels_out
dimensionsbias_init
: Initial weigths with respect to the bias [default: Xavier initialisation]. If given it should be anchannels_out
vector of scalars.rng
: Random Number Generator (seeFIXEDSEED
) [deafult:Random.GLOBAL_RNG
]
Notes:
- Xavier initialization is sampled from a
Uniform
distribution between⨦ sqrt(6/(prod(input_size)*nchannels_in))
- to retrieve the output size of the layer, use
size(ConvLayer[2])
. The output size on each dimension d (except the last one that is given bynchannels_out
) is given by the following formula (ceiled):output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]
- with strides higher than 1, the automatic padding is set to keep outsize = inside/stride
BetaML.Nn.ConvLayer
— MethodConvLayer(
input_size_with_channel,
kernel_size,
nchannels_out;
stride,
rng,
padding,
kernel_eltype,
kernel_init,
usebias,
bias_init,
f,
df
) -> ConvLayer{_A, _B, _C, typeof(identity), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}
Alternative constructor for a ConvLayer
where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2]
if one wish.
For arguments and default values see the documentation of the main constructor.
BetaML.Nn.DenseLayer
— Typestruct DenseLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer
Representation of a layer in the network
Fields:
w
: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)wb
: Biases (n)f
: Activation functiondf
: Derivative of the activation function
BetaML.Nn.DenseLayer
— MethodDenseLayer(
nₗ,
n;
rng,
w_eltype,
w,
wb,
f,
df
) -> DenseLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}
Instantiate a new DenseLayer
Positional arguments:
nₗ
: Number of nodes of the previous layern
: Number of nodes
Keyword arguments:
w_eltype
: Eltype of the weigths [def:Float64
]w
: Initial weigths with respect to input [default: Xavier initialisation, dims = (n,nₗ)]wb
: Initial weigths with respect to bias [default: Xavier initialisation, dims = (n)]f
: Activation function [def:identity
]df
: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD iff
is unknown]rng
: Random Number Generator (seeFIXEDSEED
) [deafult:Random.GLOBAL_RNG
]
Notes:
- Xavier initialization =
rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ+n))
- Specify
df=nothing
to explicitly use AD
BetaML.Nn.DenseNoBiasLayer
— Typestruct DenseNoBiasLayer{TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer
Representation of a layer without bias in the network
Fields:
w
: Weigths matrix with respect to the input from previous layer or data (n x n pr. layer)f
: Activation functiondf
: Derivative of the activation function
BetaML.Nn.DenseNoBiasLayer
— MethodDenseNoBiasLayer(
nₗ,
n;
rng,
w_eltype,
w,
f,
df
) -> DenseNoBiasLayer{typeof(identity), _A, Float64} where _A<:Union{Nothing, Function}
Instantiate a new DenseNoBiasLayer
Positional arguments:
nₗ
: Number of nodes of the previous layern
: Number of nodes
Keyword arguments:
w_eltype
: Eltype of the weigths [def:Float64
]w
: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]f
: Activation function [def:identity
]df
: Derivative of the activation function [default: try to match with well-known derivatives, resort to AD iff
is unknown]rng
: Random Number Generator (seeFIXEDSEED
) [deafult:Random.GLOBAL_RNG
]
Notes:
- Xavier initialization =
rand(Uniform(-sqrt(6)/sqrt(nₗ+n),sqrt(6)/sqrt(nₗ,n))
BetaML.Nn.GroupedLayer
— Typestruct GroupedLayer <: AbstractLayer
Representation of a "group" of layers, each of which operates on different inputs (features) and acting as a single layer in the network.
Fields:
layers
: The individual layers that compose this grouped layer
BetaML.Nn.GroupedLayer
— MethodGroupedLayer(layers) -> GroupedLayer
Instantiate a new GroupedLayer, a layer made up of several other layers stacked together in order to cover all the data dimensions but without connect all the inputs to all the outputs like a single DenseLayer
would do.
Positional arguments:
layers
: The individual layers that compose this grouped layer
Notes:
- can be used to create composable neural networks with multiple branches
- tested only with 1 dimensional layers. For convolutional networks use ReshaperLayers before and/or after.
BetaML.Nn.Learnable
— TypeLearnable(data)
Structure representing the learnable parameters of a layer or its gradient.
The learnable parameters of a layers are given in the form of a N-tuple of Array{Float64,N2} where N2 can change (e.g. we can have a layer with the first parameter being a matrix, and the second one being a scalar). We wrap the tuple on its own structure a bit for some efficiency gain, but above all to define standard mathematic operations on the gradients without doing "type piracy" with respect to Base tuples.
BetaML.Nn.NeuralNetworkE_hp
— Type**`
mutable struct NeuralNetworkE_hp <: BetaMLHyperParametersSet
`**
Hyperparameters for the Feedforward
neural network model
Parameters:
layers
: Array of layer objects [def:nothing
, i.e. basic network]. Seesubtypes(BetaML.AbstractLayer)
for supported layersloss
: Loss (cost) function [def:squared_cost
] It must always assume y and ŷ as (n x d) matrices, eventually usingdropdims
inside.
dloss
: Derivative of the loss function [def:dsquared_cost
ifloss==squared_cost
,nothing
otherwise, i.e. use the derivative of the squared cost or autodiff]epochs
: Number of epochs, i.e. passages trough the whole training sample [def:200
]batch_size
: Size of each individual batch [def:16
]opt_alg
: The optimisation algorithm to update the gradient at each batch [def:ADAM()
]shuffle
: Whether to randomly shuffle the data at each iteration (epoch) [def:true
]tunemethod
: The method - and its parameters - to employ for hyperparameters autotuning. SeeSuccessiveHalvingSearch
for the default method. To implement automatic hyperparameter tuning during the (first)fit!
call simply setautotune=true
and eventually change the defaulttunemethod
options (including the parameter ranges, the resources to employ and the loss function to adopt).
To know the available layers type subtypes(AbstractLayer)
) and then type ?LayerName
for information on how to use each layer.
BetaML.Nn.NeuralNetworkE_options
— TypeNeuralNetworkE_options
A struct defining the options used by the Feedforward neural network model
Parameters:
cache
: Cache the results of the fitting stage, as to allow predict(mod) [default:true
]. Set it tofalse
to save memory for large data.descr
: An optional title and/or description for this modelverbosity
: The verbosity level to be used in training or prediction (seeVerbosity
) [deafult:STD
]cb
: A call back function to provide information during training [def:fitting_info
autotune
: 0ption for hyper-parameters autotuning [def:false
, i.e. not autotuning performed]. If activated, autotuning is performed on the firstfit!()
call. Controll auto-tuning trough the optiontunemethod
(see the model hyper-parameters)rng
: Random Number Generator (seeFIXEDSEED
) [deafult:Random.GLOBAL_RNG
]
BetaML.Nn.NeuralNetworkEstimator
— TypeNeuralNetworkEstimator
A "feedforward" (but also multi-branch) neural network (supervised).
For the parameters see NeuralNetworkE_hp
and for the training options NeuralNetworkE_options
(we have a few more options for this specific estimator).
Notes:
- data must be numerical
- the label can be a n-records vector or a n-records by n-dimensions matrix, but the result is always a matrix.
- For one-dimension regressions drop the unnecessary dimension with
dropdims(ŷ,dims=2)
- For classification tasks the columns should normally be interpreted as the probabilities for each categories
- For one-dimension regressions drop the unnecessary dimension with
Examples:
- Classification...
julia> using BetaML
julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];
julia> y = ["a","b","b","b","b","a"];
julia> ohmod = OneHotEncoder()
A OneHotEncoder BetaMLModel (unfitted)
julia> y_oh = fit!(ohmod,y)
6×2 Matrix{Bool}:
1 0
0 1
0 1
0 1
0 1
1 0
julia> layers = [DenseLayer(2,6),DenseLayer(6,2),VectorFunctionLayer(2,f=softmax)];
julia> m = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=300,verbosity=LOW)
NeuralNetworkEstimator - A Feed-forward neural network (unfitted)
julia> ŷ_prob = fit!(m,X,y_oh)
***
*** Training for 300 epochs with algorithm ADAM.
Training.. avg ϵ on (Epoch 1 Batch 1): 0.4116936481380642
Training of 300 epoch completed. Final epoch error: 0.44308719831108734.
6×2 Matrix{Float64}:
0.853198 0.146802
0.0513715 0.948629
0.0894273 0.910573
0.0367079 0.963292
0.00548038 0.99452
0.808334 0.191666
julia> ŷ = inverse_predict(ohmod,ŷ_prob)
6-element Vector{String}:
"a"
"b"
"b"
"b"
"b"
"a"
- Regression...
julia> using BetaML
julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];
julia> y = 2 .* X[:,1] .- X[:,2] .+ 3;
julia> layers = [DenseLayer(2,6),DenseLayer(6,6),DenseLayer(6,1)];
julia> m = NeuralNetworkEstimator(layers=layers,opt_alg=ADAM(),epochs=3000,verbosity=LOW)
NeuralNetworkEstimator - A Feed-forward neural network (unfitted)
julia> ŷ = fit!(m,X,y);
***
*** Training for 3000 epochs with algorithm ADAM.
Training.. avg ϵ on (Epoch 1 Batch 1): 33.30063874270561
Training of 3000 epoch completed. Final epoch error: 34.61265465430473.
julia> hcat(y,ŷ)
6×2 Matrix{Float64}:
4.1 4.11015
-16.5 -16.5329
-13.8 -13.8381
-18.4 -18.3876
-27.2 -27.1667
2.7 2.70542
BetaML.Nn.PoolingLayer
— Typestruct PoolingLayer{ND, NDPLUS1, NDPLUS2, TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer
Representation of a pooling layer in the network (weightless)
EXPERIMENTAL: Still too slow for practical applications
In the middle between VectorFunctionLayer
and ScalarFunctionLayer
, it applyes a function to the set of nodes defined in a sliding kernel.
Fields:
input_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1
: Input size (including nchannel_in as last dimension)output_size::StaticArraysCore.SVector{NDPLUS1, Int64} where NDPLUS1
: Output size (including nchannel_out as last dimension)kernel_size::StaticArraysCore.SVector{NDPLUS2, Int64} where NDPLUS2
: kernelsize augmented by the nchannelsin and nchannels_out dimensionspadding_start::StaticArraysCore.SVector{ND, Int64} where ND
: Padding (initial)padding_end::StaticArraysCore.SVector{ND, Int64} where ND
: Padding (ending)stride::StaticArraysCore.SVector{ND, Int64} where ND
: Stridendims::Int64
: Number of dimensions (excluding input and output channels)f::Function
: Activation functiondf::Union{Nothing, Function}
: Derivative of the activation functiony_to_x_ids::Array{Array{Tuple{Vararg{Int64, NDPLUS1}}, 1}, NDPLUS1} where NDPLUS1
: A y-dims array of vectors of ids of x(s) contributing to the giving y
BetaML.Nn.PoolingLayer
— MethodPoolingLayer(
input_size,
kernel_size,
nchannels_in;
stride,
kernel_eltype,
padding,
f,
df
) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, Float64} where {_A, _B, _C, _D<:Union{Nothing, Function}}
Instantiate a new nD-dimensional, possibly multichannel PoolingLayer
The input data is either a column vector (in which case is reshaped) or an array of input_size
augmented by the n_channels
dimension, the output size depends on the input_size
, kernel_size
, padding
and striding
but has always nchannels_out
as its last dimention.
Positional arguments:
input_size
: Shape of the input layer (integer for 1D convolution, tuple otherwise). Do not consider the channels number here.kernel_eltype
: Kernel eltype [def:Float64
]kernel_size
: Size of the kernel (aka filter) (integer for 1D or hypercube kernels or nD-sized tuple for assymmetric kernels). Do not consider the channels number here.nchannels_in
: Number of channels in inputnchannels_out
: Number of channels in output
Keyword arguments:
stride
: "Steps" to move the convolution with across the various tensor dimensions [def:kernel_size
, i.e. each X contributes to a single y]padding
: Integer or 2-elements tuple of tuples of the starting end ending padding across the various dimensions [def:nothing
, i.e. set the padding required to keep outside = inside / stride ]f
: Activation function. It should have a vector as input and produce a scalar as output[def:maximum
]df
: Derivative (gradient) of the activation function for the various inputs. [default:nothing
(i.e. use AD)]
Notes:
- to retrieve the output size of the layer, use
size(PoolLayer[2])
. The output size on each dimension d (except the last one that is given bynchannels_out
) is given by the following formula (ceiled):output_size[d] = 1 + (input_size[d]+2*padding[d]-kernel_size[d])/stride[d]
- differently from a ConvLayer, the pooling applies always on a single channel level, so that the output has always the same number of channels of the input. If you want to reduce the channels number either use a
ConvLayer
with the desired number of channels in output or use aReghaperLayer
to add a 1-element further dimension that will be treated as "channel" and choose the desided stride for the last pooling dimension (the one that was originally the channel dimension)
BetaML.Nn.PoolingLayer
— MethodPoolingLayer(
input_size_with_channel,
kernel_size;
stride,
padding,
f,
kernel_eltype,
df
) -> PoolingLayer{_A, _B, _C, typeof(maximum), _D, _E} where {_A, _B, _C, _D<:Union{Nothing, Function}, _E<:Number}
Alternative constructor for a PoolingLayer
where the number of channels in input is specified as a further dimension in the input size instead of as a separate parameter, so to use size(previous_layer)[2]
if one wish.
For arguments and default values see the documentation of the main constructor.
BetaML.Nn.ReshaperLayer
— Typestruct ReshaperLayer{NDIN, NDOUT} <: AbstractLayer
Representation of a "reshaper" (weigthless) layer in the network
Reshape the output of a layer (or the input data) to the shape needed for the next one.
Fields:
input_size::StaticArraysCore.SVector{NDIN, Int64} where NDIN
: Input sizeoutput_size::StaticArraysCore.SVector{NDOUT, Int64} where NDOUT
: Output size
BetaML.Nn.ReshaperLayer
— TypeReshaperLayer(
input_size
) -> ReshaperLayer{_A, _B} where {_A, _B}
ReshaperLayer(
input_size,
output_size
) -> ReshaperLayer{_A, _B} where {_A, _B}
Instantiate a new ReshaperLayer
Positional arguments:
input_size
: Shape of the input layer (tuple).output_size
: Shape of the input layer (tuple) [def:prod([input_size...]))
, i.e. reshape to a vector of appropriate lenght].
BetaML.Nn.SGD
— TypeSGD(;η=t -> 1/(1+t), λ=2)
Stochastic Gradient Descent algorithm (default)
Fields:
η
: Learning rate, as a function of the current epoch [def: t -> 1/(1+t)]λ
: Multiplicative constant to the learning rate [def: 2]
BetaML.Nn.ScalarFunctionLayer
— Typestruct ScalarFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer
Representation of a ScalarFunction layer in the network. ScalarFunctionLayer applies the activation function directly to the output of the previous layer (i.e., without passing for a weigth matrix), but using an optional learnable parameter (an array) used as second argument, similarly to [VectorFunctionLayer
(@ref). Differently from VectorFunctionLayer
, the function is applied scalarwise to each node.
The number of nodes in input must be set to the same as in the previous layer
Fields:
w
: Weigths (parameter) array passes as second argument to the activation function (if not empty)n
: Number of nodes in output (≡ number of nodes in input )f
: Activation function (vector)dfx
: Derivative of the (vector) activation function with respect to the layer inputs (x)dfw
: Derivative of the (vector) activation function with respect to the optional learnable weigths (w)
Notes:
- The output
size
of this layer is the same as those of the previous layers.
BetaML.Nn.ScalarFunctionLayer
— MethodScalarFunctionLayer(
nₗ;
rng,
wsize,
w_eltype,
w,
f,
dfx,
dfw
) -> ScalarFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}
Instantiate a new ScalarFunctionLayer
Positional arguments:
nₗ
: Number of nodes (must be same as in the previous layer)
Keyword arguments:
wsize
: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]w_eltype
: Eltype of the weigths [def:Float64
]w
: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]f
: Activation function [def:softmax
]dfx
: Derivative of the activation function with respect to the data [default: try to match with well-known derivatives, resort to AD iff
is unknown]dfw
: Derivative of the activation function with respect to the learnable parameter [default:nothing
(i.e. use AD)]rng
: Random Number Generator (seeFIXEDSEED
) [deafult:Random.GLOBAL_RNG
]
Notes:
- If the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)
- Xavier initialization =
rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))
BetaML.Nn.VectorFunctionLayer
— Typestruct VectorFunctionLayer{N, TF<:Function, TDFX<:Union{Nothing, Function}, TDFW<:Union{Nothing, Function}, WET<:Number} <: AbstractLayer
Representation of a VectorFunction layer in the network. Vector function layer expects a vector activation function, i.e. a function taking the whole output of the previous layer an input rather than working on a single node as "normal" activation functions would do. Useful for example with the SoftMax function in classification or with the pool1D
function to implement a "pool" layer in 1 dimensions. By default it is weightless, i.e. it doesn't apply any transformation to the output coming from the previous layer except the activation function. However, by passing the parameter wsize
(a touple or array - tested only 1D) you can pass the learnable parameter to the activation function too. It is your responsability to be sure the activation function accept only X or also this learnable array (as second argument). The number of nodes in input must be set to the same as in the previous layer (and if you are using this for classification, to the number of classes, i.e. the previous layer must be set equal to the number of classes in the predictions).
Fields:
w
: Weigths (parameter) array passes as second argument to the activation function (if not empty)nₗ
: Number of nodes in input (i.e. length of previous layer)n
: Number of nodes in output (automatically inferred in the constructor)f
: Activation function (vector)dfx
: Derivative of the (vector) activation function with respect to the layer inputs (x)dfw
: Derivative of the (vector) activation function with respect to the optional learnable weigths (w)
Notes:
- The output
size
of this layer is given by the size of the output function,
that not necessarily is the same as the previous layers.
BetaML.Nn.VectorFunctionLayer
— MethodVectorFunctionLayer(
nₗ;
rng,
wsize,
w_eltype,
w,
f,
dfx,
dfw,
dummyDataToTestOutputSize
) -> VectorFunctionLayer{_A, typeof(softmax), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}
Instantiate a new VectorFunctionLayer
Positional arguments:
nₗ
: Number of nodes (must be same as in the previous layer)
Keyword arguments:
wsize
: A tuple or array specifying the size (number of elements) of the learnable parameter [def: empty array]w_eltype
: Eltype of the weigths [def:Float64
]w
: Initial weigths with respect to input [default: Xavier initialisation, dims = (nₗ,n)]f
: Activation function [def:softmax
]dfx
: Derivative of the activation function with respect to the data
[default: try to match with well-known derivatives, resort to AD if f
is unknown]
dfw
: Derivative of the activation function with respect to the learnable parameter [default:nothing
(i.e. use AD)]dummyDataToTestOutputSize
: Dummy data to test the output size [def:
ones(nₗ)
]
rng
: Random Number Generator (seeFIXEDSEED
) [deafult:Random.GLOBAL_RNG
]
Notes:
- If the derivative is provided, it should return the gradient as a (n,n) matrix (i.e. the Jacobian)
- To avoid recomputing the activation function just to determine its output size, we compute the output size once here in the layer constructor by calling the activation function with
dummyDataToTestOutputSize
. Feel free to change it if it doesn't match with the activation function you are setting - Xavier initialization =
rand(Uniform(-sqrt(6)/sqrt(sum(wsize...)),sqrt(6)/sqrt(sum(wsize...))))
Base.size
— Methodsize(layer)
Get the size of the layers in terms of (size in input, size in output) - both as tuples
Notes:
- You need to use
import Base.size
before defining this function for your layer
Base.size
— Methodsize(layer::ConvLayer) -> Tuple{Tuple, Tuple}
Get the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension
Base.size
— Methodsize(
layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET} where {TF<:Function, TDF<:Union{Nothing, Function}, WET<:Number}
) -> Tuple{Tuple, Tuple}
Get the dimensions of the layers in terms of (dimensions in input, dimensions in output) including channels as last dimension
BetaML.Nn.ReplicatorLayer
— MethodReplicatorLayer(
n
) -> ScalarFunctionLayer{_A, typeof(identity), _B, Nothing, Float64} where {_A, _B<:Union{Nothing, Function}}
Create a weigthless layer whose output is equal to the input.
Fields:
n
: Number of nodes in output (≡ number of nodes in input )
Notes:
- The output
size
of this layer is the same as those of the previous layers. - This is just an alias for a
ScalarFunctionLayer
with no weigths and identity function.
BetaML.Nn.backward
— Methodbackward(layer,x,next_gradient)
Compute backpropagation for this layer with respect to its inputs
Parameters:
layer
: Worker layerx
: Input to the layernext_gradient
: Derivative of the overal loss with respect to the input of the next layer (output of this layer)
Return:
- The evaluated gradient of the loss with respect to this layer inputs
BetaML.Nn.fitting_info
— Methodfittinginfo(nn,xbatch,ybatch,x,y;n,batchsize,epochs,epochsran,verbosity,nepoch,n_batch)
Default callback funtion to display information during training, depending on the verbosity level
Parameters:
nn
: Worker networkxbatch
: Batch input to the network (batch_size,din)ybatch
: Batch label input (batch_size,dout)x
: Full input to the network (n_records,din)y
: Full label input (n_records,dout)n
: Size of the full training setn_batches
: Number of baches per epochepochs
: Number of epochs defined for the trainingepochs_ran
: Number of epochs already ran in previous training sessionsverbosity
: Verbosity level defined for the training (NONE,LOW,STD,HIGH,FULL)n_epoch
: Counter of the current epochn_batch
: Counter of the current batch
#Notes:
- Reporting of the error (loss of the network) is expensive. Use
verbosity=NONE
for better performances
BetaML.Nn.forward
— Methodforward(layer,x)
Predict the output of the layer given the input
Parameters:
layer
: Worker layerx
: Input to the layer
Return:
- An Array{T,1} of the prediction (even for a scalar)
BetaML.Nn.forward
— Methodforward(
layer::ConvLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},
x
) -> Any
Compute forward pass of a ConvLayer
BetaML.Nn.forward
— Methodforward(
layer::PoolingLayer{ND, NDPLUS1, NDPLUS2, TF, TDF, WET},
x
) -> Any
Compute forward pass of a ConvLayer
BetaML.Nn.get_gradient
— Methodget_gradient(layer,x,next_gradient)
Compute backpropagation for this layer with respect to the layer weigths
Parameters:
layer
: Worker layerx
: Input to the layernext_gradient
: Derivative of the overaall loss with respect to the input of the next layer (output of this layer)
Return:
- The evaluated gradient of the loss with respect to this layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the
get_params()
andset_params()
functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in itsLearnable
type.
BetaML.Nn.get_gradient
— Methodget_gradient(nn,x,y)
Low level function that retrieve the current gradient of the weigthts (i.e. derivative of the cost with respect to the weigths). Unexported in BetaML >= v0.9
Parameters:
nn
: Worker networkx
: Input to the network (d,1)y
: Label input (d,1)
#Notes:
- The output is a vector of tuples of each layer's input weigths and bias weigths
BetaML.Nn.get_params
— Methodget_params(layer)
Get the layers current value of its trainable parameters
Parameters:
layer
: Worker layer
Return:
- The current value of the layer's trainable parameters as tuple of matrices. It is up to you to decide how to organise this tuple, as long you are consistent with the
get_gradient()
andset_params()
functions. Note that starting from BetaML 0.2.2 this tuple needs to be wrapped in itsLearnable
type.
BetaML.Nn.get_params
— Methodget_params(nn)
Retrieve current weigthts
Parameters:
nn
: Worker network
Notes:
- The output is a vector of tuples of each layer's input weigths and bias weigths
BetaML.Nn.init_optalg!
— Methodinit_optalg!(opt_alg::ADAM;θ,batch_size,x,y,rng)
Initialize the ADAM algorithm with the parameters m and v as zeros and check parameter bounds
BetaML.Nn.init_optalg!
— Methodinitoptalg!(optalg;θ,batch_size,x,y)
Initialize the optimisation algorithm
Parameters:
opt_alg
: The Optimisation algorithm to useθ
: Current parametersbatch_size
: The size of the batchx
: The training (input) datay
: The training "labels" to matchrng
: Random Number Generator (seeFIXEDSEED
) [deafult:Random.GLOBAL_RNG
]
Notes:
- Only a few optimizers need this function and consequently ovverride it. By default it does nothing, so if you want write your own optimizer and don't need to initialise it, you don't have to override this method
BetaML.Nn.preprocess!
— Methodpreprocess!(layer::AbstractLayer)
Preprocess the layer with information known at layer creation (i.e. no data info used)
This function is used for some layers to cache some computation that doesn't require the data and it is called at the beginning of fit!
. For example, it is used in ConvLayer to store the ids of the convolution.
Notes:
- as it doesn't depend on data, it is not reset by
reset!
BetaML.Nn.set_params!
— Methodset_params!(layer,w)
Set the trainable parameters of the layer with the given values
Parameters:
layer
: Worker layerw
: The new parameters to set (Learnable)
Notes:
- The format of the tuple wrapped by Learnable must be consistent with those of the
get_params()
andget_gradient()
functions.
BetaML.Nn.set_params!
— Methodset_params!(nn,w)
Update weigths of the network
Parameters:
nn
: Worker networkw
: The new weights to set
BetaML.Nn.single_update!
— Methodsingleupdate!(θ,▽;nepoch,nbatch,batchsize,xbatch,ybatch,opt_alg)
Perform the parameters update based on the average batch gradient.
Parameters:
θ
: Current parameters▽
: Average gradient of the batchn_epoch
: Count of current epochn_batch
: Count of current batchn_batches
: Number of batches per epochxbatch
: Data associated to the current batchybatch
: Labels associated to the current batchopt_alg
: The Optimisation algorithm to use for the update
Notes:
- This function is overridden so that each optimisation algorithm implement their
own version
- Most parameters are not used by any optimisation algorithm. They are provided
to support the largest possible class of optimisation algorithms
- Some optimisation algorithms may change their internal structure in this function