Videos related to this segment (click the title to watch)
01 JULIA1 - 6A: Metaprogramming and macros (23:46)
01 JULIA1 - 6B: Interoperability with other languages (23:6)
01 JULIA1 - 6C: Performances and errors: profiling, debugging, introspection and exceptions (27:33)
01 JULIA1 - 6D: Parallel computation: multithreading, multiprocessing (20:3)

0106 Further Topics

Some stuff to set-up the environment..

julia> cd(@__DIR__)
julia> using Pkg
julia> Pkg.activate(".") # If using a Julia version different than 1.10 please uncomment and run the following line (reproductibility guarantee will however be lost) # Pkg.resolve() # Pkg.instantiate() # run this if you didn't in Segment 01.01 Activating project at `~/work/SPMLJ/SPMLJ/buildedDoc/01_-_JULIA1_-_Basic_Julia_programming`
julia> using Random
julia> Random.seed!(123)Random.TaskLocalRNG()
julia> using InteractiveUtils # loaded automatically when working... interactively

Metaprogramming and macros

"running" some code include the following passages (roughly):

  • parsing of the text defining the code and its translation in hierarchical expressions to the Abstract syntax Tree (AST) (syntax errors are caugth at this time)
  • on the first instance required ("just in time") compilation of the AST expressions into object code (using the LLVM compiler)
  • execution of the compiled object code

"Macros" in many other language (e.g. C or C++) refer to the possibility to "pre-process" the textual representation of the code statements before it is parsed. In julia instead it refers to the possibility to alter the expression once has already being parsed in the AST, allowing a greater expressivity as we are no longer limited by the parsing syntax

The AST is organised in a hierarchical tree of expressions where each element (including the operators) is a symbol For variables, you can use symbols to refer to the actual identifiers instad to the variable's value

Expressions themselves are objects representing unevaluated computer expressions

Expressions and symbols

julia> expr1 = Meta.parse("a = b + 2") # What the parser do when reading the source code. b doesn't need to actually been defined, it's just a namebinding without the reference to any object, not even `nothing`:(a = b + 2)
julia> typeof(expr1) # expressions are first class objectsExpr
julia> expr2 = :(a = b + 1):(a = b + 1)
julia> expr3 = quote a = b + 1 endquote #= REPL[4]:1 =# a = b + 1 end
julia> expr3quote #= REPL[4]:1 =# a = b + 1 end
julia> dump(expr1) # The AST ! Note this is already a nested statement, an assignment of the result of an expression (the sum call between the symbol `:b` and 1) to the symbol `a`Expr head: Symbol = args: Array{Any}((2,)) 1: Symbol a 2: Expr head: Symbol call args: Array{Any}((3,)) 1: Symbol + 2: Symbol b 3: Int64 2
julia> expr4 = Expr(:(=),:a,Expr(:call,:+,:b,1)) # The AST using the "Expr" constructor:(a = b + 1)
julia> symbol1 = :(a) # as for expressions:a
julia> symbol2 = Meta.parse("a") # as for expressions:a
julia> symbol3 = Symbol("a") # specific for symbols only:a
julia> othsymbol = Symbol("aaa",10,"bbb"):aaa10bbb
julia> typeof(symbol1)Symbol

I can access any parts of my expression before evaluating it (indeed, that's what macro will do...)

julia> myASymbol = expr1.args[2].args[1]:+
julia> expr1.args[2].args[1] = :(*):*
julia> b = 2 # a # error, a not defined2
julia> eval(expr1)4
julia> a # here now is defined and it has an object associated... 4!4
Danger

The capability to evaluate expressions is very powerfull but due to obvious secutiry implications never evaluate expressions you aren't sure of their provenience. For example if you develop a Julia web app (e.g. using Genie.jl) never evaluate user provided expressions.

Note that evaluation of expressions happens always at global scope, even if it done inside a function:

julia> function foo()
           locVar = 1
           expr = :(locVar + 1)
           return eval(expr)
       end
       # a = foo() # error locVar not definedfoo (generic function with 1 method)

To refer to the value of a variable rather than the identifier itself within an expression, interpolate the variable using the dollar sign:

julia> expr = :($a + b) # here the identifier 'a' has been replaced with its numerical value, `4`:(4 + b)
julia> dump(expr)Expr head: Symbol call args: Array{Any}((3,)) 1: Symbol + 2: Int64 4 3: Symbol b
julia> eval(expr)6
julia> a = 1010
julia> eval(expr) # no changes6
julia> b = 100100
julia> eval(expr) # here it change, as it is at eval time that the identifier `b` is "replaced" with its value104

Macros

One of the best usage of macros is they allow package developers to provide a very flexible API to their package, suited for the specific needs of the package, making life easier for the users. Compare for example the API of JuMP with those of Pyomo to define model constraints !

Some examples of macros: MultiDimEquations.jl

  • from: @meq par1[d1 in DIM1, d2 in DIM2, dfix3] = par2[d1,d2]+par3[d1,d2]
  • to: [par1[d1,d2,dfix3] = par2[d1,d2]+par3[d1,d2] for d1 in DIM1, d2 in DIM2]

Pipe.jl:

  • from: @pipe 10 |> foo(_,a) |> foo2(b,_,c) |> foo3(_)
  • to: foo3(foo2(b,foo(10,a),c))

Brodcasting (Base):

  • from: @. a + b * D^2
  • to: a .+ b .* D.^2

Defining a macro...

Like functions, but both the arguments and the returned output are expressions

julia> macro customLoop(controlExpr,workExpr)
           return quote
             for i in $controlExpr
               $workExpr
             end
           end
       end@customLoop (macro with 1 method)

Invoking a macro....

julia> a = 55
julia> @customLoop 1:4 println(i) #note that "i" is in the macro1 2 3 4
julia> @customLoop 1:a println(i)1 2 3 4 5
julia> @customLoop 1:a if i > 3 println(i) end4 5
julia> @customLoop ["apple", "orange", "banana"] println(i)apple orange banana
julia> @customLoop ["apple", "orange", "banana"] begin print("i: "); println(i) endi: apple i: orange i: banana
julia> @macroexpand @customLoop 1:4 println(i) # print what the macro does with the specific expressions providedquote #= REPL[1]:3 =# for var"#433#i" = 1:4 #= REPL[1]:4 =# Main.var"Main".println(var"#433#i") #= REPL[1]:5 =# end end

String macros (aka "non-standard string literals") Invoked with the syntax xxx" ...text..." or xxx""" ...multi-line text...""" where xxx is the name of the macro and the macro must be defined as macro xxx_str. Used to perform textual modification o nthe given text, for example this print the given text on a 8 characters

julia> macro print8_str(mystr)                                 # input here is a string, not an expression
           limits = collect(1:8:length(mystr))
           for (i,j) in enumerate(limits)
             st = j
             en = i==length(limits) ? length(mystr) : j+7
             println(mystr[st:en])
           end
       end@print8_str (macro with 1 method)
julia> print8"123456789012345678"12345678 90123456 78
julia> print8"""This is a text that once printed in 8 columns with terminal will be several lines. Ok, no rammar rules relating to carriage returns are emploied here..."""This is a text t hat once printed in 8 co lumns wi th termi nal will be seve ral line s. Ok, n o rammar rules r elating to carri age retu rns are emploied here...

While normally used to modify text, string macros are "true" macros:

julia> macro customLoop_str(str)
           exprs = Meta.parse(str)
           controlExpr,workExpr = exprs.args[1],exprs.args[2]
           return quote
             for i in $controlExpr
               $workExpr
             end
           end
       end@customLoop_str (macro with 1 method)
julia> customLoop"""1:4; println(i)"""1 2 3 4

Interfacing with other languages

There are 3 ways to interface Julia with programs or libraries wrote in other languages. At the lowest level, Julia allows to directly interface with C or Fortran libraries, and this means, aside using directly libraries written in C, to be able to interface with any programming language that offer also a C interface (R, Python...) Using this low level C Interface, users have created specific packages to interface many languages using a simple, Julian-way syntax. We will see these interfaces for R and Python. Finally, at the highest level, many common packages of other languages have been already "interfaced", so that the user can use the Julia Package without even knowing that this is an interface for an other package, for example SymPy.jl is a large interface to the Python package SymPy.

Using C libraries

Let's start by seing how to use a C library. For this example to work you will need to have the GCC compiler installed on your machine First let's write the header and source C files and write them to the disk:

julia> cheader = """
       extern int get5();
       extern double mySum(float x, float y);
       """"extern int get5();\nextern double mySum(float x, float y);\n"
julia> csource = """ int get5(){ return 5; } double mySum(float x, float y){ return x+y; } """"int get5(){\n return 5;\n}\n\ndouble mySum(float x, float y){\n return x+y;\n}\n"
julia> open(f->write(f,cheader),"myclib.h","w") # We open a stream to file with the "w" parameter as for "writing", and we pass the stream to the anonymous function to actually write to the stream. If this funcitons is many lines of code, consider rewriting the `open` statement using a `do` block58
julia> open(f->write(f,csource),"myclib.c","w")79

Now let's run the command to compile the C code we saved as shared library using gcc, a C compiler. The following example assume that GCC is installed in the machine where this example is run and available as gcc.

julia> compilationCommand1 = `gcc -o myclib.o -c myclib.c` # the actual compilation, note the backticks used to define a command`gcc -o myclib.o -c myclib.c`
julia> compilationCommand2 = `gcc -shared -o libmyclib.so myclib.o -lm -fPIC` # the linking into a shared library`gcc -shared -o libmyclib.so myclib.o -lm -fPIC`
julia> run(compilationCommand1)Process(`gcc -o myclib.o -c myclib.c`, ProcessExited(0))
julia> run(compilationCommand2)Process(`gcc -shared -o libmyclib.so myclib.o -lm -fPIC`, ProcessExited(0))

This should have created the C library libmyclib.so on disk. Let's gonna use it:

julia> const myclib = joinpath(@__DIR__, "libmyclib.so")  # we need the full path"/home/runner/work/SPMLJ/SPMLJ/buildedDoc/01_-_JULIA1_-_Basic_Julia_programming/libmyclib.so"

ccall arguments:

  1. A tuple with the funcion name to call and the library path. For both, if embedded in a variable, the variable must be set constant.
  2. The Julia type that map to the C type returned by the function.
    • intInt32 or Int64 (or the easy-to remmeber Cint alias)
    • floatFloat32 (or the Cfloat alias)
    • doubleFloat64 (or the Cdouble alias)
  3. A tuple with the Julia types of the parameters passed to the C function
  4. Any other argument are the values of the parameter passed
julia> a = ccall((:get5,myclib), Int32, ())5
julia> b = ccall((:mySum,myclib), Float64, (Float32,Float32), 2.5, 1.5)4.0

More details on calling C or Fortran code can be obtained in the official Julia documentation.

Using Python in Julia

The "default" way to use Python code in Julia is trough the PyCall.jl package. It automatically take care of convert between Python types (including numpy arrays) and Julia types (types that can not be converted automatically are converted to the generic PyObject type).

julia> ENV["PYTHON"] = "" # will force PyCall to download and use a "private to Julia" (conda based) version of Python. use "/path/to/python" if you want to reuse a version already installed on your system
       
       # using Pkg
       # Pkg.add("PyCall")
       # Pkg.build("PyCall")""
julia> using PyCall

Embed short python snippets in Julia

julia> py"""
       def sumMyArgs (i, j):
         return i+j
       def getNthElement (vec,n):
         return vec[n]
       """
julia> a = py"sumMyArgs"(3,4) # 7 - here we call the Python object (a function) with Julia parameters7
julia> b = py"getNthElement"([1,2,3],1) # 2 - attention to the diffferent convention for starting arrays!. Note the Julia Array ahas been converted automatically to a Python list2
julia> d = py"getNthElement([1,$a,3],1)" # 7 - here we interpolate the Python call7

Alternativly, use @pyinclude("pythonScript.py")

julia> pythonCode = """
       def sumMyArgs (i, j, z):
         return i+j+z
       """"def sumMyArgs (i, j, z):\n  return i+j+z\n"
julia> open(f->write(f,pythonCode),"pythonScript.py","w")40
julia> @pyinclude("pythonScript.py")
julia> a = py"sumMyArgs"(3,4,5)12
Tip

Note thaat the 3 arguments definition of sumMyArgs has replaced the 3-arguments one. This would now error py"sumMyArgs"(3,4)

Use Python libraries

Add a package to the local Python installation using Conda:

julia> pyimport_conda("ezodf", "ezodf", "conda-forge") # pyimport_conda(module, package, channel)PyObject <module 'ezodf' from '/home/runner/.local/lib/python3.10/site-packages/ezodf/__init__.py'>
julia> const ez = pyimport("ezodf") # Equiv. of Python `import ezodf as ez`PyObject <module 'ezodf' from '/home/runner/.local/lib/python3.10/site-packages/ezodf/__init__.py'>
julia> destDoc = ez.newdoc(doctype="ods", filename="anOdsSheet.ods")PyObject <ezodf.document.PackagedDocument object at 0x7fabd9958af0>

Both ez and destDoc are PyObjects for which we can access attributes and call the methods using the usual obj.method() syntax as we would do in Python

julia> sheet    = ez.Sheet("Sheet1", size=(10, 10))PyObject <ezodf.table.Table object at 0x7fabd9958280>
julia> destDoc.sheets.append(sheet) # dcell1 = sheet[(2,3)] # This would error because the index is a tuple. Let's use directly the `get(obj,key)` function instead:PyObject <ezodf.table.Table object at 0x7fabd9958280>
julia> dcell1 = get(sheet,(2,3)) # Equiv. of Python `dcell1 = sheet[(2,3)]`. Attention again to Python indexing from zero: this is cell "D3", not "B3" !PyObject <ezodf.cells.Cell object at 0x7fabd99581f0>
julia> dcell1.set_value("Hello")
julia> get(sheet,"A9").set_value(10.5) # Equiv. of Python `sheet['A9'].set_value(10.5)`
julia> destDoc.backup = falsefalse
julia> destDoc.save()

Using Julia in Python

Installation of the Python package PyJulia

PyJulia can be installed using pip, taking note that its name using pip is julia not PyJulia: $ python3 -m pip install --user julia

We can now open a Python terminal and initialise PyJulia to work with our Julia version:

>>> import julia
>>> julia.install() # Only once to set-up in julia the julia packages required by PyJulia

If we have multiple Julia versions, we can specify the one to use in Python passing julia="/path/to/julia/binary/executable" (e.g. julia = "/home/myUser/lib/julia-1.1.0/bin/julia") to the install() function.

Running Julia libraries and code in Python

On each Python session we need to run the following code:

from julia import Julia
Julia(compiled_modules=False)

This is a workaround to the common situation when the Python interpreter is statically linked to libpython, but it will slow down the interactive experience, as it will disable Julia packages pre-compilation, and every time we will use a module for the first time, this will need to be compiled first. Other, more efficient but also more complicate, workarounds are given in the package documentation, under the Troubleshooting section.

We can now direcltly load a Julia module, including Main, the global namespace of Julia’s interpreter, with from julia import ModuleToLoad and access the module objects directly or using the Module.evel() interface.

Add a Julia package...
>>> from julia import Pkg
>>> Pkg.add("BetaML")

Of course we can add a package alternatively from within Julia

"Direct" calling of Julia functions...
>>> from julia import BetaML
>>> import numpy as np
>>> model = BetaML.buildForest([[1,10],[2,12],[12,1]],["a","a","b"])
>>> predictions = BetaML.predict(model,np.array([[2,9],[13,0]]))
>>> predictions
[{'b': 0.36666666666666664, 'a': 0.6333333333333333}, {'b': 0.7333333333333333, 'a': 0.26666666666666666}]
Access using the eval() interface...

If we are using the jl.eval() interface, the objects we use must be already known to julia. To pass objects from Python to Julia, we can import the julia Main module (the root module in julia) and assign the needed variables, e.g.

>>> X_python = [1,2,3,2,4]
>>> from julia import Main
>>> Main.X_julia = X_python
>>> Main.eval('BetaML.gini(X_julia)')
0.7199999999999999
>>> Main.eval("""
...   function makeProd(x,y)
...       return x*y
...   end
...   """
... )
>>> Main.eval("makeProd(2,3)") # or Main.makeProd(2,3)

For large scripts instead of using eval() we can equivalently use Main.include("aJuliaScript.jl")

Using R in Julia

To use R from within Julia we use the RCall package.

julia> ENV["R_HOME"] = "*" #  # will force RCall to download and use a "private to Julia" (conda based) version of R. use "/path/to/R/directory" (e.g. `/usr/lib/R`) if you want to reuse a version already installed on your system
       
       # using Pkg
       # Pkg.add("RCall")
       # Pkg.build("RCall")"*"
julia> using RCall
julia> R""" sumMyArgs <- function(i,j) i+j getNthElement <- function(vec,n) { return(vec[n]) } """RCall.RObject{RCall.ClosSxp} function (vec, n) { return(vec[n]) }
julia> a = rcopy(R"sumMyArgs"(3,4)) # 7 - here we call the R object (a function) with Julia parameters7
julia> b = rcopy(R"getNthElement"([1,2,3],1)) # 1 - no differences in array indexing here1
julia> d = rcopy(R"as.integer(getNthElement(c(1,$a,3),2))") # 7 - here we interpolate the R call7
julia> d = convert(Int64,R"getNthElement(c(1,$a,3),2)")7

While we don't have here the problem of different array indexing convention (both Julia and R start indexing arrays at 1), we have the "problem" that the output returned by using R"..." is not yet an exploitable Julia object but it remains as an RObject that we can convert with rcopy() or explicitly with convert(T,obj). Also, R elements are all floats by default, so if we need an integer in Julia we need to explicitly convert it, either in R or in Julia.

If the R code is on a script, we don't have here a sort of @Rinclude macro, so let's implement it ourselves by loading the file content as a file and evaluating it using the function reval provided by RCall:

julia> macro Rinclude(fname)
           quote
               rCodeString = read($fname,String)
               reval(rCodeString)
               nothing
           end
       end@Rinclude (macro with 1 method)
julia> RCode = """ sumMyArgs <- function(i, j, z) i+j+z """"sumMyArgs <- function(i, j, z) i+j+z\n"
julia> open(f->write(f,RCode),"RScript.R","w")37
julia> @Rinclude("RScript.R")
julia> a = rcopy(R"sumMyArgs"(3,4,5)) # 12 # a = rcopy(R"sumMyArgs"(3,4)) # error ! The 3-arguments version of `sumMyArgs` has _replaced_ the 2-arguments one12

Using Julia in R

Installation of the R package JuliaCall

JuliaCall can be installed from CRAN:

> install.packages("JuliaCall")
> library(JuliaCall)
install_julia()

install_julia() will force the download of R and install a private copy of julia. If you prefer to use instead an existing version of julia and having R default to download a private version only if it can't find a version already installed, use julia_setup(installJulia = TRUE) instead of install_julia(), eventually passing the JULIA_HOME = "/path/to/julia/binary/executable/directory" (e.g. JULIA_HOME = "/home/myUser/lib/julia-1.7.0/bin") parameter to the julia_setup call.

JuliaCall depends for some things (like object conversion between Julia and R) from the Julia RCall package. If we don't already have it installed in Julia, it will try to install it automatically.

Running Julia libraries and code in R

On each R session we need to run the julia_setup function:

library(JuliaCall)
julia_setup() # If we have already downloaded a private version of Julia for R it will be retrieved automatically

We can now load a Julia module and access the module objects directly or using the Module.evel() interface.

Add a Julia package...
> julia_eval('using Pkg; Pkg.add("BetaML")')

Of course we can add a package alternatively from within Julia

Let's load some data from R and do some work with this data in Julia:

> library(datasets)
> X <- as.matrix(sapply(iris[,1:4], as.numeric))
> y <- sapply(iris[,5], as.integer)
Calling of Julia functions with julia_call...

With JuliaCall, differently than PyJulia, we can't call direclty the julia functions but we need to employ the R function julia_call("juliaFunction",args):

> julia_eval("using BetaML")
> yencoded <- julia_call("integerEncoder",y)
> ids      <- julia_call("shuffle",1:length(y))
> Xs       <- X[ids,]
> ys       <- yencoded[ids]
> cOut     <- julia_call("kmeans",Xs,3L)    # kmeans expects K to be an integer
> y_hat    <- sapply(cOut[1],as.integer)[,] # We need a vector, not a matrix
> acc      <- julia_call("accuracy",y_hat,ys)
> acc
[1] 0.8933333
Access using the eval() interface...

As alternative, we can embed Julia code directly in R using the julia_eval() function:

> kMeansR  <- julia_eval('
+      function accFromKmeans(x,k,y_true)
+        cOut = kmeans(x,Int(k))
+        acc = accuracy(cOut[1],y_true)
+        return acc
+      end
+ ')

We can then call the above function in R in one of the following three ways:

  1. kMeansR(Xs,3,ys)
  2. julia_assign("Xs_julia", Xs); julia_assign("ys_julia", ys); julia_eval("accFromKmeans(Xs_julia,3,ys_julia)")
  3. julia_call("accFromKmeans",Xs,3,ys).

While other "convenience" functions are provided by the package, using julia_call or julia_assign followed by julia_eval should suffix to accomplish most of the task we may need in Julia.

Some performance tips

Type stability

"Type stable" functions guarantee to the compiler that given a certain method (i.e. with the arguments being of a given type) the object returned by the function is also of a certain fixed type. Type stability is fundamental to allow type inference continue across the function call stack.

julia> function f1(x)    # Type unstable
           outVector = [1,2.0,"2"]
           if x < 0
               return outVector[1]
           elseif x == 0
               return outVector[2]
           else
               return outVector[3]
           end
       endf1 (generic function with 1 method)
julia> function f2(x) # Type stable outVector = [1,convert(Int64,2.0),parse(Int64,"2")] if x < 0 return outVector[1] elseif x == 0 return outVector[2] else return outVector[3] end endf2 (generic function with 1 method)
julia> a = f1(0)2.0
julia> b = f1(1)"2"
julia> typeof(a)Float64
julia> typeof(b)String
julia> c = f2(0)2
julia> d = f2(1)2
julia> typeof(c)Int64
julia> typeof(d)Int64
julia> using BenchmarkTools
julia> @btime f1(0) # 661 ns 6 allocations 75.106 ns (5 allocations: 192 bytes) 2.0
julia> @btime f2(0) # 55 ns 1 allocations 46.519 ns (1 allocation: 80 bytes) 2
julia> @code_warntype f1(0) # Body::AnyMethodInstance for Main.var"Main".f1(::Int64) from f1(x) @ Main.var"Main" REPL[1]:1 Arguments #self#::Core.Const(Main.var"Main".f1) x::Int64 Locals outVector::Vector{Any} Body::ANY 1 ─ (outVector = Base.vect(1, 2.0, "2")) │ %2 = (x < 0)::Bool └── goto #3 if not %2 2 ─ %4 = Base.getindex(outVector, 1)::ANY └── return %4 3 ─ %6 = (x == 0)::Bool └── goto #5 if not %6 4 ─ %8 = Base.getindex(outVector, 2)::ANY └── return %8 5 ─ %10 = Base.getindex(outVector, 3)::ANY └── return %10
julia> @code_warntype f2(0) # Body::Int64MethodInstance for Main.var"Main".f2(::Int64) from f2(x) @ Main.var"Main" REPL[2]:1 Arguments #self#::Core.Const(Main.var"Main".f2) x::Int64 Locals outVector::Vector{Int64} Body::Int64 1 ─ %1 = Main.var"Main".convert(Main.var"Main".Int64, 2.0)::Core.Const(2) │ %2 = Main.var"Main".parse(Main.var"Main".Int64, "2")::Int64 │ (outVector = Base.vect(1, %1, %2)) │ %4 = (x < 0)::Bool └── goto #3 if not %4 2 ─ %6 = Base.getindex(outVector, 1)::Int64 └── return %6 3 ─ %8 = (x == 0)::Bool └── goto #5 if not %8 4 ─ %10 = Base.getindex(outVector, 2)::Int64 └── return %10 5 ─ %12 = Base.getindex(outVector, 3)::Int64 └── return %12

While in general it is NOT important to annotate function parameters for performance, it is important to annotate struct fields with concrete types

julia> abstract type Goo end
julia> struct Foo <: Goo x::Number end
julia> struct Boo <: Goo x::Int64 end
julia> function f1(o::Goo) return o.x +2 endf1 (generic function with 2 methods)
julia> fobj = Foo(1)Main.var"Main".Foo(1)
julia> bobj = Boo(1)Main.var"Main".Boo(1)
julia> @btime f1($fobj) # 17.1 ns 0 allocations 20.047 ns (0 allocations: 0 bytes) 3
julia> @btime f1($bobj) # 2.8 ns 0 allocations 2.785 ns (0 allocations: 0 bytes) 3

Here the same function under some argument types is type stable, under other argument types is not

julia> @code_warntype f1(fobj)MethodInstance for Main.var"Main".f1(::Main.var"Main".Foo)
  from f1(o::Main.var"Main".Goo) @ Main.var"Main" REPL[4]:1
Arguments
  #self#::Core.Const(Main.var"Main".f1)
  o::Main.var"Main".Foo
Body::ANY
1 ─ %1 = Base.getproperty(o, :x)::NUMBER
│   %2 = (%1 + 2)::ANY
└──      return %2
julia> @code_warntype f1(bobj)MethodInstance for Main.var"Main".f1(::Main.var"Main".Boo) from f1(o::Main.var"Main".Goo) @ Main.var"Main" REPL[4]:1 Arguments #self#::Core.Const(Main.var"Main".f1) o::Main.var"Main".Boo Body::Int64 1 ─ %1 = Base.getproperty(o, :x)::Int64 │ %2 = (%1 + 2)::Int64 └── return %2

Avoid (non-constant) global variables

julia> g        = 22
julia> const cg = 1 # we can't change the _type_ of the object binded to a constant variable1
julia> cg = 2 # we can rebind to an other object of the same type, but we get a warning # cg = 2.5 # this would error !WARNING: redefinition of constant Main.cg. This may fail, cause incorrect answers, or produce other errors. 2
julia> f1(x,y) = x+yf1 (generic function with 3 methods)
julia> f2(x) = x + gf2 (generic function with 1 method)
julia> f3(x) = x + cgf3 (generic function with 1 method)
julia> @btime f1(3,2) 1.552 ns (0 allocations: 0 bytes) 5
julia> @btime f2(3) # 22 times slower !!! 20.962 ns (0 allocations: 0 bytes) 5
julia> @btime f3(3) # as f1 1.552 ns (0 allocations: 0 bytes) 5

Loop arrays with the inner loop by rows

Julia is column mayor (differently than Python) so arrays of bits types are contiguous in memory across the different rows of the same column

julia> a = rand(1000,1000);
julia> function f1(x) (R,C) = size(x) cum = 0.0 for r in 1:R for c in 1:C cum += x[r,c] end end return cum endf1 (generic function with 3 methods)
julia> function f2(x) (R,C) = size(x) cum = 0.0 for c in 1:C for r in 1:R cum += x[r,c] end end return cum endf2 (generic function with 1 method)
julia> @btime f1($a) # 2.3 ms 0 allocations 926.449 μs (0 allocations: 0 bytes) 500220.6882968228
julia> @btime f2($a) # 1.3 ms 0 allocations 925.564 μs (0 allocations: 0 bytes) 500220.6882968189

Use low-level optimisation when possible

julia> function f1(x)
           s = 0.0
           for i in 1:length(x)
               s += i * x[i]
           end
           return s
       endf1 (generic function with 3 methods)
julia> function f2(x) s = 0.0 for i in 1:length(x) @inbounds s += i * x[i] # remove bound checks end return s endf2 (generic function with 1 method)
julia> function f3(x) s = 0.0 @simd for i in 1:length(x) # tell compiler it is allowed to run the loop in whatever order, allowing in-thread paralllelism of modern CPUs s += i * x[i] end return s endf3 (generic function with 1 method)
julia> x = rand(10000);
julia> @btime f1($x) 9.267 μs (0 allocations: 0 bytes) 2.5231610262164485e7
julia> @btime f2($x) 9.257 μs (0 allocations: 0 bytes) 2.5231610262164485e7
julia> @btime f3($x) 5.559 μs (0 allocations: 0 bytes) 2.5231610262164526e7
julia> X = rand(100,20);
julia> function f1(x) s = 0.0 for i in 1:size(x,1) s += sum(x[i,:]) end return s endf1 (generic function with 3 methods)
julia> function f2(x) s = 0.0 @views for i in 1:size(x,1) s += sum(x[i,:]) # the slice operator copy the data.. the views macro force to have instead to have a view (reference) end return s endf2 (generic function with 1 method)
julia> @btime f1($X) 4.225 μs (100 allocations: 21.88 KiB) 1013.9993057955776
julia> @btime f2($X) 1.314 μs (0 allocations: 0 bytes) 1013.9993057955776
Warning

Attention that while the @views macro "save time" by not copying the data, the resulting array has a pretty messy layout. If you need to use it for many subsequent operations it may be more efficient to "pay" the copy cost once and then have an array with a nicelly continuous block of memory..

julia> function f1(x,y)
           if x+y > 100
               return x + y + 2
           else
               return x + 1
           end
       endf1 (generic function with 3 methods)
julia> @inline function f2(x,y) # the function is "inlined", its whole definition copied at each calling place rather than being called if x+y > 100 return x + y + 2 else return x + 1 end endf2 (generic function with 2 methods)
julia> function f3(y) s = 0.0 for i in 2:y s += f1(i,i-1) s += f1(i,i) end return s endf3 (generic function with 1 method)
julia> function f4(y) s = 0.0 for i in 2:y s += f2(i,i-1) s += f2(i,i) end return s endf4 (generic function with 1 method)
julia> x = 10001000
julia> @btime f3($x) 1.851 μs (0 allocations: 0 bytes) 2.002396e6
julia> @btime f4($x) 1.834 μs (0 allocations: 0 bytes) 2.002396e6

But attention! Not always a good idea:

julia> function f3(y)
           s = 0.0
           for i in 2:y
              s += sum(f1(a,a-1) for a in 2:i)
           end
           return s
        endf3 (generic function with 1 method)
julia> function f4(y) s = 0.0 for i in 2:y s += sum(f2(a,a-1) for a in 2:i) end return s endf4 (generic function with 1 method)
julia> x = 10001000
julia> @btime f3($x) 86.201 μs (0 allocations: 0 bytes) 3.3359915e8
julia> @btime f4($x) 86.061 μs (0 allocations: 0 bytes) 3.3359915e8

Note that the Julia compiles already inline small functions automatically when it thinks it will improve performances

Profiling the code to discover bootlenecks

We already see @btime and @benchmark from the package BenchmarkTools.jl Remember to quote the global variables used as parameter of your function with the dollar sign to have accurate benchmarking of the function execution. Julia provide the macro @time but we should run on a second call to a given function (with a certain parameter types) or it will include compilation time in its output:

julia> function fb(x)
           out = Union{Int64,Float64}[1,2.0,3]
           push!(out,4)
           if x > 10
               if ( x > 100)
                   return [out[1],out[2]] |>  sum
               else
                   return [out[2],out[3]] |>  sum
               end
           else
               return [out[1],out[3]] |>  sum
           end
       endfb (generic function with 1 method)
julia> @time fb(3) 0.000006 seconds (3 allocations: 256 bytes) 4
julia> @time fb(3) 0.000005 seconds (3 allocations: 256 bytes) 4

We can use @profile function(x,y) to use a sample-based profiling

julia> using Profile # in the stdlib
julia> function foo(n) a = rand(n,n) b = a + a c = b * b return c endfoo (generic function with 2 methods)
julia> @profile (for i = 1:100; foo(1000); end) # too fast otherwise
julia> Profile.print() # on my pc: 243 rand, 174 the sum, 439 the matrix productOverhead ╎ [+additional indent] Count File:Line; Function ========================================================= ╎242 @Base/client.jl:552; _start() ╎ 242 @Base/client.jl:318; exec_options(opts::Base.JLOptions) ╎ 242 @Base/Base.jl:495; include(mod::Module, _path::String) ╎ 242 @Base/loading.jl:2136; _include(mapexpr::Function, mod::Module, _… ╎ 242 @Base/loading.jl:2076; include_string(mapexpr::typeof(identity),… ╎ 242 @Base/boot.jl:385; eval ╎ ╎ 242 …ter/src/makedocs.jl:265; kwcall(::@NamedTuple{sitename::Strin… ╎ ╎ 242 …er/src/makedocs.jl:271; #makedocs#87 ╎ ╎ 242 @Base/file.jl:112; cd(f::Documenter.var"#88#90"{Documenter.D… ╎ ╎ 242 …r/src/makedocs.jl:271; #88 ╎ ╎ 242 @Base/env.jl:257; withenv(::Documenter.var"#89#91"{Documen… ╎ ╎ ╎ 242 …r/src/makedocs.jl:272; #89 ╎ ╎ ╎ 242 …ies/Selectors.jl:170; dispatch(::Type{Documenter.Builde… ╎ ╎ ╎ 242 …lder_pipeline.jl:222; runner(::Type{Documenter.Builder… ╎ ╎ ╎ 242 …nder_pipeline.jl:22; expand(doc::Documenter.Document) ╎ ╎ ╎ 242 …es/Selectors.jl:170; dispatch(::Type{Documenter.Expa… ╎ ╎ ╎ ╎ 242 …der_pipeline.jl:846; runner(::Type{Documenter.Expan… ╎ ╎ ╎ ╎ 242 …c/IOCapture.jl:100; kwcall(::@NamedTuple{rethrow::… ╎ ╎ ╎ ╎ 242 …c/IOCapture.jl:167; capture(f::Documenter.var"#63… ╎ ╎ ╎ ╎ 242 …se/logging.jl:627; with_logger ╎ ╎ ╎ ╎ 242 …se/logging.jl:515; with_logstate(f::Function, l… ╎ ╎ ╎ ╎ ╎ 242 …/IOCapture.jl:170; (::IOCapture.var"#5#9"{Data… ╎ ╎ ╎ ╎ ╎ 242 …_pipeline.jl:847; (::Documenter.var"#63#65"{D… ╎ ╎ ╎ ╎ ╎ 242 @Base/file.jl:112; cd(f::Documenter.var"#64#6… ╎ ╎ ╎ ╎ ╎ 242 …pipeline.jl:848; #64 ╎ ╎ ╎ ╎ ╎ 242 …ase/boot.jl:385; eval ╎ ╎ ╎ ╎ ╎ ╎ 242 REPL[3]:1; top-level scope ╎ ╎ ╎ ╎ ╎ ╎ 242 …/Profile.jl:44; macro expansion ╎ ╎ ╎ ╎ ╎ ╎ 242 REPL[3]:1; macro expansion ╎ ╎ ╎ ╎ ╎ ╎ 107 REPL[2]:2; foo(n::Int64) ╎ ╎ ╎ ╎ ╎ ╎ 107 …c/Random.jl:279; rand ╎ ╎ ╎ ╎ ╎ ╎ ╎ 107 …c/Random.jl:291; rand ╎ ╎ ╎ ╎ ╎ ╎ ╎ 107 …c/Random.jl:290; rand ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …ase/boot.jl:494; Array ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …ase/boot.jl:487; Array 3╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …ase/boot.jl:479; Array ╎ ╎ ╎ ╎ ╎ ╎ ╎ 104 …c/Random.jl:269; rand! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 104 …hiroSimd.jl:293; rand! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 104 …hiroSimd.jl:142; xoshiro_bulk ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …hiroSimd.jl:249; xoshiro_bulk_s… 4╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …hiroSimd.jl:71; _plus ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:252; xoshiro_bulk_s… 1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:77; _xor ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 …hiroSimd.jl:253; xoshiro_bulk_s… 2╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 …hiroSimd.jl:77; _xor ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:254; xoshiro_bulk_s… 1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:77; _xor ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:255; xoshiro_bulk_s… 1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:77; _xor ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 95 …hiroSimd.jl:257; xoshiro_bulk_s… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 29 …/pointer.jl:146; unsafe_store! 29╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 29 …/pointer.jl:146; unsafe_store! 66╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 66 …hiroSimd.jl:108; _bits2float ╎ ╎ ╎ ╎ ╎ ╎ 68 REPL[2]:3; foo(n::Int64) ╎ ╎ ╎ ╎ ╎ ╎ 68 …rraymath.jl:16; +(A::Matrix{Float64},… ╎ ╎ ╎ ╎ ╎ ╎ ╎ 68 …roadcast.jl:892; broadcast_preservin… ╎ ╎ ╎ ╎ ╎ ╎ ╎ 68 …roadcast.jl:903; materialize ╎ ╎ ╎ ╎ ╎ ╎ ╎ 68 …roadcast.jl:928; copy ╎ ╎ ╎ ╎ ╎ ╎ ╎ 64 …roadcast.jl:956; copyto! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 64 …roadcast.jl:1003; copyto! ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 63 …simdloop.jl:77; macro expansion ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 63 …roadcast.jl:1004; macro expans… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 18 …roadcast.jl:636; getindex ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 15 …roadcast.jl:681; _broadcast_… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 15 …roadcast.jl:705; _getindex ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ +1 15 …roadcast.jl:675; _broadcast… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ +2 15 …ensional.jl:696; getindex 15╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ +3 15 …sentials.jl:14; getindex ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …roadcast.jl:682; _broadcast_… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …roadcast.jl:709; _broadcast… 3╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ +1 3 …se/float.jl:409; + ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 45 …ensional.jl:698; setindex! 45╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 45 …se/array.jl:1024; setindex! 1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …simdloop.jl:84; macro expansion ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …roadcast.jl:223; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …roadcast.jl:224; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …actarray.jl:876; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …actarray.jl:877; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …ase/boot.jl:494; Array ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …ase/boot.jl:487; Array 4╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …ase/boot.jl:479; Array ╎ ╎ ╎ ╎ ╎ ╎ 67 REPL[2]:4; foo(n::Int64) ╎ ╎ ╎ ╎ ╎ ╎ 67 …c/matmul.jl:113; *(A::Matrix{Float64}… ╎ ╎ ╎ ╎ ╎ ╎ ╎ 11 …se/array.jl:420; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ 11 …ase/boot.jl:487; Array 11╎ ╎ ╎ ╎ ╎ ╎ ╎ 11 …ase/boot.jl:479; Array ╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …c/matmul.jl:237; mul! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …c/matmul.jl:263; mul! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …c/matmul.jl:352; generic_matmatmul! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …c/matmul.jl:605; gemm_wrapper!(C:… 56╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …src/blas.jl:1524; gemm!(transA::… ╎6114 @Base/task.jl:682; task_done_hook(t::Task) ╎ 6114 @Base/task.jl:1008; wait() 6114╎ 6114 @Base/task.jl:999; poptask(W::Base.IntrusiveLinkedListSynchronized… Total snapshots: 10190. Utilization: 40% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.
julia> Profile.clear()

Introspection and debugging

To discover problems on the code more in general we can use several introspection functions that Julia provide us (some of which we have already saw):

julia> # @less rand(3)  # Show the source code of the specific method invoked - use `q` to quit
       # @edit rand(3)  # Like @loss but it opens the source code in an editor
       methods(foo)# 2 methods for generic function "foo" from Main.var"Main":
 [1] foo()
     @ REPL[1]:1
 [2] foo(n)
     @ REPL[2]:1
julia> @which foo(2) # which method am I using when I call foo with an integer?foo(n) @ Main.var"Main" REPL[2]:1
julia> typeof(a)Matrix{Float64} (alias for Array{Float64, 2})
julia> eltype(a)Float64
julia> isa(1.2, Number)true
julia> fieldnames(Foo)(:x,)
julia> dump(fobj)Main.var"Main".Foo x: Int64 1
julia> names(Main, all=false) # available (e.g. exported) identifiers of a given module13-element Vector{Symbol}: :Base :Core :LESSONS_ROOTDIR :LESSONS_ROOTDIR_TMP :LESSONS_SUBDIR :MAKE_PDF :Main :include_sandbox :link_example :literate_directory :makeList :preprocess :rdir
julia> sizeof(2) # bytes8
julia> typemin(Int64)-9223372036854775808
julia> typemax(Int64)9223372036854775807
julia> bitstring(2)"0000000000000000000000000000000000000000000000000000000000000010"

Various low-level interpretation of an expression

julia> @code_native foo(3)	.text
	.file	"foo"
	.globl	julia_foo_38008                 # -- Begin function julia_foo_38008
	.p2align	4, 0x90
	.type	julia_foo_38008,@function
julia_foo_38008:                        # @julia_foo_38008
; ┌ @ REPL[2]:1 within `foo`
# %bb.0:                                # %top
	push	rbp
	mov	rbp, rsp
	push	r15
	push	r14
	push	r12
	push	rbx
	sub	rsp, 32
	vxorps	xmm0, xmm0, xmm0
	vmovaps	xmmword ptr [rbp - 64], xmm0
	mov	qword ptr [rbp - 48], 0
	#APP
	mov	rax, qword ptr fs:[0]
	#NO_APP
	mov	rdx, rdi
	lea	rcx, [rbp - 64]
	movabs	rdi, 140375528943696
; │ @ REPL[2]:2 within `foo`
; │┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:279 within `rand` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:291 @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:290
; ││┌ @ boot.jl:494 within `Array` @ boot.jl:487 @ boot.jl:479
	mov	rsi, rdx
	mov	r12, qword ptr [rax - 8]
	mov	qword ptr [rbp - 64], 4
	mov	rax, qword ptr [r12]
	mov	qword ptr [rbp - 56], rax
	movabs	rax, 140376665409696
	mov	qword ptr [r12], rcx
	call	rax
; ││└
; ││┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:269 within `rand!` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:293
; │││┌ @ essentials.jl:10 within `length`
	mov	rbx, qword ptr [rax + 8]
; │││└
; │││┌ @ abstractarray.jl:1237 within `pointer`
; ││││┌ @ pointer.jl:65 within `unsafe_convert`
	mov	r14, qword ptr [rax]
; ││└└└
; ││┌ @ boot.jl:494 within `Array` @ boot.jl:487 @ boot.jl:479
	mov	r15, rax
; ││└
; ││┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:269 within `rand!` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:293
; │││┌ @ int.jl:88 within `*`
	shl	rbx, 3
; │││└
; │││┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:141 within `xoshiro_bulk`
; ││││┌ @ operators.jl:425 within `>=`
; │││││┌ @ int.jl:514 within `<=`
	cmp	rbx, 64
; ││││└└
	jl	.LBB0_2
# %bb.1:                                # %L11
; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:142 within `xoshiro_bulk`
	movabs	rax, offset j_xoshiro_bulk_simd_38010
	mov	qword ptr [rbp - 48], r15
	mov	rdi, r14
	mov	rsi, rbx
	call	rax
; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:143 within `xoshiro_bulk`
; ││││┌ @ int.jl:86 within `-`
	sub	rbx, rax
; ││││└
; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:144 within `xoshiro_bulk`
; ││││┌ @ pointer.jl:282 within `+`
	add	r14, rax
.LBB0_2:                                # %L17
; ││││└
; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:146 within `xoshiro_bulk`
; ││││┌ @ operators.jl:276 within `!=`
; │││││┌ @ promotion.jl:521 within `==`
	test	rbx, rbx
; ││││└└
	je	.LBB0_4
# %bb.3:                                # %L22
; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:147 within `xoshiro_bulk`
	movabs	rax, offset j_xoshiro_bulk_nosimd_38011
	mov	qword ptr [rbp - 48], r15
	mov	rdi, r14
	mov	rsi, rbx
	call	rax
.LBB0_4:                                # %L24
; │└└└
; │ @ REPL[2]:3 within `foo`
	movabs	rax, offset "j_+_38012"
	mov	qword ptr [rbp - 48], r15
	mov	rdi, r15
	mov	rsi, r15
	call	rax
; │ @ REPL[2]:4 within `foo`
	movabs	rcx, offset "j_*_38013"
	mov	qword ptr [rbp - 48], rax
	mov	rdi, rax
	mov	rsi, rax
	call	rcx
	mov	rcx, qword ptr [rbp - 56]
	mov	qword ptr [r12], rcx
; │ @ REPL[2]:5 within `foo`
	add	rsp, 32
	pop	rbx
	pop	r12
	pop	r14
	pop	r15
	pop	rbp
	ret
.Lfunc_end0:
	.size	julia_foo_38008, .Lfunc_end0-julia_foo_38008
; └
                                        # -- End function
	.section	".note.GNU-stack","",@progbits
julia> @code_llvm foo(3); @ REPL[2]:1 within `foo` define nonnull {}* @julia_foo_38056(i64 signext %0) #0 { top: %gcframe3 = alloca [3 x {}*], align 16 %gcframe3.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 0 %1 = bitcast [3 x {}*]* %gcframe3 to i8* call void @llvm.memset.p0i8.i64(i8* align 16 %1, i8 0, i64 24, i1 true) %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #7 %tls_ppgcstack = getelementptr i8, i8* %thread_ptr, i64 -8 %2 = bitcast i8* %tls_ppgcstack to {}**** %tls_pgcstack = load {}***, {}**** %2, align 8 ; @ REPL[2]:2 within `foo` ; ┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:279 within `rand` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:291 @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:290 ; │┌ @ boot.jl:494 within `Array` @ boot.jl:487 @ boot.jl:479 %3 = bitcast [3 x {}*]* %gcframe3 to i64* store i64 4, i64* %3, align 16 %4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 1 %5 = bitcast {}** %4 to {}*** %6 = load {}**, {}*** %tls_pgcstack, align 8 store {}** %6, {}*** %5, align 8 %7 = bitcast {}*** %tls_pgcstack to {}*** store {}** %gcframe3.sub, {}*** %7, align 8 %8 = call nonnull {}* inttoptr (i64 140376665409696 to {}* ({}*, i64, i64)*)({}* inttoptr (i64 140375528943696 to {}*), i64 %0, i64 %0) ; │└ ; │┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:269 within `rand!` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:293 ; ││┌ @ abstractarray.jl:1237 within `pointer` ; │││┌ @ pointer.jl:65 within `unsafe_convert` %9 = bitcast {}* %8 to i8** %arrayptr = load i8*, i8** %9, align 8 %10 = ptrtoint i8* %arrayptr to i64 ; ││└└ ; ││┌ @ essentials.jl:10 within `length` %11 = bitcast {}* %8 to { i8*, i64, i16, i16, i32 }* %arraylen_ptr = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %11, i64 0, i32 1 %arraylen = load i64, i64* %arraylen_ptr, align 8 ; ││└ ; ││┌ @ int.jl:88 within `*` %12 = shl i64 %arraylen, 3 ; ││└ ; ││┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:141 within `xoshiro_bulk` ; │││┌ @ operators.jl:425 within `>=` ; ││││┌ @ int.jl:514 within `<=` %13 = icmp slt i64 %12, 64 ; │││└└ br i1 %13, label %L17, label %L11 L11: ; preds = %top %14 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 2 store {}* %8, {}** %14, align 16 ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:142 within `xoshiro_bulk` %15 = call i64 @j_xoshiro_bulk_simd_38058(i64 zeroext %10, i64 signext %12) ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:143 within `xoshiro_bulk` ; │││┌ @ int.jl:86 within `-` %16 = sub i64 %12, %15 ; │││└ ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:144 within `xoshiro_bulk` ; │││┌ @ pointer.jl:282 within `+` %17 = getelementptr i8, i8* %arrayptr, i64 %15 %18 = ptrtoint i8* %17 to i64 ; ││││┌ @ essentials.jl:522 within `oftype` ; │││││┌ @ pointer.jl:26 within `convert` ; ││││││┌ @ boot.jl:800 within `Ptr` br label %L17 L17: ; preds = %L11, %top %value_phi = phi i64 [ %18, %L11 ], [ %10, %top ] %value_phi1 = phi i64 [ %16, %L11 ], [ %12, %top ] ; │││└└└└ ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:146 within `xoshiro_bulk` ; │││┌ @ operators.jl:276 within `!=` ; ││││┌ @ promotion.jl:521 within `==` %.not = icmp eq i64 %value_phi1, 0 ; │││└└ br i1 %.not, label %L24, label %L22 L22: ; preds = %L17 %19 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 2 store {}* %8, {}** %19, align 16 ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:147 within `xoshiro_bulk` call void @j_xoshiro_bulk_nosimd_38059(i64 zeroext %value_phi, i64 signext %value_phi1) br label %L24 L24: ; preds = %L22, %L17 %20 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 2 store {}* %8, {}** %20, align 16 ; └└└ ; @ REPL[2]:3 within `foo` %21 = call nonnull {}* @"j_+_38060"({}* nonnull %8, {}* nonnull %8) store {}* %21, {}** %20, align 16 ; @ REPL[2]:4 within `foo` %22 = call nonnull {}* @"j_*_38061"({}* nonnull %21, {}* nonnull %21) %23 = load {}*, {}** %4, align 8 %24 = bitcast {}*** %tls_pgcstack to {}** store {}* %23, {}** %24, align 8 ; @ REPL[2]:5 within `foo` ret {}* %22 }
julia> @code_typed foo(3)CodeInfo( 1 ── %1 = $(Expr(:foreigncall, :(:jl_alloc_array_2d), Matrix{Float64}, svec(Any, Int64, Int64), 0, :(:ccall), Matrix{Float64}, Core.Argument(2), Core.Argument(2), Core.Argument(2), Core.Argument(2)))::Matrix{Float64} %2 = $(Expr(:gc_preserve_begin, :(%1))) %3 = $(Expr(:foreigncall, :(:jl_array_ptr), Ptr{Float64}, svec(Any), 0, :(:ccall), :(%1)))::Ptr{Float64} %4 = Base.bitcast(Ptr{UInt8}, %3)::Ptr{UInt8} %5 = Base.arraylen(%1)::Int64 %6 = Base.mul_int(%5, 8)::Int64 %7 = Random.XoshiroSimd.Float64::Type{Float64} %8 = Random.XoshiroSimd._bits2float::typeof(Random.XoshiroSimd._bits2float) %9 = Base.sle_int(64, %6)::Bool └─── goto #3 if not %9 2 ── %11 = invoke Random.XoshiroSimd.xoshiro_bulk_simd($(QuoteNode(Random.TaskLocalRNG()))::Random.TaskLocalRNG, %4::Ptr{UInt8}, %6::Int64, %7::Type{Float64}, $(QuoteNode(Val{8}()))::Val{8}, %8::typeof(Random.XoshiroSimd._bits2float))::Int64 %12 = Base.sub_int(%6, %11)::Int64 %13 = Core.bitcast(Core.UInt, %4)::UInt64 %14 = Base.bitcast(UInt64, %11)::UInt64 %15 = Base.add_ptr(%13, %14)::UInt64 └─── %16 = Core.bitcast(Ptr{UInt8}, %15)::Ptr{UInt8} 3 ┄─ %17 = φ (#2 => %16, #1 => %4)::Ptr{UInt8} %18 = φ (#2 => %12, #1 => %6)::Int64 %19 = (%18 === 0)::Bool %20 = Base.not_int(%19)::Bool └─── goto #5 if not %20 4 ── invoke Random.XoshiroSimd.xoshiro_bulk_nosimd($(QuoteNode(Random.TaskLocalRNG()))::Random.TaskLocalRNG, %17::Ptr{UInt8}, %18::Int64, %7::Type{Float64}, %8::typeof(Random.XoshiroSimd._bits2float))::Any 5 ┄─ goto #6 6 ── $(Expr(:gc_preserve_end, :(%2))) └─── goto #7 7 ── goto #8 8 ── goto #9 9 ── goto #10 10 ─ goto #11 11 ─ %30 = invoke Main.var"Main".:+(%1::Matrix{Float64}, %1::Matrix{Float64})::Matrix{Float64} %31 = invoke Main.var"Main".:*(%30::Matrix{Float64}, %30::Matrix{Float64})::Matrix{Float64} └─── return %31 ) => Matrix{Float64}
julia> @code_lowered foo(3)CodeInfo( 1 ─ a = Main.var"Main".rand(n, n) b = a + a c = b * b └── return c )

We can use a debugger, like e.g. the one integrated in Juno or VSCode. Graphical debuggers allow to put a breakpoint on some specific line of code, run the code in debug mode (yes, it will be slower), let the program arrive to the breakpoint and inspect the state of the system at that point of the code, including local variables. In Julia we can also change the program interactively ! Other typycal functions are running a single line, running inside a function, running until the current function return, ecc..

Runtime exceptions

As many (all?) languages, Julia when "finds" an error issues an exception, that if it is not caugth at higher level in the call stack (i.e. recognised and handled) lead to an error and return to the prompt or termination of the script (and rarely with the Julia process crashing altogether).

The idea is that we try some potentially dangerous code and if some error is raised in this code we catch it and handle it.

julia> function customIndex(vect,idx;toReturn=0)
           try
               vect[idx]
           catch e
               if isa(e,BoundsError)
                   return toReturn
               end
               rethrow(e)
           end
       endcustomIndex (generic function with 1 method)
julia> a = [1,2,3] # a[4] # Error ("BoundsError" to be precise)3-element Vector{Int64}: 1 2 3
julia> customIndex(a,4)0

Note that handling exceptions is computationally expensive, so do not use exceptions in place of conditional statements

Parallel computation

Finally one note on parallel computation. We see only some basic usage of multithreading and multiprocesses in this course, but with Julia it is relativelly easy to parallelise the code either using multiple threads or multiple processes. What's the difference ?

  • multithread
    • advantages: computationally "cheap" to create (the memory is shared)
    • disadvantages: limited to the number of cores within a CPU, require attention in not overwriting the same memory or doing it at the intended order ("data race"), we can't add threads dynamically (within a script)
  • multiprocesses
    • advantages: unlimited number, can be run in different CPUs of the same machine or differnet nodes of a cluster, even using SSH on different networks, we can add processes from within our code with addprocs(nToAdd)
    • disadvantages: the memory being copied (each process will have its own memory) are computationally expensive (you need to have a gain higher than the cost on setting a new process) and require attention to select which memory a given process will need to "bring with it" for its functionality

Note that if you are reading this document on the github pages, this script is compiled using GitHub actions where a single thread and process are available, so you will not see performance gains.

Multithreading

Warning

It is not possible to add threads dinamically, either we have to start Julia with the parameter -t (e.g. -t 8 or -t auto) in the command line or use the VSCode Julia externsion setting Julia: Num Threads

julia> function inner(x)
           s = 0.0
           for i in 1:x
               for j in 1:i
                   if j%2 == 0
                       s += j
                   else
                       s -= j
                   end
               end
           end
           return s
       endinner (generic function with 1 method)
julia> function parentSingleThread(x,y) toTest = x .+ (1:y) out = zeros(length(toTest)) for i in 1:length(toTest) out[i] = inner(toTest[i]) end return out endparentSingleThread (generic function with 1 method)
julia> function parentThreaded(x,y) toTest = x .+ (1:y) out = zeros(length(toTest)) Threads.@threads for i in 1:length(toTest) out[i] = inner(toTest[i]) end return out endparentThreaded (generic function with 1 method)
julia> x = 100100
julia> y = 2020
julia> str = parentSingleThread(x,y)20-element Vector{Float64}: -51.0 0.0 -52.0 0.0 -53.0 0.0 -54.0 0.0 -55.0 0.0 -56.0 0.0 -57.0 0.0 -58.0 0.0 -59.0 0.0 -60.0 0.0
julia> mtr = parentThreaded(x,y)20-element Vector{Float64}: -51.0 0.0 -52.0 0.0 -53.0 0.0 -54.0 0.0 -55.0 0.0 -56.0 0.0 -57.0 0.0 -58.0 0.0 -59.0 0.0 -60.0 0.0
julia> str == mtr # truetrue
julia> Threads.nthreads() # 4 in my case4
julia> Threads.threadid()1
julia> @btime parentSingleThread(100,20) # 140 μs on my machine 114.285 μs (1 allocation: 224 bytes) 20-element Vector{Float64}: -51.0 0.0 -52.0 0.0 -53.0 0.0 -54.0 0.0 -55.0 0.0 -56.0 0.0 -57.0 0.0 -58.0 0.0 -59.0 0.0 -60.0 0.0
julia> @btime parentThreaded(100,20) # 47 μs 41.178 μs (22 allocations: 2.42 KiB) 20-element Vector{Float64}: -51.0 0.0 -52.0 0.0 -53.0 0.0 -54.0 0.0 -55.0 0.0 -56.0 0.0 -57.0 0.0 -58.0 0.0 -59.0 0.0 -60.0 0.0

Multiprocessing

NOTE The code on multiprocessing is commented out (not executed, so you don't see the output) as GitHub actions (that are used to run this code and render the web pages you are reading) have problems running multi-process functions:

using Distributed     # from the Standard Library
addprocs(3)           # 2,3,4

The first process is considered a sort of "master" process, the other one are the "workers" We can add processes on other machines by providing the SSH connection details directly in the addprocs() call (Julia must be installed on that machines as well) We can alternativly start Julia directly with n worker processes using the armument -p n in the command line.

println("Worker pids: ")
for pid in workers()  # return a vector to the pids
    println(pid)      # 2,3,4
end
rmprocs(workers()[1])    #  remove process pid 2
println("Worker pids: ")
for pid in workers()
    println(pid) # 3,4 are left
end
@everywhere begin using Distributed end # this is needed only in GitHub action
@everywhere println(myid()) # 4,3

Run heavy tasks in parallel

using Distributed, BenchmarkTools
a = rand(1:35,100)
@everywhere function fib(n)
    if n == 0 return 0 end
    if n == 1 return 1 end
    return fib(n-1) + fib(n-2)
end

The macro @everywhere make available the given function (or functions with @everywhere begin [shared function definitions] end or @everywhere include("sharedCode.jl")) to all the current workers.

result  = map(fib,a)

The pmap function ("parallel" map) automatically pick up the free processes, assign them the job prom the "input" array and merge the results in the returned array. Note that the order is preserved:

result2 = pmap(fib,a)
result == result2
@btime map(fib,$a)  # serialised:   median time: 514 ms    1 allocations
@btime pmap(fib,$a) # parallelised: median time: 265 ms 4220 allocations # the memory of `a` need to be copied to all processes

Divide and Conquer

Rather than having a "heavy operation" and being interested in the individual results, here we have a "light" operation and we want to aggregate the results of the various computations using some aggreagation function. We can then use @distributed (aggregationfunction) for [forConditions] macro:

using Distributed, BenchmarkTools
function f(n)   # our single-process benchmark
  s = 0.0
  for i = 1:n
    s += i/2
  end
    return s
end
function pf(n)
  s = @distributed (+) for i = 1:n # aggregate using sum on variable s
        i/2                        # the last element of the for cycle is used by the aggregator
  end
  return s
end
@btime  f(10000000) # median time: 11.1 ms   0 allocations
@btime pf(10000000) # median time:  5.7 ms 145 allocations

Note that also in this case the improvement is less than proportional with the number of processes we add

Details on parallel comutation can be found on the official documentation, including information to run nativelly Julia on GPUs or TPUs.

View this file on Github.


This page was generated using Literate.jl.