01 JULIA1 - 6A: Metaprogramming and macros (23:46)
01 JULIA1 - 6B: Interoperability with other languages (23:6)
01 JULIA1 - 6C: Performances and errors: profiling, debugging, introspection and exceptions (27:33)
01 JULIA1 - 6D: Parallel computation: multithreading, multiprocessing (20:3)
0106 Further Topics
Some stuff to set-up the environment..
julia> cd(@__DIR__)
julia> using Pkg
julia> Pkg.activate(".") # If using a Julia version different than 1.10 please uncomment and run the following line (reproductibility guarantee will however be lost) # Pkg.resolve() # Pkg.instantiate() # run this if you didn't in Segment 01.01
Activating project at `~/work/SPMLJ/SPMLJ/buildedDoc/01_-_JULIA1_-_Basic_Julia_programming`
julia> using Random
julia> Random.seed!(123)
Random.TaskLocalRNG()
julia> using InteractiveUtils # loaded automatically when working... interactively
Metaprogramming and macros
"running" some code include the following passages (roughly):
- parsing of the text defining the code and its translation in hierarchical expressions to the Abstract syntax Tree (AST) (syntax errors are caugth at this time)
- on the first instance required ("just in time") compilation of the AST expressions into object code (using the LLVM compiler)
- execution of the compiled object code
"Macros" in many other language (e.g. C or C++) refer to the possibility to "pre-process" the textual representation of the code statements before it is parsed. In julia instead it refers to the possibility to alter the expression once has already being parsed in the AST, allowing a greater expressivity as we are no longer limited by the parsing syntax
The AST is organised in a hierarchical tree of expressions where each element (including the operators) is a symbol For variables, you can use symbols to refer to the actual identifiers instad to the variable's value
Expressions themselves are objects representing unevaluated computer expressions
Expressions and symbols
julia> expr1 = Meta.parse("a = b + 2") # What the parser do when reading the source code. b doesn't need to actually been defined, it's just a namebinding without the reference to any object, not even `nothing`
:(a = b + 2)
julia> typeof(expr1) # expressions are first class objects
Expr
julia> expr2 = :(a = b + 1)
:(a = b + 1)
julia> expr3 = quote a = b + 1 end
quote #= REPL[4]:1 =# a = b + 1 end
julia> expr3
quote #= REPL[4]:1 =# a = b + 1 end
julia> dump(expr1) # The AST ! Note this is already a nested statement, an assignment of the result of an expression (the sum call between the symbol `:b` and 1) to the symbol `a`
Expr head: Symbol = args: Array{Any}((2,)) 1: Symbol a 2: Expr head: Symbol call args: Array{Any}((3,)) 1: Symbol + 2: Symbol b 3: Int64 2
julia> expr4 = Expr(:(=),:a,Expr(:call,:+,:b,1)) # The AST using the "Expr" constructor
:(a = b + 1)
julia> symbol1 = :(a) # as for expressions
:a
julia> symbol2 = Meta.parse("a") # as for expressions
:a
julia> symbol3 = Symbol("a") # specific for symbols only
:a
julia> othsymbol = Symbol("aaa",10,"bbb")
:aaa10bbb
julia> typeof(symbol1)
Symbol
I can access any parts of my expression before evaluating it (indeed, that's what macro will do...)
julia> myASymbol = expr1.args[2].args[1]
:+
julia> expr1.args[2].args[1] = :(*)
:*
julia> b = 2 # a # error, a not defined
2
julia> eval(expr1)
4
julia> a # here now is defined and it has an object associated... 4!
4
The capability to evaluate expressions is very powerfull but due to obvious secutiry implications never evaluate expressions you aren't sure of their provenience. For example if you develop a Julia web app (e.g. using Genie.jl) never evaluate user provided expressions.
Note that evaluation of expressions happens always at global scope, even if it done inside a function:
julia> function foo() locVar = 1 expr = :(locVar + 1) return eval(expr) end # a = foo() # error locVar not defined
foo (generic function with 1 method)
To refer to the value of a variable rather than the identifier itself within an expression, interpolate the variable using the dollar sign:
julia> expr = :($a + b) # here the identifier 'a' has been replaced with its numerical value, `4`
:(4 + b)
julia> dump(expr)
Expr head: Symbol call args: Array{Any}((3,)) 1: Symbol + 2: Int64 4 3: Symbol b
julia> eval(expr)
6
julia> a = 10
10
julia> eval(expr) # no changes
6
julia> b = 100
100
julia> eval(expr) # here it change, as it is at eval time that the identifier `b` is "replaced" with its value
104
Macros
One of the best usage of macros is they allow package developers to provide a very flexible API to their package, suited for the specific needs of the package, making life easier for the users. Compare for example the API of JuMP with those of Pyomo to define model constraints !
Some examples of macros: MultiDimEquations.jl
- from:
@meq par1[d1 in DIM1, d2 in DIM2, dfix3] = par2[d1,d2]+par3[d1,d2]
- to:
[par1[d1,d2,dfix3] = par2[d1,d2]+par3[d1,d2] for d1 in DIM1, d2 in DIM2]
- from:
@pipe 10 |> foo(_,a) |> foo2(b,_,c) |> foo3(_)
- to:
foo3(foo2(b,foo(10,a),c))
Brodcasting (Base):
- from:
@. a + b * D^2
- to:
a .+ b .* D.^2
Defining a macro...
Like functions, but both the arguments and the returned output are expressions
julia> macro customLoop(controlExpr,workExpr) return quote for i in $controlExpr $workExpr end end end
@customLoop (macro with 1 method)
Invoking a macro....
julia> a = 5
5
julia> @customLoop 1:4 println(i) #note that "i" is in the macro
1 2 3 4
julia> @customLoop 1:a println(i)
1 2 3 4 5
julia> @customLoop 1:a if i > 3 println(i) end
4 5
julia> @customLoop ["apple", "orange", "banana"] println(i)
apple orange banana
julia> @customLoop ["apple", "orange", "banana"] begin print("i: "); println(i) end
i: apple i: orange i: banana
julia> @macroexpand @customLoop 1:4 println(i) # print what the macro does with the specific expressions provided
quote #= REPL[1]:3 =# for var"#433#i" = 1:4 #= REPL[1]:4 =# Main.var"Main".println(var"#433#i") #= REPL[1]:5 =# end end
String macros (aka "non-standard string literals") Invoked with the syntax xxx" ...text..."
or xxx""" ...multi-line text..."""
where xxx
is the name of the macro and the macro must be defined as macro xxx_str
. Used to perform textual modification o nthe given text, for example this print the given text on a 8 characters
julia> macro print8_str(mystr) # input here is a string, not an expression limits = collect(1:8:length(mystr)) for (i,j) in enumerate(limits) st = j en = i==length(limits) ? length(mystr) : j+7 println(mystr[st:en]) end end
@print8_str (macro with 1 method)
julia> print8"123456789012345678"
12345678 90123456 78
julia> print8"""This is a text that once printed in 8 columns with terminal will be several lines. Ok, no rammar rules relating to carriage returns are emploied here..."""
This is a text t hat once printed in 8 co lumns wi th termi nal will be seve ral line s. Ok, n o rammar rules r elating to carri age retu rns are emploied here...
While normally used to modify text, string macros are "true" macros:
julia> macro customLoop_str(str) exprs = Meta.parse(str) controlExpr,workExpr = exprs.args[1],exprs.args[2] return quote for i in $controlExpr $workExpr end end end
@customLoop_str (macro with 1 method)
julia> customLoop"""1:4; println(i)"""
1 2 3 4
Interfacing with other languages
There are 3 ways to interface Julia with programs or libraries wrote in other languages. At the lowest level, Julia allows to directly interface with C or Fortran libraries, and this means, aside using directly libraries written in C, to be able to interface with any programming language that offer also a C interface (R, Python...) Using this low level C Interface, users have created specific packages to interface many languages using a simple, Julian-way syntax. We will see these interfaces for R and Python. Finally, at the highest level, many common packages of other languages have been already "interfaced", so that the user can use the Julia Package without even knowing that this is an interface for an other package, for example SymPy.jl
is a large interface to the Python package SymPy
.
Using C libraries
Let's start by seing how to use a C library. For this example to work you will need to have the GCC compiler installed on your machine First let's write the header and source C files and write them to the disk:
julia> cheader = """ extern int get5(); extern double mySum(float x, float y); """
"extern int get5();\nextern double mySum(float x, float y);\n"
julia> csource = """ int get5(){ return 5; } double mySum(float x, float y){ return x+y; } """
"int get5(){\n return 5;\n}\n\ndouble mySum(float x, float y){\n return x+y;\n}\n"
julia> open(f->write(f,cheader),"myclib.h","w") # We open a stream to file with the "w" parameter as for "writing", and we pass the stream to the anonymous function to actually write to the stream. If this funcitons is many lines of code, consider rewriting the `open` statement using a `do` block
58
julia> open(f->write(f,csource),"myclib.c","w")
79
Now let's run the command to compile the C code we saved as shared library using gcc, a C compiler. The following example assume that GCC is installed in the machine where this example is run and available as gcc
.
julia> compilationCommand1 = `gcc -o myclib.o -c myclib.c` # the actual compilation, note the backticks used to define a command
`gcc -o myclib.o -c myclib.c`
julia> compilationCommand2 = `gcc -shared -o libmyclib.so myclib.o -lm -fPIC` # the linking into a shared library
`gcc -shared -o libmyclib.so myclib.o -lm -fPIC`
julia> run(compilationCommand1)
Process(`gcc -o myclib.o -c myclib.c`, ProcessExited(0))
julia> run(compilationCommand2)
Process(`gcc -shared -o libmyclib.so myclib.o -lm -fPIC`, ProcessExited(0))
This should have created the C library libmyclib.so
on disk. Let's gonna use it:
julia> const myclib = joinpath(@__DIR__, "libmyclib.so") # we need the full path
"/home/runner/work/SPMLJ/SPMLJ/buildedDoc/01_-_JULIA1_-_Basic_Julia_programming/libmyclib.so"
ccall arguments:
- A tuple with the funcion name to call and the library path. For both, if embedded in a variable, the variable must be set constant.
- The Julia type that map to the C type returned by the function.
int
→Int32
orInt64
(or the easy-to remmeberCint
alias)float
→Float32
(or theCfloat
alias)double
→Float64
(or theCdouble
alias)
- A tuple with the Julia types of the parameters passed to the C function
- Any other argument are the values of the parameter passed
julia> a = ccall((:get5,myclib), Int32, ())
5
julia> b = ccall((:mySum,myclib), Float64, (Float32,Float32), 2.5, 1.5)
4.0
More details on calling C or Fortran code can be obtained in the official Julia documentation.
Using Python in Julia
The "default" way to use Python code in Julia is trough the PyCall.jl package. It automatically take care of convert between Python types (including numpy arrays) and Julia types (types that can not be converted automatically are converted to the generic PyObject
type).
julia> ENV["PYTHON"] = "" # will force PyCall to download and use a "private to Julia" (conda based) version of Python. use "/path/to/python" if you want to reuse a version already installed on your system # using Pkg # Pkg.add("PyCall") # Pkg.build("PyCall")
""
julia> using PyCall
Embed short python snippets in Julia
julia> py""" def sumMyArgs (i, j): return i+j def getNthElement (vec,n): return vec[n] """
julia> a = py"sumMyArgs"(3,4) # 7 - here we call the Python object (a function) with Julia parameters
7
julia> b = py"getNthElement"([1,2,3],1) # 2 - attention to the diffferent convention for starting arrays!. Note the Julia Array ahas been converted automatically to a Python list
2
julia> d = py"getNthElement([1,$a,3],1)" # 7 - here we interpolate the Python call
7
Alternativly, use @pyinclude("pythonScript.py")
julia> pythonCode = """ def sumMyArgs (i, j, z): return i+j+z """
"def sumMyArgs (i, j, z):\n return i+j+z\n"
julia> open(f->write(f,pythonCode),"pythonScript.py","w")
40
julia> @pyinclude("pythonScript.py")
julia> a = py"sumMyArgs"(3,4,5)
12
Note thaat the 3 arguments definition of sumMyArgs
has replaced the 3-arguments one. This would now error py"sumMyArgs"(3,4)
Use Python libraries
Add a package to the local Python installation using Conda:
julia> pyimport_conda("ezodf", "ezodf", "conda-forge") # pyimport_conda(module, package, channel)
PyObject <module 'ezodf' from '/home/runner/.local/lib/python3.10/site-packages/ezodf/__init__.py'>
julia> const ez = pyimport("ezodf") # Equiv. of Python `import ezodf as ez`
PyObject <module 'ezodf' from '/home/runner/.local/lib/python3.10/site-packages/ezodf/__init__.py'>
julia> destDoc = ez.newdoc(doctype="ods", filename="anOdsSheet.ods")
PyObject <ezodf.document.PackagedDocument object at 0x7fabd9958af0>
Both ez
and destDoc
are PyObjects
for which we can access attributes and call the methods using the usual obj.method()
syntax as we would do in Python
julia> sheet = ez.Sheet("Sheet1", size=(10, 10))
PyObject <ezodf.table.Table object at 0x7fabd9958280>
julia> destDoc.sheets.append(sheet) # dcell1 = sheet[(2,3)] # This would error because the index is a tuple. Let's use directly the `get(obj,key)` function instead:
PyObject <ezodf.table.Table object at 0x7fabd9958280>
julia> dcell1 = get(sheet,(2,3)) # Equiv. of Python `dcell1 = sheet[(2,3)]`. Attention again to Python indexing from zero: this is cell "D3", not "B3" !
PyObject <ezodf.cells.Cell object at 0x7fabd99581f0>
julia> dcell1.set_value("Hello")
julia> get(sheet,"A9").set_value(10.5) # Equiv. of Python `sheet['A9'].set_value(10.5)`
julia> destDoc.backup = false
false
julia> destDoc.save()
Using Julia in Python
Installation of the Python package PyJulia
PyJulia can be installed using pip
, taking note that its name using pip
is julia
not PyJulia
: $ python3 -m pip install --user julia
We can now open a Python terminal and initialise PyJulia to work with our Julia version:
>>> import julia
>>> julia.install() # Only once to set-up in julia the julia packages required by PyJulia
If we have multiple Julia versions, we can specify the one to use in Python passing julia="/path/to/julia/binary/executable" (e.g. julia = "/home/myUser/lib/julia-1.1.0/bin/julia") to the install() function.
Running Julia libraries and code in Python
On each Python session we need to run the following code:
from julia import Julia
Julia(compiled_modules=False)
This is a workaround to the common situation when the Python interpreter is statically linked to libpython, but it will slow down the interactive experience, as it will disable Julia packages pre-compilation, and every time we will use a module for the first time, this will need to be compiled first. Other, more efficient but also more complicate, workarounds are given in the package documentation, under the Troubleshooting section.
We can now direcltly load a Julia module, including Main
, the global namespace of Julia’s interpreter, with from julia import ModuleToLoad
and access the module objects directly or using the Module.evel()
interface.
Add a Julia package...
>>> from julia import Pkg
>>> Pkg.add("BetaML")
Of course we can add a package alternatively from within Julia
"Direct" calling of Julia functions...
>>> from julia import BetaML
>>> import numpy as np
>>> model = BetaML.buildForest([[1,10],[2,12],[12,1]],["a","a","b"])
>>> predictions = BetaML.predict(model,np.array([[2,9],[13,0]]))
>>> predictions
[{'b': 0.36666666666666664, 'a': 0.6333333333333333}, {'b': 0.7333333333333333, 'a': 0.26666666666666666}]
Access using the eval()
interface...
If we are using the jl.eval() interface, the objects we use must be already known to julia. To pass objects from Python to Julia, we can import the julia Main module (the root module in julia) and assign the needed variables, e.g.
>>> X_python = [1,2,3,2,4]
>>> from julia import Main
>>> Main.X_julia = X_python
>>> Main.eval('BetaML.gini(X_julia)')
0.7199999999999999
>>> Main.eval("""
... function makeProd(x,y)
... return x*y
... end
... """
... )
>>> Main.eval("makeProd(2,3)") # or Main.makeProd(2,3)
For large scripts instead of using eval()
we can equivalently use Main.include("aJuliaScript.jl")
Using R in Julia
To use R from within Julia we use the RCall package.
julia> ENV["R_HOME"] = "*" # # will force RCall to download and use a "private to Julia" (conda based) version of R. use "/path/to/R/directory" (e.g. `/usr/lib/R`) if you want to reuse a version already installed on your system # using Pkg # Pkg.add("RCall") # Pkg.build("RCall")
"*"
julia> using RCall
julia> R""" sumMyArgs <- function(i,j) i+j getNthElement <- function(vec,n) { return(vec[n]) } """
RCall.RObject{RCall.ClosSxp} function (vec, n) { return(vec[n]) }
julia> a = rcopy(R"sumMyArgs"(3,4)) # 7 - here we call the R object (a function) with Julia parameters
7
julia> b = rcopy(R"getNthElement"([1,2,3],1)) # 1 - no differences in array indexing here
1
julia> d = rcopy(R"as.integer(getNthElement(c(1,$a,3),2))") # 7 - here we interpolate the R call
7
julia> d = convert(Int64,R"getNthElement(c(1,$a,3),2)")
7
While we don't have here the problem of different array indexing convention (both Julia and R start indexing arrays at 1), we have the "problem" that the output returned by using R"..."
is not yet an exploitable Julia object but it remains as an RObject
that we can convert with rcopy()
or explicitly with convert(T,obj)
. Also, R elements are all floats by default, so if we need an integer in Julia we need to explicitly convert it, either in R or in Julia.
If the R code is on a script, we don't have here a sort of @Rinclude macro, so let's implement it ourselves by loading the file content as a file and evaluating it using the function reval
provided by RCall
:
julia> macro Rinclude(fname) quote rCodeString = read($fname,String) reval(rCodeString) nothing end end
@Rinclude (macro with 1 method)
julia> RCode = """ sumMyArgs <- function(i, j, z) i+j+z """
"sumMyArgs <- function(i, j, z) i+j+z\n"
julia> open(f->write(f,RCode),"RScript.R","w")
37
julia> @Rinclude("RScript.R")
julia> a = rcopy(R"sumMyArgs"(3,4,5)) # 12 # a = rcopy(R"sumMyArgs"(3,4)) # error ! The 3-arguments version of `sumMyArgs` has _replaced_ the 2-arguments one
12
Using Julia in R
Installation of the R package JuliaCall
JuliaCall can be installed from CRAN:
> install.packages("JuliaCall")
> library(JuliaCall)
install_julia()
install_julia()
will force the download of R and install a private copy of julia. If you prefer to use instead an existing version of julia and having R default to download a private version only if it can't find a version already installed, use julia_setup(installJulia = TRUE)
instead of install_julia()
, eventually passing the JULIA_HOME = "/path/to/julia/binary/executable/directory"
(e.g. JULIA_HOME = "/home/myUser/lib/julia-1.7.0/bin"
) parameter to the julia_setup
call.
JuliaCall
depends for some things (like object conversion between Julia and R) from the Julia RCall
package. If we don't already have it installed in Julia, it will try to install it automatically.
Running Julia libraries and code in R
On each R session we need to run the julia_setup
function:
library(JuliaCall)
julia_setup() # If we have already downloaded a private version of Julia for R it will be retrieved automatically
We can now load a Julia module and access the module objects directly or using the Module.evel()
interface.
Add a Julia package...
> julia_eval('using Pkg; Pkg.add("BetaML")')
Of course we can add a package alternatively from within Julia
Let's load some data from R and do some work with this data in Julia:
> library(datasets)
> X <- as.matrix(sapply(iris[,1:4], as.numeric))
> y <- sapply(iris[,5], as.integer)
Calling of Julia functions with julia_call
...
With JuliaCall
, differently than PyJulia
, we can't call direclty the julia functions but we need to employ the R function julia_call("juliaFunction",args)
:
> julia_eval("using BetaML")
> yencoded <- julia_call("integerEncoder",y)
> ids <- julia_call("shuffle",1:length(y))
> Xs <- X[ids,]
> ys <- yencoded[ids]
> cOut <- julia_call("kmeans",Xs,3L) # kmeans expects K to be an integer
> y_hat <- sapply(cOut[1],as.integer)[,] # We need a vector, not a matrix
> acc <- julia_call("accuracy",y_hat,ys)
> acc
[1] 0.8933333
Access using the eval()
interface...
As alternative, we can embed Julia code directly in R using the julia_eval()
function:
> kMeansR <- julia_eval('
+ function accFromKmeans(x,k,y_true)
+ cOut = kmeans(x,Int(k))
+ acc = accuracy(cOut[1],y_true)
+ return acc
+ end
+ ')
We can then call the above function in R in one of the following three ways:
kMeansR(Xs,3,ys)
julia_assign("Xs_julia", Xs); julia_assign("ys_julia", ys); julia_eval("accFromKmeans(Xs_julia,3,ys_julia)")
julia_call("accFromKmeans",Xs,3,ys)
.
While other "convenience" functions are provided by the package, using julia_call
or julia_assign
followed by julia_eval
should suffix to accomplish most of the task we may need in Julia.
Some performance tips
Type stability
"Type stable" functions guarantee to the compiler that given a certain method (i.e. with the arguments being of a given type) the object returned by the function is also of a certain fixed type. Type stability is fundamental to allow type inference continue across the function call stack.
julia> function f1(x) # Type unstable outVector = [1,2.0,"2"] if x < 0 return outVector[1] elseif x == 0 return outVector[2] else return outVector[3] end end
f1 (generic function with 1 method)
julia> function f2(x) # Type stable outVector = [1,convert(Int64,2.0),parse(Int64,"2")] if x < 0 return outVector[1] elseif x == 0 return outVector[2] else return outVector[3] end end
f2 (generic function with 1 method)
julia> a = f1(0)
2.0
julia> b = f1(1)
"2"
julia> typeof(a)
Float64
julia> typeof(b)
String
julia> c = f2(0)
2
julia> d = f2(1)
2
julia> typeof(c)
Int64
julia> typeof(d)
Int64
julia> using BenchmarkTools
julia> @btime f1(0) # 661 ns 6 allocations
75.106 ns (5 allocations: 192 bytes) 2.0
julia> @btime f2(0) # 55 ns 1 allocations
46.519 ns (1 allocation: 80 bytes) 2
julia> @code_warntype f1(0) # Body::Any
MethodInstance for Main.var"Main".f1(::Int64) from f1(x) @ Main.var"Main" REPL[1]:1 Arguments #self#::Core.Const(Main.var"Main".f1) x::Int64 Locals outVector::Vector{Any} Body::ANY 1 ─ (outVector = Base.vect(1, 2.0, "2")) │ %2 = (x < 0)::Bool └── goto #3 if not %2 2 ─ %4 = Base.getindex(outVector, 1)::ANY └── return %4 3 ─ %6 = (x == 0)::Bool └── goto #5 if not %6 4 ─ %8 = Base.getindex(outVector, 2)::ANY └── return %8 5 ─ %10 = Base.getindex(outVector, 3)::ANY └── return %10
julia> @code_warntype f2(0) # Body::Int64
MethodInstance for Main.var"Main".f2(::Int64) from f2(x) @ Main.var"Main" REPL[2]:1 Arguments #self#::Core.Const(Main.var"Main".f2) x::Int64 Locals outVector::Vector{Int64} Body::Int64 1 ─ %1 = Main.var"Main".convert(Main.var"Main".Int64, 2.0)::Core.Const(2) │ %2 = Main.var"Main".parse(Main.var"Main".Int64, "2")::Int64 │ (outVector = Base.vect(1, %1, %2)) │ %4 = (x < 0)::Bool └── goto #3 if not %4 2 ─ %6 = Base.getindex(outVector, 1)::Int64 └── return %6 3 ─ %8 = (x == 0)::Bool └── goto #5 if not %8 4 ─ %10 = Base.getindex(outVector, 2)::Int64 └── return %10 5 ─ %12 = Base.getindex(outVector, 3)::Int64 └── return %12
While in general it is NOT important to annotate function parameters for performance, it is important to annotate struct fields with concrete types
julia> abstract type Goo end
julia> struct Foo <: Goo x::Number end
julia> struct Boo <: Goo x::Int64 end
julia> function f1(o::Goo) return o.x +2 end
f1 (generic function with 2 methods)
julia> fobj = Foo(1)
Main.var"Main".Foo(1)
julia> bobj = Boo(1)
Main.var"Main".Boo(1)
julia> @btime f1($fobj) # 17.1 ns 0 allocations
20.047 ns (0 allocations: 0 bytes) 3
julia> @btime f1($bobj) # 2.8 ns 0 allocations
2.785 ns (0 allocations: 0 bytes) 3
Here the same function under some argument types is type stable, under other argument types is not
julia> @code_warntype f1(fobj)
MethodInstance for Main.var"Main".f1(::Main.var"Main".Foo) from f1(o::Main.var"Main".Goo) @ Main.var"Main" REPL[4]:1 Arguments #self#::Core.Const(Main.var"Main".f1) o::Main.var"Main".Foo Body::ANY 1 ─ %1 = Base.getproperty(o, :x)::NUMBER │ %2 = (%1 + 2)::ANY └── return %2
julia> @code_warntype f1(bobj)
MethodInstance for Main.var"Main".f1(::Main.var"Main".Boo) from f1(o::Main.var"Main".Goo) @ Main.var"Main" REPL[4]:1 Arguments #self#::Core.Const(Main.var"Main".f1) o::Main.var"Main".Boo Body::Int64 1 ─ %1 = Base.getproperty(o, :x)::Int64 │ %2 = (%1 + 2)::Int64 └── return %2
Avoid (non-constant) global variables
julia> g = 2
2
julia> const cg = 1 # we can't change the _type_ of the object binded to a constant variable
1
julia> cg = 2 # we can rebind to an other object of the same type, but we get a warning # cg = 2.5 # this would error !
WARNING: redefinition of constant Main.cg. This may fail, cause incorrect answers, or produce other errors. 2
julia> f1(x,y) = x+y
f1 (generic function with 3 methods)
julia> f2(x) = x + g
f2 (generic function with 1 method)
julia> f3(x) = x + cg
f3 (generic function with 1 method)
julia> @btime f1(3,2)
1.552 ns (0 allocations: 0 bytes) 5
julia> @btime f2(3) # 22 times slower !!!
20.962 ns (0 allocations: 0 bytes) 5
julia> @btime f3(3) # as f1
1.552 ns (0 allocations: 0 bytes) 5
Loop arrays with the inner loop by rows
Julia is column mayor (differently than Python) so arrays of bits types are contiguous in memory across the different rows of the same column
julia> a = rand(1000,1000);
julia> function f1(x) (R,C) = size(x) cum = 0.0 for r in 1:R for c in 1:C cum += x[r,c] end end return cum end
f1 (generic function with 3 methods)
julia> function f2(x) (R,C) = size(x) cum = 0.0 for c in 1:C for r in 1:R cum += x[r,c] end end return cum end
f2 (generic function with 1 method)
julia> @btime f1($a) # 2.3 ms 0 allocations
926.449 μs (0 allocations: 0 bytes) 500220.6882968228
julia> @btime f2($a) # 1.3 ms 0 allocations
925.564 μs (0 allocations: 0 bytes) 500220.6882968189
Use low-level optimisation when possible
julia> function f1(x) s = 0.0 for i in 1:length(x) s += i * x[i] end return s end
f1 (generic function with 3 methods)
julia> function f2(x) s = 0.0 for i in 1:length(x) @inbounds s += i * x[i] # remove bound checks end return s end
f2 (generic function with 1 method)
julia> function f3(x) s = 0.0 @simd for i in 1:length(x) # tell compiler it is allowed to run the loop in whatever order, allowing in-thread paralllelism of modern CPUs s += i * x[i] end return s end
f3 (generic function with 1 method)
julia> x = rand(10000);
julia> @btime f1($x)
9.267 μs (0 allocations: 0 bytes) 2.5231610262164485e7
julia> @btime f2($x)
9.257 μs (0 allocations: 0 bytes) 2.5231610262164485e7
julia> @btime f3($x)
5.559 μs (0 allocations: 0 bytes) 2.5231610262164526e7
julia> X = rand(100,20);
julia> function f1(x) s = 0.0 for i in 1:size(x,1) s += sum(x[i,:]) end return s end
f1 (generic function with 3 methods)
julia> function f2(x) s = 0.0 @views for i in 1:size(x,1) s += sum(x[i,:]) # the slice operator copy the data.. the views macro force to have instead to have a view (reference) end return s end
f2 (generic function with 1 method)
julia> @btime f1($X)
4.225 μs (100 allocations: 21.88 KiB) 1013.9993057955776
julia> @btime f2($X)
1.314 μs (0 allocations: 0 bytes) 1013.9993057955776
Attention that while the @views
macro "save time" by not copying the data, the resulting array has a pretty messy layout. If you need to use it for many subsequent operations it may be more efficient to "pay" the copy cost once and then have an array with a nicelly continuous block of memory..
julia> function f1(x,y) if x+y > 100 return x + y + 2 else return x + 1 end end
f1 (generic function with 3 methods)
julia> @inline function f2(x,y) # the function is "inlined", its whole definition copied at each calling place rather than being called if x+y > 100 return x + y + 2 else return x + 1 end end
f2 (generic function with 2 methods)
julia> function f3(y) s = 0.0 for i in 2:y s += f1(i,i-1) s += f1(i,i) end return s end
f3 (generic function with 1 method)
julia> function f4(y) s = 0.0 for i in 2:y s += f2(i,i-1) s += f2(i,i) end return s end
f4 (generic function with 1 method)
julia> x = 1000
1000
julia> @btime f3($x)
1.851 μs (0 allocations: 0 bytes) 2.002396e6
julia> @btime f4($x)
1.834 μs (0 allocations: 0 bytes) 2.002396e6
But attention! Not always a good idea:
julia> function f3(y) s = 0.0 for i in 2:y s += sum(f1(a,a-1) for a in 2:i) end return s end
f3 (generic function with 1 method)
julia> function f4(y) s = 0.0 for i in 2:y s += sum(f2(a,a-1) for a in 2:i) end return s end
f4 (generic function with 1 method)
julia> x = 1000
1000
julia> @btime f3($x)
86.201 μs (0 allocations: 0 bytes) 3.3359915e8
julia> @btime f4($x)
86.061 μs (0 allocations: 0 bytes) 3.3359915e8
Note that the Julia compiles already inline small functions automatically when it thinks it will improve performances
Profiling the code to discover bootlenecks
We already see @btime
and @benchmark
from the package BenchmarkTools.jl Remember to quote the global variables used as parameter of your function with the dollar sign to have accurate benchmarking of the function execution. Julia provide the macro @time
but we should run on a second call to a given function (with a certain parameter types) or it will include compilation time in its output:
julia> function fb(x) out = Union{Int64,Float64}[1,2.0,3] push!(out,4) if x > 10 if ( x > 100) return [out[1],out[2]] |> sum else return [out[2],out[3]] |> sum end else return [out[1],out[3]] |> sum end end
fb (generic function with 1 method)
julia> @time fb(3)
0.000006 seconds (3 allocations: 256 bytes) 4
julia> @time fb(3)
0.000005 seconds (3 allocations: 256 bytes) 4
We can use @profile function(x,y)
to use a sample-based profiling
julia> using Profile # in the stdlib
julia> function foo(n) a = rand(n,n) b = a + a c = b * b return c end
foo (generic function with 2 methods)
julia> @profile (for i = 1:100; foo(1000); end) # too fast otherwise
julia> Profile.print() # on my pc: 243 rand, 174 the sum, 439 the matrix product
Overhead ╎ [+additional indent] Count File:Line; Function ========================================================= ╎242 @Base/client.jl:552; _start() ╎ 242 @Base/client.jl:318; exec_options(opts::Base.JLOptions) ╎ 242 @Base/Base.jl:495; include(mod::Module, _path::String) ╎ 242 @Base/loading.jl:2136; _include(mapexpr::Function, mod::Module, _… ╎ 242 @Base/loading.jl:2076; include_string(mapexpr::typeof(identity),… ╎ 242 @Base/boot.jl:385; eval ╎ ╎ 242 …ter/src/makedocs.jl:265; kwcall(::@NamedTuple{sitename::Strin… ╎ ╎ 242 …er/src/makedocs.jl:271; #makedocs#87 ╎ ╎ 242 @Base/file.jl:112; cd(f::Documenter.var"#88#90"{Documenter.D… ╎ ╎ 242 …r/src/makedocs.jl:271; #88 ╎ ╎ 242 @Base/env.jl:257; withenv(::Documenter.var"#89#91"{Documen… ╎ ╎ ╎ 242 …r/src/makedocs.jl:272; #89 ╎ ╎ ╎ 242 …ies/Selectors.jl:170; dispatch(::Type{Documenter.Builde… ╎ ╎ ╎ 242 …lder_pipeline.jl:222; runner(::Type{Documenter.Builder… ╎ ╎ ╎ 242 …nder_pipeline.jl:22; expand(doc::Documenter.Document) ╎ ╎ ╎ 242 …es/Selectors.jl:170; dispatch(::Type{Documenter.Expa… ╎ ╎ ╎ ╎ 242 …der_pipeline.jl:846; runner(::Type{Documenter.Expan… ╎ ╎ ╎ ╎ 242 …c/IOCapture.jl:100; kwcall(::@NamedTuple{rethrow::… ╎ ╎ ╎ ╎ 242 …c/IOCapture.jl:167; capture(f::Documenter.var"#63… ╎ ╎ ╎ ╎ 242 …se/logging.jl:627; with_logger ╎ ╎ ╎ ╎ 242 …se/logging.jl:515; with_logstate(f::Function, l… ╎ ╎ ╎ ╎ ╎ 242 …/IOCapture.jl:170; (::IOCapture.var"#5#9"{Data… ╎ ╎ ╎ ╎ ╎ 242 …_pipeline.jl:847; (::Documenter.var"#63#65"{D… ╎ ╎ ╎ ╎ ╎ 242 @Base/file.jl:112; cd(f::Documenter.var"#64#6… ╎ ╎ ╎ ╎ ╎ 242 …pipeline.jl:848; #64 ╎ ╎ ╎ ╎ ╎ 242 …ase/boot.jl:385; eval ╎ ╎ ╎ ╎ ╎ ╎ 242 REPL[3]:1; top-level scope ╎ ╎ ╎ ╎ ╎ ╎ 242 …/Profile.jl:44; macro expansion ╎ ╎ ╎ ╎ ╎ ╎ 242 REPL[3]:1; macro expansion ╎ ╎ ╎ ╎ ╎ ╎ 107 REPL[2]:2; foo(n::Int64) ╎ ╎ ╎ ╎ ╎ ╎ 107 …c/Random.jl:279; rand ╎ ╎ ╎ ╎ ╎ ╎ ╎ 107 …c/Random.jl:291; rand ╎ ╎ ╎ ╎ ╎ ╎ ╎ 107 …c/Random.jl:290; rand ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …ase/boot.jl:494; Array ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …ase/boot.jl:487; Array 3╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …ase/boot.jl:479; Array ╎ ╎ ╎ ╎ ╎ ╎ ╎ 104 …c/Random.jl:269; rand! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 104 …hiroSimd.jl:293; rand! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 104 …hiroSimd.jl:142; xoshiro_bulk ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …hiroSimd.jl:249; xoshiro_bulk_s… 4╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …hiroSimd.jl:71; _plus ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:252; xoshiro_bulk_s… 1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:77; _xor ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 …hiroSimd.jl:253; xoshiro_bulk_s… 2╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 …hiroSimd.jl:77; _xor ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:254; xoshiro_bulk_s… 1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:77; _xor ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:255; xoshiro_bulk_s… 1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …hiroSimd.jl:77; _xor ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 95 …hiroSimd.jl:257; xoshiro_bulk_s… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 29 …/pointer.jl:146; unsafe_store! 29╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 29 …/pointer.jl:146; unsafe_store! 66╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 66 …hiroSimd.jl:108; _bits2float ╎ ╎ ╎ ╎ ╎ ╎ 68 REPL[2]:3; foo(n::Int64) ╎ ╎ ╎ ╎ ╎ ╎ 68 …rraymath.jl:16; +(A::Matrix{Float64},… ╎ ╎ ╎ ╎ ╎ ╎ ╎ 68 …roadcast.jl:892; broadcast_preservin… ╎ ╎ ╎ ╎ ╎ ╎ ╎ 68 …roadcast.jl:903; materialize ╎ ╎ ╎ ╎ ╎ ╎ ╎ 68 …roadcast.jl:928; copy ╎ ╎ ╎ ╎ ╎ ╎ ╎ 64 …roadcast.jl:956; copyto! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 64 …roadcast.jl:1003; copyto! ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 63 …simdloop.jl:77; macro expansion ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 63 …roadcast.jl:1004; macro expans… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 18 …roadcast.jl:636; getindex ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 15 …roadcast.jl:681; _broadcast_… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 15 …roadcast.jl:705; _getindex ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ +1 15 …roadcast.jl:675; _broadcast… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ +2 15 …ensional.jl:696; getindex 15╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ +3 15 …sentials.jl:14; getindex ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …roadcast.jl:682; _broadcast_… ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 3 …roadcast.jl:709; _broadcast… 3╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ +1 3 …se/float.jl:409; + ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 45 …ensional.jl:698; setindex! 45╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 45 …se/array.jl:1024; setindex! 1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 …simdloop.jl:84; macro expansion ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …roadcast.jl:223; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …roadcast.jl:224; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …actarray.jl:876; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …actarray.jl:877; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …ase/boot.jl:494; Array ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …ase/boot.jl:487; Array 4╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 4 …ase/boot.jl:479; Array ╎ ╎ ╎ ╎ ╎ ╎ 67 REPL[2]:4; foo(n::Int64) ╎ ╎ ╎ ╎ ╎ ╎ 67 …c/matmul.jl:113; *(A::Matrix{Float64}… ╎ ╎ ╎ ╎ ╎ ╎ ╎ 11 …se/array.jl:420; similar ╎ ╎ ╎ ╎ ╎ ╎ ╎ 11 …ase/boot.jl:487; Array 11╎ ╎ ╎ ╎ ╎ ╎ ╎ 11 …ase/boot.jl:479; Array ╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …c/matmul.jl:237; mul! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …c/matmul.jl:263; mul! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …c/matmul.jl:352; generic_matmatmul! ╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …c/matmul.jl:605; gemm_wrapper!(C:… 56╎ ╎ ╎ ╎ ╎ ╎ ╎ 56 …src/blas.jl:1524; gemm!(transA::… ╎6114 @Base/task.jl:682; task_done_hook(t::Task) ╎ 6114 @Base/task.jl:1008; wait() 6114╎ 6114 @Base/task.jl:999; poptask(W::Base.IntrusiveLinkedListSynchronized… Total snapshots: 10190. Utilization: 40% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.
julia> Profile.clear()
Introspection and debugging
To discover problems on the code more in general we can use several introspection functions that Julia provide us (some of which we have already saw):
julia> # @less rand(3) # Show the source code of the specific method invoked - use `q` to quit # @edit rand(3) # Like @loss but it opens the source code in an editor methods(foo)
# 2 methods for generic function "foo" from Main.var"Main": [1] foo() @ REPL[1]:1 [2] foo(n) @ REPL[2]:1
julia> @which foo(2) # which method am I using when I call foo with an integer?
foo(n) @ Main.var"Main" REPL[2]:1
julia> typeof(a)
Matrix{Float64} (alias for Array{Float64, 2})
julia> eltype(a)
Float64
julia> isa(1.2, Number)
true
julia> fieldnames(Foo)
(:x,)
julia> dump(fobj)
Main.var"Main".Foo x: Int64 1
julia> names(Main, all=false) # available (e.g. exported) identifiers of a given module
13-element Vector{Symbol}: :Base :Core :LESSONS_ROOTDIR :LESSONS_ROOTDIR_TMP :LESSONS_SUBDIR :MAKE_PDF :Main :include_sandbox :link_example :literate_directory :makeList :preprocess :rdir
julia> sizeof(2) # bytes
8
julia> typemin(Int64)
-9223372036854775808
julia> typemax(Int64)
9223372036854775807
julia> bitstring(2)
"0000000000000000000000000000000000000000000000000000000000000010"
Various low-level interpretation of an expression
julia> @code_native foo(3)
.text .file "foo" .globl julia_foo_38008 # -- Begin function julia_foo_38008 .p2align 4, 0x90 .type julia_foo_38008,@function julia_foo_38008: # @julia_foo_38008 ; ┌ @ REPL[2]:1 within `foo` # %bb.0: # %top push rbp mov rbp, rsp push r15 push r14 push r12 push rbx sub rsp, 32 vxorps xmm0, xmm0, xmm0 vmovaps xmmword ptr [rbp - 64], xmm0 mov qword ptr [rbp - 48], 0 #APP mov rax, qword ptr fs:[0] #NO_APP mov rdx, rdi lea rcx, [rbp - 64] movabs rdi, 140375528943696 ; │ @ REPL[2]:2 within `foo` ; │┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:279 within `rand` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:291 @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:290 ; ││┌ @ boot.jl:494 within `Array` @ boot.jl:487 @ boot.jl:479 mov rsi, rdx mov r12, qword ptr [rax - 8] mov qword ptr [rbp - 64], 4 mov rax, qword ptr [r12] mov qword ptr [rbp - 56], rax movabs rax, 140376665409696 mov qword ptr [r12], rcx call rax ; ││└ ; ││┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:269 within `rand!` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:293 ; │││┌ @ essentials.jl:10 within `length` mov rbx, qword ptr [rax + 8] ; │││└ ; │││┌ @ abstractarray.jl:1237 within `pointer` ; ││││┌ @ pointer.jl:65 within `unsafe_convert` mov r14, qword ptr [rax] ; ││└└└ ; ││┌ @ boot.jl:494 within `Array` @ boot.jl:487 @ boot.jl:479 mov r15, rax ; ││└ ; ││┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:269 within `rand!` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:293 ; │││┌ @ int.jl:88 within `*` shl rbx, 3 ; │││└ ; │││┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:141 within `xoshiro_bulk` ; ││││┌ @ operators.jl:425 within `>=` ; │││││┌ @ int.jl:514 within `<=` cmp rbx, 64 ; ││││└└ jl .LBB0_2 # %bb.1: # %L11 ; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:142 within `xoshiro_bulk` movabs rax, offset j_xoshiro_bulk_simd_38010 mov qword ptr [rbp - 48], r15 mov rdi, r14 mov rsi, rbx call rax ; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:143 within `xoshiro_bulk` ; ││││┌ @ int.jl:86 within `-` sub rbx, rax ; ││││└ ; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:144 within `xoshiro_bulk` ; ││││┌ @ pointer.jl:282 within `+` add r14, rax .LBB0_2: # %L17 ; ││││└ ; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:146 within `xoshiro_bulk` ; ││││┌ @ operators.jl:276 within `!=` ; │││││┌ @ promotion.jl:521 within `==` test rbx, rbx ; ││││└└ je .LBB0_4 # %bb.3: # %L22 ; ││││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:147 within `xoshiro_bulk` movabs rax, offset j_xoshiro_bulk_nosimd_38011 mov qword ptr [rbp - 48], r15 mov rdi, r14 mov rsi, rbx call rax .LBB0_4: # %L24 ; │└└└ ; │ @ REPL[2]:3 within `foo` movabs rax, offset "j_+_38012" mov qword ptr [rbp - 48], r15 mov rdi, r15 mov rsi, r15 call rax ; │ @ REPL[2]:4 within `foo` movabs rcx, offset "j_*_38013" mov qword ptr [rbp - 48], rax mov rdi, rax mov rsi, rax call rcx mov rcx, qword ptr [rbp - 56] mov qword ptr [r12], rcx ; │ @ REPL[2]:5 within `foo` add rsp, 32 pop rbx pop r12 pop r14 pop r15 pop rbp ret .Lfunc_end0: .size julia_foo_38008, .Lfunc_end0-julia_foo_38008 ; └ # -- End function .section ".note.GNU-stack","",@progbits
julia> @code_llvm foo(3)
; @ REPL[2]:1 within `foo` define nonnull {}* @julia_foo_38056(i64 signext %0) #0 { top: %gcframe3 = alloca [3 x {}*], align 16 %gcframe3.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 0 %1 = bitcast [3 x {}*]* %gcframe3 to i8* call void @llvm.memset.p0i8.i64(i8* align 16 %1, i8 0, i64 24, i1 true) %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #7 %tls_ppgcstack = getelementptr i8, i8* %thread_ptr, i64 -8 %2 = bitcast i8* %tls_ppgcstack to {}**** %tls_pgcstack = load {}***, {}**** %2, align 8 ; @ REPL[2]:2 within `foo` ; ┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:279 within `rand` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:291 @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:290 ; │┌ @ boot.jl:494 within `Array` @ boot.jl:487 @ boot.jl:479 %3 = bitcast [3 x {}*]* %gcframe3 to i64* store i64 4, i64* %3, align 16 %4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 1 %5 = bitcast {}** %4 to {}*** %6 = load {}**, {}*** %tls_pgcstack, align 8 store {}** %6, {}*** %5, align 8 %7 = bitcast {}*** %tls_pgcstack to {}*** store {}** %gcframe3.sub, {}*** %7, align 8 %8 = call nonnull {}* inttoptr (i64 140376665409696 to {}* ({}*, i64, i64)*)({}* inttoptr (i64 140375528943696 to {}*), i64 %0, i64 %0) ; │└ ; │┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/Random.jl:269 within `rand!` @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:293 ; ││┌ @ abstractarray.jl:1237 within `pointer` ; │││┌ @ pointer.jl:65 within `unsafe_convert` %9 = bitcast {}* %8 to i8** %arrayptr = load i8*, i8** %9, align 8 %10 = ptrtoint i8* %arrayptr to i64 ; ││└└ ; ││┌ @ essentials.jl:10 within `length` %11 = bitcast {}* %8 to { i8*, i64, i16, i16, i32 }* %arraylen_ptr = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %11, i64 0, i32 1 %arraylen = load i64, i64* %arraylen_ptr, align 8 ; ││└ ; ││┌ @ int.jl:88 within `*` %12 = shl i64 %arraylen, 3 ; ││└ ; ││┌ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:141 within `xoshiro_bulk` ; │││┌ @ operators.jl:425 within `>=` ; ││││┌ @ int.jl:514 within `<=` %13 = icmp slt i64 %12, 64 ; │││└└ br i1 %13, label %L17, label %L11 L11: ; preds = %top %14 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 2 store {}* %8, {}** %14, align 16 ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:142 within `xoshiro_bulk` %15 = call i64 @j_xoshiro_bulk_simd_38058(i64 zeroext %10, i64 signext %12) ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:143 within `xoshiro_bulk` ; │││┌ @ int.jl:86 within `-` %16 = sub i64 %12, %15 ; │││└ ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:144 within `xoshiro_bulk` ; │││┌ @ pointer.jl:282 within `+` %17 = getelementptr i8, i8* %arrayptr, i64 %15 %18 = ptrtoint i8* %17 to i64 ; ││││┌ @ essentials.jl:522 within `oftype` ; │││││┌ @ pointer.jl:26 within `convert` ; ││││││┌ @ boot.jl:800 within `Ptr` br label %L17 L17: ; preds = %L11, %top %value_phi = phi i64 [ %18, %L11 ], [ %10, %top ] %value_phi1 = phi i64 [ %16, %L11 ], [ %12, %top ] ; │││└└└└ ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:146 within `xoshiro_bulk` ; │││┌ @ operators.jl:276 within `!=` ; ││││┌ @ promotion.jl:521 within `==` %.not = icmp eq i64 %value_phi1, 0 ; │││└└ br i1 %.not, label %L24, label %L22 L22: ; preds = %L17 %19 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 2 store {}* %8, {}** %19, align 16 ; │││ @ /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Random/src/XoshiroSimd.jl:147 within `xoshiro_bulk` call void @j_xoshiro_bulk_nosimd_38059(i64 zeroext %value_phi, i64 signext %value_phi1) br label %L24 L24: ; preds = %L22, %L17 %20 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe3, i64 0, i64 2 store {}* %8, {}** %20, align 16 ; └└└ ; @ REPL[2]:3 within `foo` %21 = call nonnull {}* @"j_+_38060"({}* nonnull %8, {}* nonnull %8) store {}* %21, {}** %20, align 16 ; @ REPL[2]:4 within `foo` %22 = call nonnull {}* @"j_*_38061"({}* nonnull %21, {}* nonnull %21) %23 = load {}*, {}** %4, align 8 %24 = bitcast {}*** %tls_pgcstack to {}** store {}* %23, {}** %24, align 8 ; @ REPL[2]:5 within `foo` ret {}* %22 }
julia> @code_typed foo(3)
CodeInfo( 1 ── %1 = $(Expr(:foreigncall, :(:jl_alloc_array_2d), Matrix{Float64}, svec(Any, Int64, Int64), 0, :(:ccall), Matrix{Float64}, Core.Argument(2), Core.Argument(2), Core.Argument(2), Core.Argument(2)))::Matrix{Float64} │ %2 = $(Expr(:gc_preserve_begin, :(%1))) │ %3 = $(Expr(:foreigncall, :(:jl_array_ptr), Ptr{Float64}, svec(Any), 0, :(:ccall), :(%1)))::Ptr{Float64} │ %4 = Base.bitcast(Ptr{UInt8}, %3)::Ptr{UInt8} │ %5 = Base.arraylen(%1)::Int64 │ %6 = Base.mul_int(%5, 8)::Int64 │ %7 = Random.XoshiroSimd.Float64::Type{Float64} │ %8 = Random.XoshiroSimd._bits2float::typeof(Random.XoshiroSimd._bits2float) │ %9 = Base.sle_int(64, %6)::Bool └─── goto #3 if not %9 2 ── %11 = invoke Random.XoshiroSimd.xoshiro_bulk_simd($(QuoteNode(Random.TaskLocalRNG()))::Random.TaskLocalRNG, %4::Ptr{UInt8}, %6::Int64, %7::Type{Float64}, $(QuoteNode(Val{8}()))::Val{8}, %8::typeof(Random.XoshiroSimd._bits2float))::Int64 │ %12 = Base.sub_int(%6, %11)::Int64 │ %13 = Core.bitcast(Core.UInt, %4)::UInt64 │ %14 = Base.bitcast(UInt64, %11)::UInt64 │ %15 = Base.add_ptr(%13, %14)::UInt64 └─── %16 = Core.bitcast(Ptr{UInt8}, %15)::Ptr{UInt8} 3 ┄─ %17 = φ (#2 => %16, #1 => %4)::Ptr{UInt8} │ %18 = φ (#2 => %12, #1 => %6)::Int64 │ %19 = (%18 === 0)::Bool │ %20 = Base.not_int(%19)::Bool └─── goto #5 if not %20 4 ── invoke Random.XoshiroSimd.xoshiro_bulk_nosimd($(QuoteNode(Random.TaskLocalRNG()))::Random.TaskLocalRNG, %17::Ptr{UInt8}, %18::Int64, %7::Type{Float64}, %8::typeof(Random.XoshiroSimd._bits2float))::Any 5 ┄─ goto #6 6 ── $(Expr(:gc_preserve_end, :(%2))) └─── goto #7 7 ── goto #8 8 ── goto #9 9 ── goto #10 10 ─ goto #11 11 ─ %30 = invoke Main.var"Main".:+(%1::Matrix{Float64}, %1::Matrix{Float64})::Matrix{Float64} │ %31 = invoke Main.var"Main".:*(%30::Matrix{Float64}, %30::Matrix{Float64})::Matrix{Float64} └─── return %31 ) => Matrix{Float64}
julia> @code_lowered foo(3)
CodeInfo( 1 ─ a = Main.var"Main".rand(n, n) │ b = a + a │ c = b * b └── return c )
We can use a debugger, like e.g. the one integrated in Juno or VSCode. Graphical debuggers allow to put a breakpoint on some specific line of code, run the code in debug mode (yes, it will be slower), let the program arrive to the breakpoint and inspect the state of the system at that point of the code, including local variables. In Julia we can also change the program interactively ! Other typycal functions are running a single line, running inside a function, running until the current function return, ecc..
Runtime exceptions
As many (all?) languages, Julia when "finds" an error issues an exception, that if it is not caugth at higher level in the call stack (i.e. recognised and handled) lead to an error and return to the prompt or termination of the script (and rarely with the Julia process crashing altogether).
The idea is that we try some potentially dangerous code and if some error is raised in this code we catch it and handle it.
julia> function customIndex(vect,idx;toReturn=0) try vect[idx] catch e if isa(e,BoundsError) return toReturn end rethrow(e) end end
customIndex (generic function with 1 method)
julia> a = [1,2,3] # a[4] # Error ("BoundsError" to be precise)
3-element Vector{Int64}: 1 2 3
julia> customIndex(a,4)
0
Note that handling exceptions is computationally expensive, so do not use exceptions in place of conditional statements
Parallel computation
Finally one note on parallel computation. We see only some basic usage of multithreading and multiprocesses in this course, but with Julia it is relativelly easy to parallelise the code either using multiple threads or multiple processes. What's the difference ?
- multithread
- advantages: computationally "cheap" to create (the memory is shared)
- disadvantages: limited to the number of cores within a CPU, require attention in not overwriting the same memory or doing it at the intended order ("data race"), we can't add threads dynamically (within a script)
- multiprocesses
- advantages: unlimited number, can be run in different CPUs of the same machine or differnet nodes of a cluster, even using SSH on different networks, we can add processes from within our code with
addprocs(nToAdd)
- disadvantages: the memory being copied (each process will have its own memory) are computationally expensive (you need to have a gain higher than the cost on setting a new process) and require attention to select which memory a given process will need to "bring with it" for its functionality
- advantages: unlimited number, can be run in different CPUs of the same machine or differnet nodes of a cluster, even using SSH on different networks, we can add processes from within our code with
Note that if you are reading this document on the github pages, this script is compiled using GitHub actions where a single thread and process are available, so you will not see performance gains.
Multithreading
It is not possible to add threads dinamically, either we have to start Julia with the parameter -t
(e.g. -t 8
or -t auto
) in the command line or use the VSCode Julia externsion setting Julia: Num Threads
julia> function inner(x) s = 0.0 for i in 1:x for j in 1:i if j%2 == 0 s += j else s -= j end end end return s end
inner (generic function with 1 method)
julia> function parentSingleThread(x,y) toTest = x .+ (1:y) out = zeros(length(toTest)) for i in 1:length(toTest) out[i] = inner(toTest[i]) end return out end
parentSingleThread (generic function with 1 method)
julia> function parentThreaded(x,y) toTest = x .+ (1:y) out = zeros(length(toTest)) Threads.@threads for i in 1:length(toTest) out[i] = inner(toTest[i]) end return out end
parentThreaded (generic function with 1 method)
julia> x = 100
100
julia> y = 20
20
julia> str = parentSingleThread(x,y)
20-element Vector{Float64}: -51.0 0.0 -52.0 0.0 -53.0 0.0 -54.0 0.0 -55.0 0.0 -56.0 0.0 -57.0 0.0 -58.0 0.0 -59.0 0.0 -60.0 0.0
julia> mtr = parentThreaded(x,y)
20-element Vector{Float64}: -51.0 0.0 -52.0 0.0 -53.0 0.0 -54.0 0.0 -55.0 0.0 -56.0 0.0 -57.0 0.0 -58.0 0.0 -59.0 0.0 -60.0 0.0
julia> str == mtr # true
true
julia> Threads.nthreads() # 4 in my case
4
julia> Threads.threadid()
1
julia> @btime parentSingleThread(100,20) # 140 μs on my machine
114.285 μs (1 allocation: 224 bytes) 20-element Vector{Float64}: -51.0 0.0 -52.0 0.0 -53.0 0.0 -54.0 0.0 -55.0 0.0 -56.0 0.0 -57.0 0.0 -58.0 0.0 -59.0 0.0 -60.0 0.0
julia> @btime parentThreaded(100,20) # 47 μs
41.178 μs (22 allocations: 2.42 KiB) 20-element Vector{Float64}: -51.0 0.0 -52.0 0.0 -53.0 0.0 -54.0 0.0 -55.0 0.0 -56.0 0.0 -57.0 0.0 -58.0 0.0 -59.0 0.0 -60.0 0.0
Multiprocessing
NOTE The code on multiprocessing is commented out (not executed, so you don't see the output) as GitHub actions (that are used to run this code and render the web pages you are reading) have problems running multi-process functions:
using Distributed # from the Standard Library
addprocs(3) # 2,3,4
The first process is considered a sort of "master" process, the other one are the "workers" We can add processes on other machines by providing the SSH connection details directly in the addprocs()
call (Julia must be installed on that machines as well) We can alternativly start Julia directly with n worker processes using the armument -p n
in the command line.
println("Worker pids: ")
for pid in workers() # return a vector to the pids
println(pid) # 2,3,4
end
rmprocs(workers()[1]) # remove process pid 2
println("Worker pids: ")
for pid in workers()
println(pid) # 3,4 are left
end
@everywhere begin using Distributed end # this is needed only in GitHub action
@everywhere println(myid()) # 4,3
Run heavy tasks in parallel
using Distributed, BenchmarkTools
a = rand(1:35,100)
@everywhere function fib(n)
if n == 0 return 0 end
if n == 1 return 1 end
return fib(n-1) + fib(n-2)
end
The macro @everywhere
make available the given function (or functions with @everywhere begin [shared function definitions] end
or @everywhere include("sharedCode.jl")
) to all the current workers.
result = map(fib,a)
The pmap function ("parallel" map) automatically pick up the free processes, assign them the job prom the "input" array and merge the results in the returned array. Note that the order is preserved:
result2 = pmap(fib,a)
result == result2
@btime map(fib,$a) # serialised: median time: 514 ms 1 allocations
@btime pmap(fib,$a) # parallelised: median time: 265 ms 4220 allocations # the memory of `a` need to be copied to all processes
Divide and Conquer
Rather than having a "heavy operation" and being interested in the individual results, here we have a "light" operation and we want to aggregate the results of the various computations using some aggreagation function. We can then use @distributed (aggregationfunction) for [forConditions]
macro:
using Distributed, BenchmarkTools
function f(n) # our single-process benchmark
s = 0.0
for i = 1:n
s += i/2
end
return s
end
function pf(n)
s = @distributed (+) for i = 1:n # aggregate using sum on variable s
i/2 # the last element of the for cycle is used by the aggregator
end
return s
end
@btime f(10000000) # median time: 11.1 ms 0 allocations
@btime pf(10000000) # median time: 5.7 ms 145 allocations
Note that also in this case the improvement is less than proportional with the number of processes we add
Details on parallel comutation can be found on the official documentation, including information to run nativelly Julia on GPUs or TPUs.
This page was generated using Literate.jl.