Quiz 1.13: Performances, parallel computation

Q1: Julia code optimisation

Which of the following strategies would likely improve the performance of Julia's code ?

Avoid using constant global variables within a function body

Avoid using non-constant global variables within a function body

Loop matrices first by rows and then by columns

If the size of an output array is known, preallocate its size and modify each item one at a time rather than start from an empty array and add elements one at a time

Avoid modifying the objects passed as arguments in the body of the function

Annotate function arguments with their type

Write the function in a way that given the types of the arguments, the result is of a fixed type

Annotate all the fields of a type with their relevant concrete types

None of the strategies in this list would likely increase performances

RESOLUTION

Concerning the wrong answers: (1) Constant global variables (i.e. whose identifier can not be rebound to another type of object) can be efficiently used within functions; (2) Being Julia "column mayor", it is more efficient to have the most inner loop along the rows of the same column, so to read (or write) continuous parts in memory; (3) While stylistically modify function arguments in-place should be limited (as it may lead to hard to find bugs), from a performance point of view it is absolutely fine. If it avoids allocating memory for temporary objects, it may even be faster; (4) The same for the type annotation of function arguments: it may be good to avoid bugs, or required for multiple dispatch, but in general it doesn't improve performances (except in very rare cases when it may help the compiler's type inference))

The correct answers are:

Avoid using non-constant global variables within a function body
If the size of an output array is known, preallocate its size and modify each item one at a time rather than start from an empty array and add elements one at a time
Write the function in a way that given the types of the arguments, the result is of a fixed type
Annotate all the fields of a type with their relevant concrete types

Q2: Type stability

Given the following Julia snippet:

a = Any["a",1,2.5]
b = [2,3]
foo(x) = eltype(x) == Int64 ? 1       : nothing
boo(x) = eltype(x) == Int64 ? x[1]    : nothing
goo(x) = eltype(x) == Int64 ? nothing : 1
zoo(x) = eltype(x) == Int64 ? nothing : x[1]

Which of the following sentences are correct?

RESOLUTION

The only "type instable" function call is zoo(a) because in such a case the compiler doesn't know "a priori" the type of the returned output but has to look up the actual type of the first element of a at runtime. Not only the function zoo will not specialise and be slow, but also if there are other functions that depend on the returned value of zoo, the compiler will need extra steps - at run time - to lookup the correct method of these other functions to apply, further slowing down the code. Even if in this example we are working with a single argument, you can "feel" how it is the multiple dispatch together with JIT that really makes the difference in Julia: thanks to multiple dispatch we can have multiple JIT compiled versions of the function that specialise to the given inputs. So, for example, when we call bool([2,3]), the JIT compiler will compile a version of boo specific for Vector{Int64}, and hence resolve the condition at compilation time and compile the function returning directly x[1] and hence knowing it will be an Int64 still at compilation time.

The correct answers are:

foo(a) is type stable
foo(b) is type stable
boo(a) is type stable
boo(b) is type stable
goo(a) is type stable
goo(b) is type stable
zoo(b) is type stable

Q3: Parallel computation

Which of the following statements regarding parallel computation are correct ?{

In Julia multiple processes can be run across different machines on a network, across a machine with multiple CPUS or on a single-CPU machine with multiple cores

Muiltiprocessing doesn't require to allocate memory to each process

Multithreading is limited to the number of cores available on the CPU(s) where the Julia process runs

Both multithreading and multiprocessing scale less than linearly with the resources employed (e.g. using 10 threads, or 10 processes, will take more than 1/10th of the time than a single thread or process)

Once Julia has been started with a certain number of processes (with the command line parameter -p ) is not possible to add further processes

None of the (other) sentences is correct

RESOLUTION

Concerning the wrong answers: (1) Multiprocessing does require (unfortunately!) allocation of the memory to each process, (2) differently than multithreading, we can add - or remove - processes dynamically using addprocs().

The correct answers are:

In Julia multiple processes can be run across different machines on a network, across a machine with multiple CPUS or on a single-CPU machine with multiple cores
Multithreading is limited to the number of cores available on the CPU(s) where the Julia process runs
Both multithreading and multiprocessing scale less than linearly with the resources employed (e.g. using 10 threads, or 10 processes, will take more than 1/10th of the time than a single thread or process)

Q4: Stochasticity and multithreading

Given the following Julia code:

using Random
rng        = Random.GLOBAL_RNG
Random.seed!(rng,123)
nThreads   = Threads.nthreads() 
rngs       = [deepcopy(rng) for i in 1:nThreads]

function foo(rngs)
  results      = Array{Int64,1}(undef,30)
  masterSeed   = rand(rng,100:9999999999999)
  Threads.@threads for i in 1:30
    tsrng      = rngs[Threads.threadid()]
    Random.seed!(tsrng,masterSeed+i*10)
    sample     = rand(tsrng, 1:100,100)
    results[i] = sum(sample)
  end
  return results' * collect(1:30) # inner product value * position
end

(out1, out2) = (foo(rngs),foo(rngs))

Which statements are correct ?

out1 is equal to out2 when only one thread is used

out1 is equal to out2 indipendently from the number of threads used

out1 is different than out2 indipendently to the number of threads used

When the script is run multiple times (in independent Julia sessions) the couple (out1,out2) is the same across the various runs indipendently of the number of threads used in the various attempts

When the script is run multiple times (in independent Julia sessions) the couple (out1,out2) is the same across the various runs only if the number of threads used in the various attempts is constant

When the script is run multiple times (in independent Julia sessions) the couple (out1,out2) would differ from one run to the other

When the script is run multiple times (in independent Julia sessions) the couple (out1,out2) would be the same across the various runs only if instead of taking the inner product between the results and their position we would take the sum of the results vector

The script leads to an error

None of the (other) statements is correct

RESOLUTION

This exercise serves to remember to consider exactly which kind of stochasticity we need from our program (do the results must be the same at each call given the inputs ? Or the model as a whole must return the same result given the inputs ? ) and how multiple threads complicate the issue. In this exemple we first set in foo() a masterseed, then we use this seed, in combination to the ith of the loop, to reseed the nth-thread rng that will be used in the ith computation. Note that the reseeding is based on the ith of the loop and NOT on the nth of the thread. This guarantees that our stochastic experiment is replicable indipendently from the number of threads used. Also, not only the elements in results are the same, but also their position (order). That is, the tuple (out1,out2) will always be the same across multiple runs of the whole code. However, for how it is written, at each time we call foo() we have a different result. I let to you, as a further exercise, to think how you could modify the code to have always the same result at each call of the function.

The correct answers are:

out1 is different than out2 indipendently to the number of threads used
When the script is run multiple times (in independent Julia sessions) the couple (out1,out2) is the same across the various runs indipendently of the number of threads used in the various attempts