Quiz 1.13: Performances, parallel computation


Q1: Julia code optimisation

Which of the following strategies would likely improve the performance of Julia's code ?

RESOLUTION

Concerning the wrong answers: (1) Constant global variables (i.e. whose identifier can not be rebound to another type of object) can be efficiently used within functions; (2) Being Julia "column mayor", it is more efficient to have the most inner loop along the rows of the same column, so to read (or write) continuous parts in memory; (3) While stylistically modify function arguments in-place should be limited (as it may lead to hard to find bugs), from a performance point of view it is absolutely fine. If it avoids allocating memory for temporary objects, it may even be faster; (4) The same for the type annotation of function arguments: it may be good to avoid bugs, or required for multiple dispatch, but in general it doesn't improve performances (except in very rare cases when it may help the compiler's type inference))

The correct answers are:

  • Avoid using non-constant global variables within a function body
  • If the size of an output array is known, preallocate its size and modify each item one at a time rather than start from an empty array and add elements one at a time
  • Write the function in a way that given the types of the arguments, the result is of a fixed type
  • Annotate all the fields of a type with their relevant concrete types

Q2: Type stability

Given the following Julia snippet:

a = Any["a",1,2.5]
b = [2,3]
foo(x) = eltype(x) == Int64 ? 1       : nothing
boo(x) = eltype(x) == Int64 ? x[1]    : nothing
goo(x) = eltype(x) == Int64 ? nothing : 1
zoo(x) = eltype(x) == Int64 ? nothing : x[1]

Which of the following sentences are correct?

RESOLUTION

The only "type instable" function call is zoo(a) because in such a case the compiler doesn't know "a priori" the type of the returned output but has to look up the actual type of the first element of a at runtime. Not only the function zoo will not specialise and be slow, but also if there are other functions that depend on the returned value of zoo, the compiler will need extra steps - at run time - to lookup the correct method of these other functions to apply, further slowing down the code. Even if in this example we are working with a single argument, you can "feel" how it is the multiple dispatch together with JIT that really makes the difference in Julia: thanks to multiple dispatch we can have multiple JIT compiled versions of the function that specialise to the given inputs. So, for example, when we call bool([2,3]), the JIT compiler will compile a version of boo specific for Vector{Int64}, and hence resolve the condition at compilation time and compile the function returning directly x[1] and hence knowing it will be an Int64 still at compilation time.

The correct answers are:

  • foo(a) is type stable
  • foo(b) is type stable
  • boo(a) is type stable
  • boo(b) is type stable
  • goo(a) is type stable
  • goo(b) is type stable
  • zoo(b) is type stable

Q3: Parallel computation

Which of the following statements regarding parallel computation are correct ?{

RESOLUTION

Concerning the wrong answers: (1) Multiprocessing does require (unfortunately!) allocation of the memory to each process, (2) differently than multithreading, we can add - or remove - processes dynamically using addprocs().

The correct answers are:

  • In Julia multiple processes can be run across different machines on a network, across a machine with multiple CPUS or on a single-CPU machine with multiple cores
  • Multithreading is limited to the number of cores available on the CPU(s) where the Julia process runs
  • Both multithreading and multiprocessing scale less than linearly with the resources employed (e.g. using 10 threads, or 10 processes, will take more than 1/10th of the time than a single thread or process)

Q4: Stochasticity and multithreading

Given the following Julia code:

using Random
rng        = Random.GLOBAL_RNG
Random.seed!(rng,123)
nThreads   = Threads.nthreads() 
rngs       = [deepcopy(rng) for i in 1:nThreads]

function foo(rngs)
  results      = Array{Int64,1}(undef,30)
  masterSeed   = rand(rng,100:9999999999999)
  Threads.@threads for i in 1:30
    tsrng      = rngs[Threads.threadid()]
    Random.seed!(tsrng,masterSeed+i*10)
    sample     = rand(tsrng, 1:100,100)
    results[i] = sum(sample)
  end
  return results' * collect(1:30) # inner product value * position
end

(out1, out2) = (foo(rngs),foo(rngs))

Which statements are correct ?

RESOLUTION

This exercise serves to remember to consider exactly which kind of stochasticity we need from our program (do the results must be the same at each call given the inputs ? Or the model as a whole must return the same result given the inputs ? ) and how multiple threads complicate the issue. In this exemple we first set in foo() a masterseed, then we use this seed, in combination to the ith of the loop, to reseed the nth-thread rng that will be used in the ith computation. Note that the reseeding is based on the ith of the loop and NOT on the nth of the thread. This guarantees that our stochastic experiment is replicable indipendently from the number of threads used. Also, not only the elements in results are the same, but also their position (order). That is, the tuple (out1,out2) will always be the same across multiple runs of the whole code. However, for how it is written, at each time we call foo() we have a different result. I let to you, as a further exercise, to think how you could modify the code to have always the same result at each call of the function.

The correct answers are:

  • out1 is different than out2 indipendently to the number of threads used
  • When the script is run multiple times (in independent Julia sessions) the couple (out1,out2) is the same across the various runs indipendently of the number of threads used in the various attempts