Introduction to Julia - RSE-Cambridge/julia-study-group GitHub Wiki

An introduction to Julia for the initiated

Intended for those already confident with a programming language. Assumes to some degree familiarity with Python.

Why I think Julia is worth learning

Julia feels like a 21st century programming language, that is to say, the syntax is unobtrusive, with much of the "programmer thinking" (types, memory management, algorithms) being taken care of by the compiler. The developer is still able to control these things when necessary, providing the low-level tinkering that is needed for blazing performance, but allows for high-level approaches in the rest of an application.

Other things that Julia does well:

Automatic Differentiation
Multiple-dispatch is an extremely powerful abstraction
- Two language problem, compiling for xPUs, quantum computers
Reproducible, rich and (currently very) actively maintained ecosystem
- Pkg.jl, GitHub organizations for topically similar packages
- Easy access to bleeding edge algorithms and packages in ML, DEs, statistics, etc.
Good tooling, encouraged by default
- Documenter.jl, Test.jl, GitHub actions
- Built-in introspection, reflection, and profiling

Other people seem to agree, e.g. Julia for HEP (Springer 2023).

What I had to learn to learn Julia

... coming from an ostensibly pure Python background.

Live and breathe in the REPL
- Workflow is unlike any other language I'd worked with (c.f. Lisp / Clojure)
- Emphasis on interactive development
There should be nine-- and preferably ten --obvious ways to do it -- there are multiple ways of expressing exactly the same thing
- This has upshots: no awkward expressions, keep things familiar
- And downshots: collaborators may have different habits, sometimes different code generated for seemingly identical expressions
Standard algorithms and for-loops are fast again
Every function is effectively a template until you call it

Setup recommendations

My recommendation is to use the VSCode Julia Extension. The extension is feature rich, and provides good utilities for working with the Julia REPL (such as Shift-Enter to run a code block, and the plotting pane). - Setup threads by passint -tauto in the extension settings

Use juliaup version manager

A note on "Virtual environments". In Julia, these aren't quite the same as in e.g. Python:

Downloaded packages are stored in ~/.julia
Environments are namespaces of compatible versions
Packages used in development (i.e. dev deps) are your global env
Work on multiple distinct projects in the same env
Check things out in development mode

In the REPL, press the ] key to enter the package manager mode. Activate your project environment with

julia>]
pkg> activate .

To add a dependency to a project

pkg> add PackageName

To check out a project for development (wrapper around git clone and hot-reloading via Revise.jl)

pkg> dev --local NameOfPackage

Recommendation: Use the same environment until you really need another one (e.g. incompatible deps, eye-watering compilation times)

Home directory

> ls ~/.julia
artifacts/       clones/          compiled/        dev/             juliaup/         logs/            packages/        pluto_notebooks/ registries/      scratchspaces/

Notable directories:

artifacts: downloaded compiled libraries, auxillary data, generated things, etc.
compiled: shared object files that have been precompiled for packages
dev: packages checked out for development

First things to know

Julia is optionally strongly typed (you'll see what I mean)
Julia is garbage collected
Julia compiles on first-call (related: world age in JIT, can occasionally be problematic)
Column major, arrays start at 1

Get help in the repl by prefixing a function or type with ?

julia> ?sort
search: sort sort! sortperm sortperm! sortslices insorted Cshort issorted QuickSort MergeSort Cushort partialsort partialsort! partialsortperm

  sort(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)

  Variant of sort! that returns a sorted copy of v leaving v itself unmodified.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> v = [3, 1, 2];
...

The Julia Compiler and Runtime

Image from Carsten Bauer

Examining the translations

"""
    quadratic_solve(a, b, c)

Solve for the two roots of the quadratic equation ``ax^2 + bx + c`` using the quadratic formula.
"""
function quadratic_solve(a, b, c)
    Δ = √(b^2 - 4a * c)
    root1 = (-b + Δ) / (2a)
    root2 = (-b - Δ) / (2a)
    return (root1, root2)
end

quadratic_solve(5.0, 3.0, -2.0)

Meta.parse("quadratic_solve(   5.0,          3.0, -2.0)")
Meta.@dump quadratic_solve(5.0, 3.0, -2.0)

@code_lowered quadratic_solve(5.0, 3.0, -2.0)
@code_warntype quadratic_solve(5.0, 3.0, -2.0)
@code_typed quadratic_solve(5.0, 3.0, -2.0)
@code_llvm debuginfo=:none quadratic_solve(5.0, 3.0, -2.0)
@code_native debuginfo=:none quadratic_solve(5.0, 3.0, -2.0)

Julia Types

Julia has a notion of Abstract and Concrete types. Abstract types are not instantiable (though constructor functions may exist, e.g. AbstractFloat), but provide a means of relating interfaces and behaviors (compare to C++20's concepts).

Concrete types may subtype an abstract type, but cannot themselves be used as a supertype for another concrete type.
Concrete types may only subtype a single abstract type

# abstract types by convention should have `Abstract` in their name
# this is not enforced
abstract type AbstractSubstance end
# abstract types can be subtypes of other abstract types
abstract type AbstractFluid <: AbstractSubstance end

@show methods(AbstractFluid)

function colour(fluid::AbstractFluid) 
    error("Not implemented for $(typeof(fluid))")
end

# structs are concrete types: these can be instantiated
struct Coal <: AbstractSubstance
    # a struct does not need to contain fields
    # can be used to control compile-time dispatch (more on this later)
end

struct Water <: AbstractFluid
end

colour(water::Water) = "blue"

coal = Coal()
water = Water()

struct Cat <: AbstractFluid
    colour::String
end

colour(cat::Cat) = cat.colour

cat = Cat("black")

@show colour(cat)
@show colour(water)

@show T = typeof(cat)
@show supertype(T)
@show supertypes(T)

All types are a subtype of Any

A whistle stop tour

println("Hello World")
@show "Hello World"
display("Hello World")
Base.show(stdout, "Hello World")
"Hello World"

Julia convention: function names are often just a lowercase name

function singleword(a)
    all(!isspace, strip(a))
end

function unless_the_function_has_a_long_name()
    @warn "Why did you call me?"
    throw("Not impemented")
end

singleword("Hello")
singleword("Hello World")

(Almost) everything is an expression

# last line of a function is returned
function foo()
    true
end

# last line of a block is returned
begin
    a = 1
    b = 2
    a + b
end

let a = 10.0
    a
end

# same applies to the blocks in control flow
x = if foo()
    1
else
    0
end

@show x

Julia is column major, indices start at 1 (optionally anything you like, e.g. StarWarsArrays.jl)

matrix = rand(10_000, 10_000)

# function name convention: 
# if it modifies arguments, include a bang (!)
# arguments that are modified should be first
# this is not enforced
function loop_columns!(mat)
    for i in 1:size(mat, 2)
        for j in 1:size(mat, 1)
            mat[j, i] = 42
        end
    end
    mat
end

function loop_rows!(mat)
    for i in 1:size(mat, 1)
        for j in 1:size(mat, 2)
            mat[i, j] = 42
        end
    end
    mat
end

# run once to precompile
@time loop_columns!(matrix)
@time loop_rows!(matrix)

# run again to time invoke
# semi colon supresses output of that line in th REPL
@time loop_columns!(matrix) ;
@time loop_rows!(matrix) ;

Better benchmarking with BenchmarkTools.jl:

using BenchmarkTools

@benchmark loop_columns!($matrix)
@benchmark loop_rows!($matrix)

Lots of ways of mapping functions onto arrays, including the broadcast mechanism:

f(x) = sin(x)^2

# using collect to explicitly allocate the range
# range is otherwise a lazy data structure
data = collect(range(0.0, 5.0, 100))

# can map to allocate
map(f, data)

# can use list comprehension
[f(i) for i in data]

# broadcast syntax
f.(data) 

# broadcast pipe
data .|> f

# function composition
g = asin ∘ f
g.(data)

# in-place broadcast
data .= f.(data)
# using a macro
@. data = f(data)

Lazy evaluation of element-based methods, and collection orientated programming in Transducers.jl

Julia is UTF-8, including for infix operations

a ⊗ b = kron(a, b)

γ = [1, 2, 3]
σ₂ = [4, 5, 6]

γ ⊗ σ₂'

Lots of linear algebra built in

A = [4 5; 2 1]
b = [-2, 1]

# solve A x = b for x
x = A \ b

From ?\:

(A, B)

Matrix division using a polyalgorithm. For input matrices A and B, the result X is such that A*X == B when A is square. The solver that is used depends upon the structure of A. If A is upper or lower triangular (or diagonal), no factorization of A is required and the system is solved with either forward or backward substitution. For non-triangular square matrices, an LU factorization is used.

For rectangular A the result is the minimum-norm least squares solution computed by a pivoted QR factorization of A and a rank estimate of A based on the R factor.

When A is sparse, a similar polyalgorithm is used. For indefinite matrices, the LDLt factorization does not use pivoting during the numerical factorization and therefore the procedure can fail even for invertible matrices.

Julia has useful macros

x = [1, 2, 7, 1, 7, 2, 3, 6, 1, 2, 3 ]

# time and some memory info (good for rough estimates)
@time x.^2

# view into a slice of memory (no copy)
@view x[4:8]

# macros read everything to the right as part of their input: scope with brackets
@view x[4:8] .= 5 # error
@view(x[4:8]) .= 5
@show x

# turn off bounds checks for speed
function fill_zeros!(x)
    for i in eachindex(x)
        x[i] = 0
    end
end
fill_zeros!(x)

# for inlining of functions
@inline function inline_me(a, b)
    a^2 + b
end

# make everything a broadcast
x = collect(range(1, 10, 20))
@. x = x^2 - 12

# also worth mentioning
Threads.@threads
@simd

Julia lets you work on the AST to define your own macros

"""
    @simple_muladd expr

Substitutes top level multiplications and additions with `muladd` explicitly.

## Example

    @simple_muladd 5 * 2 + 1
"""
macro simple_muladd(expr)
    # macros operate on the AST
    if (expr.args[1] == :+) && (expr.args[2].args[1] == :(*))
        x1 = expr.args[3]
        multiply = expr.args[2]
        x2 = multiply.args[2]
        x3 = multiply.args[3]
        quote
            muladd($(x2), $(x3), $(x1))
        end |> esc
    else
        expr
    end
end

a = 5
b = 7
c = 2

@simple_muladd a * b + c

function test1(a, b, c)
    @simple_muladd a * b + c
end

function test2(a, b, c)
    a * b + c
end

@code_native debuginfo=:none test1(a, b, c)
@code_native debuginfo=:none test2(a, b, c)

# annotate the type
A = Float64[ 
    1  3 ;
    4  5 
]
b = Float64[4, 3]
c = Float64[2, 1]

@code_native debuginfo=:none test1(A, b, c)
@code_native debuginfo=:none test2(A, b, c)

For a recursive version, see MuladdMacro.jl

Iterators, generators, comprehensions, standard algorithms etc.

for (i, k) in enumerate("hello")
    @show i,k
end

generator = (i for i in "hello")

# make it stateful: this allocates a mutable data structure that 
# tracks the state of the generator
itt = Iterators.Stateful(generator)

result = popfirst!(itt)
# do something with result
next_result = popfirst!(itt)

# algorithms given lazy structures allocate
filter(>(0), -10:10)

sort(rand(10))

# or inplace sort with the bang (!)
x = rand(10)
sort!(x)

Packages are capitalized camel case, and are easy to fetch

pkg> add Tullio, Symbolics

Project.toml: high-level, user-readable
Manifest.toml: essentially only for machine reading, lock file (usually not checked to version control)

Fetched packages are easy to use:

using Tullio, Symbolics

@variables x[1:3, 1:3], y[1:3]

X = [x[i,j] for i in 1:3, j in 1:3]

@tullio z[i] := X[k, i] * y[k]

using Latexify
latexify(z)

$$ \left[ \begin{array}{c} x_{1}ˏ_1 y_1 + x_{2}ˏ_1 y_2 + x_{3}ˏ_1 y_3 \ x_{1}ˏ_2 y_1 + x_{2}ˏ_2 y_2 + x_{3}ˏ_2 y_3 \ x_{1}ˏ_3 y_1 + x_{2}ˏ_3 y_2 + x_{3}ˏ_3 y_3 \ \end{array} \right] $$

Good IO:

filename = "./testfile" 
write(filename, "hello world, i am a testfile") # don't get the order wrong: no checks
write("hello world, i am a testfile", filename) # <- oh no

# vector of lines
lines = readlines("./testfile")

# open the file
handle = open("./testfile", "r")
close(handle)

function reader(io)
    "> " * read(io, String) * "\n - Reader"
end

# opens, uses the callback, then closes
data = open(reader, "./testfile", "r")

# compare `do` blocks to python `with` contexts
# the `do` specifies a function which is wrapped up and passed as the 
# first argument to the invoked function
data = open("./testfile", "r") do io
    read(io, String)
end

# aside: do blocks
tell_me_about(f) = @show methods(f)
tell_me_about() do k
    k^2
end

Custom types, immutable by default

mutable struct MyDataType2
    a::Float64
    b::Int64
end

data1 = MyDataType2(1.0, 2)
data1

Mutability does not propagate in either direction:

struct MyImmutableType
    a::Float64
end

# implement default constructor
MyImmutableType() = MyImmutableType(0.0)

t0 = MyImmutableType()

t1 = MyImmutableType(1.0)
t1.a = 2.0 # error

mutable struct MyMutableType
    a::MyImmutableType
    b::Float64
end

t2 = MyMutableType(t1, 2.0)
t2.b = 3.0
t2.a.a = 5.0 # error
t2.a = MyImmutableType(5.0)
t2

struct MyImmutableMutableType
    a::Vector{MyImmutableType}
    c::Float64
end

t3 = MyImmutableMutableType([t1], 2.0)
t3.a[1].a = 6.0 # error
t3.a[1] = MyImmutableType(6.0)
t3

t3.a = [MyImmutableType(7.0)] # error

NB: mutable types are always heap allocated and GC-tracked

Add dispatches to existing functions

struct DualNumber <: AbstractFloat
    x::Float64
    ε::Float64
end

DualNumber(x) = DualNumber(x, zero(x))

# cannot overload functions unless imported or qualified
import Base: +, -, *, /

# summation rule
a::DualNumber + b::DualNumber = DualNumber(a.x + b.x, a.ε + b.ε)
a::DualNumber - b::DualNumber = DualNumber(a.x - b.x, a.ε - b.ε)

# product rule
a::DualNumber * b::DualNumber = DualNumber(a.x * b.x, (a.x * b.ε) + (a.ε * b.x) )


# quotient rule
a::DualNumber / b::DualNumber = 
    DualNumber(a.x / b.x, ( (b.x * a.ε) - (a.x * b.ε) ) / (b.x^2))

# conversion
Base.convert(::Type{<:DualNumber}, x::Real) = DualNumber(x)
Base.convert(::Type{<:DualNumber}, x::DualNumber) = x

# always promote to Dual
Base.promote_rule(::Type{<:DualNumber}, ::Type{<:Number}) = DualNumber

a = DualNumber(5)
a + a

b = DualNumber(5.0, 1.0)
b^2 + 5b - 3

Julia's type system

As said, Julia is optionally strongly typed: to see what this means

function my_add(x, y)
    x + y
end

# add a dispatch based on the abstract type
function my_add(x::AbstractString, y::AbstractString)
    x * y
end

@code_warntype my_add(1, 2)
@code_warntype my_add(1.0, 2)
@code_typed my_add(1f0, 2)
@code_typed my_add(1f0, 1.0)
@code_typed my_add("Hello", "World")

@code_typed my_add([1, 2, 3], [4, 5, 6])
@code_typed my_add((1, 2, 3), (4, 5, 6))

Generics and type parameters


# can enforce x and y have the same type, and then use that type information
function my_subtract(x::T, y::T) where {T}
    # wouldn't recommend doing it this way, better to use dispatches
    if T <: Real
        return x - y
    elseif T <: AbstractString
        return join(i for i in x if i ∉ y)
    end
    error("not implemented for type $T")
end

my_subtract(1, 2)
my_subtract("i am so unhappy", "un")
my_subtract(1.0 + im, 2.0 + im) # error: not implemented

# can use a parametric type to do the same thing with structs
# note: no restrictions on T, and therefore T can be <: Any
struct GenericDualNumber{T} <: Number
    x::T
    ε::T
end

GenericDualNumber(1, 2)
GenericDualNumber(1f0, 2f0)
GenericDualNumber("1", "2")
GenericDualNumber(GenericDualNumber(1.0, 2.0), GenericDualNumber(5.0, -2.0))
GenericDualNumber(1.0, 1) # error: different types given

Functions and anonymous functions as types:

# anonymous function syntax
(arg1, arg2) -> arg1 + arg2
assigned_to_variable = (arg1, arg2) -> arg1 + arg2

function hello()
    return 2
end

typeof(hello)

@show typeof(() -> 2)
@show typeof(() -> 2) # world age ticked

GenericDualNumber(hello, hello)
GenericDualNumber(() -> 2,  () -> 2) # error

k = () -> 2
GenericDualNumber(k, k)

Note: mangled names always generated for anonymous functions. For debugging, is recommended to always pass named functions so that call stacks are properly annotated

MPI and parallelization

Quick threading example:

a = rand(500_000)

for i in eachindex(a)
    a[i] = 2 * a[i]
end

@show Threads.nthreads()
@show Threads.threadpoolsize()
@show Threads.threadid()

@sync begin
    Threads.@spawn println(Threads.threadid())
    Threads.@spawn println(Threads.threadid())
end

# can multi thread for-loops by annotation
Threads.@threads for i in eachindex(a)
    a[i] = 2 * a[i]
end

function busywait(seconds)
    tstart = time_ns()
    while (time_ns() - tstart) / 1e9 < seconds
    end
end

# thread-local storage paradigm
storage = [zeros(1000) for _ in 1:Threads.nthreads()]
Threads.@threads for i in eachindex(first(storage))
    arr = storage[Threads.threadid()]
    # ...
end

GPU programming

Kernel programming with KernelAbstractions.jl

# i'm on a M1 mac so
using Metal, KernelAbstractions

@kernel function mul2_kernel(A)
  I = @index(Global) # threadId.x
  A[I] = 2 * A[I]
end

A = MtlArray(ones(Float32, 1024, 1024))

backend = get_backend(A)
kernel = mul2_kernel(backend, 64) # work group size = 64
kernel(A, ndrange=size(A))
KernelAbstractions.synchronize(backend)
@show all(A .== 2.0f0)

Supports CUDA, ROCm, oneAPI, Metal
Compiles the LLVM IR that the Julia compiler generates into device specific code

Array-based GPU programming with e.g. GPUArrays.jl

Implementation heterogeneity/homogeneity via things like Adapt.jl

FFI, working with C/Fortran code, and BinaryBuilder.jl

Calling C and Fortran Code (Julia Docs)
CxxWrap.jl for Boost.Python style wrapping
BinaryBuilder.jl

Compile C/C++/Fortran in RootFS containers to a common denominator glibc / musl or libfortran, with the Cartesian product of e.g. C++ String ABIs, for different architectures in a single script

@rpath patching
Undefined symbol checks
Dependency checks

Adding the package then downloads compatible binaries and data automatically for use.

Profiling

Profiling (Julia docs)

TL;DR: there is a @profile macro for profiling function calls with a statistical profiler

For tracking allocations, start Julia with --track-allocation=user

For coverage, start Julia with --code-coverage=user (see more in Coverage.jl)

Tooling

Tools I use daily:

Plots.jl: high level plotting utilities for visualizing data, in particular with the Unicode backend

using Plots, UnicodePlots

plot(Plots.fakedata(50, 5), w = 3)

x = collect(range(0, 2π, 100))
y = @. sin(x)

UnicodePlots.lineplot(x, y, canvas = DotCanvas)

BenchmarkTools.jl: benchmarking suite, examples above
Cthulhu.jl: recursive @code_* tools
- step info functions, drill down to find type instabilities, etc.
JuliaFormatter.jl: code formatter
Revise.jl: automatically loaded by the VSCode extension
FlameGraphs.jl: for profiling
ForwardDiff.jl: forward mode automatic differentiation

Other useful packages:

StaticArrays.jl: like C++ std::valarray mixed with std::array, fixed size arrays that live on the stack, but implement the full AbstractArray interface
- small linear algebra routines are lightning fast
StructArrays.jl: SoA
BangBang.jl: uniform interface for mutable and immutable datatypes

Gotchas and things I don't like

Type stability is still a bit of a pain in big call stacks
Error printing and stack traces can be atrocious
Pre-compilation times are getting better, but are still wild and occasionally unpredictable
Runtime is memory hungry
GC can hamstring performance (especially with external dependencies), though 1.10 is already making this better with multi-threaded GC
Some edge-case & non-deterministic bugs that bit me

References

Re: reproducibility

Project.toml and Manifest.toml available on reasonable request to the authors :p