julia_segfault - HazyResearch/dimmwitted GitHub Wiki

Julia Support: I am getting a Segmentation Fault, or an ERROR saying my function ''protentially is not thread-safe''! What should I do?

DimmWitted calles Julia functions you wrote from multiple threads to increase the performance. Because the current Julia engine is single-threaded, not all Julia functions can be used successfully with DimmWitted. Therefore, in DimmWitted, we provide

  1. A built-in simple sanity-checker to make our best effort guess in deciding whether your function can be used at register time. Of course, you can suppress the decision this sanity checker made as shown in this tutorial to use it at your own risk.
  2. We document in this page some best effort guidelines in designing Julia
    functions that can be used in DimmWitted.
  3. We provide a debugging mode in DimmWitted to call your function from a single thread to help you diagnose the problem.

Pre-requisites... To understand this tutorial, we assume that you have already familiar with the Julia walkthorugh.

Built-in Sanity Checker

By default, everytime you call a function like register_row, DimmWitted runs a sanity-checker to make a guess of whether this function can be successfully used in DimmWitted. For example, if you register a function like

function loss(row::Array{Cdouble,1}, model::Array{Cdouble,1})
        const label = row[length(row)]
        const nfeat = length(model)
        d = dot(row[1:nfeat], model)
        return (-label * d + log(exp(d) + 1.0))
end

You will see DimmWitted tells you

ERROR: Your function contains LLVM LR `alloc` or `call` other julia 
functions. We cannot register this function because it protentially 
is not thread-safe. Use register_row(_dw,loss,true) to register this 
function AT YOUR OWN RISK!

This means that DimmWitted's sanity checker thinks the function you try to register might not be able to be successfully executed. Of course, you can suppress the decision of this sanity checker by putting true to the third argument of register_row. In this case, DimmWitted will not complain and just use your function. In our example, you can use

handle_loss = DimmWitted.register_row(dw, loss, true)

Unfortunately, in our example, the sanity checker is correct, and you will see

signal (11): Segmentation fault
Segmentation fault (core dumped)

How to Write DimmWitted-friendly Julia Code?

It is actually not very hard to write a DimmWitted-friendly Julia code, and the core principle is to avoid memory allocation or calling other functions that are not thread-safe, e.g., dot.

Writing Type-Stable Code

One can find a good tutorial here about how to write type-stable Julia code. Let's see one example

function loss(row::Array{Cdouble,1}, model::Array{Cdouble,1})
        const label = row[length(row)]
        const nfeat = length(model)
        d = 0
        for i = 1:nfeat
                d = d + row[i]*model[i]
        end
        return (-label * d + log(exp(d) + 1.0))
end

Can this function be used with DimmWitted? If you run this function, our sanity checker will fails. If we check the LLVM LR using code_llvm, you can see the following line that causes the problem:

%2 = alloca [5 x %jl_value_t*], align 8

How should we revise our code to avoid this problem? We observe that the problem is acutally caused by Line 4

d = 0

In this line, d is of the type Int64, and at Line 6, d's type changed and becomes Cdouble. This type-change causes allocation of the memory. Instead, if we replace Line 4 with

d = 0.0

or

d = 0::Cdouble

We get a function that the sanity checker is happy with.

This trick is not mysterious, and one can consult this blog page about how to write ''Type-Stable Code'' in Julia.

Debug Mode

Our previous discussion builds upon the hypothesis that it is multi-threading that causes the problem. Therefore, to help you diagnose your problem, this mode is called DimmWitted.MR_SINGLETHREAD_DEBUG, and you can use it in the way like

dw = DimmWitted.open(examples, model,
                DimmWitted.MR_SINGLETHREAD_DEBUG,
                DimmWitted.DR_SHARDING,
                DimmWitted.AC_ROW)

If you use this, we can see that the version with dot can actually run even though the sanity checker still complains.