julia_scd - HazyResearch/dimmwitted GitHub Wiki
Julia Support: How to write other access methods in Julia for DimmWitted?
You can write in Julia all three access methods, i.e., row-wise, column-wise, and column-to-row, that DimmWitted supported. In this tutorial, we will show how to write a logistic regression model using SCD instead of SGD. SCD for logistic regression is a column-to-row access method. The code can be found here.
Pre-requisites... To understand this tutorial, we assume that you have already familiar with the Julia walkthorugh, and knows how to write a logistic regression model with SGD.
Revising Gradient Function
To change access methods, you do not need to chang the data, and therefore you can use the same synthetic data set that we created for SGD. However, you need to change the gradient function with a different signature:
function grad(col::Array{Cdouble,1},
_colid::Cint,
rows::Array{Array{Cdouble, 1}},
model::Array{Cdouble,1})
Different from the row-wise gradient function, the column-to-row gradient
function takes as input four parameters, where col
is the array of one column, _colid
is the index of
this column (start from 0), rows
is a array of
rows that has non-zero element for column id=colid
,
and model
is the model. Given this signature,
we can write the gradient function as
function grad(col, _colid, rows, model)
colid = _colid + 1
nfeat = length(model)
nrows = length(rows)
if colid > nfeat
return 1.0
end
sum_term = 0.0
pat_term = 0.0
for ir = 1:length(rows)
label = rows[ir][nfeat+1]
d = 0.0
for i = 1:nfeat
d = d + rows[ir][i]*model[i]
end
sum_term = sum_term + label*rows[ir][colid]
pat_term = pat_term + rows[ir][colid]*1.0/(1.0+exp(-d))
end
model[colid] = model[colid] - 0.00001* (-sum_term + pat_term)
return 1.0
end
This function contains multiple components:
- Line 2: Note that the variable
_colid
starts from 0, however, the index of Julia starts from 1, therefore, we create the variablecolid
to start from 1. - Line 3-4: Get the number of features and number of rows.
- Line 5-7: If
colid
is the last column (i.e., the label column), we do nothing. - Line 9-19: Calculate the gradient of the
colid
'th element in the model. - Line 21: Update the
colid
'th element in the model.
Register a Column-to-row Function
The last twist that we need to do is when we register the function. Instead of
using register_row
, we should use register_c2r
as
handle_grad = DimmWitted.register_c2r(dw, grad)
Also, when creating the DimmWitted object, we should use DimmWitted.AC_C2R
instead of DimmWitted.AC_ROW
:
dw = DimmWitted.open(examples, model,
DimmWitted.MR_PERMACHINE,
DimmWitted.DR_SHARDING,
DimmWitted.AC_C2R)
Note that for column-to-row access, you can register either row access function or column-to-row function. For row (resp. column) access, you can only register row (resp. column) function.