julia_sparse - HazyResearch/dimmwitted GitHub Wiki
Julia Support: Can I use sparse input matrix?
Sure! We understand that many real-world applications involve sparse data
set, and therefore DimmWitted provides the support of using the native Julia
type SparseMatrixCSC{T1,Int64}
directly. In this tutorial, we
walk through how to use this feature.
The code can be found
here.
Pre-requisites... To understand this tutorial, we assume that you have already familiar with the Julia walkthorugh, and knows how to write a logistic regression model with dense data.
Prepare the Data Set
We still use the same synthetic data set that we created for the dense case by using
sparse_example=sparse(examples)
Here the variable sparse_example
is of the type
SparseMatrixCSC{Cdouble,Int64}
.
Change the Function
Here is the last change you need to make to use sparse data! The signature of the loss and gradient function needs to change accordingly for sparse data:
function loss(row, model::Array{Cdouble,1})
...
end
Here, the row
object is not of the type Array{Cdouble,1}
any more, instead, it is an Array of
immutable TMPTYPE
idx::Clonglong
data::Cdouble
end
where idx
is the element index (for row access, it is the column id
for non-zero elements), and data
is the actual data element.
It is not hard to see that row
is a sparse representation
of a vector.
Given this difference, we can see that the following piece of code implements a sparse version of the loss function.
function loss(row, model::Array{Cdouble,1})
const nfeat = length(model)
const lastcol = nfeat + 1
const nnz = length(row)
label = 0.0
if row[nnz].idx == lastcol
label = row[nnz].data
end
d = 0.0
for i = 1:nnz
if row[i].idx != lastcol
d = d + row[i].data*model[row[i].idx]
end
end
return (-label * d + log(exp(d) + 1.0))
end
After this change, all other code is the same as the dense case.
Can I Use Sparse Model?
As of DimmWitted v0.01, we do not support sparse model yet. Let us know if you found that necessary in your application.