I've looked at the Matrix package and at their slides. I was trying to understand what is the intution and the meaning behind the arguments in the dgCMatrix
class. I understand that
@i
gives the zero-based row indeces of the non-zero entries in the matrix.@j
gives the zero-based column indeces of the non-zero entries in the matrix.@x
gives the non-zero elements at the(i,j)
positions.
However I don't understand the meaning of the pointer @p
. The documentation says
numeric (integer-valued) vector of pointers, one for each column (or row), to the initial (zero-based) index of elements in the column (or row).
This is not very informative. In the "detail" section, on the same page they explain more
If
i
orj
is missing thenp
must be a non-decreasing integer vector whose first element is zero. It provides the compressed, or “pointer” representation of the row or column indices, whichever is missing. The expanded form ofp
,rep(seq_along(dp),dp)
wheredp <- diff(p)
, is used as the (1-based) row or column indices.
Which to me is definitely non-intuitive. Can someone provide a simple explanation of what p
represents? I've created a Minimal Working Example but feel free to create a new one.
Minimal Working Example
# Define non-zero values and their row/col indeces
i_indeces <- c(1, 3, 4, 6, 8, 9)
j_indeces <- c(2, 9, 6, 3, 9, 10)
values <- c(60, 20, 10, 40, 30, 50)
# Create the sparse matrix
A <- sparseMatrix(
i=i_indeces,
j=j_indeces,
x=values,
dims=c(10, 20)
)
Where
> str(A)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:6] 0 5 3 2 7 8
..@ p : int [1:21] 0 0 1 2 2 2 3 3 3 5 ...
..@ Dim : int [1:2] 10 20
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:6] 60 40 10 20 30 50
..@ factors : list()
and
> A
10 x 20 sparse Matrix of class "dgCMatrix"
[1,] . 60 . . . . . . . . . . . . . . . . . .
[2,] . . . . . . . . . . . . . . . . . . . .
[3,] . . . . . . . . 20 . . . . . . . . . . .
[4,] . . . . . 10 . . . . . . . . . . . . . .
[5,] . . . . . . . . . . . . . . . . . . . .
[6,] . . 40 . . . . . . . . . . . . . . . . .
[7,] . . . . . . . . . . . . . . . . . . . .
[8,] . . . . . . . . 30 . . . . . . . . . . .
[9,] . . . . . . . . . 50 . . . . . . . . . .
[10,] . . . . . . . . . . . . . . . . . . . .
Note
I understand that rep(seq_along(diff(A@p)), diff(A@p))
is a rearranged form of j_indeces
but I still don't understand what it means.