Right upfront: this is an issue I encountered when submitting an R package to CRAN. So I
- dont have control of the stack size (as the issue occured on one of CRANs platforms)
- I cant provide a reproducible example (as I dont know the exact configurations on CRAN)
Problem
When trying to submit the cSEM.DGP package to CRAN the automatic pretest (for Debian x86_64-pc-linux-gnu; not for Windows!) failed with the NOTE: C stack usage 7975520 is too close to the limit
.
I know this is caused by a function with three arguments whose body is about 800 rows long. The function body consists of additions and multiplications of these arguments. It is the function varzeta6()
which you find here (from row 647 onwards).
How can I adress this?
Things I cant do:
- provide a reproducible example (at least I would not know how)
- change the stack size
Things I am thinking of:
- try to break the function into smaller pieces. But I dont know how to best do that.
- somehow precompile? the function (to be honest, I am just guessing) so CRAN doesnt complain?
Let me know your ideas!
Details / Background
The reason why varzeta6()
(and varzeta4()
/ varzeta5()
and even more so varzeta7()
) are so long and R-inefficient is that they are essentially copy-pasted from mathematica (after simplifying the mathematica code as good as possible and adapting it to be valid R code). Hence, the code is by no means R-optimized (which @MauritsEvers righly pointed out).
Why do we need mathematica? Because what we need is the general form for the model-implied construct correlation matrix of a recursive strucutral equation model with up to 8 constructs as a function of the parameters of the model equations. In addition there are constraints. To get a feel for the problem, lets take a system of two equations that can be solved recursivly:
- Y2 = beta1*Y1 + zeta1
- Y3 = beta2*Y1 + beta3*Y2 + zeta2
What we are interested in is the covariances: E(Y1*Y2), E(Y1*Y3), and E(Y2*Y3) as a function of beta1, beta2, beta3 under the constraint that
- E(Y1) = E(Y2) = E(Y3) = 0,
- E(Y1^2) = E(Y2^2) = E(Y3^3) = 1
- E(Yi*zeta_j) = 0 (with i = 1, 2, 3 and j = 1, 2)
For such a simple model, this is rather trivial:
- E(Y1*Y2) = E(Y1*(beta1*Y1 + zeta1) = beta1*E(Y1^2) + E(Y1*zeta1) = beta1
- E(Y1*Y3) = E(Y1*(beta2*Y1 + beta3*(beta1*Y1 + zeta1) + zeta2) = beta2 + beta3*beta1
- E(Y2*Y3) = ...
But you see how quickly this gets messy when you add Y4, Y5, until Y8.
In general the model-implied construct correlation matrix can be written as (the expression actually looks more complicated because we also allow for up to 5 exgenous constructs as well. This is why varzeta1()
already looks complicated. But ignore this for now.):
- V(Y) = (I - B)^-1 V(zeta)(I - B)'^-1
where I is the identity matrix and B a lower triangular matrix of model parameters (the betas). V(zeta) is a diagonal matrix. The functions varzeta1()
, varzeta2()
, ..., varzeta7()
compute the main diagonal elements. Since we constrain Var(Yi) to always be 1, the variances of the zetas follow. Take for example the equation Var(Y2) = beta1^2*Var(Y1) + Var(zeta1) --> Var(zeta1) = 1 - beta1^2. This looks simple here, but is becomes extremly complicated when we take the variance of, say, the 6th equation in such a chain of recursive equations because Var(zeta6) depends on all previous covariances betwenn Y1, ..., Y5 which are themselves dependend on their respective previous covariances.
Ok I dont know if that makes things any clearer. Here are the main point:
- The code for
varzeta1()
, ...,varzeta7()
is copy pasted from mathematica and hence not R-optimized. - Mathematica is required because, as far as I know, R cannot handle symbolic calculations.
- I could R-optimze "by hand" (which is extremly tedious)
- I think the structure of the
varzetaX()
must be taken as given. The question therefore is: can I somehow use this function anyway?