Background
I'm writing my own function. The function takes a dataframe input with unfixed number of features. Also, the features' type maybe different,i.e. numeric
,factor
and chr
.
I want to maximise my likelihood function, which is support on a extended data matrix, with each features'log
transformation and up to quadratic
orders, e.g. columns interception + log(feature1) + feature1 + feature1^2 + log(feature2)+..+ feature1*feature2 + ... + feature_{n-1}*feature_n
Take bulit-in dataset iris
as an example:
Code:
str(iris)
Out:
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
As we can see, the first 4 features,from Sepal.length
to Petal.Width
, are numeric. I want to bulid a model on their log
to quadratic
orders. So I want to output a data matrix like following:
Code:
colnames(model.matrix(~ 1+ log(Sepal.Length) + poly(Sepal.Length,degree = 2)
+ log(Sepal.Width) + poly(Sepal.Width,degree = 2) +
log(Petal.Length) + poly(Petal.Length,degree = 2)+
log(Petal.Width) + poly(Petal.Width,degree = 2)+Sepal.Length*Sepal.Width + Sepal.Length*Petal.Length+ Sepal.Length*Petal.Width + Sepal.Width*Petal.Length +Sepal.Width*Petal.Width + Petal.Length*Petal.Width,data = iris))
Out:
[1] "(Intercept)""log(Sepal.Length)"
[3] "poly(Sepal.Length, degree = 2)1""poly(Sepal.Length, degree = 2)2"
[5] "log(Sepal.Width)""poly(Sepal.Width, degree = 2)1"
[7] "poly(Sepal.Width, degree = 2)2""log(Petal.Length)"
[9] "poly(Petal.Length, degree = 2)1""poly(Petal.Length, degree = 2)2"
[11] "log(Petal.Width)""poly(Petal.Width, degree = 2)1"
[13] "poly(Petal.Width, degree = 2)2""Sepal.Length"
[15] "Sepal.Width""Petal.Length"
[17] "Petal.Width""Sepal.Length:Sepal.Width"
[19] "Sepal.Length:Petal.Length""Sepal.Length:Petal.Width"
[21] "Sepal.Width:Petal.Length""Sepal.Width:Petal.Width"
[23] "Petal.Length:Petal.Width"
The problem
The problem is that using poly
to type formula from scratch is not wisely, especially when we have hundreds features! My function should treat dataframe automaticlly.
I know model.matrix
can extend my original dataset, like iris
. model.matrix
can even auto dealing with factor
and chr
features, converting them to dummy variables. And poly
can extend feature to high orders
but do not provide log
transformations.
My question is how to get log
and up to quadratic
order transformation of any given dataframe automatically. I want to share my new model with others, so I think my function should be replicable on any others' data sets.