Using the code below I train two networks. Both networks are identical but the second network (model2
) has two output neurons. The target output for the first network (model1
) is the sale price of some houses. The target output of the second network (model2
) is also the sale price, but given twice for both outputs.
library(keras)
library(deepviz)
library(caret)
data(Sacramento)
x_train<-as.matrix(Sacramento[,c("beds","baths","sqft")])
x_train<-scale(x_train)
y_train<-log(Sacramento$price)
# model1, one output neuron
input<-layer_input(shape=3)
hidden<-layer_dense(input,units=4,activation="sigmoid",use_bias=T)
output<-layer_dense(hidden,units=1,activation="linear",use_bias=F)
model1<-keras_model(inputs=input, outputs=output)
model1 %>% compile(
optimizer = "rmsprop",
loss = 'mse',
metrics = c('mean_squared_error')
)
model1 %>% fit(
x_train,
y_train,
epochs = 100,
batch_size=10,
validation_split=0.2
)
# model2, two output neurons
input<-layer_input(shape=3)
hidden<-layer_dense(input,units=4,activation="sigmoid",use_bias=F)
outputs<-list(
layer_dense(hidden,units=1,activation="linear",use_bias=F),
layer_dense(hidden,units=1,activation="linear",use_bias=F)
)
model2<-keras_model(inputs=input, outputs=outputs)
model2 %>% compile(
optimizer = "rmsprop",
loss = 'mse',
metrics = c('mean_squared_error')
)
model2 %>% fit(
x_train,
list(y_train,y_train),
epochs = 100,
batch_size=10,
validation_split=0.2
)
The result of the first network (model1
) is:
# loss: 0.1118 - mean_squared_error: 0.1118 - val_loss: 0.1124 - val_mean_squared_error: 0.1124
The result of the second network (model2
) is:
# loss: 0.8307 - dense_84_loss: 0.4268 - dense_85_loss: 0.4115 - dense_84_mean_squared_error: 0.4223 - dense_85_mean_squared_error: 0.4084 - val_loss: 0.9867 - val_dense_84_loss: 0.4844 - val_dense_85_loss: 0.4878 - val_dense_84_mean_squared_error: 0.4918 - val_dense_85_mean_squared_error: 0.4950
Why is the performance of model2
substantially worse than model1
? Shouldn't the performance of both models be approximately identical?
Here are the network weights of model2
:
[[1]]
[,1] [,2] [,3] [,4]
[1,] 0.320372 -0.1731332 0.5624840 -0.47232226
[2,] -2.519146 0.2757542 0.8330284 -0.01051062
[3,] -1.838133 0.4940852 0.7297845 0.37787062
[[2]]
[,1]
[1,] 5.708315
[2,] 6.818939
[3,] 5.347315
[4,] 6.232662
[[3]]
[,1]
[1,] 6.026654
[2,] 5.976376
[3,] 6.360726
[4,] 5.751414
Obviously, the connections weights to the output neurons [[3]] and [[4]] are very different? Why is that? Shouldn't they be approximately identical?