Using the code below I train two networks. Both networks are identical but the second network (model2) has two output neurons. The target output for the first network (model1) is the sale price of some houses. The target output of the second network (model2) is also the sale price, but given twice for both outputs.
library(keras)
library(deepviz)
library(caret)
data(Sacramento)
x_train<-as.matrix(Sacramento[,c("beds","baths","sqft")])
x_train<-scale(x_train)
y_train<-log(Sacramento$price)
# model1, one output neuron
input<-layer_input(shape=3)
hidden<-layer_dense(input,units=4,activation="sigmoid",use_bias=T)
output<-layer_dense(hidden,units=1,activation="linear",use_bias=F)
model1<-keras_model(inputs=input, outputs=output)
model1 %>% compile(
optimizer = "rmsprop",
loss = 'mse',
metrics = c('mean_squared_error')
)
model1 %>% fit(
x_train,
y_train,
epochs = 100,
batch_size=10,
validation_split=0.2
)
# model2, two output neurons
input<-layer_input(shape=3)
hidden<-layer_dense(input,units=4,activation="sigmoid",use_bias=F)
outputs<-list(
layer_dense(hidden,units=1,activation="linear",use_bias=F),
layer_dense(hidden,units=1,activation="linear",use_bias=F)
)
model2<-keras_model(inputs=input, outputs=outputs)
model2 %>% compile(
optimizer = "rmsprop",
loss = 'mse',
metrics = c('mean_squared_error')
)
model2 %>% fit(
x_train,
list(y_train,y_train),
epochs = 100,
batch_size=10,
validation_split=0.2
)
The result of the first network (model1) is:
# loss: 0.1118 - mean_squared_error: 0.1118 - val_loss: 0.1124 - val_mean_squared_error: 0.1124
The result of the second network (model2) is:
# loss: 0.8307 - dense_84_loss: 0.4268 - dense_85_loss: 0.4115 - dense_84_mean_squared_error: 0.4223 - dense_85_mean_squared_error: 0.4084 - val_loss: 0.9867 - val_dense_84_loss: 0.4844 - val_dense_85_loss: 0.4878 - val_dense_84_mean_squared_error: 0.4918 - val_dense_85_mean_squared_error: 0.4950
Why is the performance of model2 substantially worse than model1? Shouldn't the performance of both models be approximately identical?
Here are the network weights of model2:
[[1]]
[,1] [,2] [,3] [,4]
[1,] 0.320372 -0.1731332 0.5624840 -0.47232226
[2,] -2.519146 0.2757542 0.8330284 -0.01051062
[3,] -1.838133 0.4940852 0.7297845 0.37787062
[[2]]
[,1]
[1,] 5.708315
[2,] 6.818939
[3,] 5.347315
[4,] 6.232662
[[3]]
[,1]
[1,] 6.026654
[2,] 5.976376
[3,] 6.360726
[4,] 5.751414
Obviously, the connections weights to the output neurons [[3]] and [[4]] are very different? Why is that? Shouldn't they be approximately identical?