以上这个据说是学前班同学更擅长的题目,微博上有人表示逻辑性强的头脑确实不能解出来,其实未必。我们试着用线性模型来解决这个问题:等式右边的数字一定是根据左边的数字来的,那么假设每个数字对右侧的贡献是有规律的,即右侧数字~第1位+第2位+…+第4位数字。最后注意这里的x实际上是因子,因为它只能有0~9总计10个水平。
训练数据文件train.csv如下:
one | two | three | four | answer |
8 | 8 | 0 | 9 | 6 |
7 | 1 | 1 | 1 | 0 |
2 | 1 | 7 | 2 | 0 |
6 | 6 | 6 | 6 | 4 |
1 | 1 | 1 | 1 | 0 |
3 | 2 | 1 | 3 | 0 |
7 | 6 | 6 | 2 | 2 |
9 | 3 | 1 | 3 | 1 |
0 | 0 | 0 | 0 | 4 |
2 | 2 | 2 | 2 | 0 |
3 | 3 | 3 | 3 | 0 |
5 | 5 | 5 | 5 | 0 |
8 | 1 | 9 | 3 | 3 |
8 | 0 | 9 | 6 | 5 |
7 | 7 | 7 | 7 | 0 |
9 | 9 | 9 | 9 | 4 |
7 | 7 | 5 | 6 | 1 |
6 | 8 | 5 | 5 | 3 |
9 | 8 | 8 | 1 | 5 |
5 | 5 | 3 | 1 | 0 |
线性模型的代码及说明如下:
## 读取训练数据 train <- read.csv(file="D:\\Workplace\\train.csv") ## 把数据变为适用于线性模型的形式 freqTable <- as.data.frame( t(apply(train[,1:4], 1, function(X) table(c(X, 0:9))-1)) ) names(freqTable) <- c("zero","one","two","three","four","five","six","seven","eight","nine") freqTable$y <- train[,5] ## 用无截距的线性模型拟合之 myModel <- lm(y ~ 0 + zero + one + two + three + four + five + six + seven + eight + nine, data=freqTable) ## 测试集相当于是2,8,5,1 test <- data.frame(t(c(0,1,1,0,0,1,0,0,1,0))) names(test) <- c("zero","one","two","three","four","five","six","seven","eight","nine") predict.lm(myModel,newdata=test) # 1 --这只是行号 # 2 --这才是结果 # 警告信息: --无视警告 # 用秩缺乏拟合来进行预测的结果很可能不可靠 ## 看看为什么这么准(训练数据没有4所以NA了) round(myModel$coefficients) # zero one two three four five six seven eight nine # 1 0 0 0 NA 0 1 0 2 1
看到这里,是不是发现只用线性模型,就把这个数圈圈的秘密给训练出来了:-)