节点文献
改进随机森林模型在人口空间化中的应用
Application of improved random forest model in population spatialization
【摘要】 基于随机森林模型的人口空间化方法未考虑人口空间分布非平衡性,利用Bootstrap采样加剧样本的不均衡性,使其不具有代表性,造成模型预测精度较低。针对此问题,本文以成都市为例,通过相关性分析提取影响人口分布的特征因子,基于K-means++聚类算法对数据集进行聚类处理,然后利用Bootstrap采样法从各簇中抽取等量的数据融合作为训练子集构建改进随机森林模型,并与传统随机森林模型进行对比。运用改进后的随机森林模型对成都市2020年人口数据进行空间化,并与WorldPop数据集进行精度对比。结果表明,基于改进随机森林的人口空间化模型整体精度达80.5%,较改进前提高了约3.4%,有效提高了模型预测精度;相较于WorldPop数据集,基于改进随机森林模型的人口空间化结果在拟合度及精度方面均较优。
【Abstract】 The random forest model-based population spatialization method does not take into account the non-equilibrium of population spatial distribution, and the use of Bootstrap sampling exacerbates the unevenness of the sample, making it unrepresentative and resulting in low model prediction accuracy. For this problem, this study takes Chengdu city as an example, the characteristic factors of affecting the population distribution are extracted through correlation analysis, the data set is clustered based on the K-means++clustering algorithm, and then an equal amount of data from each cluster is fused as a training subset using the Bootstrap sampling method to construct an improved random forest model and compare it with the traditional random forest model. Finally, the population data of Chengdu city in 2020 is spatialized using an improved random forest model, and the results are compared with the WorldPop dataset for accuracy. The results show that the overall accuracy of the population spatialisation model based on the improved random forest reaches 80.5%, which is about 3.4% higher than before the improvement, indicating that the improved random forest model can effectively improve the model prediction accuracy. Compared to the WorldPop dataset, the population spatialisation results based on the improved random forest model are better in terms of fit and accuracy.
【Key words】 population spatialization; random forest; K-means++ clustering; Chengdu city;
- 【文献出处】 测绘通报 ,Bulletin of Surveying and Mapping , 编辑部邮箱 ,2023年06期
- 【分类号】P208;C924.2
- 【下载频次】171