高维小样本下煤与瓦斯突出集成学习预测模型

Ensemble learning model for predicting coal and gas outbursts based on high-dimensional small samples

  • 摘要: 煤与瓦斯突出预测数据具有高维、小样本等特征,给预测建模带来巨大挑战。针对这一问题,通过构建包含瓦斯压力、瓦斯含量、煤体破坏类型等7个指标的60组样本数据库,采用排列重要性降维方法进行特征降维,筛选出5个关键特征(瓦斯放散初速度、煤层厚度、瓦斯含量、瓦斯压力和煤的坚固系数),以减少弱相关特征对预测建模的影响;选取支持向量机(SVM)、随机森林(RF)、K最近邻(KNN)、逻辑回归(LR)和梯度提升算法(XGBoost)作为基学习器,XGBoost为元学习器构建Stacking集成模型,并结合贝叶斯优化(BO)算法对模型超参数进行全局寻优,构建一种煤与瓦斯突出预测的BO-Stacking集成模型,并采用沙普利加和解释(SHAP)方法对模型预测结果进行可解释性分析。结果表明:经过特征降维后的BO-Stacking模型准确率、F1值、Kappa系数和AUC值分别为92.4 %、0.956、0.927和0.969,均优于各单一模型的预测性能;各特征指标对预测结果的影响大小排序为瓦斯放散初速度>瓦斯含量>瓦斯压力>煤的坚固系数>煤层厚度。BO-Stacking集成学习模型具有良好的预测性能和稳定性,为煤与瓦斯突出预测提供了一种新方法。

     

    Abstract: Coal and gas outburst prediction data are characterized by high dimensionality and small sample sizes, posing significant challenges to predictive modeling. To address this issue, this study constructed a database of 60 samples comprising seven indicators, including gas pressure, gas content, and coal failure type. The permutation importance method was used for feature dimensionality reduction, selecting five key features (initial velocity of gas emission, coal seam thickness, gas content, gas pressure and coal sturdiness coefficient) to mitigate the impact of weakly correlated features on prediction modeling. A Stacking ensemble model was developed using support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), logistic regression (LR) and extreme gradient boosting (XGBoost) as base learners and XGBoost as the meta-learner. Bayesian Optimization (BO) was applied for global hyperparameter tuning, resulting in a BO-Stacking ensemble model for coal and gas outburst prediction. The shapely additive explanations (SHAP) method was employed for interpretability analysis of the model's predictions. The results show that the BO-Stacking model, after feature reduction, achieved an accuracy of 92.4 %, an F1 score of 0.956, a Kappa coefficient of 0.927, and an AUC value of 0.969, outperforming all individual models. The ranking of feature importance was initial velocity of gas emission > gas content > gas pressure > coal sturdiness coefficient > coal seam thickness. The BO-Stacking ensemble learning model demonstrates strong predictive performance and stability, providing a novel approach for coal and gas outburst prediction.

     

/

返回文章
返回