基于概率融合算法的煤矿事故隐患文本知识实体抽取研究

Textual knowledge entity extraction of hidden dangers in coal mine accidents based on probabilistic fusion algorithm

  • 摘要: 针对煤矿事故隐患文本数据的非结构化特性,基于煤矿事故隐患文本数据集,通过分析隐患描述文本数据的特征及隐含信息,结合事故隐患传播规律设计了适用于煤矿事故隐患描述文本的知识实体标注类型并使用Brat工具进行标注,构建用于知识实体抽取模型的数据集;提出一种基于动态权重融合的BERT-IDCNN- CRF模型,并引入基于牛顿冷却定律的概率融合算法。结果表明:引入概率融合算法后,动态权重融合的BERT-IDCNN-CRF在隐患文本知识实体抽取任务中表现最佳,其精度、召回率与F1值分别提升了8.93%、5.28%、7.51%,显著提高了模型的预测准确性和稳定性,并具有良好的适应性。

     

    Abstract: Given the unstructured nature of text data related to hidden dangers in coal mine accidents, extracting latent knowledge is crucial for constructing a knowledge graph of hidden dangers in coal mine accidents. This study proposes annotation types for knowledge entities to describe hidden dangers in coal mine accidents by analyzing the characteristics and latent information in the texts of hidden dangers based on their propagation patterns. Using the Brat annotation tool, we annotated the text data related to hidden dangers of coal mine accidents to construct a dataset for knowledge extraction model. We proposes a BERT-IDCNN-CRF model based on dynamic fusion and introduced a probabilistic fusion algorithm based on Newton's law of cooling. The results indicate that with the incorporation of the probabilistic fusion algorithm, the dynamically weighted BERT-IDCNN-CRF model achieved the best performance in the task of knowledge entity extraction from hidden danger texts. Its precision, recallrate, and F1-score improved by 8.93%, 5.28%, and 7.51%, respectively, significantly enhancing the model's prediction accuracy and stability, while demonstrating excellent adaptability.

     

/

返回文章
返回