栗婧, 张志珍, 杜璇, 等. 基于文本分类技术的煤矿违章行为统计方法研究[J]. 矿业科学学报, 2022, 7(3): 344-353. DOI: 10.19606/j.cnki.jmst.2022.03.009
引用本文: 栗婧, 张志珍, 杜璇, 等. 基于文本分类技术的煤矿违章行为统计方法研究[J]. 矿业科学学报, 2022, 7(3): 344-353. DOI: 10.19606/j.cnki.jmst.2022.03.009
Li Jing, Zhang Zhizhen, Du Xuan, et al. Statistical method of coal mine violations based on text classification technology[J]. Journal of Mining Science and Technology, 2022, 7(3): 344-353. DOI: 10.19606/j.cnki.jmst.2022.03.009
Citation: Li Jing, Zhang Zhizhen, Du Xuan, et al. Statistical method of coal mine violations based on text classification technology[J]. Journal of Mining Science and Technology, 2022, 7(3): 344-353. DOI: 10.19606/j.cnki.jmst.2022.03.009

基于文本分类技术的煤矿违章行为统计方法研究

Statistical method of coal mine violations based on text classification technology

  • 摘要: 煤矿作为高危行业,企业违章行为记录繁杂。为高效、准确、智能地检索和管理企业违章记录信息,减少违章行为发生,本文以某矿近3年的13 935条违章行为数据库为样本,将违章行为分为3大类23小类,基于计算机文本分类技术,通过Jieba分词器文本预处理、向量空间模型构建、TF-IDF模型特征值选取、相似度计算等流程搭建了违章文本数据分类器,在Python环境下构建了可视化展示平台并进行分类统计。结果表明:违章操作在总违章行为中占比最高,达到64 %,其次为违章行动和违章指挥。同时对各违章子类进行了高、中、低频类别划分,为预防事故发生提供重要数据支撑。

     

    Abstract: As a high-risk industry, coal mining enterprises have a complex record of violations.In order to efficiently, accurately and intelligently retrieve and manage an enterprise's illegal record and reduce the occurrence of illegal behaviors.A database of 13, 935 violations in a mine in recent three years is taken as a sample.The illegal actions are divided into 3 categories and 23 subcategories.And based on the computer text classification technology, the illegal text data classifier is built.Its process includes text preprocessing of Jieba word segmentation, vector space model construction, feature value selection of TF-IDF model, and similarity calculation process.Finally, a visual classification statistics and presentation system was constructed in Python environment, and the classified statistics were carried out.The results showed that the proportion of illegal operation is 64 %, which is the highest among all illegal behavior, followed by illegal action, and illegal command accounted for the smallest proportion.At the same time, the key subcategories of high frequency, medium frequency and low frequency were analyzed to provide quantitative support for accident prevention.

     

/

返回文章
返回