欢迎访问昆明冶金高等专科学校学报官方网站,今天是 分享到:

昆明冶金高等专科学校学报 ›› 2023, Vol. 39 ›› Issue (6): 46-.DOI: 10.3969/j.issn.1009-0479.2023.06.008

• 电子信息技术 • 上一篇    下一篇

基于 NLP 和统计方法的唐代不同流派诗歌风格特征分析

李梦巧,沈凡起,马 明,张朝元   

  1. (大理大学a.数学与计算机学院;b.教师教育学院,云南大理671003)
  • 收稿日期:2023-10-09 出版日期:2023-12-02 发布日期:2024-03-12
  • 作者简介:李梦巧 (1987-),女,河南南阳人,助教,理学硕士,主要从事计算数学与数学教育研究。
  • 基金资助:
    大理大学第八期教育教学改革研究项目 “新工科背景下 《高等数学》课程教学改革研究与实践”(2022JGY08-99);2022年度云南省研究生导师团队建设项目 “学科教学 (数学)研究生导师团队”(108)。

Analysis of the Stylistic Characteristics of Different Schools ofthe Tang Dynasty Poetry Based on NLP and Statistical Methods

LI Mengqiao,SHEN Fanqi, MA Ming,ZHANG Chaoyuan   

  1. (a. College of Mathematics and Computer; b. College of Teacher Education, Dali University, Dali 671003, Yunnan, China)
  • Received:2023-10-09 Online:2023-12-02 Published:2024-03-12

摘要: 唐诗是我国的文化瑰宝,数量大,风格和主题多样。为了对不同流派唐诗的特征进行讨论,研究基于自然语言文字处理,利用k-means++聚类分析、重复测量方差分析和配对样本t检验等统计方法对山水田园诗派、边塞诗派、浪漫诗派、现实诗派和咏史诗派五大流派诗歌的风格特征和差异进行了分析。结果表明:唐诗本身具有较强的共性,如“千里””“万里”“何处”等关键词的TF-IDF值在不同流派中均比较靠前:在不同流派的诗歌特征上,浪漫诗派和现实诗派、边寒诗派与咏史诗派、浪漫诗派与咏史诗派在关键词的使用上相似度较高,对不同关键词的TF-IDF值进行配对样本t检验后的p值分别为0.973、0.383、0.052:其余流派间的差异较大。

关键词: NLP, k-means++聚类分析, 方差分析, 唐代诗歌, 特征分析

Abstract: Tang poetry is the cultural treasure of our country, with a large quantity, diverse styles andthemes. In order to discuss the characteristics of Tang poems of different schools, based on natural lan-guage word processing, this study analyzed the stylistic characteristics and differences of the poems of fivemajor schools , namely , landscape pastoral poetry , frontier poetry , romantic poetry, realistic poetry and epic poetry , using statistical methods such as k-means + + cluster analysis, repeated measurement variance a.nalysis and paired sample t test. The results show that Tang poetry itself has strong commonality, such as“* thousands of miles" , “ten thousands of miles" , “where" and other keywords 'TF-lDF values are relativelyhigh in different schools. In terms of poetry characteristics of different schools, romantie poetry and realisticpoetry , frontier poetry and epic poetry , romantic poetry and epic poetry have high similarities in the use ofkeywords , and the p-values of TF-DF values of different keywords after paired sample t-test are 0. 9730. 383 and 0. 052 , respectively. There are significant differences between the other genres.

Key words: NLP, k-means + + clustering analysis , variance analysis, the Tang Dynasty poetry, featureanalysis.

中图分类号: