Modified method of searching keywords and keyterms in text data

D.O. Bukhalenkov, T.M. Zabolotnia

Abstract


This article discusses the issue of automated search for keywords and key terms in text data. To improve the efficiency of the tools of automated search for keywords in the text according to the criteria of absolute accuracy and Jaccard index, a modification of one of the most modern methods for searching for keywords has been developed. A modification of the existing hybrid keyword search method is proposed. It takes into account complex dependencies between pairs of words in the text to determine multi-word expressions, which, unlike the original method, allows finding key terms consisting of several words. Tests of the created modification of the hybrid method of searching for key terms showed the effectiveness of its use for searching for key terms in texts in comparison with existing analogues.

Prombles in programming 2024; 1: 12-22


Keywords


keywords; key terms; text data processing; Python; Stanford classification

References


Shibamouli Lahiri, Sagnik Ray Choudhury, Cornelia Caragea. Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks, 2014.

H. M. Mahedi Hasan, Falguni Sanyal, Dipankar Chaki, Md. Haider Ali. An empirical study of important keyword extraction techniques from documents. 2017. In Proceedings of the 2017 1st International Conference on Intelligent Systems and Information Management, 91-94. CrossRef

Rafael Geraldeli Rossi, Ricardo Marcondes Marcacini, Solange Oliveira Rezende. Analysis of Statistical Key-word Extraction Methods for Incremental Clustering. Proceedings of the 10th of the Encontro Nacional de Inteligˆencia Artificial e Computacional (ENIAC), Fortaleza, Brazil, 2013, 1-12. CrossRef

Takashi Yamauchi, Dongshik Kang, Hayao Miyagi. The Keyword Search Using Thesaurus Concept, 2002 [Online] - Available from: https://koreascience.kr/article/CFKO200211921321260.pdf

K. S. Sampada, N Kavya. Machine Learning Methods for Keyword extraction and Indexing, 2019.

Marie-Catherine de Marneffe, Christopher D. Manning (2008). Stanford typed dependencies manual [Online] - Available from: https://downloads.cs.stanford.edu/nlp/software/dependencies_manual.pdf

Beatrice Santorini (1990). Part-of-Speech Tagging Guidelines for the Penn Treebank Project [Online] - Available from: https://www.cis.upenn.edu/~bies/manuals/tagguide.pdf

Journal of Aerospace Technology and Management [Online] - Available from: https://jatm.com.br/jatm/issue/archive

Rene Gonçalves, Koshun Iha, Francisco Machado, José Rocco. (2012). Ammonium Perchlorate and Ammonium Perchlorate- Hydroxyl Terminated Polybutadiene Simulated Combustion. Journal of Aerospace Technology and Management. 4. CrossRef

Universal Dependency Relations [Online] - Available from: https://universaldependencies.org/u/dep/

Fixed dependency [Online] - Available from: https://universaldependencies.org/u/dep/fixed.html

Flat dependency [Online] - Available from: https://universaldependencies.org/u/dep/flat.html

Compound dependency [Online] - Available from: https://universaldependencies.org/u/dep/compound

Steven Bird, Ewan Klein, Edward Loper. (2009). Natural Language Processing with Python.

NC Chung, B. Miasojedow, M. Startek, A. Gambin (2019). "Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data". BMC Bioinformatics. CrossRef

Maurício Silva, Victor Gamarra, Koldaev Vitor. (2009). Control of Reynolds number in a high speed wind tunnel. Journal of Aerospace Technology and Management. 1. CrossRef

Dietrich Klakow, Peters Jochen (2002). "Testing the correlation of word error rate and perplexity". Speech Communication. 38 (1-2), 19-28. CrossRef

AllenNLP Library [Online] - Available from: https://allenai.org/allennlp/software/allennlp-library

JiWER [Online] - Available from: https://jitsi.github.io/jiwer/

Keyword Extractor - MonkeyLearn [Online] - Available from: https://monkeylearn.com/keyword-extractor-online/

Keyword Extraction: A Guide to Finding Keywords in Text - MonkeyLearn [Online] - Available from: https://monkeylearn.com/keyword-extraction/




DOI: https://doi.org/10.15407/pp2024.01.012

Refbacks

  • There are currently no refbacks.