Using the determining the similarity of words method to evaluate text vectorization algorithms
Abstract
Using the determining the similarity of words method to evaluate text vectorization algorithms
Incoming article date: 03.11.2024The article presents the existing methods of reducing the dimensionality of data for teaching machine models of natural language. The concepts of text vectorization and word-form embedding are introduced. The task of text classification is being formed. The stages of classifier training are being formed. A classifying neural network is being designed. A series of experiments is being conducted to determine the effect of reducing the dimension of word-form embeddings on the quality of text classification. The results of evaluating the work of trained classifiers are compared.
Keywords: natural language processing, vectorization, word-form embedding, text classification, data dimensionality reduction, classifier