The article provides a brief description of the existing methods of vectorization of texts in natural language. The evaluation is described by the method of determining the similarity of words. A comparative analysis of the operation of several vectorizer models is carried out. The process of selecting data for evaluation is described. The results of evaluating the performance of the models are compared.
Keywords: natural language processing, vectorization, word-form embedding, semantic similarity, correlation
The article presents ways to improve the accuracy of the classification of normative and reference information using hierarchical clustering algorithms.
Keywords: machine learning, artificial neural network, convolutional neural network, normative reference information, hierarchical clustering, DIANA