A method for automated formation of a training data set for machine learning algorithms for classification of electronic documents
Abstract
A method for automated formation of a training data set for machine learning algorithms for classification of electronic documents
Incoming article date: 24.08.2023The article considers a method of automated formation of a training data set for machine learning algorithms for classification of electronic documents, which differs from the known ones by forming training data sets based on the synthesis of clustering and data augmentation methods based on calculating the distance between objects in multidimensional spaces.
Keywords: teaching with a teacher, clustering, pattern recognition, machine learning algorithm, electronic document, vectorization, formalized documents