Формирование и анализ эффективности выборки для обучения языковых моделей распознаванию и анализу исходного кода программ

Какутин Д.Ю.; Дмитриев А.С.

Formation and analysis of the efficiency of the dataset for teaching language models to recognize and analyze the source code of programs

Kakutin D.Y., Dmitriev A.S.

Incoming article date: 23.04.2022

This article describes the formation of a training set for training language neural networks for their use in tasks related to the analysis and search for matches and / or correspondences in meaning / value, and specifically with functions and methods in the source code of a programming language. The key parameters required in the sample for the correct training of the neural network are determined.

Keywords: source code, machine learning, natural language processing, neural network, data analysis