Formation and analysis of the efficiency of the dataset for teaching language models to recognize and analyze the source code of programs
Abstract
Formation and analysis of the efficiency of the dataset for teaching language models to recognize and analyze the source code of programs
Incoming article date: 23.04.2022This article describes the formation of a training set for training language neural networks for their use in tasks related to the analysis and search for matches and / or correspondences in meaning / value, and specifically with functions and methods in the source code of a programming language. The key parameters required in the sample for the correct training of the neural network are determined.
Keywords: source code, machine learning, natural language processing, neural network, data analysis