Search for patent analogues based on a comparison of key phrases
Abstract
Search for patent analogues based on a comparison of key phrases
Incoming article date: 01.10.2024This study describes approaches to automating full-text keyword search in the field of patent information. Automating the search by keywords (n-grams) is a significantly more difficult task than searching by individual words, in addition, it requires morphological and syntactic analysis of the text. To achieve this goal, the following tasks were solved: (a) the full-text search systems were analyzed: Apache Solr, ElasticSearch and ClickHouse; (b) a comparison of the architectures and basic capabilities of each system was carried out; (c) search results in Apache Solr, ElasticSearch and ClickHouse were obtained on the same dataset. The following conclusions were drawn: (a) all the systems considered perform full-text keyword search; (b) Apache Solr is the system with the highest performance, it also has very convenient functions; (b) ElasticSearch has a fast and powerful architecture; (c) ClickHouse has a high data processing speed.
Keywords: search, keyphrases, patent, Apache Solr, Elasticsearch, ClickHouse