Approximation of the meaning for thematic subject headings by simple interpretable representationsстатья
Статья опубликована в журнале из списка RSCI Web of Science
Информация о цитировании статьи получена из
Scopus
Статья опубликована в журнале из перечня ВАК
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 2 октября 2024 г.
Аннотация:The paper studies methods for approximating a user labeled topics by simple representations in a text classification problem. It is assumed that in real information systems the meaning of thematic categories can be approximated by a fairly simple interpreted expression. An algorithm for constructing formulas is considered, which constructs a representation of a text topic in the form of a Boolean formula—in fact, a request to a full-text information system. The algorithm is based on an optimized selection of various logical predicates with words and terms from the thesaurus. The presented algorithm has been compared with modern machine learning techniques on real collections with noisy expert markup. The described method can be used for text classification, expert evaluation of the content of the heading, assessment of the complexity of the description of the topic, and correcting the markup.