CRUST | Browsing by Author "Demidovich, Inna"

Browsing by Author "Demidovich, Inna"

Now showing 1 - 3 of 3

Authorship Determination of Natural Language Texts by Several Classes of Indicators with Customizable Weights
(CEUR-WS Team, Aachen, Germany, 2021) Shynkarenko, Viktor I.; Demidovich, Inna
ENG: In this work we try to improve the results of texts and their fragments attribution using the classification method of the least distance in Euclidean space of images, by selecting weights for each of the image measures. For weights determination the genetic algorithm was used. Images are formed using statistical and modified recurrent analysis and the text complexity indicators. We will try to identify the effectiveness for each of them. It was found that this method usage improves the efficiency of the text attribution and the reliability of authorship determination of the texts from the control sample reaches 80-91%.
A Dual Approach to Establishing the Authority of Technical Natural Language Texts and Their Components
( Ukrainian State University of Science and Technologies, Dnipro, 2023) Shynkarenko, Viktor I.; Demidovich, Inna; Kuropiatnyk, Olena S.
ENG: Purpose. The study is aimed at testing the hypothesis that it is possible to determine plagiarism by methods of establishing the authorship of a text without using a text bank and their direct comparison. Methodology. Construc-tive and productive models of the processes of establishing the authorship of technical texts for two methods have been developed. The first method is based on the formation of a text model in the form of a set of formal substitution rules with probabilistic weights (as in stochastic formal grammars), which reflects the syntactic features and patterns of text formation by the author. The degree of similarity between the text under study and another text is determined by comparing their models. The second method is a classical approach to detecting borrowings (plagiarism) by directly comparing the text under study with an existing text bank, highlighting repeated text fragments, and determining the degree of originality. Experiments were conducted to establish the correlation between the results of these two methods. The experimental base consisted of 509 text sections of theses of students majoring in «Software Engineering». Findings. Experimental studies have made it possible to establish a high correlation between the results of the two methods. Correlation coefficients in the range of 0.75...1.0 and with an average value of 0.88 were obtained provided that borrowings are taken into account for text fragments of at least five words in length. Originality. For the first time, the authors have identified the possibilities and proposed methods for indirect plagiarism detection without using a large text bank. The essence of the model is to formalize the representation of the author's sentence syntax by a set of substitution rules with probabilistic weights. Practical value. Based on the results obtained, the possibilities for detecting borrowings have been expanded and the effectiveness of the corre-sponding methods has been increased. Recommendations on the parameters of classical methods for detecting borrowings have been obtained, in particular, it is recommended to take into account text fragments of at least five words in length as a rational parameter when using borrowing detection systems. The possibilities of text authorship detection methods tested on fiction texts are extended to technical texts.
Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task
(IEEE, 2021) Demidovich, Inna; Shynkarenko, Viktor I.; Kuropiatnyk, Olena; Kirichenko, Oleksandr
ENG: The previously developed method establishes the natural language texts authorship based on frequency analysis, supplemented by indicators of text complexity and recurrent analysis. The authorship indication problem is reduced to the pattern recognition classical theory. To account for the different individual indicators information content, their weights are taken into account. They are determined according to the maximum number of the correctly established texts authorship from the training sample using a genetic algorithm. This method is used to study the effectiveness of the author's style representation that is based on different types of words processing: two types of words stems and 4-grams. To obtain stems, the adapted Porter stemmer is used and creating a dictionary of the foundations of the Ukrainian language original method is applied, respectively. Taking into account the calculated indicators weights, the reliability of establishing the text authorship in the control sample reached 85-91%.

Browsing by Author "Demidovich, Inna"

Results Per Page

Sort Options