A Literature Survey on Writing Style Change Detection Based on Machine Learning: State-Of-The-Art - Review |
||
|
|
|
© 2022 by IJCTT Journal | ||
Volume-70 Issue-5 |
||
Year of Publication : 2022 | ||
Authors : Vivian Anyango Oloo, Calvins Otieno, Lilian Awuor Wanzare | ||
DOI : 10.14445/22312803/IJCTT-V70I5P103 |
How to Cite?
Vivian Anyango Oloo, Calvins Otieno, Lilian Awuor Wanzare, "A Literature Survey on Writing Style Change Detection Based on Machine Learning: State-Of-The-Art - Review," International Journal of Computer Trends and Technology, vol. 70, no. 5, pp. 15-32, 2022. Crossref, https://doi.org/10.14445/22312803/IJCTT-V70I5P103
Abstract
The goal of the Style Change Detection task is to detect the stylistic changes in a document and exploit them to determine the number of authors. This study reviewed nineteen (19) state of the art papers and articles on writing style change detection. The papers were identified and selected based on study area, year of publication and the technique proposed for writing style change detection. The focus of this study was to investigate the features used, the techniques and the results obtained by these state of the art studies. Three categories were defined and all papers placed in one of the groups based on the problem it was solving. The study found out that the most commonly used feature category was the lexical features although using feature combinations yields better results. In addition, simple distance measures were shown to outperform other state-of-the art techniques in authorship clustering and style change detection. The use of ensembles of algorithms is recommended for style change detection tasks when the text length is short and the dataset is large.
Keywords
Authorship, Clustering algorithms, Multiple authorship, Stylometry, Style change detection.
Reference
[1] Abbasi, A., & Chen, H. (2005). Applying authorship analysis to extremist-group Web forum messages. IEEE Intelligent Systems, 20(5), 67–75. https://doi.org/10.1109/MIS.2005.81
[2] Ahmed, H. (2018). The Role of Linguistic Feature Categories in Authorship Verification. Procedia Computer Science, 142, 214–221. https://doi.org/10.1016/j.procs.2018.10.478
[3] Akiva, N., & Koppel, M. (2012). Identifying distinct components of a multi-author document. Proceedings - 2012 European Intelligence and Security Informatics Conference, EISIC 2012, 205–209. https://doi.org/10.1109/EISIC.2012.16
[4] Alberts, H. (2017). Author clustering with the aid of a simple distance measure: Notebook for PAN at CLEF 2017. CEUR Workshop Proceedings, 1866.
[5] Brocardo, M. L., Traore, I., Saad, S., & Woungang, I. (2013). Authorship verification for short messages using stylometry. 2013 International Conference on Computer, Information and Telecommunication Systems, CITS 2013. https://doi.org/10.1109/CITS.2013.6705711
[6] Brocardo, M. L., Traore, I., & Woungang, I. (2015). Authorship verification of e-mail and tweet messages applied for continuous authentication. Journal of Computer and System Sciences, 81(8), 1429– 1440. https://doi.org/10.1016/J.JCSS.2014.12.019
[7] Castro-Castro, D., Alberto Rodríguez-Losada, C., & Muñoz, R. (n.d.). Mixed Style Feature Representation and B 0-maximal Clustering for Style Change Detection Notebook for PAN at CLEF 2020.
[8] Daelemans, W., Verhoeven, B., Potthast, M., Stamatatos, E., Stein, B., Juola, P., Sanchez-Perez, M. A., & Barrón-Cedeño, A. (n.d.). Overview of the Author Identification Task at PAN 2014.
[9] Deibel, R., & Löfflad, D. (2021). Style change detection on real-world data using an LSTM-powered attribution algorithm. CEUR Workshop Proceedings, 2936, 1899–1909.
[10] Ding, S. H. H., Fung, B. C. M., Iqbal, F., & Cheung, W. K. (2016). Learning Stylometric Representations for Authorship Analysis.
[11] García-mondeja, Y., Castro-castro, D., & Lavielle-castro, V. (2017). Discovering Author Groups using a ?-compact. 1–6. http://ceurws.org/Vol-1866/
[12] Gómez-Adorno, H., Aleman, Y., Vilariño, D., Sanchez-Perez, M. A., Pinto, D., & Sidorov, G. (2017). Author clustering using hierarchical Clustering analysis: Notebook for PAN at CLEF 2017. CEUR Workshop Proceedings, 1866.
[13] Gómez-Adorno, H., Posadas-Duran, J. P., Ríos-Toledo, G., Sidorov, G., & Sierra, G. (2018). Stylometry-based approach for detecting writing style changes in literary texts. Computacion y Sistemas, 22(1), 47–53. https://doi.org/10.13053/CyS-22-1-2882
[14] Gorman, R. (2020). Author identification of short texts using dependency treebanks without vocabulary. Digital Scholarship in the Humanities, 35(4), 812–825. https://doi.org/10.1093/LLC/FQZ070
[15] Halvani, O., & Graner, L. (2017). Author Clustering using compressionbased dissimilarity scores: Notebook for PAN at CLEF 2017. CEUR Workshop Proceedings, 1866.
[16] Hosseinia, M., & Mukherjee, A. (2018). A parallel hierarchical attention network for style change detection: Notebook for PAN at CLEF 2018. CEUR Workshop Proceedings, 2125.
[17] Howedi, F., Mohd, M., Aborawi, Z. A., & Jowan, S. A. (2020). Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data. Journal of Computer Science, 16(10), 1334–1345. https://doi.org/10.3844/jcssp.2020.1334.1345
[18] Jankowska, M., Milios, E., & Kešelj, V. (n.d.). Author Verification Using Common N-Gram Profiles of Text Documents.
[19] Jiexu L. I., Zheng, R., & Chen, H. (2006). From fingerprint to writeprint. In Communications of the ACM (Vol. 49, Issue 4, pp. 76– 82). Association for Computing Machinery. https://doi.org/10.1145/1121949.1121951
[20] Juola P. (2006). Authorship attribution for electronic documents. IFIP International Federation for Information Processing, 222, 119– 130. https://doi.org/10.1007/0-387-36891-4_10
[21] Kara?,D., ?piewak, M., & Sobecki, P. (2017). OPI-JSA at CLEF 2017: Author clustering and style breach detection: Notebook for PAN at CLEF 2017. CEUR Workshop Proceedings, 1866.
[22] Kaur R., Singh, S., & Kumar, H. (2020). TB-CoAuth: Text based continuous authentication for detecting compromised accounts in social networks. Applied Soft Computing Journal, 97.
[23] Kestemont, M., Tschuggnall, M., Stamatatos, E., Daelemans, W., Specht, G., Stein, B., & Potthast, M. (2018). Overview of the Author Identification Task at PAN-2018 Cross-domain Authorship Attribution and Style Change Detection.
[24] Khan, J. A. (2018). A model for style change detection at a glance: Notebook for PAN at CLEF 2018. CEUR Workshop Proceedings, 2125.
[25] Kocher, M. (2016). UniNE at CLEF 2016: Author Clustering. CEUR Workshop Proceedings, 1609, 895–902.
[26] Kocher, M., & Savoy, J. (2017). UniNE at CLEF 2017: Author clustering: Notebook for PAN at CLEF 2017. CEUR Workshop Proceedings, 1866.
[27] Koppel, M., & Schler, J. (2004). Authorship verification as a oneclass classification problem. Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004, 489–495. https://doi.org/10.1145/1015330.1015448
[28] Kuznetsov, M., Motrenko, A., Kuznetsova, R., & Strijov, V. (2016). Methods for intrinsic plagiarism detection and author diarization. CEUR Workshop Proceedings, 1609, 912–919.
[29] Iyer, A., & Vosoughi, S. (2020). Style Change Detection Using BERT. Clef 2020. http://ceur-ws.org/Vol-2696/paper_232.pdf
[30] Nath, S. (2019). Style Change Detection by Threshold Based and Window Merge Clustering Methods ( Notebook paper ) Style Change Detection by Threshold Based and Window Merge Clustering Methods. September.
[31] Nath,S. (2021). Style change detection using Siamese neural networks.
[32] Potha, N., & Stamatatos, E. (2018, July). Intrinsic author verification using topic modeling. ACM International Conference Proceeding Series. https://doi.org/10.1145/3200947.3201013
[33] Ramnial, H., Panchoo, S., & Pudaruth, S. (2016). Authorship attribution using stylometry and machine learning techniques. Advances in Intelligent Systems and Computing, 384, 113–125. https://doi.org/10.1007/978-3-319-23036-8_10
[34] Rexha, A., Kröll, M., Ziak, H., & Kern, R. (2018). Authorship identification of documents with high content similarity. Scientometrics, 115(1), 223–237. https://doi.org/10.1007/s11192-018-2661-6
[35] Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., & Stein, B. (2016). Overview of PAN’16: New challenges for authorship analysis: Cross-genre profiling, clustering, diarization, and obfuscation. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9822 LNCS, 332–350. https://doi.org/10.1007/978-3-319-44564-9_28
[36] Safin, K., & Ogaltsov, A. (2018). Detecting a change of style using text statistics: Notebook for PAN at CLEF 2018. CEUR Workshop Proceedings, 2125.
[37] Safin, K., & Kuznetsova, R. (2017). Style breach detection with neural sentence embeddings: Notebook for PAN at CLEF 2017. CEUR Workshop Proceedings, 1866.
[38] Sari, Y. (2018). Neural and Non-neural Approaches to Authorship attribution.
[39] Sari, Y., & Stevenson, M. (2016). Exploring Word Embeddings and Character N -Grams for Author Clustering Notebook for PAN at CLEF 2016. Working Notes for CLEF.
[40] Sittar, A., Iqbal, H. R., & Nawab, R. M. A. (2016). Author diarization using cluster-distance approach. CEUR Workshop Proceedings, 1609, 1000–1007.
[41] Str, E. (2021). Multi-label Style Change Detection by Solving a Binary Classification Problem.
[42] Tschuggnall, M., Stamatatos, E., Verhoeven, B., Daelemans, W., Specht, G., Stein, B., & Potthast, M. (2017). Overview of the author identification task at PAN-2017: Style breach detection and author clustering. CEUR Workshop Proceedings, 1866.
[43] Zangerle, E., Tschuggnall, M., Specht, G., Stein, B., & Potthast, M. (2019). Overview of the Style Change Detection Task at PAN 2019. September, 9–12.
[44] Zangerle, E., Mayerl, M., Specht, G., Potthast, M., & Stein, B. (2020). Overview of the Style Change Detection Task at PAN 2020. CEUR Workshop Proceedings, 2696.
[45] Zhang, Z., Han, Z., Kong, L., Miao, X., Peng, Z., Zeng, J., Cao, H., Zhang, J., Xiao, Z., & Peng, X. (2021). Style change detection based on writing style similarity. CEUR Workshop Proceedings, 2936, 2208–2211.
[46] Zuo, C., Zhao, Y., & Banerjee, R. (2019). Style Change Detection with Feed-forward Neural Networks. September, 9–12.