THỨ TƯ,NGÀY 22 THÁNG 4, 2020

Computational authorship studies are an increasingly popular topic for research among specialists per both calcolatore elettronico science and the humanities

Bởi Nguyễn Hoàng Phong

Cập nhật: 17/06/2022, 03:15

Computational authorship studies are an increasingly popular topic for research among specialists per both calcolatore elettronico science and the humanities

It can be considered per form of style-based document authentication (Echtheitskritik), which has valuable applications that extend well beyond the domain of literary analysis, puro, for instance, the domain of forensic sciences. According to Stamatatos’s 2009 survey of the field, ‘[t]he main preoccupazione behind statistically or computationally-supported authorship attribution is that by measuring some textual features we can distinguish between texts written by different authors.’22 22 Addirittura. Stamatatos, ‘Verso survey’ (n. 14, above) 538. This basic assumption implies that it should be possible preciso assess, for any new unseen document, whether or not it was written by other authors for whom we have texts available. Nowadays computational authorship studies are often considered per subfield of stylometry mediante the digital humanities, the broader computational study of the writing style of texts.23 23 D. Holmes, ‘The evolution of stylometry con humanities scholarship’, LLC 13 (1998) 111–17.

While stylometry has per rich history, dating back preciso at least the nineteenth century, it is clear that it received its most important impetus only durante the past two or three decades, stimulated by the rise of (personal) computing and the increased availability of large bodies of text mediante electronic form. Apart from the influential, yet more conventional, statistical analyses carried out by pioneers such as Mosteller and Wallace or John Burrows well before the 1990s, an influential approach durante authorship studies has been onesto approach the attribution of anonymous texts as verso ‘text categorization’ problem.24 24 Mosteller and Wallace, Inference and disputed authorship (n. 4, above) and J. Burrows, Computation into criticism: per study of Jane Austen’s novels (Oxford 1987). Heavily influenced by parallel research sopra cervello elettronico science, the ispirazione was to optimize a statistical classifier on example texts by per number of available candidate authors, much like verso spam filter nowadays is still trained on manually annotated emails to learn how esatto distinguish between ‘junk’ email and normal messages.25 25 F. Sebastiani, ‘Machine learning con automated text categorisation’, ACM Pc Surveys 34 (2002) 1–47. After addestramento such per classifier on this example giorno, the classifier could then be used to categorize or classify anonymous text as belonging onesto one of the preparazione authors’ oeuvres.

It resembles per police lineup, con which the correct author of an anonymous text has preciso be singled out from a series of available candidate authors for whom reference or ‘training’ material is available

This text categorization setup is commonly known as ‘authorship attribution’.26 26 The following paragraph heavily draws on M. Koppel and Y. Winter, ‘Determining if two documents are written by the same author’, JASIST 65 (2014) 178–187. For a number of years, practitioners of stylometry have che tipo di onesto acknowledge the limitations of authorship attribution, because it necessarily assumes that the correct target author is indeed included per the attrezzi of candidates. Per many real-world cases, this problematic assumption cannot possibly be made, because the attrezzi of relevant candidates is difficult or impossible sicuro establish beforehand. Because of this, the setup of authorship verification has recently been introduced as verso new framework: here, the task is preciso verify whether or not an anonymous document was written by one or several of a series of candidate authors. Sopra some sense, authorship verification redefines the text categorization problem by adding an additional category label: ‘None of the above.’

In the present context, it should be emphasized that the problem posed by the HA is a ‘vanilla’ example of verso problem per authorship verification: while the insieme indeed contains per number of (auto-) attributions, the veracity of all of these has been questioned con previous scholarship

Verification is hence an increasingly common experimental setup per authorship studies, and is the topic of a dedicated track con the yearly PAN competition, an annual competition on finding computational solutions sicuro issues sopra present-day textual forensics, mostly related preciso the detection of plagiarism, authorship, and social programma misuse (such as grooming or Wikipedia vandalism).27 27 The competition’s website is pan.webis.de. The most recent survey of an authorship datingranking.net/it/our-teen-network-review/ verification track is: Ancora. Stamatatos et al., ‘Overview of the author identification task at PAN 2015′ per Working Taccuino Papers of the CLEF 2015 Evaluation Labs, ancora. L. Cappellato et al. (2015). Generally speaking, authorship verification is a more generic problem than authorship attribution – i.ancora. every attribution problem could, sopra principle, be cast as a verification problem – but it has also proven onesto be more challenging. Mediante our experiments, we have therefore attempted sicuro radically minimize any assumptions on our part as puro the authorial provenance of the texts per the HA. For each piece of text analysed below, we propose puro independently assess the probability that it was written by one of the (alleged) individual authors identified per the insieme.

Bình luận

Tôn trọng lẫn nhau, hãy giữ cuộc tranh luận một cách văn minh và không đi vượt quá chủ đề chính. Thoải mái được chỉ trích ý kiến nhưng không được chỉ trích cá nhân. Chúng tôi sẽ xóa bình luận nếu nó vi phạm Nguyên tắc cộng đồng của chúng tôi

Chưa có bình luận. Sao bạn không là người đầu tiên bình luận nhỉ?

SEARCH