Rhetorical Structure Theory
Rhetorical Structure Theory (RST) was originally formulated by William Mann and Sandra Thompson of the University of Southern California's Information Sciences Institute (ISI) in 1988.[1][2] This theory was developed as part of studies of computer based text generation. Natural language researchers later began using RST in text summarization and other applications. RST addresses text organization by means of relations that hold between parts of text. It explains coherence by postulating a hierarchical, connected structure of texts.
In 2000, Daniel Marcu, also of ISI, demonstrated that practical discourse parsing and text summarization also could be achieved using RST.[3] Marcu was named a fellow of the Association for Computational Linguistics in 2014 for his "significant contributions to discourse parsing, summarization, and machine translation and to kickstarting the statistical machine translation industry."[4]
Rhetorical relations
Rhetorical relations or coherence relations or discourse relations are paratactic (coordinate) or hypotactic (subordinate) relations that hold across two or more text spans.[5] It is widely accepted that notion of coherence is through text relations like this. RST using rhetorical relations provide a systematic way for an analyst to analyse the text. An analysis is usually built by reading the text and constructing a tree using the relations. The following example is a title and summary, appearing at the top of an article in Scientific American magazine (Ramachandran and Anstis, 1986). The original text, broken into numbered units, is:[6]
- [Title:] The Perception of Apparent Motion
- [Abstract:] When the motion of an intermittently seen object is ambiguous
- the visual system resolves confusion
- by applying some tricks that reflect a built-in knowledge of properties of the physical world
In the figure, numbers 1,2,3,4 show the corresponding units as explained above. The fourth unit and the third unit form a relation 'Means'. The fourth unit is the essential part of this relation, so it is called the nucleus of the relation and third unit is called the satellite of the relation. Similarly second unit to third and fourth unit is forming relation ′Condition'. All units are also spans and spans may be composed of more than one unit.
Nuclearity in Discourse
RST establishes two different types of units. Nuclei are considered as the most important parts of text whereas satellites contribute to the nuclei and are secondary. Nucleus contains basic information and satellite contains additional information about nucleus. The satellite is often incomprehensible without nucleus. whereas a text where a satellites have been deleted can be understood to a certain extent.
Hierarchy in the Analysis
RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels.
Why RST?
- From linguistic point of view, RST proposes a different view of text organization than most linguistic theories.
- RST points to a tight relation between relations and coherence in text
- From a computational point of view, it provides a characterization of text relations that has been implemented in different systems and for applications as text generation[7] and summarization.[8]
References
- ↑ http://www.cis.upenn.edu/~nenkova/Courses/cis700-2/rst.pdf
- ↑ http://www.aclweb.org/anthology/J05-2001
- ↑ http://www.isi.edu/about/history/timeline/
- ↑ https://www.aclweb.org/aclwiki/index.php?title=ACL_Fellows
- ↑ http://www.sfu.ca/~mtaboada/docs/Taboada_Implicit_Explicit.pdf
- ↑ http://www.sfu.ca/~mtaboada/docs/Taboada_Mann_RST_Part1.pdf
- ↑ http://ccl.pku.edu.cn/doubtfire/NLP/Text_Generation/RST/RST%20and%20Text%20Generation.htm
- ↑ http://www.icmc.usp.br/pessoas/taspardo/ISDA2008-UzedaEtAl.pdf