Online supplement for
Neumann, Stella & Evert, Stefan (2021). A register variation perspective on varieties of English. In: E. Seoane & D. Biber (eds.), Corpus based approaches to register variation, pp. 143–178. Amsterdam: Benjamins.
This chapter reports an exploration of dimensions of register variation across varieties of English. We analyse 2,844 texts from the Hong Kong, Jamaica and New Zealand components of the International Corpus of English, using its text categorization scheme as a frame of reference. We apply Geometric Multivariate Analysis (Evert & Neumann 2017), an interactive procedure for exploring latent structure in language variation, based on the frequencies of 41 lexico-grammatical features informed by systemic functional register theory. Visual inspection of the distribution of texts across the multidimensional space reveals continuities between groups of texts as well as dimensions of variation that can be related to theoretical register constructs. We also observe differences between the three ICE components (and their text categories) in register space.
Corpus data & pre-processing
In this section, all feature extraction and post-processing scripts will be provided to ensure full reproducibility of our analyses.
It is possible to capture the full complexity of our data set and analysis with a few selected plots in a paper. Therefore, we have implemented two interactive Web Apps using the R/Shiny framework. Readers can view preset configurations referenced in the paper, but also change all parameters interactively, e.g. to compare the three varieties of English.
- Scatterplot Viewer – visualize geometric shape of data set in LDA dimensions
- Weights & Feature Contributions – visualize contribution of features to LDA dimension scores
This section provides interactive 3D scatterplots (on supported Web browsers), which give a better impression of the multidimensional nature of register variation than 2D plots.
each 3D view has a size of 2.7 MiB and may take fairly long to load and display
- LDA dimensions 1, 2 and 3 | colour = text category
- LDA dimensions 1, 2 and 4 | colour = text category
- Unsupervised PCA | colour = text category – not discussed in paper
- Unsupervised PCA | colour = variety – not discussed in paper
- Structure of feature contributions to LDA dimension 1 | colour = text category – will be the topic of future work
- Structure of feature contributions to LDA dimension 4 | colour = text category – will be the topic of future work