Online supplement for
Neumann, Stella & Evert, Stefan (2021). A register variation perspective on varieties of English. In: E. Seoane & D. Biber (eds.), Corpus based approaches to register variation, pp. 143–178. Amsterdam: Benjamins.
This chapter reports an exploration of dimensions of register variation across varieties of English. We analyse 2,844 texts from the Hong Kong, Jamaica and New Zealand components of the International Corpus of English, using its text categorization scheme as a frame of reference. We apply Geometric Multivariate Analysis (Evert & Neumann 2017), an interactive procedure for exploring latent structure in language variation, based on the frequencies of 41 lexico-grammatical features informed by systemic functional register theory. Visual inspection of the distribution of texts across the multidimensional space reveals continuities between groups of texts as well as dimensions of variation that can be related to theoretical register constructs. We also observe differences between the three ICE components (and their text categories) in register space.
Data sets & replication scripts
This section provides data sets and R code for replication of our GMA analysis. CQP queries and our scripting framework for feature extraction are also provided, but you will have to obtain suitable CWB-indexed versions of the required ICE components to re-run the extraction.
- Feature extraction package: feature_extraction.zip
- Summary of the data set & feature scaling: PDF – HTML
- Pre-processed data set: ice_preprocessed.rda (4.3 MiB)
- GMA analysis and Shiny viewers: analysis_scripts.zip
It is possible to capture the full complexity of our data set and analysis with a few selected plots in a paper. Therefore, we have implemented two interactive Web Apps using the R/Shiny framework. Readers can view preset configurations referenced in the paper, but also change all parameters interactively, e.g. to compare the three varieties of English.
- Scatterplot Viewer – visualize geometric shape of data set in LDA dimensions
- Weights & Feature Contributions – visualize contribution of features to LDA dimension scores
This section provides interactive 3D scatterplots (on supported Web browsers), which give a better impression of the multidimensional nature of register variation than 2D plots.
each 3D view has a size of 2.7 MiB and may take fairly long to load and display
- LDA dimensions 1, 2 and 3 | colour = text category
- LDA dimensions 1, 2 and 4 | colour = text category
- Unsupervised PCA | colour = text category – not discussed in paper
- Unsupervised PCA | colour = variety – not discussed in paper
- Structure of feature contributions to LDA dimension 1 | colour = text category – will be the topic of future work
- Structure of feature contributions to LDA dimension 4 | colour = text category – will be the topic of future work