Online Supplement for Neumann & Evert (2021)

Online supplement for

Neumann, Stella & Evert, Stefan (2021). A register variation perspective on varieties of English. In: E. Seoane & D. Biber (eds.), Corpus based approaches to register variation, pp. 143–178. Amsterdam: Benjamins.

manuscript (PDF) – citation (.bib)

This chapter reports an exploration of dimensions of register variation across varieties of English. We analyse 2,844 texts from the Hong Kong, Jamaica and New Zealand components of the International Corpus of English, using its text categorization scheme as a frame of reference. We apply Geometric Multivariate Analysis (Evert & Neumann 2017), an interactive procedure for exploring latent structure in language variation, based on the frequencies of 41 lexico-grammatical features informed by systemic functional register theory. Visual inspection of the distribution of texts across the multidimensional space reveals continuities between groups of texts as well as dimensions of variation that can be related to theoretical register constructs. We also observe differences between the three ICE components (and their text categories) in register space.

This section provides data sets and R code for replication of our GMA analysis. CQP queries and our scripting framework for feature extraction are also provided, but you will have to obtain suitable CWB-indexed versions of the required ICE components to re-run the extraction.

Feature extraction package: feature_extraction.zip
Summary of the data set & feature scaling: PDF – HTML
Pre-processed data set: ice_preprocessed.rda (4.3 MiB)
GMA analysis and Shiny viewers: analysis_scripts.zip

It is possible to capture the full complexity of our data set and analysis with a few selected plots in a paper. Therefore, we have implemented two interactive Web Apps using the R/Shiny framework. Readers can view preset configurations referenced in the paper, but also change all parameters interactively, e.g. to compare the three varieties of English.

Scatterplot Viewer – visualize geometric shape of data set in LDA dimensions
Weights & Feature Contributions – visualize contribution of features to LDA dimension scores

This section provides interactive 3D scatterplots (on supported Web browsers), which give a better impression of the multidimensional nature of register variation than 2D plots.

each 3D view has a size of 2.7 MiB and may take fairly long to load and display

LDA dimensions 1, 2 and 3 | colour = text category
LDA dimensions 1, 2 and 4 | colour = text category
Unsupervised PCA | colour = text category – not discussed in paper
Unsupervised PCA | colour = variety – not discussed in paper
Structure of feature contributions to LDA dimension 1 | colour = text category – will be the topic of future work
Structure of feature contributions to LDA dimension 4 | colour = text category – will be the topic of future work

imprint & privacy

Online supplement for

Manuscript

Data sets & replication scripts

Shiny Apps

3D Visualization