Mapping Cognitive Place Associations within the United Kingdom through Online Discussion on Reddit

open-access
figshare
github
peer-reviewed
journal
This published research article builds a custom transformer-based geoparsing pipeline using the Hugging Face transformers Python library, to extract all place names from 8.3 million Reddit comments. DVC pipelines are used to ensure full reproducibility of the model workflow.
Authors
Affiliations

Cillian Berragan

University of Liverpool

Alex Singleton

University of Liverpool

Alessia Calafiore

University of Edinburgh

Jeremy Morley

Ordnance Survey

Published

January 8, 2024

Doi
Abstract

This paper explores cognitive place associations; conceptualised as a place-based mental model that derives subconscious links between geographic locations. Utilising a large corpus of online discussion data from the social media website Reddit, we experiment on the extraction of such geographic knowledge from unstructured text. First we construct a system to identify place names found in Reddit comments, disambiguating each to a set of coordinates where possible. Following this, we build a collective picture of cognitive place associations in the United Kingdom, linking locations that co-occur in user comments and evaluating the effect of distance on the strength of these associations. Exploring these geographies nationally, associations were shown to be typically weaker over greater distances. This distance decay is also highly regional, rural areas typically have greater levels of distance decay, particularly in Wales and Scotland. When comparing major cities across the UK, we observe distinct distance decay patterns, influenced primarily by proximity to other cities.

Keywords

cognitive place connectivity, distance decay, geoparsing, mental maps, natural language processing, United Kingdom

Introduction

The importance of relational thinking to understand geographical phenomena has been widely acknowledged in human and computational geography (Glückler and Panitz 2021; Bergmann and O’Sullivan 2018; Lukermann 1961). Spatial networks have been explored from a variety of perspectives, to uncover the dynamics underpinning the spatial behaviours of individuals (González, Hidalgo, and Barabási 2008; Noulas et al. 2011), or to challenge conceptualisations of regions as bounded by administrative definitions (Calafiore et al. 2021; Alessandretti, Aslak, and Lehmann 2020).

Within computational geography, most research has explored direct connections between places by investigating physical movements of individuals, using population movement data from both traditional data sources such as Census or surveys (Rae 2009; Titheridge et al. 2009), or through alternative forms of data like transport records (Yang, Li, and Li 2019; Allard and Moura 2016; Gong et al. 2021; Farber and Li 2013), mobile phone data (Lin, Wu, and Li 2019; SafeGraph 2022; Rowe et al. 2022), and geotagged social media (Steiger et al. 2015; Arthur and Williams 2019; Ostermann et al. 2015; Z. Li et al. 2021). However, focussing only on connections built through population movement conceals associations that persist through individuals or community subconscious, regardless of any physical movement.

Literature discussing the role of human cognition in constructing mental images of cities (lynch1964?), and how they can be represented through mental maps (Gould and White 1986), reveals that the way humans conceive spatial structures and associations between places are substantially entrenched in individuals’ experiences and geographic knowledge, which only partially derive from movements. Places represent a complex network of socio-spatial relationships that emerge from linked individual experiences (Pierce, Martin, and Murphy 2011), enabling the definition of collectively recognised place associations. While movements in geographic space are limited by time and distance (Miller 2018; Patterson and Farber 2015), representational spaces expressed through mental maps are not necessarily bounded by spatio-temporal constraints (Merrifield 1993). Modern developments in transport and communication access warp the perceptions of distance between places (Massey 2008), and in turn their perceived level of connectivity (Fabrikant et al. 2002).

Alternatively, online sources of data offer novel opportunities to explore place associations, built directly from the passive contributions of individual users. Recent work has demonstrated how digital social friendships (Bailey et al. 2018), or embedded links in Wikipedia articles (Salvini and Fabrikant 2016), may be used to provide insight into social place connections. Other works have instead considered that text itself can be used to quantify relationships between geographic terms, described as ‘geo-semantic relatedness’ (Ballatore, Bertolotto, and Wilson 2014). Work building on this concept has applied it to city and region names identified in news articles, social media, and general web pages (Hu, Ye, and Shaw 2017; Ye, Gong, and Li 2021; Liu et al. 2014; Meijers and Peris 2019).

Distance is a key influence on observed levels of connectivity in spatial interaction literature (Haynes and Fotheringham 1985), and Tobler’s first law of geography, where locations that are further apart are typically less well-connected (Tobler 1970), has generated the term ‘distance decay’ (Taylor 1983), which has various forms of mathematical representation. Given a legacy of empirical evidence, distance decay in its various forms can be sensibly assumed in place connections when both temporal and spatial constraints are considered in our physical environment. However, when considering the links between locations from the perspective of cognitive associations built through mental maps, such constraints are no longer as restrictive (Fabrikant et al. 2002). Quantifying the effect of distance on cognitive place associations may therefore result in unexpected patterns in the effect of distance on associations, that reveal the cognitive biases used to construct mental maps.

The objective of this paper is to quantify cognitive place associations across the UK1 to build mental maps, while evaluating the effect of distance on the strength of these associations, measuring the level of distance decay through a gravity model. To generate association measures from a cognitive rather than geographic perspective, we infer associations through co-occurring locations extracted from a large corpus of informal, unstructured and discursive text from the social media website Reddit. Locations when mentioned in informal comments are drawn from a cognitive process associated with mental maps of these locations, subconsciously illustrating associations between places from memory and based on experience.

Section 2 outlines existing literature relating to cognitive place associations, detailing methods that can be used for the automated extraction and grounding of place names2 from a large corpus of unstructured text. Section 3 provides details on our data sources, our methodology for geoparsing place names, and the computation of a gravity model to examine the effects of distance decay on the strength of associations. In Section 4 we present the results of our gravity model and demonstrate variations in distance decay with respect to six locations. In Section 5 we conclude our findings and outline the scope for future work.

Cognitive Place Associations

The term ‘mental map’ typically refers to the cognitive visualisation of a geographic environment. They represent collective, experiential geographic knowledge, relating to both places, and the relationships between them (Kaplan 1976). Mental maps exhibit a variety of biases, for example, they are often more detailed with respect to locations that we are familiar with, while others may be less detailed or even absent entirely (Gould and White 1986). The scale and distance between features in mental maps can be warped (Peake and Moore 2004; Marston, Jones III, and Woodward 2005); prominent roads may appear larger than in reality, skyline features in a city may be perceived as less prominent due to their irrelevance to an individual at street level (Gould and White 1986; lynch1964?), and good transport connections narrow the time it takes to reach connected locations, which in turn reduces the perceived distance between them (Massey 2008; Merrifield 1993). Intermediate features along common routes also have varying levels of importance to individuals; unimportant features may appear less prominent than in reality or absent altogether (Carr and Schissler 1969; Kaplan 1976).

The characterisation of mental maps has been well studied from a qualitative perspective, often featuring individual participation for the physical construction of hand-drawn sketches (lynch1964?; Pocock 1976; Daniel R. Montello et al. 2003; Haney and Knowles 1978; Canter 1977; Murray and Spencer 1979; Lee 1973; Gould and White 1986; Goodey 1974). Such approaches typically consider more localised areas that are familiar to selected participants (Pocock 1976; Haney and Knowles 1978; Canter 1977), focussing on mapping landmarks and regions within cities. Others have considered the broad characterisation of larger regions like entire countries (Gould and White 1968; Goodey 1974), where mental maps are less detailed, instead contributing generalised information regarding areas that are deemed important to the participant. Inherently, these techniques capture subjective information from individuals, which may not necessarily conform with a generalised collective knowledge of these geographic locations.

Quantifying associations

Within mobility research, connections between locations are broadly quantified through mapping the flow of populations, goods, services or other entities between origin and destination locations (Shaw and Hesse 2010). This relates to the concept of Spatial Interaction, which describes a mathematical or statistical representation of physical movements over space, typically observed in the context of commuting, migration or information and commodity flows (Haynes and Fotheringham 1985; Shaw and Hesse 2010; Dennett and Wilson 2013; Rowe, Lovelace, and Dennett 2022; Singleton, Wilson, and O’Brien 2012; Rowe et al. 2022). In spatial interaction literature, the strength of connectivity between locations is generally quantified through gravity models, which incorporate the effect of relative distance on the strength of interaction between origin and destination locations (Erlander 1980; Haynes and Fotheringham 1985). Typically, an increase in distance leads to a decrease in spatial interaction, known as ‘distance decay’ (Taylor and Openshaw 1975). While conceptualising geographic connections in this manner builds a picture that is constrained by spatio-temporal movement, this is not necessarily a requirement for a more generalised understanding of associations as a geographic concept (Merriman 2012).

Unlike physical connections, which are described by movements across Euclidean space, cognitive associations are decoupled from the restriction of physical movements; both the distance between locations, and the time taken to travel between them, do not directly influence the strength of cognitive associations. Instead, they capture the persistent perceived associations that reflect the experiential geographic knowledge used by individuals to generate ‘mental maps’ (Gould and White 1986). Such associations capture non-specific subconscious links between locations, influenced by personal experiences, incorporating cultural similarities (Greenberg Raanan and Shoval 2014), distortion through commuting methods and navigation technologies (Peake and Moore 2004), online communication (Zook 2006), and other influences on cognitive bias. For example, transport access and telecommunication warp a general sense of perceived geographic distance between certain locations (Massey 2008), and the cognitive understanding of ‘nearness’ does not necessarily correlate with Euclidean distance (Fabrikant et al. 2002; Worboys 2001; Daniel R. Montello 1993). These implicit associations between places are generated through a complex network of socio-spatial relationships, built through linked individual experiences, and allow for shared experiences of places to captured (Pierce, Martin, and Murphy 2011).

Traditional approaches to the exploration of differences between cognitive and real-world associations would have relied on the use of large-scale studies and active individual participation to derive these associations, for example through volunteered geographic information (VGI) (Goodchild 2007), or participatory mapping (Chambers 2006; Pánek 2016). Additionally, while mental maps may be generated through hand-drawn sketches, such methods do not scale well to derive a population level understanding.

By contrast, alternative forms of data online present an opportunity to infer associations between places, by capturing persistent links that are not temporally or spatially bounded. Facebook for example has been used to generate social connections between locations, geographically grounded based on the home location of two friends (Bailey et al. 2018). Geographic networks may also be generated from crowdsourced databases like Wikipedia (Salvini and Fabrikant 2016), which demonstrate connections between cities based on hyperlinks embedded in articles that contain general knowledge. Unlike these structured data sources, unstructured online text also provides embedded geographic information as place names, which may be extracted through computational techniques, such as natural language processing (Purves et al. 2018; Berragan et al. 2022).

Relationships between locational mentions in text are typically examined through co-occurrences, where locations that are frequently mentioned in a shared context are assumed to have a real-world relationship (Hu, Ye, and Shaw 2017; Ye, Gong, and Li 2021; Liu et al. 2014; Meijers and Peris 2019; Ballatore, Bertolotto, and Wilson 2014). Current research has however concentrated primarily on examining the relationships between city names on news articles (Hu, Ye, and Shaw 2017), or general web pages (Liu et al. 2014), where locational mentions do not necessarily capture a collective and generalised view generated from the mental perceptions of geography that exist within populations.

Alternative sources include online social media, which contribute a large volume of natural language text submitted by many unique users, discussing a range of informal topics, typically with shared user interactions. Place names discussed on social media more frequently include fine-grained locations (Han et al. 2018; C. Li and Sun 2014), and given interactions are often more informal, the information captured likely exhibit user cognitive biases related to their own mental maps (Jang and Kim 2019). While past work has built co-occurring place names from single news articles or documents, they may instead be built from a user facing perspective, building co-occurrences from comments associated with each user in a large corpus of social media data. We argue that this approach more appropriately captures the cognitive information each user uses to associate two locations, which can then be generalised by combining associations across each user in the corpus.

There are however concerns with the use of passively contributed, user-generated data for place-focussed geographic research; primarily with the representativeness of populations, and the bias in contributions (Gardner et al. 2020; Graham, Straumann, and Hogan 2015). For example, despite having over 300 million users, Twitter users typically post from high-density urban areas, rather than where they live (Ballatore and De Sabbata 2020), demographic groups have variable propensity to contribute (Hecht and Stephens 2014; Ballatore and De Sabbata 2018; Gardner et al. 2020), and contributions to gazetteers or digital maps are increased in more densely populated, urban locations (Graham, Straumann, and Hogan 2015; Laurier, Brown, and McGregor 2016; Smith et al. 2020). Another concern with user-generated data comes from the tendency for few users to contribute the greatest proportion of activity (Haklay 2016), meaning that despite a large volume of unique users, there may be bias towards the contributions of certain individuals. In other research methods like participatory mapping, biases often reflect the social and cultural background of the communities contributing their understanding of geographies (Corbett and Rambaldi 2009; Pánek 2016), which in our work equates to the experiential knowledge used to construct those mental maps that inform our cognitive place associations.

Extracting Locations from Text

Past works that considered links between locational mentions in text have identified locations either by querying articles for city names (Hu, Ye, and Shaw 2017), or simply using a word list of city names to parse articles for their occurrences (Meijers and Peris 2019). Such approaches suffer with performance, Meijers and Peris (2019) for example identified that 2.8% of their target place names could refer to multiple locations, while 1% of names were words that appeared in the English vocabulary. In total, they identify that around 15% of their place names displayed some level of ambiguity, and quantitative assessment of the effect of this demonstrated that it negatively impacted the quality of the associations identified. To avoid such issues, instead of a simple rule-based approach for the extraction of place names from our text, we construct a structured process using machine learning. This implements geoparsing, which is the process of extracting place names from unstructured text and matching them to the correct associated geographic coordinates (Purves et al. 2018). This task can be divided into two stages; identifying place names in text, followed by the association of these place names with a unique identifier in a knowledge base (typically a gazetteer) in a process called geocoding or toponym disambiguation.

Modern geoparsing processes use Named Entity Recognition (NER) to identify place names from natural language text (Purves et al. 2018; Karimzadeh et al. 2019; Halterman 2017). Unlike simpler methods which use knowledge or rule-based methods (Leidner and Lieberman 2011), NER uses more complex supervised machine learning to identify place names. The use of machine learning allows for the identification of place names that do not already appear within formal gazetteers, which is particularly useful in research considering colloquial names (Hollenstein 2008). Word context may also be used to improve accuracy, as words may appear in a gazetteer but not be used in a geographic context (Reading could be considered a place in the UK or a noun) (Purves et al. 2018). This is particularly important when considering informal text, where capitalisation may not always indicate the use of proper nouns, misspellings may be frequent, and names that do not often appear in gazetteers are common.

Recent work however has noted that current geoparsing systems using existing NER models do not necessarily perform well for the task of place name extraction (Berragan et al. 2022). Such pre-built models do not always consider geographically specific issues like the use of metonyms (Gritta, Pilehvar, and Collier 2020), and are typically trained on news articles, which limits their performance on other forms of text, like social media (Won, Murrieta-Flores, and Martins 2018; Berragan et al. 2022). For toponym disambiguation, the global GeoNames3 database is typically used as a gazetteer in these geoparsing systems, which has limited data for fine-grained locations in the United Kingdom (Stock et al. 2013; Moncla et al. 2014), while increasing potential noise with the inclusion of place names outside the UK. As such, existing geoparsers were considered unsuitable for our task; geoparsing UK place names within Reddit comments, with the inclusion of fine-grained locations.

Methodology

We first developed a task-specific geoparsing process to identify all place names contained within the Reddit comment corpus, resolving them to geographic coordinates within the United Kingdom. Cognitive associations were then generated between each identified location using co-occurrence: when identified locations are mentioned by the same author, they create an association between places. These associations are therefore built from the mental maps of individual Reddit users, unbounded from the typical space-time constraints of traditional spatial relationships derived from movements. We then investigate the strength of these associations by deriving aggregate geographic representations derived from each user, and determine the role of distance in shaping the strength of these associations through a gravity model. Finally, we select four urban and four rural regions to map the strength of associations geographically, demonstrating variations in distance decay patterns.

Geoparsing Reddit Comments

Reddit4 is a public discussion, news aggregation social network, and among the top 20 most visited websites in the United Kingdom. As of 2020, Reddit had around 430 million active monthly users, comparable to the number of Twitter users (Murphy 2019; Statista 2022). Reddit is divided into separate independent subreddits each covering specific topics of discussion, where users may submit posts that have dedicated nested conversational threads enabling users to add and respond to comments. Subreddits cover a wide range of topics, and in the interest of geography, they also act as forums for the discussion of local places. The United Kingdom subreddit5 acts as a general hub for related topics, notably including a list of smaller and more geographically specific related subreddits. This list provides a ‘Places’ section, a collection of local British subreddits, ranging in scale from country level (/r/England), regional (/r/thenorth, /r/Teeside), to cities (/r/Manchester) and small towns (/r/Alnwick). In total there are 213 subreddits that relate to ‘places’ within the United Kingdom6. For each subreddit, every single historic comment was retrieved using the Pushshift7 Reddit archive (Baumgartner et al. 2020). In total 8,070,827 comments were extracted, submitted by 490,534 unique users, between 2011-01-01 and 2022-04-17, this represents a very large corpus of text comprising 262 million words.

We then implemented our own geoparsing methodology to extract and geolocate any place name mention within each comment text. We first identified all place name mentions using a custom-built NER model8. This model was built using a large language model called BERT (Devlin et al. 2019), which is pre-trained on a large corpus of general human text, meaning for tasks like NER it performs better compared with simpler models. Our NER model was then trained to identify all place names within this corpus. Coordinate information was attributed with all identified place names, using OS Open Names9, and ‘natural’ locations from the Gazetteer of British Place Names10. Given place names typically appear multiple times in gazetteers, a disambiguation method was required. We therefore disambiguated place names by finding their minimum distance to a collection of contextual locations. Contextual locations in this case referred to all gazetteer entries matching place names that appear in sentences with this target place name, within the same subreddit. This worked under the assumption that each unique place name in a single subreddit is likely to refer to the same location, and that locations mentioned in surrounding text are likely geographically close together (Kamalloo and Rafiei 2018). When associating locations with coordinate information, we excluded any location that was larger than a city, for example countries or regions.

Our final dataset therefore consisted of a collection of place names with their geographic coordinates, corpus location, and an anonymised user ID for the user of the comment the place name was taken from. In total, 213,764 unique users mentioned at least one place name in our corpus, 39,050 mentioned more than 10 place names, and 3,158 over 100. 1% of these contributed 32% of all place names, representing the top 2,137 users. As is common in user-generated content, our data are skewed in that proportionally few users mention a large proportion of our total place names. The large volume of unique users that contribute low volumes of comments do however mean that we likely still achieve a broad representation, particularly compared with past work that generated mental maps for a limited number of individuals (Goodchild and Li 2012). As our comments spanned a period of over 10 years, we also examined the temporality of contributions made by users. The mean time between a user’s first and final comment is 318.1 days, with a maximum of 4112 days. As such, the contributor distribution is highly skewed, as the majority of users (55%) only have commented a maximum of 1 day apart.

Place associations through Co-occurrence

‘Cognitive association strength’ is defined in our paper as the normalised proportion of co-occurrences between two locations in our corpus, where co-occurrences represent the total collection of locations mentioned by a single user. The following section first outlines the construction of distance decay measures using a gravity model that incorporates cognitive association strength alongside distance, then details how we generate a scaled measure of this cognitive association strength. The first measure enables us to quantify how distance impacts our association strength, determining whether there is an observable distance decay effect when considering locational co-occurrences in user comments. The second enables the direct strength of association between locations to be examined, without the incorporation of distance in the calculation.

To measure the effect of distance decay we employ the same gravity model used by both Liu et al. (2014) and Hu, Ye, and Shaw (2017), shown on Equation 1:

\[ \mathbf{S}_{i j} \propto \frac{\mathbf{S}_{i} \mathbf{S}_{j}}{d_{i j}^{\beta}}, \tag{1}\]

where \(\mathbf{S}_{ij}\) is the total number of users that mention both places \(i\) and \(j\), and \(\mathbf{S}_{i}\mathbf{S}_{j}\) is the total number of users that mention place \(i\), multiplied by the total number of users that mention place \(j\). \(d_{ij}\) is the distance between the two locations \(i\) and \(j\), and \(\beta\) is the friction factor. Larger values for \(\beta\) indicate a stronger distance decay effect. Estimating the value of \(\beta\) generates a quantifiable measure of the distance decay effect (Hu, Ye, and Shaw 2017).

We can decompose Equation 1 into the following multiple linear regression model (Taaffe 1996):

\[ \log(\mathbf{S}_{ij}) = b_{0} + b_{1}\log(\mathbf{S}_{i}\mathbf{S}_{j}) + b_{2}\log(d_{ij}), \tag{2}\]

where \(b_2 = -(\beta * b_1)\), meaning we can calculate our \(\beta\) coefficient using \(\beta = -(b_2 / b_1)\) (Hu, Ye, and Shaw 2017; C. Li and Sun 2014).11

While this approach enables the calculation of a global \(\beta\) to measure distance decay, a spatial regression model would enable us to calculate local values of \(\beta\), quantifying the distance decay effect on individual locations (Rey, Arribas-Bel, and Wolf 2023). We therefore additionally implement a spatial regression model which incorporates a fixed spatial effect for the H3 polygon name, allowing for \(\beta\) coefficients to be calculated for each location in our study, to explore spatial heterogeneity.

Finally, we generate a normalised cognitive association measure to assess the strength between two locations. Unlike the previous gravity models, co-occurrences are not incorporated alongside distance. This mirrors similar work that considered the strength of social connections between Facebook (Bailey et al. 2018), and Twitter users (Z. Li et al. 2021), enabling the direct strength of associations to be generated:

\[ \frac{\mathbf{S}_{i j}}{\sqrt{\mathbf{S}_{i} \mathbf{S}_{j}}} \tag{3}\]

In this equation, dividing by \(\sqrt{\mathbf{S}_{i} \mathbf{S}_{j}}\) normalises our values, given locations with higher populations are expected to be mentioned by a larger number of users. Values therefore range from 0 indicating no association, to 1, showing a complete overlap in user mentions.

To present the results of our analysis we aggregate our user location mentions into H3 hexagons12, a hierarchical spatial indexing system which partitions all locations across earth into a uniform hexagonal grid, available for different levels of aggregation. We select an H3 resolution of 5 which equates to an average hexagon area of 252 km2 and an edge length of 9.8 km. All associations between each location contained within a shared H3 hexagon are then combined, forming association measures between hexagons, rather that unique point locations. To name hexagons we select the most frequently occurring location.

The use of fixed unit size hexagons for aggregating data in our analysis is beneficial for several reasons. Firstly, hexagons are geometries that enable us to obtain results that are statistically more robust especially when analysing distance decay between locations, because of the constant number of neighbours, with an equal distance separating them (Birch, Oom, and Beecham 2007). Secondly, hexagon grids help to minimise misrepresentation in spatial visualisation (Langton and Solymosi 2021), and allow us to capture inter-region heterogeneity. Finally, aggregation is essential given the data representations of locations within gazetteers; despite many locations having large footprints, they are all represented as a single coordinate pair (Goodchild and Hill 2008). This problem means that despite users mentioning locations like parks within cities, without aggregation they are treated as two distinct points, with a geographic distance separating them. Alternatively OpenStreetMap can be used to provide more accurate place footprints, but at the cost of a very large data volume when considering the entirety of the UK (Haklay and Weber 2008).

Additionally, we classify our H3 hexagons into both rural and urban using the England and Wales Rural Urban Classification13 and Scotland Rural Urban Classification14. For Scotland classes 1 and 2 were considered Urban.

Results

In the following section, we first examine the performance of our geoparsing methodology, identifying any potential noise and how this was mitigated. We then examine our cognitive place associations, exploring how distance impacts the strength of association by generating gravity models to calculate \(\beta\) coefficients both globally and locally. Finally, we examine the patterns in association strength across a selection of targeted geographic locations.

Extracting Names and Locations: Assessing Geoparsing performance

In total, 26.8% of all comments within the Reddit corpus contained at least one place name: 5,001,261 place names were identified, with 2,848,310 (57.0%) being attributable to a set of coordinates15. From these locations, 42,333 were found to be unique, of which 21,014 were only mentioned a single time, while London was the most frequently encountered location, at 283,521 mentions. The most ambiguous place name was found to be ‘High Street’, with 47 total unique coordinate locations. As expected, many of the most ambiguous place names were street names, including ‘Church Street’ (36 locations), ‘Bridge Street’ (34 locations), and ‘London Road’ (34 locations).

Figure 1: Locations of three place names that appear in the UK gazetteer that are difficult to correctly disambiguate. Size of the green points indicate frequency in mentions, black points are user locations determined through mean locational mentions. Values indicate the proportional contributions of each disambiguated location to their respective polygon (Top four percentages shown).

In Figure 1 we consider three examples where place names may have been incorrectly geoparsed. Figure 1 (a) shows the geographic distribution of all 47 ‘High Street’ locations. The percentage values indicate the proportion of ‘High Street’ mentions within a particular H3 polygon, compared to all other locations in this polygon. Aggregation here appears to mitigate the risk of noise in most cases, given most ‘High Street’ locations contribute lower than 1% towards polygon associations. A similar case is shown in Figure 1 (b), where ‘City Centre’ mentions only account for 3.2% of the Manchester hexagon. Figure 1 (c) instead demonstrates a location that is impossible to correctly geoparse in our model, and despite there being 13 unique locations in the UK called ‘California’, this issue only appears prominent in one hexagon. This hexagon named ‘California’ does potentially generate noise in our analysis, given the high contribution of 85.1%. However, as users that mention ‘California’ are spread across the country, it is less likely to largely impact our associations.

Notably, despite both ‘High Street’ and ‘City Centre’ being shared with non-specific geographic concepts, the model is still able to distinguish between them depending on context. For example, ‘city centre’ appears 23,961 times in our corpus, but is only tagged by our NER model 3,008 times. While ‘high street’ appears 7,773 times and is only tagged 768 times. These results suggest that the model is often able to correctly understand that identical phrases may or may not refer to place names, depending on their semantic context.

Measuring Distance Decay of Cognitive Association Strength

In the following section we present the levels of distance decay observed when evaluating place association strength through the gravity model specified in Equation 1, quantifying the level of distance decay using a \(\beta\) coefficient. As calculated, higher \(\beta\) coefficient values indicate a stronger distance decay effect, meaning that co-occurrences between locations that are geographically more distant tend to be less frequent. A \(\beta\) value of zero would indicate that distance has no effect on the frequency of co-occurrences between locations.

Our gravity model gives a \(\beta\) coefficient of 1.00 (Pearson’s R2: 0.772), reflecting a distance decay from co-occurrences in Reddit comments that is stronger than decay observed in other studies that explored news articles (0.23) (Hu, Ye, and Shaw 2017), or general web queries (0.2) (Liu et al. 2014). Confirming the existence of a general distance decay effect for Reddit derived places demonstrates that distance typically contributes to lower co-occurrences in locations that are further apart, a similarity that is shared with past work that examines decay from the perspective of true population movements (Gong et al. 2021; Yang, Li, and Li 2019), and the social relationships of regions examined through social media (Bailey et al. 2018; Z. Li et al. 2021). Our place associations generated from users on Reddit therefore appear to more appropriately incorporate a geographic component, compared with city mentions in news articles or general web pages. However, while this gravity model gives us an indication of the global level of distance decay in our corpus, it is likely that the level of distance decay varies by location. In the following analysis we therefore consider locations where the gravity model does not achieve a good approximation.

Figure 2: H3 polygons showing (a) Top 20 associations by residual values in green (>0), and (b) bottom 20 associations by residual values (<0) in red. (c) Residuals taken from Equation 2 against co-occurrence strength (10,000 samples). (*) Indicates a location that is incorrectly geoparsed.

Figure 2 (a) and (b) plots the top (most positive) and bottom (most negative) 20 residuals from our gravity model, demonstrating associations that are stronger or weaker than expected, when accounting for the distance between two regions. Many of the top residuals concern associations shared with London and other major cities in the UK, with some associations between urban areas in Scotland. The most positive residual is the association between London and Edinburgh (3.89), Glasgow and London in second (3.84), and Glasgow and Edinburgh in third (3.81). As expected, these residuals reflect a strong association between regions over larger distances (mean 293 km), highlighting associations where distance decay is less effective. Notably, there is an incorrect association here between a natural feature named ‘London Bridge’ and London, which has appeared due to the lack of urban landmarks in our gazetteer. The bottom residuals are more sporadic, typically showing associations over shorter distances (mean 156 km), between lesser known locations that are unusually weak. For example, highlighted on this figure is the association between Swansea Bay and Southampton (-2.55). Figure 2 (c) plots the model residuals against cognitive association strength, showing that for locations with a greater proportion of co-occurrences, the model is likely to be under-estimating in prediction, leading to an over-prediction in distance decay, with the inverse true for locations with a lower proportion of co-occurrences.

Regional Difference in Distance Decay

Figure 2 demonstrates that there are clear regional variations in the observed level of distance decay, which do not conform with the general decay effect calculated through our proposed gravity model approximation. To examine regional effects on distance decay, we implement a spatial regression model (a mixed linear model with spatial fixed effects), allowing \(\beta\) values to change depending on the location of each polygon.

Figure 3: H3 polygons showing (a) Distribution of geoparsed locations. (b) Urban rural classification index for England, Wales, and Scotland, reclassified into binary ‘Urban’ or ‘Rural’. (c) Calculated \(\beta\) coefficients for the spatial regression model; higher \(\beta\) values indicate a greater distance decay strength.

Incorporating this spatial information gives a more effective approximation of our gravity model, and achieves an improved Pearson’s R2 of 0.946, suggesting that distance decay is not uniform across all regions in our study. Figure 3 (a) shows the distribution of locations mentioned in our study, which broadly conform with the binary urban rural classification shown on Figure 3 (b). Figure 3 (c) maps the spatial \(\beta\) coefficients obtained through our spatial regression model, with high distance decay present across Scotland, Wales, and areas in the South West and North East of England. Users that mention locations in these regions typically do not mention other locations that are geographically distant, highlighting areas that are either more isolated from the rest of the UK, or have stronger associations with nearby locations.

In the North East, this perceived isolation from the rest of the UK mirrors lexical research, where Tweets in the North East have been shown to be unlike other regions (Arthur and Williams 2019). This region in particular has been known to suffer economically following the historic decline of local industries (Middleton and Freestone 2008), where lack of job opportunities has resulted in poor inward migration, with among the lowest population growth in the country (Office for National Statistics 2022). Alternatively, this observation may be also attributable to a general sense of identity that is associated with these regions. Both the South West and North East of England are known to exhibit a strong sense of localised identity (Deacon 2007; Middleton and Freestone 2008), which is similarly translatable to the national identity that generates strong associations within Scotland and Wales, that are not shared with England (Haesly 2005).

To explicitly quantify the difference in distance decay between urban and rural areas we calculate separate \(\beta\) coefficient values based on the binary split of areas into urban or rural. Urban areas have a \(\beta\) coefficient of 0.68, while rural areas had a \(\beta\) coefficient of 1.14, indicating that urban areas do appear to have a lower overall level of distance decay compared with rural regions. This correlates with the results of traditional mobility studies where more populated areas tend to exhibit a lower distance decay (Thomas 1981), largely dictated by the improved accessibility to external locations through public transport, the road network or job opportunities (Moseley 2023; Findlay, Short, and Stockdale 2000), and the general cultural significance that is more frequently associated with urban locations (lynch1964?; Borer 2006).

We have demonstrated that not only does distance decay vary between rural and urban locations, but within these classes there is also apparent heterogeneity. In the following section, we therefore consider the ability to directly map the strength of cognitive associations with respect to a selection of both rural and urban regions in our study, to understand the variation in distance decay patterns.

Mapping Cognitive Place Associations

Figure 4: Data subsets with respect to eight selected locations showing cognitive association strength associated with each H3 polygon containing each named location (highlighted in green), \(\beta\) values generated from data subsets. Distance decay plots below maps show association strength against distance for each selected location. Lines show rolling mean for 250 samples in black and lower samples in grey.

In Figure 4 we map the cognitive association strength of each H3 polygon in our study, with respect to four major cities, and four rural locations in the UK, also indicating the associated \(\beta\) coefficients. Mapped cognitive association values are given by Equation 3, and indicate the proportion of users that have comments that mention locations both within the target polygon (e.g. London), and locations in other polygons. Distance decay curves for each polygon are shown below these maps, indicating patterns in decay associated with each location. London has the lowest \(\beta\) coefficient of all cities, indicating that locations at increasing distance from London decay in their association at a slower rate compared with other cities. This is reflected by the shallow overall decay curve for London, increasing at points associated with main urban conurbations in England and Scotland, observable on the map for Figure 4 (a). Such trends are perhaps unsurprising given London’s prominence as the capital city. Manchester on Figure 4 (b) reveals a different decay pattern, showing a sharp drop in associations initially, that reduces and reverses when cities like London or Edinburgh are included in the distribution. Unlike Manchester, Newcastle (d) has an overall greater \(\beta\) coefficient, where association strength drops more quickly and is less persistent across England, only increasing with urban locations in Scotland, and slightly with London. While both are major cities in England, Manchester is both physically more well connected to the rest of the country through existing rail routes (Miyoshi and Givoni 2013), and is a greater economic centre compared with Newcastle. These factors likely contribute to the perceived strength of associations with these cities, which is captured in our analysis.

Edinburgh (c) is distinct compared with other cities, with a steeper initial decay curve compared with London, largely dictated by stronger initial associations with Scottish locations. Again, as is common for many cities in the UK, this city also shares a strong association with London, regardless of the distance. This increased strength of association with locations within Scotland gives Edinburgh the highest \(\beta\) coefficient, an effect that captures the strong sense of identity between areas in Scotland (Haesly 2005).

Figure 4 (e-g) give examples of variable distance decay curves for rural locations across the UK. Both ‘(e) Milford Haven’ in Wales, and ‘(f) Cowel’ in Scotland share general associations across each respective country, which appears to drop past the border into England. This similarly captures the sense of national identity associated with both Wales and Scotland, and conforms with results from the analysis of both physical and networks, where strong ‘boundary effects’ often see intra-connectivity within regions, that becomes weaker when moving across borders (Z. Li et al. 2021; Bailey et al. 2018; Arthur and Williams 2019; Yin et al. 2017). Given a national identity is less prominent in England, the town ‘(g) St Austell’ gives a steep distance decay curve, with low association strength between any location more than 50 km away, a noticeably different curve compared with the rural locations analysed in Scotland and Wales.

We also examine the incorrectly disambiguated ‘(h) California’ polygon, and confirm that the distance decay curve does not appear to show a geographically cohesive pattern, with no noticeable gradient. The positive \(\beta\) coefficient appears to relate with an unexplained increase in association with Scotland, however values remain low.

Conclusions, Implications and Future Work

In our work we present an alternative method for determining subconscious associations between locations, generating quantifiable measures of association strength solely using user-generated social media text. Unlike physical or online social interactions, our cognitive associations are intended as persistent measure of strength between locations across the UK, built from the naive, place-based geographic knowledge of individuals. Our geoparsing process means that no explicitly geographic information like geotags are required in our data source, allowing for the inclusion of fine-grained and informally defined locations, and associations may be examined between any two locations identified in our corpus.

By utilising a distance-based gravity model, we demonstrate that our associations do broadly conform with established real-world geographic restrictions, through an observed distance-decay effect, but with notable deviations. Unlike past work that only considered co-occurring city names, we expand our analysis to incorporate place name mentions of any scale, which enables the exploration of both rural and suburban decay. We are therefore able to demonstrate that distance decay is greater in rural areas compared with urban areas, and that cities across the UK have varying patterns in distance decay. Associations between major cities like Manchester and London are demonstrably stronger than less prominent intermediate locations, an effect that challenges the notion of Euclidean distance in a mental understanding of geography (Carr and Schissler 1969; Kaplan 1976), and conforms with the suggestion that ‘nearness’ is not a uniform concept (Massey 2008; Fabrikant et al. 2002; Worboys 2001; Daniel R. Montello 1993). Alternative patterns are also demonstrated, particularly for locations in Scotland, where distance appears to have little impact on association strength, until locations across the border into England are reached. This observation appears to correlate with the concept of a ‘boundary effect’, that has been captured in both physical and online networks (Z. Li et al. 2021; Bailey et al. 2018; Arthur and Williams 2019; Yin et al. 2017). This example is particularly interesting, replicating past research that generated mental maps from individuals through participatory mapping, which captured strong desirability towards areas within Scotland from residents, that did not persist across the border into England (Gould and White 1968)

The distinct patterns in distance decay that we observe demonstrate differences in associations between cities that capture real-world perceptions of these locations. For example, regions of low association across the UK may reflect the lack of desire to connect more broadly with the rest of the country (Roos Breines, Raghuram, and Gunter 2019). This is particularly noticeable in Scotland, Wales, and the North East of England, regions where populations often exhibit a strong sense of independence from the UK, driven by a sense of regional or national pride (Haesly 2005; Middleton and Freestone 2008; Nayak 2016). Strong associations with London and Manchester are also indicative of their importance nationally, while cities like Newcastle are perceived as less important, resulting in lower associations. Imbalance in rural locations may also be driven by an increasing dependence on digital maps (Farman 2020); rural landmarks are far less common compared with cities with named streets and buildings (Laurier, Brown, and McGregor 2016), limiting our ability to conceptualise rural environments (Smith et al. 2020).

Finally, Reddit is unique compared with alternative social media sources; general activity is centred around discussing specific topics or themes within communities, relative to more general social networks such as Twitter or Facebook (Sylla et al. 2022; Medvedev, Lambiotte, and Delvenne 2019). Communities on Reddit therefore present the opportunity to generate collective, but geographically disaggregated representations of spatial knowledge. Locations identified from within these communities likely represent urban areas of interest which may be derived based on their frequency of mentions (Chen, Arribas-Bel, and Singleton 2019), or semantic regions that reflect mental perceptions of places (Gao et al. 2017). These unstructured comments also provide contextual lexicons relating to places names, meaning there is also the opportunity to explore associations between these communities through their associated typology (Gao et al. 2017; Arthur and Williams 2019). While count-based lexical approaches have been traditionally used to explore geographic variation in text, large language models are now able to capture deep contextual semantic information (Devlin et al. 2019), allowing for a deeper connection between language and geography to be explored.

References

Alessandretti, Laura, Ulf Aslak, and Sune Lehmann. 2020. “The Scales of Human Mobility.” Nature 587 (7834): 402–7. https://doi.org/10.1038/s41586-020-2909-1.
Allard, Ryan F., and Filipe Moura. 2016. “The Incorporation of Passenger Connectivity and Intermodal Considerations in Intercity Transport Planning.” Transport Reviews 36 (2): 251–77. https://doi.org/10.1080/01441647.2015.1059379.
Arthur, Rudy, and Hywel T. P. Williams. 2019. “The Human Geography of Twitter: Quantifying Regional Identity and Inter-Region Communication in England and Wales.” Edited by Emilio Ferrara. PLOS ONE 14 (4): e0214466. https://doi.org/10.1371/journal.pone.0214466.
Bailey, Michael, Rachel Cao, Theresa Kuchler, Johannes Stroebel, and Arlene Wong. 2018. “Social Connectedness: Measurement, Determinants, and Effects.” Journal of Economic Perspectives 32 (3): 259–80. https://doi.org/10.1257/jep.32.3.259.
Ballatore, Andrea, Michela Bertolotto, and David C. Wilson. 2014. “An Evaluative Baseline for Geo-Semantic Relatedness and Similarity.” GeoInformatica 18 (4): 747–67. https://doi.org/10.1007/s10707-013-0197-8.
Ballatore, Andrea, and Stefano De Sabbata. 2018. “Charting the Geographies of Crowdsourced Information in Greater London.” In Geospatial Technologies for All, edited by Ali Mansourian, Petter Pilesjö, Lars Harrie, and Ron van Lammeren, 149–68. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-78208-9_8.
———. 2020. “Los Angeles as a Digital Place: The Geographies of User-Generated Content.” Transactions in GIS 24 (4): 880–902. https://doi.org/10.1111/tgis.12600.
Baumgartner, Jason, Savvas Zannettou, Brian Keegan, Megan Squire, and Jeremy Blackburn. 2020. “The Pushshift Reddit Dataset.” arXiv. https://arxiv.org/abs/2001.08435.
Bergmann, Luke, and David O’Sullivan. 2018. “Reimagining GIScience for Relational Spaces: Reimagining GIScience.” The Canadian Geographer / Le Géographe Canadien 62 (1): 7–14. https://doi.org/10.1111/cag.12405.
Berragan, Cillian, Alex Singleton, Alessia Calafiore, and Jeremy Morley. 2022. “Geoparsing Comments from Reddit to Extract Mental Place Connectivity Within the United Kingdom.” In Spatial Data Science Symposium 2022 Short Paper Proceedings. https://doi.org/10.25436/E28C7R.
Birch, Colin P. D., Sander P. Oom, and Jonathan A. Beecham. 2007. “Rectangular and Hexagonal Grids Used for Observation, Experiment and Simulation in Ecology.” Ecological Modelling 206 (3): 347–59. https://doi.org/10.1016/j.ecolmodel.2007.03.041.
Borer, Michael Ian. 2006. “The Location of Culture: The Urban Culturalist Perspective.” City & Community 5 (2): 173–97. https://doi.org/10.1111/j.1540-6040.2006.00168.x.
Calafiore, Alessia, Gregory Palmer, Sam Comber, Daniel Arribas-Bel, and Alex Singleton. 2021. “A Geographic Data Science Framework for the Functional and Contextual Analysis of Human Dynamics Within Global Cities.” Computers, Environment and Urban Systems 85 (January): 101539. https://doi.org/10.1016/j.compenvurbsys.2020.101539.
Canter, David. 1977. “The Psychology of Place.” The Psychology of Place., x, 198–x, 198.
Carr, Stephen, and Dale Schissler. 1969. “The City as a Trip: Perceptual Selection and Memory in the View from the Road.” Environment and Behavior 1 (1): 7.
Chambers, Robert. 2006. “Participatory Mapping and Geographic Information Systems: Whose Map? Who Is Empowered and Who Disempowered? Who Gains and Who Loses?” THE ELECTRONIC JOURNAL OF INFORMATION SYSTEMS IN DEVELOPING COUNTRIES 25 (1): 1–11. https://doi.org/10.1002/j.1681-4835.2006.tb00163.x.
Chen, Meixu, Dani Arribas-Bel, and Alex Singleton. 2019. “Understanding the Dynamics of Urban Areas of Interest Through Volunteered Geographic Information.” Journal of Geographical Systems 21: 89–109.
Corbett, Jon, and Giacomo Rambaldi. 2009. “Geographic Information Technologies, Local Knowledge, and Change.” Qualitative GIS: A Mixed Methods Approach, 75–92.
Deacon, Bernard. 2007. “County, Nation, Ethnic Group? The Shaping of the Cornish Identity.” The International Journal of Regional and Local Studies 3 (1): 5–29. https://doi.org/10.1179/jrl.2007.3.1.5.
Dennett, Adam, and Alan Wilson. 2013. “A Multilevel Spatial Interaction Modelling Framework for Estimating Interregional Migration in Europe.” Environment and Planning A: Economy and Space 45 (6): 1491–1507. https://doi.org/10.1068/a45398.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” arXiv:1810.04805 [Cs], May. https://arxiv.org/abs/1810.04805.
Erlander, Sven. 1980. Optimal Spatial Interaction and the Gravity Model. Edited by M. Beckmann and H. P. Künzi. Vol. 173. Lecture Notes in Economics and Mathematical Systems. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-45515-5.
Fabrikant, Sara Irina, Marco Ruocco, Richard Middleton, Daniel R. Montello, and Corinne Jörgensen. 2002. “The First Law of Cognitive Geography: Distance and Similarity in Semantic Space.” Proceedings of GIScience 2002, 31–33.
Farber, Steven, and Xiao Li. 2013. “Urban Sprawl and Social Interaction Potential: An Empirical Analysis of Large Metropolitan Regions in the United States.” Journal of Transport Geography 31 (July): 267–77. https://doi.org/10.1016/j.jtrangeo.2013.03.002.
Farman, Jason. 2020. Mobile Interface Theory: Embodied Space and Locative Media. Routledge.
Findlay, Allan M., David Short, and Aileen Stockdale. 2000. “The Labour-Market Impact of Migration to Rural Areas.” Applied Geography 20 (4): 333–48. https://doi.org/10.1016/S0143-6228(00)00012-6.
Gao, Song, Krzysztof Janowicz, Daniel R. Montello, Yingjie Hu, Jiue-An Yang, Grant McKenzie, Yiting Ju, Li Gong, Benjamin Adams, and Bo Yan. 2017. “A Data-Synthesis-Driven Method for Detecting and Extracting Vague Cognitive Regions.” International Journal of Geographical Information Science 31 (6): 1245–71. https://doi.org/10.1080/13658816.2016.1273357.
Gardner, Z., P. Mooney, S. De Sabbata, and L. Dowthwaite. 2020. “Quantifying Gendered Participation in OpenStreetMap: Responding to Theories of Female (Under) Representation in Crowdsourced Mapping.” GeoJournal 85 (6): 1603–20. https://doi.org/10.1007/s10708-019-10035-z.
Glückler, Johannes, and Robert Panitz. 2021. “Unleashing the Potential of Relational Research: A Meta-Analysis of Network Studies in Human Geography.” Progress in Human Geography 45 (6): 1531–57. https://doi.org/10.1177/03091325211002916.
Gong, Junfang, Shengwen Li, Xinyue Ye, Qiong Peng, and Sonali Kudva. 2021. “Modelling Impacts of High-Speed Rail on Urban Interaction with Social Media in China’s Mainland.” Geo-Spatial Information Science 24 (4): 638–53. https://doi.org/10.1080/10095020.2021.1972771.
González, Marta C., César A. Hidalgo, and Albert-László Barabási. 2008. “Understanding Individual Human Mobility Patterns.” Nature 453 (7196): 779–82. https://doi.org/10.1038/nature06958.
Goodchild, Michael F. 2007. “Citizens as Sensors: The World of Volunteered Geography.” GeoJournal 69 (4): 211–21. https://doi.org/10.1007/s10708-007-9111-y.
Goodchild, Michael F., and L. L. Hill. 2008. “Introduction to Digital Gazetteer Research.” International Journal of Geographical Information Science 22 (10): 1039–44. https://doi.org/10.1080/13658810701850497.
Goodchild, Michael F., and Linna Li. 2012. “Assuring the Quality of Volunteered Geographic Information.” Spatial Statistics 1 (May): 110–20. https://doi.org/10.1016/j.spasta.2012.03.002.
Goodey, Brian. 1974. “Images of Place: Essays on Environmental Perception, Communications and Education.” (No Title).
Gould, Peter R., and R. R. White. 1968. “The Mental Maps of British School Leavers.” Regional Studies 2 (2): 161–82. https://doi.org/10.1080/09595236800185171.
Gould, Peter R., and Rodney White. 1986. Mental Maps. Hoboken: Taylor and Francis.
Graham, Mark, Ralph K. Straumann, and Bernie Hogan. 2015. “Digital Divisions of Labor and Informational Magnetism: Mapping Participation in Wikipedia.” Annals of the Association of American Geographers 105 (6): 1158–78. https://doi.org/10.1080/00045608.2015.1072791.
Greenberg Raanan, Malka, and Noam Shoval. 2014. “Mental Maps Compared to Actual Spatial Behavior Using GPS Data: A New Method for Investigating Segregation in Cities.” Cities 36 (February): 28–40. https://doi.org/10.1016/j.cities.2013.09.003.
Gritta, Milan, Mohammad Taher Pilehvar, and Nigel Collier. 2020. “A Pragmatic Guide to Geoparsing Evaluation: Toponyms, Named Entity Recognition and Pragmatics.” Language Resources and Evaluation 54 (3): 683–712. https://doi.org/10.1007/s10579-019-09475-3.
Haesly, Richard. 2005. “Identifying Scotland and Wales: Types of Scottish and Welsh National Identities.” Nations and Nationalism 11 (2): 243–63. https://doi.org/10.1111/j.1354-5078.2005.00202.x.
Haklay, Mordechai E. 2016. “Why Is Participation Inequality Important?” In European Handbook of Crowsourced Geographic Information, edited by Cristina Capineri, Muki Haklay, Haosheng Huang, Vyron Antoniou, Juhani Kettunen, Frank Ostermann, and Ross Purves. Ubiquity Press.
Haklay, Mordechai E., and Patrick Weber. 2008. “Openstreetmap: User-generated Street Maps.” IEEE Pervasive Computing 7 (4): 12–18. https://doi.org/10.1109/MPRV.2008.80.
Halterman, Andrew. 2017. “Mordecai: Full Text Geoparsing and Event Geocoding.” The Journal of Open Source Software 2 (9). https://doi.org/10.21105/joss.00091.
Han, Jialong, Aixin Sun, Gao Cong, Wayne Xin Zhao, Zongcheng Ji, and Minh C. Phan. 2018. “Linking Fine-Grained Locations in User Comments.” IEEE Transactions on Knowledge and Data Engineering 30 (1): 59–72. https://doi.org/10.1109/TKDE.2017.2758780.
Haney, Wava G., and Eric S. Knowles. 1978. “Perception of Neighborhoods by City and Suburban Residents.” Human Ecology 6 (2): 201–14. https://doi.org/10.1007/BF00889095.
Haynes, Kingsley E, and A Stewart Fotheringham. 1985. “Gravity and Spatial Interaction Models.” WVU Research Repository, 72.
Hecht, Brent, and Monica Stephens. 2014. “A Tale of Cities: Urban Biases in Volunteered Geographic Information.” In Proceedings of the International AAAI Conference on Web and Social Media, 8:197–205.
Hollenstein, Livia. 2008. “Capturing Vernacular Geography from Georeferenced Tags.”
Hu, Yingjie, Xinyue Ye, and Shih-Lung Shaw. 2017. “Extracting and Analyzing Semantic Relatedness Between Cities Using News Articles.” International Journal of Geographical Information Science 31 (12): 2427–51. https://doi.org/10.1080/13658816.2017.1367797.
Jang, Kee Moon, and Youngchul Kim. 2019. “Crowd-Sourced Cognitive Mapping: A New Way of Displaying People’s Cognitive Perception of Urban Space.” PLOS ONE 14 (6): e0218590. https://doi.org/10.1371/journal.pone.0218590.
Kamalloo, Ehsan, and Davood Rafiei. 2018. “A Coherent Unsupervised Model for Toponym Resolution.” In Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18, 1287–96. Lyon, France: ACM Press. https://doi.org/10.1145/3178876.3186027.
Kaplan, Stephen. 1976. “Adaptation, Structure and Knowledge.” In Environmental Knowing: Theories, Perspectives and Methods, edited by G. T. Moore and Golledge R. G., 32–45. Stroudsburg, PA: Dowden, Hutchinson and Ross.
Karimzadeh, Morteza, Scott Pezanowski, Alan M. MacEachren, and Jan O. Wallgrün. 2019. GeoTxt: A Scalable Geoparsing System for Unstructured Text Geolocation.” Transactions in GIS 23 (1): 118–36. https://doi.org/10.1111/tgis.12510.
Langton, Samuel H, and Reka Solymosi. 2021. “Cartograms, Hexograms and Regular Grids: Minimising Misrepresentation in Spatial Data Visualisations.” Environment and Planning B: Urban Analytics and City Science 48 (2): 348–57. https://doi.org/10.1177/2399808319873923.
Laurier, Eric, Barry Brown, and Moira McGregor. 2016. “Mediated Pedestrian Mobility: Walking and the Map App.” Mobilities 11 (1): 117–34. https://doi.org/10.1080/17450101.2015.1099900.
Lee, Terence R. 1973. “Psychology and Living Space.” Image and Environment, 87–108.
Leidner, Jochen L, and Michael D Lieberman. 2011. “Detecting Geographical References in the Form of Place Names and Associated Spatial Natural Language.” SIGSPATIAL Special 3 (2): 5–11. https://doi.org/10.1145/2047296.2047298.
Li, Chenliang, and Aixin Sun. 2014. “Fine-Grained Location Extraction from Tweets with Temporal Awareness.” In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR ’14, 43–52. Gold Coast, Queensland, Australia: ACM Press. https://doi.org/10.1145/2600428.2609582.
Li, Zhenlong, Xiao Huang, Xinyue Ye, Yuqin Jiang, Yago Martin, Huan Ning, Michael E. Hodgson, and Xiaoming Li. 2021. “Measuring Global Multi-Scale Place Connectivity Using Geotagged Social Media Data.” Scientific Reports 11 (1): 14694. https://doi.org/10.1038/s41598-021-94300-7.
Lin, Jinyao, Zhifeng Wu, and Xia Li. 2019. “Measuring Inter-City Connectivity in an Urban Agglomeration Based on Multi-Source Data.” International Journal of Geographical Information Science 33 (5): 1062–81. https://doi.org/10.1080/13658816.2018.1563302.
Liu, Yu, Fahui Wang, Chaogui Kang, Yong Gao, and Yongmei Lu. 2014. “Analyzing Relatedness by Toponym Co-Occurrences on Web Pages: Analyzing Relatedness by Toponym Co-Occurrences on Web Pages.” Transactions in GIS 18 (1): 89–107. https://doi.org/10.1111/tgis.12023.
Lukermann, F. 1961. “The Concept of Location in Classical Geography.” Annals of the Association of American Geographers 51 (2): 194–210. https://doi.org/10.1111/j.1467-8306.1961.tb00373.x.
Marston, Sallie A, John Paul Jones III, and Keith Woodward. 2005. “Human Geography Without Scale.” Transactions of the Institute of British Geographers 30 (4): 416–32. https://doi.org/10.1111/j.1475-5661.2005.00180.x.
Massey, Doreen. 2008. “A Global Sense of Place.” In The Cultural Geography Reader, 269–75. Routledge.
Medvedev, Alexey N., Renaud Lambiotte, and Jean-Charles Delvenne. 2019. “The Anatomy of Reddit: An Overview of Academic Research.” In Dynamics On and Of Complex Networks III, edited by Fakhteh Ghanbarnejad, Rishiraj Saha Roy, Fariba Karimi, Jean-Charles Delvenne, and Bivas Mitra, 183–204. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-14683-2_9.
Meijers, Evert, and Antoine Peris. 2019. “Using Toponym Co-Occurrences to Measure Relationships Between Places: Review, Application and Evaluation.” International Journal of Urban Sciences 23 (2): 246–68. https://doi.org/10.1080/12265934.2018.1497526.
Merrifield, Andrew. 1993. “Place and Space: A Lefebvrian Reconciliation.” Transactions of the Institute of British Geographers 18 (4): 516–31. https://doi.org/10.2307/622564.
Merriman, Peter. 2012. “Human Geography Without Time-Space.” Transactions of the Institute of British Geographers 37 (1): 13–27. https://doi.org/10.1111/j.1475-5661.2011.00455.x.
Middleton, Christopher, and Phillip Freestone. 2008. “The Impact of Culture-Led Regeneration on Regional Identity in North East England.” Proc. RSAI.
Miller, Harvey J. 2018. “Time Geography.” Handbook of Behavioral and Cognitive Geography, 74–94. https://doi.org/10.4337/9781784717544.00011.
Miyoshi, Chikage, and Moshe Givoni. 2013. “The Environmental Case for the High-Speed Train in the UK: Examining the LondonManchester Route.” International Journal of Sustainable Transportation 8 (2): 107–26. https://doi.org/10.1080/15568318.2011.645124.
Moncla, Ludovic, Walter Renteria-Agualimpia, Javier Nogueras-Iso, and Mauro Gaio. 2014. “Geocoding for Texts with Fine-Grain Toponyms: An Experiment on a Geoparsed Hiking Descriptions Corpus.” In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 183–92. Dallas Texas: ACM. https://doi.org/10.1145/2666310.2666386.
Montello, Daniel R. 1993. “Scale and Multiple Psychologies of Space.” In Spatial Information Theory A Theoretical Basis for GIS, edited by Gerhard Goos, Juris Hartmanis, Andrew U. Frank, and Irene Campari, 716:312–21. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-57207-4_21.
Montello, Daniel R, Michael F Goodchild, Jonathon Gottsegen, and Peter Fohl. 2003. “Where’s Downtown?: Behavioral Methods for Determining Referents of Vague Spatial Queries.” In Spatial Vagueness, Uncertainty, Granularity, 185–204. Psychology Press.
Moseley, Malcolm J. 2023. Accessibility: The Rural Challenge. Taylor & Francis.
Murphy, Nicole. 2019. “Reddit’s 2019 Year in Review - Upvoted.” https://www.redditinc.com/blog/reddits-2019-year-in-review/#content.
Murray, Debra, and Christopher Spencer. 1979. “Individual Differences in the Drawing of Cognitive Maps: The Effects of Geographical Mobility, Strength of Mental Imagery and Basic Graphic Ability.” Transactions of the Institute of British Geographers 4 (3): 385. https://doi.org/10.2307/622058.
Nayak, Anoop. 2016. Race, Place and Globalization: Youth Cultures in a Changing World. Bloomsbury Publishing.
Noulas, Anastasios, Salvatore Scellato, Cecilia Mascolo, and Massimiliano Pontil. 2011. “An Empirical Study of Geographic User Activity Patterns in Foursquare.” Proceedings of the International AAAI Conference on Web and Social Media 5 (1): 570–73. https://doi.org/10.1609/icwsm.v5i1.14175.
Office for National Statistics. 2022. “How the Population Changed in Newcastle Upon Tyne, Census 2021 - ONS.” https://www.ons.gov.uk/visualisations/censuspopulationchange/E08000021/.
Ostermann, F. O., H. Huang, G. Andrienko, N. Andrienko, C. Capineri, K. Farkas, and R. S. Purves. 2015. “Extracting and Comparing Places Using Geo-Social Media.” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences II-3/W5 (August): 311–16. https://doi.org/10.5194/isprsannals-II-3-W5-311-2015.
Pánek, Jiří. 2016. “From Mental Maps to GeoParticipation.” The Cartographic Journal 53 (4): 300–307. https://doi.org/10.1080/00087041.2016.1243862.
Patterson, Zachary, and Steven Farber. 2015. “Potential Path Areas and Activity Spaces in Application: A Review.” Transport Reviews 35 (6): 679–700. https://doi.org/10.1080/01441647.2015.1042944.
Peake, Simon, and Tony Moore. 2004. “Analysis of Distortions in a Mental Map Using GPS and GIS.” In SIRC 2004 – The 16 Th Annual Colloquium of the Spatial Information Research Centre, 10.
Pierce, Joseph, Deborah G Martin, and James T Murphy. 2011. “Relational Place-Making: The Networked Politics of Place.” Transactions of the Institute of British Geographers 36 (1): 54–70. https://doi.org/10.1111/j.1475-5661.2010.00411.x.
Pocock, D. C. D. 1976. “Some Characteristics of Mental Maps: An Empirical Study.” Transactions of the Institute of British Geographers 1 (4): 493. https://doi.org/10.2307/621905.
Purves, Ross S., Paul Clough, Christopher B. Jones, Mark H. Hall, and Vanessa Murdock. 2018. Geographic Information Retrieval: Progress and Challenges in Spatial Search of Text. now. https://doi.org/10.1561/1500000034.
Rae, Alasdair. 2009. “From Spatial Interaction Data to Spatial Interaction Information? Geovisualisation and Spatial Structures of Migration from the 2001 UK Census.” Computers, Environment and Urban Systems 33 (3): 161–78. https://doi.org/10.1016/j.compenvurbsys.2009.01.007.
Rey, Sergio, Dani Arribas-Bel, and Levi John Wolf. 2023. Geographic Data Science with Python. CRC Press.
Roos Breines, Markus, Parvati Raghuram, and Ashley Gunter. 2019. “Infrastructures of Immobility: Enabling International Distance Education Students in Africa to Not Move.” Mobilities 14 (4): 484–99. https://doi.org/10.1080/17450101.2019.1618565.
Rowe, Francisco, Alessia Calafiore, Daniel Arribas-Bel, Krasen Samardzhiev, and Martin Fleischmann. 2022. “Urban Exodus? Understanding Human Mobility in Britain During the COVID-19 Pandemic Using Meta-Facebook Data.” Population, Space and Place n/a (n/a): e37. https://doi.org/10.1002/psp.2637.
Rowe, Francisco, Robin Lovelace, and Adam Dennett. 2022. “Spatial Interaction Modelling: A Manifesto.” OSF Preprints. https://doi.org/10.31219/osf.io/xcdms.
SafeGraph. 2022. “Places Data Curated for Accurate Geospatial Analytics SafeGraph.” https://www.safegraph.com.
Salvini, Marco M, and Sara I Fabrikant. 2016. “Spatialization of User-Generated Content to Uncover the Multirelational World City Network.” Environment and Planning B: Planning and Design 43 (1): 228–48.
Shaw, Jon, and Markus Hesse. 2010. “Transport, Geography and the ‘New’ Mobilities.” Transactions of the Institute of British Geographers 35 (3): 305–12. https://doi.org/10.1111/j.1475-5661.2010.00382.x.
Singleton, A. D., A. G. Wilson, and O. O’Brien. 2012. “Geodemographics and Spatial Interaction: An Integrated Model for Higher Education.” Journal of Geographical Systems 14 (2): 223–41. https://doi.org/10.1007/s10109-010-0141-5.
Smith, Thomas Aneurin, Eric Laurier, Stuart Reeves, and Ria Ann Dunkley. 2020. Off the Beaten Map’: Navigating with Digital Maps on Moorland.” Transactions of the Institute of British Geographers 45 (1): 223–40. https://doi.org/10.1111/tran.12336.
Statista. 2022. “Most Popular Social Networks Worldwide as of January 2022, Ranked by Number of Monthly Active Users.” Statista. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/.
Steiger, Enrico, René Westerholt, Bernd Resch, and Alexander Zipf. 2015. “Twitter as an Indicator for Whereabouts of People? Correlating Twitter with UK Census Data.” Computers, Environment and Urban Systems 54 (November): 255–65. https://doi.org/10.1016/j.compenvurbsys.2015.09.007.
Stock, Kristin, Robert C. Pasley, Zoe Gardner, Paul Brindley, Jeremy Morley, and Claudia Cialone. 2013. “Creating a Corpus of Geospatial Natural Language.” In Spatial Information Theory, edited by David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, et al., 8116:279–98. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-01790-7_16.
Sylla, Aline, Felix Glawe, Dirk Braun, Mihail Padev, Sina Schäfer, Albina Ahmetaj, Lilian Kojan, and André Calero Valdez. 2022. “Discourses of Climate Delay in American Reddit Discussions.” In Disinformation in Open Online Media, edited by Francesca Spezzano, Adriana Amaral, Davide Ceolin, Lisa Fazio, and Edoardo Serra, 13545:123–37. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-18253-2_9.
Taaffe, Edward James. 1996. Geography of Transportation. MORTON O’KELLY.
Taylor, Peter J. 1983. “Distance Decay in Spatial Interactions.” Concepts Techniques Modern Geogr 2.
Taylor, Peter J, and S Openshaw. 1975. “Distance Decay in Spatial Interactions.” In Concepts and Techniques in Modern Geography. Citeseer.
Thomas, R. W. 1981. Information Statistics in Geography. Concepts and Techniques in Modern Geography, no. 31. Norwich: Geo Abstracts.
Titheridge, Helena, Kamalasudhan Achuthan, Roger L. Mackett, and Juliet Solomon. 2009. “Assessing the Extent of Transport Social Exclusion Among the Elderly.” Journal of Transport and Land Use 2 (2). https://doi.org/10.5198/jtlu.v2i2.44.
Tobler, W. R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46 (June): 234. https://doi.org/10.2307/143141.
Won, Miguel, Patricia Murrieta-Flores, and Bruno Martins. 2018. “Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora.” Frontiers in Digital Humanities 5 (March): 2. https://doi.org/10.3389/fdigh.2018.00002.
Worboys, Michael F. 2001. “Nearness Relations in Environmental Space.” International Journal of Geographical Information Science 15 (7): 633–51. https://doi.org/10.1080/13658810110061162.
Yang, Yang, Dong Li, and Xiang (Robert) Li. 2019. “Public Transport Connectivity and Intercity Tourist Flows.” Journal of Travel Research 58 (1): 25–41. https://doi.org/10.1177/0047287517741997.
Ye, Xinyue, Junfang Gong, and Shengwen Li. 2021. “Analyzing Asymmetric City Connectivity by Toponym on Social Media in China.” Chinese Geographical Science 31 (1): 14–26. https://doi.org/10.1007/s11769-020-1172-6.
Yin, Wenpeng, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. “Comparative Study of CNN and RNN for Natural Language Processing.” arXiv:1702.01923 [Cs], February. https://arxiv.org/abs/1702.01923.
Zook, Matthew. 2006. “The Geographies of the Internet.” Annu. Rev. Inf. Sci. Technol. 40 (1): 53–78. https://doi.org/10.1002/aris.1440400109.

Footnotes

  1. Ordnance Survey UK does not include data for Northern Ireland.↩︎

  2. In our work we consider ‘place names’ to be ambiguous noun phrases found in text, without associated coordinates, while ‘locations’ are place names that have attributed coordinates. We use ‘place associations’ to capture the vagueness involved in cognitive associations.↩︎

  3. https://www.geonames.org↩︎

  4. https://reddit.com↩︎

  5. https://reddit.com/r/unitedkingdom↩︎

  6. https://www.reddit.com/r/unitedkingdom/wiki/british_subreddits↩︎

  7. https://pushshift.io/↩︎

  8. URL provided following submission↩︎

  9. https://www.ordnancesurvey.co.uk/business-government/products/open-map-names↩︎

  10. https://gazetteer.org.uk↩︎

  11. For a more detailed mathematical explanation see Hu, Ye, and Shaw (2017)↩︎

  12. https://www.uber.com/en-GB/blog/h3/↩︎

  13. https://www.gov.uk/government/collections/rural-urban-classification↩︎

  14. https://www.gov.scot/publications/scottish-government-urban-rural-classification-2020↩︎

  15. Note that many names absent from our gazetteer include locations outside the UK↩︎

Citation

BibTeX citation:
@article{berragan2024,
  author = {Berragan, Cillian and Singleton, Alex and Calafiore, Alessia
    and Morley, Jeremy},
  title = {Mapping {Cognitive} {Place} {Associations} Within the
    {United} {Kingdom} Through {Online} {Discussion} on {Reddit}},
  journal = {Transactions of the Institute of British Geographers},
  date = {2024-01-08},
  url = {https://rgs-ibg.onlinelibrary.wiley.com/doi/10.1111/tran.12669},
  doi = {10.1111/tran.12669},
  langid = {en}
}
For attribution, please cite this work as:
Berragan, Cillian, Alex Singleton, Alessia Calafiore, and Jeremy Morley. 2024. “Mapping Cognitive Place Associations Within the United Kingdom Through Online Discussion on Reddit.” Transactions of the Institute of British Geographers, January. https://doi.org/10.1111/tran.12669.