Processing spatial cue conflict in navigation: Distance estimation

Spatial navigation, a fundamental ability crucial for both human and animal survival, depends on the ability to combine spatial cues (e.g., landmarks in the environment, proprioceptive cues from self-motion) to estimate location. Navigators must estimate their own locations in the environment, as well as the locations of goals. According to traditional models of spatial learning (e.g., Gallistel, 1990, O’Keefe and Nadel, 1978, Siegel and White, 1975), navigators can develop a mental representation of an environment through repeated explorations. This representation, commonly referred to as “cognitive map”, includes straight-line distances and directions between locations (Tolman, 1948). It has been argued that cognitive maps enable efficient and flexible navigation in familiar environments (for alternative views, see the cognitive graph theory in Ericson & Warren, 2020 and the cognitive collages hypothesis in Tversky, 1993).

A key challenge in spatial navigation is resolving conflicts between different spatial cues. This problem has been extensively studied across disciplines. Much work has examined conflicts between landmarks and self-motion cues. Navigation with self-motion cues, such as proprioceptive inputs, vestibular signals, and optic flow, requires continuous integration of self-movement to determine one’s location, a process known as path integration (Etienne & Jeffery, 2004; Mittelstaedt & Mittelstaedt, 1980). In contrast, landmarks are prominent environmental features that provide direct spatial information. These two navigation modes recruit distinct and independent cognitive-neural mechanisms (Chen et al., 2017, Chen et al., 2019, Chen et al., 2024, Shettleworth and Sutton, 2005), raising an interesting question of how these two cue types interact during navigation. Behavioral studies in humans have reported mixed findings: some suggest a predominance of landmarks over self-motion cues when conflicts are large (Zhao & Warren, 2015a), others suggest the opposite (Sjolund et al., 2018), and some show no change in cue weighting until the conflicts become extreme (Zhao & Warren, 2015b). Neuroscience studies in non-human animals typically reveal that spatially-modulated neurons respond to both cue types (Campbell et al., 2018, Chen et al., 2013, Gothard et al., 1996), but preferences vary by brain region: the retrosplenial cortex favors landmarks, whereas the entorhinal cortex favors self-motion cues (Campbell et al., 2021).

Beyond landmark vs. self-motion conflicts, research has also examined how geometric and featural cues interact across multiple disciplines (see review papers, Cheng, 2008, Cheng et al., 2013, Cheng and Newcombe, 2005, Lew, 2011, Newcombe, 2023). Geometric cues refer to environmental features related to shape, layout, and spatial structure, such as the shape of a room. Featural cues refer to distinct, identifiable aspects of an environment, such as an isolated landmark at one of the room corners. Using the reorientation paradigm, Cheng demonstrated that rats predominantly relied on geometric cues rather than featural cues for reorientation, supporting a geometric module hypothesis (Cheng, 1986). However, later studies have shown that navigators make use of both geometric and featural cues, with cue reliance varying based on factors such as cue salience, navigation history, and language use (see a recent review, Newcombe, 2023). These findings have led to the adaptive cue combination hypothesis, which posits that spatial cue utilization is flexible and depends on contextual demands (Xu et al., 2017).

Additional studies have examined conflicts between other spatial cue types, such as an individual landmark vs. multiple landmarks in an array (Jetzschke et al., 2017, Roy et al., 2023) and distal vs. proximal landmarks (Knierim, 2002, Qi and Mou, 2024, Shapiro et al., 1997, Tanila et al., 1997, Yoganarasimha et al., 2006).

Across these studies, cue conflict mainly serves as an experimental tool to assess navigators’ relative reliance on different spatial cues. Researchers evaluate cue reliance by analyzing response distributions. When responses are continuous, cue weighting is inferred from the relative proximities of the response centroid to the target locations defined by conflicting cues. The closer the response centroid to the location defined by a particular cue, the greater the reliance on that cue (see Chen et al., 2017, for a review). When responses are discrete, such as in the reorientation paradigm, cue weighting is assessed based on the proportion of trials in which participants choose the location defined by a given cue (Ratliff & Newcombe, 2008). While this approach has provided valuable insights, it does not fully reveal the cognitive processes that navigators use. Specifically, it remains unclear (a) how navigators decide whether conflicting sensory-perceptual information is informative about the world (i.e., there are different sources or causes) or should be ignored (i.e., the conflicts are caused by sensory-perceptual error), and (b) how they select a goal location when they have determined that discrepant spatial cues should not be integrated.

Several models have been proposed to explain navigation behavior in cue-conflicting situations. While these models offer valuable insights, they have limitations, such as lacking mechanistic explanations of cue detection and resolution, being constrained to specific spatial cue types, and failing to generalize across tasks.

Jetzschke and colleagues proposed a probabilistic model to explain continuous spatial localization in a 2D environment, with a landmark conflicting with other landmarks in an array (Jetzschke et al., 2017). In the standard maximum-likelihood-estimation (MLE) model of cue integration, the likelihood distributions of individual cues are assumed to be Gaussian. Cue integration involves multiplying individual likelihood distributions, which results in a more precise joint likelihood distribution (Bromiley, 2013, McNamara and Chen, 2022, Appendix A). Unlike the MLE model, Jetzschke's model assumes that each individual likelihood distribution is a mixture of two Gaussians, one of which has very heavy tails. Multiplying these mixture distributions does not result in a more precise joint likelihood distribution, eliminating the typical gain from cue integration. However, this model remains primarily descriptive and does not explain how cue conflict is detected and resolved.

Harootonian and colleagues tested models for head direction estimation, considering body-based self-motion cues and visual feedback (Harootonian et al., 2022). Their findings support a hybrid model: cues are integrated when cue consistency is assumed, but only body-based cues are used when cue inconsistency is assumed. However, this model does not incorporate a mechanism for detecting cue conflict; instead, this treats the proportion of trials for cue integration as an independent, freely varying parameter.

The adaptive cue combination model aims to explain conflicts between geometric cues and featural cues in the reorientation paradigm (Xu et al., 2017, Case study 3). This model follows the principles of the standard MLE model of cue integration (Rohde et al. 2016). The MLE model, however, was originally developed to explain cue combination behavior in scenarios with minimal or no cue conflicts, making it theoretically unsuitable for situations with substantial cue conflicts (French and DeAngelis, 2020, Newman et al., 2023). Consequently, this model fails to distinguish between conditions conducive or inconducive to cue integration, where different navigation strategies should be adopted (Sjolund et al., 2018, Zhao and Warren, 2015b).

Similarly, Wang and colleagues propose a model for a reorientation task (Wang et al., 2018), which contrasted two intersecting streets (geometric cue) and trees placed at the intersection (landmark cue). This model is also based on the standard MLE model of cue integration, but logit-transformed behavioral accuracy was used as a proxy of cue reliability, due to the discrete nature of target location and response. Consequently, this model has the same limitations of the adaptive cue combination model (Xu et al., 2017). Furthermore, the experimental design differs from typical reorientation tasks, as participants were restricted to choosing from two of the four street ends. Consequently, this model’s generalizability to standard reorientation tasks remains uncertain.

Beyond these models, the view-matching model (Cheung et al., 2008) and the associative learning model (Miller & Shettleworth, 2007) offer mechanistic explanations for detecting and resolving conflicts between geometric and featural cues in the reorientation paradigm. However, the view-matching model is inapplicable to tasks involving self-motion cues, as spatial locations defined by such cues do not correspond to specific views. The associative learning model relies on feedback to adjust the association strength of the cues with the reward, but this type of feedback is absent in many navigation tasks.

In summary, several cognitive models have been proposed to explain navigation behaviors under conditions with cue conflicts, but they face significant limitations, including a lack of mechanistic explanations and restricted applicability across tasks. To address these issues, a more comprehensive model is needed – one that incorporates a mechanism for detecting and resolving cue conflicts, accounts for situations that either support or hinder cue integration, and applies broadly to navigation tasks. The Bayesian causal inference (BCI) model represents such a model.

The Bayesian causal inference (BCI) model provides a framework for understanding how the brain combines and processes information from multiple sensory sources to create a coherent and accurate perception of the external world (Körding et al., 2007). In the context of spatial navigation, the BCI model posits that the perceived location is inherently corrupted by intrinsic sensory noise, meaning that the perceived location often does not correspond to the true location as defined by the spatial cues. For example, the observer would perceive different self-locations at different times even when the same location is occupied. Therefore, the navigator cannot determine the true location at which they are actually positioned. Instead, the navigator infers the true self-location based on the perceived one, with a certain degree of uncertainty that is proportional to the amount of sensory noise inherent in the spatial input. This uncertainty inherent in perceiving self-location makes the causal structure inference a non-trivial problem, that is, determining whether different spatial inputs stem from the same location or different locations. For example, the perceived locations from different cues can be in large disparity even when the cues are congruent with each other in the physical space; conversely, the perceived locations from different cues can be close in space even when the cues are in substantial conflict in the physical space. Hence, the causal structure inference is not definitive and subject to uncertainty.

Fig. 1 illustrates the conceptual structure of the BCI model. The BCI model addresses two key questions. First, how do navigators judge whether spatial cues are congruent (i.e., sensory inputs originate from the same cause) or incongruent with each other (i.e., sensory inputs originate from different causes)? Second, what strategies do they employ to handle spatial cues based on this cue-congruence judgment?

Regarding the first question of cue-congruency judgment, the BCI model posits that an ideal observer combines the following three information sources: prior belief in a common source, prior knowledge about possible target locations, and sensory inputs stemming from different cues. The combination of sensory inputs and prior knowledge about target location generates the likelihood of a common cause, which is then combined with the prior belief in a common cause to generate the posterior belief in a common cause. All else equal, the probability of making a same-cause judgment increases with higher prior belief in common source, more widespread prior location distribution, and more similar sensory inputs.

Regarding the second question of cue-handling strategies, first, the BCI model conceives different sub-models corresponding to different cue-congruency judgments: cue integration for the common-cause judgment and cue segregation for the different-cause judgment (Körding et al., 2007, Wozny et al., 2010). In the integration sub-model, different spatial inputs are judged to be in congruency and thus are integrated, following the MLE principles. This cue integration yields the joint likelihood distribution, which is then integrated with the prior knowledge about the target location distribution to generate the posterior distribution, embodying the Bayesian theorem of combining likelihood and prior information. In the cue segregation sub-model, spatial inputs from the two cues are judged to be in conflict and thus not integrated. Typically, the task-relevant cue type is selected, whose likelihood distribution is then integrated with the prior knowledge about target location distribution to generate the posterior distribution. Next, the two posterior distributions derived from the cue integration sub-model and the cue segregation sub-model are combined using different decision rules, depending on the goal prioritized by the observer (Wozny et al., 2010).

To illustrate the benefits of applying the BCI model to understand navigation in cue-conflicting situations, consider the conflicts that are created when a landmark is relocated in space to be in conflict with other stable cues (commonly referred to as landmark instability). A frequent finding is that navigators rely less on unstable landmarks (Auger et al., 2015, Chen et al., 2017, Roy et al., 2023, Sjolund et al., 2018; but see Zhao & Warren 2015a). However, at the process level, it remains unclear why unstable landmarks reduce navigators’ reliance on them and what cognitive processes landmark instability affects. From the perspective of the BCI model, one possible explanation is that navigators gradually acquire knowledge about landmark instability statistics through experience, leading to a decreased prior belief in a common cause for unstable landmarks (Roy et al., 2023). A decreased prior belief in common-cause leads to a decreased posterior probability of a common cause, which entails more frequent different-cause judgments. If navigators assign a lower weight to landmarks in the different-cause judgment than in the same-cause judgment, the observed landmark reliance would be lessened.

However, other possibilities exist. For example, reduced landmark reliance observed in behavior could be caused by a lower weight assigned to landmarks in the different-cause judgment, while the prior belief in a common cause remains unchanged. Additionally, landmark instability may increase sensory noise of landmarks, which is typically reflected in poorer performance associated with landmarks (Auger et al., 2015, Biegler and Morris, 1993, Chen et al., 2017). The increased sensory noise lowers the weight assigned to landmarks in the common-cause judgment, as stipulated by the MLE rule of weighting cues by their relative reliabilities (Rohde et al., 2016). Reduced weight for landmarks in the common-cause judgment translates to reduced reliance on landmarks observed in behavior. In both cases, reduced reliance on landmarks emerges without changes in the prior belief in common cause or the frequency of making this judgment.

In summary, in this concrete example, spatial cue conflict potentially influences multiple cognitive processes to cause a decrease in navigators’ dependence on unstable landmarks. The BCI model comprises parameters that reflect distinct cognitive processes, allowing us to pin down the specific processes that are affected. Furthermore, the BCI model conceptualizes information as probability distributions that can be either continuous or discrete (see Section 4 “Cognitive Modeling” for details), it generalizes well across various navigation scenarios.

The BCI model not only accounts for navigation behavior in cue-conflict situations but also provides insights into cue combination suboptimality observed even with minimal cue conflicts present. Cue combination suboptimality refers to the situation when the observed response precision is lower than the prediction of the MLE model (Rohde et al., 2016). While some navigation studies suggest that people can integrate visual spatial cues (featural landmark cues or geometric cues) and body-based self-motion cues in an optimal or near-optimal manner (Chen et al., 2017, Nardini et al., 2008, Sjolund et al., 2018), others report suboptimal cue combination effects. For example, cue combination suboptimality occurs between different types of self-motion cues (visual optic flow vs. proprioceptive cues) (Chrastil et al., 2019) and between different visual landmarks (Newman & McNamara, 2022). Suboptimal cue combination behavior is also commonly observed in other perceptual domains (refer to Section 2.7.2 in Rahnev & Denison, 2018 for a summary).

One potential factor contributing to cue combination suboptimality is prior knowledge of stimulus distribution. Because this prior knowledge is shared across cue conditions, it causes correlated errors, reducing cue integration benefits (Oruç et al., 2003). The more precise the prior distribution of the target location, the stronger its influence, and the lower the gain in response precision from cue integration. When the stimulus distribution spans a relatively wide range and continuous responses are required, utilizing prior knowledge of stimulus distribution leads to a well-documented phenomenon known as the central tendency effect, wherein observers’ responses are biased towards the mean of the stimulus distribution (Hollingworth, 1910, Petzschner et al., 2015, Petzschner and Glasauer, 2011). Aston et al. quantified prior knowledge’s influence based on this effect and then excluded it from responses, uncovering the sensory cue integration process. However, this approach is constrained by the detectability of central tendency, which diminishes when the target range is narrow. Furthermore, with discrete distributions, such as those obtained in the reorientation paradigm (Cheng, 1986), the central tendency effect is challenging to quantify. In this case, the distribution mode should represent the central tendency, which can be complicated by multimodal distributions. Prior knowledge also shapes behavior in categorical tasks (Ratliff & Newcombe, 2008). In contrast, the BCI modeling approach offers broader applicability by accommodating prior knowledge beyond conditions that elicit the central tendency effect.

Besides prior knowledge of target distribution, some studies sought to explain suboptimal cue combination by attributing what cannot be explained by the MLE model to other forms of prior knowledge or prior preference (Byrne and Crawford, 2010, Kersten and Yuille, 2003, Qi and Mou, 2024). However, these studies often lack independent data to verify the use of the claimed prior knowledge or preference. The BCI model overcomes these problems by providing a unified framework that incorporates the dynamic interplay among multiple factors, including those contributing to suboptimal cue combination, with prior knowledge and prior belief as critical contributing factors. In doing so, the BCI model enhances our understanding of the broader question of spatial cues interaction, a central focus of navigation research.

Although the BCI model offers a valuable framework for understanding spatial navigation in both cue-conflicting and cue-congruent situations, its validity and robustness require evaluation through comparisons with alternative models. Previous studies have compared the BCI model to subsets of this model (e.g., full segregation or full integration) (de Winkel et al., 2017, de Winkel et al., 2018, Körding et al., 2007), to other Bayesian models (Körding et al., 2007), or to alternative variants of the BCI model (Badde et al., 2020, Wozny et al., 2010). However, none of these studies has compared the BCI model to a non-Bayesian model. A crucial element of Bayesian models is the use of prior information, which corresponds to the prior belief about causal structure and prior knowledge of the target distribution in the BCI model. In contrast, a non-Bayesian model should exclude prior information.

To address this gap, we propose the sensory disparity model, which employs a non-Bayesian causal inference mechanism (for details, see the Methods section). The primary distinction between this model and the BCI model lies in how causal structure judgments are made. The BCI model incorporates both sensory inputs and prior information (i.e., prior belief about causal structure and prior knowledge of the target distribution) in making causal structure judgments. In contrast, the sensory disparity model only relies on sensory inputs for making such judgments: the greater the absolute distance between sensory measurements from the two cue types, the lower the likelihood of a common-cause judgment. Furthermore, the sensory disparity model retains one key feature of the BCI model – the incorporation of sensory noise. Therefore, comparing the BCI model with the sensory disparity model provides a targeted test of the primary tenet of the BCI model, namely the use of prior information.

The overarching objective of the current study is to investigate spatial cue conflicts in navigation. We were especially interested in how navigators decide whether discrepancies between spatial inputs arise from sensory-perceptual error or indicate distinct environmental causes, and how they select goals accordingly. To accomplish this objective, we applied the BCI model to a spatial navigation task and compared it with a non-Bayesian sensory disparity model.

To test these models, we developed a novel cue combination paradigm along a linear track, building on paradigms established in our previous work (Chen et al., 2019, Chen et al., 2024, Kuehn et al., 2018). The task required participants to localize target locations by using either a visual landmark or visual self-motion cues (i.e., optic flow). When using the visual landmark, they need to estimate their distance to the landmark; when relying on visual self-motion information, they need to estimate their distance from the starting position of self-movement.

This task is limited compared to real-life spatial navigation, as it only probes one aspect of spatial navigation – distance estimation. Terrestrial spatial navigation is typically carried out in a two-dimensional space, involving angular estimation, distance estimation, and vector computations. Even so, distance estimation is an essential element for spatial navigation. For example, straight-line distances between locations are an essential component of survey knowledge, or a cognitive map. The importance of distance estimation extends beyond navigation. For example, time estimation is closely intertwined with spatial distance estimation (Riemer et al., 2022, Umbach et al., 2020), as it is essentially distance estimation in the temporal domain. Hence, investigating one-dimensional spatial distance estimation can help understand basic processes of spatial navigation and other related topics such as time perception.

Additionally, distance estimation is ubiquitous in real-life navigation, where cue conflicts often occur. Imagine navigating an unfamiliar city to find a café. You first follow a specific route, judging the distance you've traveled in a fixed direction to determine when to make a turn. Along the way, you also use a landmark, such as a vender’s booth in outdoor market, to confirm you are nearing your destination by estimating your distance to the landmark. Confusion arises when the vender moves to a different location. At this point, you must decide whether to rely on the distance you believe you’ve traveled along the route or adjust your judgment based on the perceived distance to the landmark. This scenario illustrates the challenges of reconciling conflicting spatial cues in distance estimation during navigation.

Moreover, our recent fMRI studies have demonstrated that linear navigation tasks engage key brain areas for spatial navigation, including the retrosplenial cortex, hippocampus, and entorhinal cortex (Chen et al., 2019, Chen et al., 2024, Chen et al., 2025, Chen et al., 2022). Linear navigation tasks are also widely used in electrophysiological studies on spatial navigation (Fischer et al., 2020, Mao et al., 2017, Saleem et al., 2018, to name a few). Hence, investigating spatial distance estimation in humans can facilitate inter-species comparisons and enhance our understanding of the cognitive-neural mechanisms underlying spatial navigation.

Finally, the use of the linear track navigation task allowed us to collect a substantial amount of data, which is essential for distinguishing complex cognitive models of a task (Lerche et al., 2017). This strength of the paradigm aligns with one of our objectives: to rigorously evaluate competing models for spatial cue conflicts.

In summary, this study examined how individuals resolve spatial cue conflicts during navigation in a linear track navigation task using the cognitive modeling approach. Our aim was to provide insights into the mechanisms underlying spatial cue interactions.

Comments (0)

No login
gif