The problem of extracting a complete system of invariant features of a signal arises in signal processing (including images) as well as in automatic pattern recognition or automatic classification and diagnostics. It is necessary to separate the information about the characteristics of the signal from the information about the transformations that this signal has undergone. These transformations (e.g. shift, image rotation, scale conversion, etc.) cannot be controlled, a visual system must identify the image independent from its location, and the transformations should not affect the performance of the system. Therefore, the images that pass into each other under some specific transformations must be classified as equivalent.
It should be noted that the human or mammalian visual system to some extent is capable to extract the invariant features of a signal. It took some effort to understand the functions necessary for visual form perception. Based on the knowledge about the receptive field (RF) reactions and about the visual system in Glezer (1995); Kulikowski and Bishop (1981); Lindeberg (2013); Shapley and Lennie (1985); Schnitzler (1976) a theory was developed according to which the visual system performs spatial-frequency image filtering. To adapt this effect to general signal processing we have to understand how this filtering works. Let us consider in more detail the visual path from the retina to the cerebral cortex of primates. Visual information enters the retina and is transmitted to the brain through about a million nerve fibers that are united in the optic nerve (Jeffries et al., 2014; Hubel & Weisel, 1977). Most of the optic nerve fibers reach without interruption two cell nuclei that are located deep in the brain. These nuclei are called lateral geniculate nuclei (LGN). In turn, neurons of the LGN send their axons directly to the primary visual cortex (Jeffries et al., 2014; Ghodrati et al., 2017).
The RF of a retinal ganglion cell refers to the synaptic network of photoreceptors, bipolar, horizontal, and amacrine cells which come together to this one ganglion cell (Hubel & Weisel, 1977). A concentric RF has the central zone where the receptors are stimulated to give response and a peripheral inhibitory ring (off-on), or conversely, an inhibitory central zone and a peripheral ring that gives response (on-off). Concentric fields are used to describe the image by points (Hubel & Weisel, 1977). Mathematically, the spatiotemporal RF is defined by an impulse response function (weight function) describing the firing-rate response to a tiny spot which is on for a very short time (Mobarhan et al., 2018). Some methods of modelling this weight functions were developed with difference of arouse and inhibitory Gaussians (Mobarhan et al., 2018; Eiber et al., 2018; Einevoll & Heggelund, 2000; Lindeberg, 2021; Jacobsen et al., 2016; Bertalmío, 2019), with Gabor elements (Paik & Ringach, 2011; Lee et al., 2016; Cope et al., 2009) and others (Li et al., year; Bertalmío, 2019).
Retinal ganglion cells project sensory information to the lateral geniculate neurons of the thalamus. LGN neurons replicate the center-surround structure of their presynaptic partners (Hubel & Weisel, 1962). Yet, this does not mean that the thalamic fields are direct copies of those in the retina. A single LGN neuron might receive inputs from multiple ganglion cells, hence the spatial information sent from retina is remixed (Alonso et al., 2006). One LGN neuron is overlapped with the ON (OFF) sub-region of the RF of retinal cells according to feedforward or feedback excitation (Usrey et al., 1999; Lian et al., 2019). Hence RFs of the LGN describe spatial patterns of light and dark regions around an average illuminance level in the visual field (Hubel & Weisel, 1977; Lian et al., 2019; Liu et al., 2021). The neural networks of the visual cortex do not present a contour picture of the incoming image which exists on the retina (Glezer, 1995). For effective processing of incoming signals it is therefore necessary to reduce the information redundancy, which, based on known studies, can be assumed to be realized in the LGN (Zabbah et al., 2014). Let us look into more details of this process.
Step 1The first transformation of information in the visual system is already threefold: informations are transformed from photoreceptors through horizontal cells to bipolar cells (Barnes et al., 2020). The optical system of the eye projects the image on the retinal layer of photoreceptors, while a horizontal cell summarizes the arousal of a large number of photoreceptors. Then the average of the summarized signals is subtracted (using the reverse suppressed signal) from the signals coming from the photoreceptors to the bipolar cells (Yeonan-Kim & Bertalmío, 2016). This is because the horizontal cells relate the photoreceptors and bipolar cells with sufficient long connections which are parallel to the layers of the retina (Glezer, 1995; Hubel & Weisel, 1977). The human eye can operate in a wide range of illumination changes: about 11 orders, whereas separate neurons in the retina and also cells on a more higher layer may change the activity within only 2 orders. This small change is one of the components of the adaptation process to the level of illumination (Zapp et al., 2022). After the adaptation a new zero level is created (Barnes et al., 2020). The distribution of the illumination can now be described in terms of areas which are brighter or darker relative to the average level (Glezer, 1995; Yeonan-Kim & Bertalmío, 2016; Barnes et al., 2020). Hence, in the first step we characterize the field of vision by M input signal points \(y_\text (i)\), \(i=1,\ldots , M\). We model the distribution function y(i) of illumination transmitted to the receptive fields of the retinal ganglion cells (the first transformation of the signal) by
$$\begin y(i)=y_\text (i) -\overline},\ i =\overline, \end$$
where \(\overline}=\sum y_\text (i)/M\) is the average of the illumination for the visual field. Contacts of bipolars with ganglion cells are executed by amacrine cells that play the role of interneurons, and under transition from bipolars to ganglion cells the arousal transforms the analog impulse to a digital one, which justify the use of discrete functions to simulate the processes in the visual system. Applying this transformation to the incoming signal, we provide the invariance property under illumination variations (Lindeberg, 2013).
Step 2The concentric RF responds with a certain number of impulses per unit time (Hubel & Weisel, 1977) (with discrete functions). Consequently, an adequate simulation should use functions of discrete arguments. The LGN is increasingly regarded as a “smart-gating” operator for processing visual information in which the relatively elaborate features are computed early in the visual pathway (Tang et al., 2016). It was shown that at high contrasts neurons of LGN tuned to higher frequencies, which enhances the use of high frequency in the analysis of images (Glezer, 1995). As contrast increases, the RFs overlap and the response function becomes alternating with respect to the average illuminance level (Glezer, 1995). For efficient coding in digital image processing linear filters should be orthogonal (Cho & Choi, 2014). Therefore, in Cho and Choi (2014)it was proposed to describe the RF of a retinal ganglion cells in terms of an orthogonal wavelet basis. Since the structure of the LGN inherits the structure of the retina RF, we can assume the efficiency of using orthogonal basis functions for modeling the LGN RF.
Using experimental data regarding the visual system (Glezer, 1995; Hubel & Weisel, 1977) it was suggested to describe the weight functions of the RF of the LGN with Krawtchouk functions (Vainerman & Filimonova, 1994; Vainerman, 1992). Polynomials which are orthogonal on the set of points \(\\) with respect to the binomial distribution were introduced by Mykhailo Krawtchouk, hence the denomination Krawtchouk functions. They are also called normalized Krawtchouk polynomials (Vainerman, 1992; Al-Utaibi et al., 2021; Jahid et al., 2017; Karmouni et al., 2017) in analogy with Hermite functions referring the fact that Krawtchouk polynomials are the discrete analogues of Hermite polynomials (Krawtchouk, 1929; Area et al., 2014), or weighted Krawtchouk polynomials ( Venkataramana and Raj, 2011; Idan et al., 2020; Yap et al., 2003), weighted and normalized Krawtchouk polynomials (Zhang et al., 2010; Abdulhussain et al., 2018; Asli & Flusser, 2014) or Krawtchouk functions (den Brinker, 2021).
To give a precise description of these functions we recall the definition of the binomial distribution \( \rho (i)\in \mathbb \):
$$ \rho (i) = \frac p^(1-p)^, \ i=1,2,\ldots ,N, $$
where \( N\in \mathbb \) and \( p\in \mathbb \) with \( 0< p <1 \) are given beforehand. We introduce following notation for the Krawtchouk polynomials: \(K_^(i,N)\), here n is the order of the polynomial and p plays the role of a coefficient of asymetry. To compute Krawtchouk polynomials we can use the following recurrent relation (Vainerman, 1992):
$$\begin K_^(i,N) =&1;\ \ \ K_^(i,N) = i-pN.\\ K_^(i,N) =&(i-n-p(N-2n))\frac^(i,N)}\\&- p(1-p)(N-n+1)\frac^(i,N)}, \end$$
with \( n,i=1,\ldots ,N\).
The orthogonality relation for the Krawtchouk polynomials reads
$$\begin \sum _^ K_^(i,N)K_^(i,N)\rho (i) = \delta _N!\frac(1-p)^}. \end$$
For applications the property of symmetry is important:
$$\begin K_^(i,N) = (-1)^K_^(N-i,N). \end$$
(1)
By analogy with the Hermite functions, the corresponding Krawtchouk functions are determined as
$$\begin F_^(i)=K_^(i)\sqrt/\sqrt/n!(N-n)!}, \end$$
(2)
where \(N\in \mathbb \) and \(p\in \mathbb \), \(0< p <1\), are set in advance (Vainerman, 1992) (Fig. 1). Krawtchouk polynomials are orthogonal, but not normalized, however, for Krawtchouk functions we have
$$\begin \sum _^F_^(i ,N)F_^(i ,N)=\delta _. \end$$
(3)
We also obtain a recurrence relation for the orthonormalized Krawtchouk functions \(F_n\):
$$\begin F_^ =&\!\sqrt(1 -p)^/i !(N -i)i !},\\ F_^ =&\!(i -qN)\sqrt(1 -p)^(N -1)!/i !(N -i)!}, \\ F_^ =&\!(i -n -p(N -2n))\frac^}}\\&- \sqrt}F_^(i ,N), \end$$
where \(i = \overline\), \(n =\overline\), \(0< p <1\). In Al-Utaibi et al. (2021); Jahid et al. (2017); Karmouni et al. (2017); Idan et al. (2020); Yap et al. (2003); Zhang et al. (2010); Abdulhussain et al. (2018); Asli and Flusser (2014); Venkataramana and Raj (2011) recurrence algorithms for fast computation of the Krawtchouk functions are presented.
The symmetry relation Eq. 1 for the polynomials turns into an analogous formula for the Krawtchouk functions
$$\begin F_^(i, N) = (-1)^ n F_^(N-i, N). \end$$
(4)
In Fig. 1 the first four Krawtchouk functions \(F_^\), \(n=1,2,3,4\) are shown for different coefficients of asymmetry \(p=0.1, 0.5, 0.9\).
Fig. 1The first four Krawtchouk functions for values a) \(p = 0.1\), b) \(p = 0.5\), c) \(p = 0.9\); \(i = \overline\)
Fig. 2The inclusion of saccades in the model of the visual system
Step 3Mathematically, the model of the RF is defined by impulse response weight functions. Assuming linearity the response of the RF to any input signal y(i) can be found by convolving it with these weight functions (Mobarhan et al., 2018; Lindeberg, 2021; Liu et al., 2021; Cho & Choi, 2014). As mentioned in the previous step we model the weight functions with Krawtchouk functions. The main specific feature of our model is the introduction of the shift operator which we use to parameterize the basis functions. The idea is the following: the system receives a signal y(i) with an impulse component of unkown location \(a_\) as a hidden variable which will be determined later with our algorithm. We calculate the generalized spectral coefficients of the signal \(y(i,a_)\) related to a set of basis functions \(\^(i-a, N),n= \overline\}\) for all possible values \(a =\overline\) of the shift operator. Applying a standard procedure from the theory of image recognition we glue the ends of the visual field. The novelty of our approach lies in the resulting set of linear transformations by using the functions \(F_^((i-a)^ \text\,N, N), a =\overline, 0< p < 1\). In this case, we will get the maximum response exactly when the value of the shift a coincides with the hidden value \(a_\) or, more precisely with the temporary location of \(y(i,a_)\) (Fig. 2).
In addition, the Krawtchouk functions contain the parameter p, which determines their degree of asymmetry and allows us to find the basis with the best match to the shape of the signal \(y(i,a_)\). In the visual system, the shift of the concentric RF (accordingly, basis functions) throughout the visual field can be related to eye saccades. Eye saccades are fast eye movements that provide an oriented shift of the gaze toward the location of an object. By positioning the image of an object on the retinal region of highest acuity - the fovea, the eye saccades increase the number of neurons activated by the presence of the object; a recruitment which likely contributes to improve its identification and localization (Goffart, 2017; Sarrabezolles et al., 2017) (Fig. 2). The eye saccades depend on the relative properties of the visual targets (e.g. size and energy), their landing position adjusts to the larger or to the brighter target in the visual field (Heeman et al., 2014). Thus, among all possible responses of saccadic eye movements, one can choose the position of RF oriented to the larger or to the brighter target with the maximum energy. We include saccadic eye movements in our visual system model because it makes it possible to separate information about the location of an object from its shape, in other words it ensures the invariance of object recognition regardless of its location (Fig. 2). The site of saccadic suppression is likely to be the LGN (Ghodrati et al., 2017), which agrees completely with our model. We assume that saccades are not so much suppressed in the LGN as they are used to determine the position of an object, which exhausts their role in the processing of visual information.
Hence we end up with generalized spectral coefficients \(c_^(a,a_)\) of the following form
Comments (0)