New tools for the investigation of muscle fiber-type spatial distributions across histological sections

Motivating examples and data processing

To facilitate the ensuing discussion, we consider three example data sets (Fig. 1). These specimens represent cross-sections of the soleus muscles of three different mice, sections having been histochemically stained to identify fast fibers. The mouse soleus comprises primarily a mix of type I and type IIA fibers, so treatment of the data as a set of binary indicators is appropriate in this case. A casual visual inspection of the leftmost muscle suggests an area in the top left of the section with mainly slow fibers, and the bottom left has mainly fast fibers, but the rest of the section has no obvious patterns or clustering. The center example appears to have a higher proportion of slow fibers near the center of the section, with the fast fibers mostly grouped together around the edges. The rightmost specimen has an even distribution of both types across the whole section, with very little visually obvious grouping together of fibers of the same type.

Fig. 1figure 1

Digitized, geometrically treated versions of the three example sections, shown as tessellations (top row) or triangulations (bottom row)

In experimental settings, stained muscle sections are typically photographed through a microscope and digitally processed. The data required for statistical analysis are two-dimensional coordinates—\(\left(x,y\right)\)—that provide the centroid of each fiber, along with the classified fiber type. Using this information, standard geometric methods are used to derive a polygon approximating the overall shape of the muscle and to identify spatial neighbors of each fiber—yielding a “neighbor network”. Figure 1 shows two different ways of visualizing the result using our three example sections: a tessellation and a triangulation. The latter is preferable if explicit visualization of the neighbor network is desired. Additional details on the geometric techniques mentioned here appear in the supplementary materials.

Summary- and test-based methods

Here, we briefly review three existing methods developed as simple summaries and hypothesis tests for binary (i.e., fast/slow) muscle fiber data. While fast and convenient, they are focused on answering “global” questions related to type-specific randomness (or lack thereof) across the entire sample.

Mean cluster size

One of the first targeted statistical tests involved the analysis of the size of clusters of like-type fibers. Howel and Brunsdon [11] considered a null hypothesis of random scattering of a chosen fiber type across the muscle section. They implemented an iterative search algorithm to establish the mean cluster size (where “cluster” is simply defined as a contiguous group of like-type fibers) of the less prevalent fiber type. This statistic is compared against a 95% permutation envelope, empirically derived assuming randomness. If the mean cluster size for the fiber type of interest is abnormally large or small, this suggests that fibers tend to “attract” or “repel,” respectively, others of the same type. Exceedance by the test statistic of the permutation envelope on either side suggests statistically significant evidence against randomness. This approach is assumed reliable when the less prevalent fiber type makes up less than 30% of the fibers in the sample, which may not always be the case in practice.

Unlike neighbor pairs

A hypothesis test of a similar nature was developed by Venema [13]. He suggested using the number of unlike neighbor pairs as a test statistic. The rationale is that a section where the arrangement of the two fiber types is random should have a mix of pairs of neighbors of both slow, both fast, and one of each type. Strong like-type clustering in the section would result in fibers tending to appear beside other fibers of the same type, so there would be more pairs of neighboring fibers of the same type and hence fewer unlike neighbor pairs.

Venema [13] demonstrated the calculation of the mean and variance of the number of unlike neighbor pairs under a null hypothesis of randomness. Using the observed number of unlike neighbor pairs to this distribution, a test statistic can be found that is negative when the sample specimen exhibits like-type clustering and positive for like-type segregation. A p value can then be calculated to determine the statistical significance of any departure from randomness.

Abnormally grouped fibers

A more recent approach [18] involves calculating the proportion of “abnormally grouped” slow fibers as a descriptive statistic. Under the assumption of a random distribution, for each slow fiber, we calculate the mean and standard deviation for the number of other slow-type neighbors. If there are at least two neighboring fibers in a cluster where the number of other slow fibers neighboring each of these exceeds one standard deviation above the mean, all fibers in that cluster are subjectively classified as abnormally grouped.

Unlike the other methods described above, these proportions do not represent a formal statistical test and hence cannot be usefully interpreted in terms of whether the distribution is random, but are instead designed to be compared between different categories of sections with similar overall type-specific proportions in order to summarize differences in like-type behavior between groups. The authors of this approach observed heightened slow fiber grouping in older human muscles (60–75 years) compared to younger muscles (20–35 years). Unfortunately, the statistic is highly sensitive to the overall proportion of slow fibers since a higher proportion of slow fibers naturally leads to larger clusters. As such, only muscles with similar proportions of slow fibers should be compared using this statistic. See the supplementary material for further elucidation.

Model- and smoothing-based methods

Useful though they are, test-based methods like those discussed above have limited inferential scope. For example, hypothesis tests are not able to provide readily interpretable measures of strength or magnitude of any like-type behavior (merely answering a “yes/no” question of spatial randomness), nor are they able to offer insight into specific spatial structure (i.e., in terms of where in a given section like-type fibers might be more likely to cluster). For these types of questions, we need more sophisticated methods. Fortunately, there are statistical tools that lend themselves well to such analyses and we will review them here in the context of binary muscle fiber data.

Binary Markov random field

Venema [15, 16] was the first to suggest a model-based approach to the analysis of binary-valued muscle fiber data. His main proposal was to consider the two types of fibers as arising from the simplest form of a binary Markov random field (BMRF). Suppose the \(n\) fibers in a particular muscle section are defined as \(\mathbf=\_,\dots ,_\}\), where \(_\) is \(-1\) if the \(i\) th fiber is slow, or \(+1\) if it is fast. These are treated as a realization of a corresponding random variable \(\mathbf\). The BMRF is used to model the probabilities of different combinations of \(\mathbf\); expressed as a function of the fiber values with respect to the neighbor network:

$$}\left(\mathbf=\mathbf\right)\propto \mathrm\left\_+\beta \sum___\right\}$$

(1)

Here, \(\alpha\) and \(\beta\) are fixed, and the notation \(i\sim j\) indicates the second sum is taken only over each neighbor pair.

The key to understanding Eq. (1) is the interpretation of the parameters \(\alpha\) and \(\beta\). The former simply describes the balance between the numbers of slow and fast fibers—a value close to zero implies the counts are roughly equal; a negative value implies there are more slow than fast; a positive value yields more fast than slow. The interest lies typically in the value of the other parameter, \(\beta\), which controls the behavior between neighbors. When \(\beta\) is close to zero, this is equivalent to a “null” scenario where fast and slow types are randomly scattered throughout the sample. A negative value of \(\beta\) implies fibers of the same type are less likely to be neighbors—that they are negatively correlated spatially—like-type repulsion. A positive value of \(\beta\) dictates positive correlation—like-type attraction. Repulsion leads to a chequerboard effect, and attraction yields clusters of fibers of the same type. For details on the methods used to gain the parameter estimates of \(\alpha\) and \(\beta\) in practice, see Venema [16] or Davies et al. [17]. The estimated value of \(\beta\) may be compared to a numerically derived permutation envelope, which essentially acts as a benchmark for establishing whether it is significantly different from zero.

The BMRF approach thus has several advantages over the simpler tests. Not only may we ascertain the presence or absence of non-randomness in the spatial arrangement of the two fibers, but the strength and direction of any present like-type neighborly behavior may be formally quantified via the estimated value of \(\beta\) (and this interpretation is valid for any overall proportion of slow versus fast fibers).

Generalized additive model

One thing the simple BMRF described above cannot do is inform on explicit spatial patterning in subregions of a given muscle cross-section; i.e., localize precisely where in a specimen we might observe features of interest. Rather than simply deducing whether or not a given section possesses like-type clustering, it may be of interest to model the probability of a chosen fiber type occurring and look at how this probability varies spatially across the section, especially in muscles where the typical distribution of fibers is non-uniform [3].

The probability \(_\) that a given fiber \(i\) at coordinate \(\left(_,_\right)\) is fast can be modeled to be a smooth function of its position in the muscle section through the use of a generalized additive model (GAM) [21]. To our knowledge, GAMs have not previously been used to model spatial patterning in muscle fiber position data. A GAM is an extension of a generalized linear model (GLM) that allows the relationship between the predictors and the response to be smooth but non-linear. In our case, we model the log odds of a given fiber being fast via a logistic GAM, expressed as follows:

$$\mathrm\left(\frac_}_}\right)=_+__+__+\sum_^__\left(_,_\right)$$

(2)

This equation is very similar to the form of a logistic regression model but with terms added to allow for greater flexibility in the resulting trend surface. The functions \(_,\dots _\) are non-linear and are chosen to allow the resulting function to closely match the patterns in the data. A common choice is a thin plate spline regression basis [22]. Estimation of the \(\beta\) parameters (note that these are completely different to the \(\beta\) used in the BMRF method) relies on a “penalty” parameter we call \(\lambda\), which controls the “smoothness” of the overall result. There are data-driven techniques to choose an optimal \(\lambda\). For more details, see the supplement and the work by Wood [21, 22].

Once we have fitted a logistic GAM to a muscle section data set, we can visualize the trend in fiber type probabilities across the section, revealing potential features of interest. Unlike the previous methods outlined above, fitting a GAM does not require explicit knowledge of the neighborhood structure. However, the natural effect of smoothing means the probability for a given fiber will be influenced mainly by nearby fibers.

Multinomial GAM

A weakness of the methods described is that they are designed for data that are binary. It is in general not straightforward to extend the methodology to cases where there are more than two classes, for example, if we were to further classify type II fibers into subtypes IIa and IIb. The exception is the GAM which can easily be extended to allow for three (or more) fiber types. A multinomial GAM [21] has a categorical response variable with a probability associated with each category. These probabilities are described in a similar way to Eq. (2) and use the same functions \(_,\dots _\) from the thin plate regression spline basis. For three fiber types (I, IIa, and IIb), we construct two surfaces each with its own set of \(\beta\) parameters and smoothing parameter \(\lambda\). Because the probabilities are required to sum to 1 for each fiber, these two surfaces are used to determine probabilities for each of the three types.

Implementation

All of the above methods can be implemented in standard statistical software such as the R language [23]. We have developed a browser-based application [24] to showcase these techniques and produce relevant plots and conclusions for a supplied dataset. The user needs to supply a CSV file with columns for the x coordinate, y coordinate, and fiber type for each fiber. The app will pre-process the data and produce plots such as those in Fig. 1. Available options in the app include the mean cluster size and number of unlike neighbor pairs tests, as well as the “proportion abnormally grouped” statistic. It can also estimate and interpret the parameters for the BMRF model. For the logistic GAM model, the app calculates and plots the probabilities of each fiber being fast contracting/type II. If desired, the user can manually vary the value of the smoothing parameter \(\lambda\) from a standard default and visualize the resulting probability surface. The multinomial GAM is also available if the supplied data file has three fiber types, and the app calculates and plots the probability surfaces for each type. The application is accessible at https://www.stats.otago.ac.nz/software/muscles/ (uploaded data is not visible to any party beyond the user themselves and is permanently erased upon browser close or refresh).

留言 (0)

沒有登入
gif