Approximate simulation of cortical microtubule models using dynamical graph grammars

1.1. Overview

Dynamic graphs are graphs that change over time and are capable of encoding the changing state of complex systems. Graphs with local dynamics provide a mathematical framework for understanding changing relationships between objects. We can formulate and enable the use of dynamic graphs by providing a high level language for their dynamics. Dynamical graph grammars (DGGs) [1] allow for an expressive and powerful way to declare a set of local rules to model a complex, dynamic system with graphs. The DGG formalism is flexible for modeling and allows a wide array of models to be created in the natural sciences.

DGGs have well-defined meaning. They map graph dynamics into a master equation, a set of first order linear differential equations governing the time evolution of joint probability distributions of state variables of a dynamic system. Using operator algebra [2], DGGs can be simulated using an exact algorithm that subsumes Gillespie's stochastic simulation algorithm (SSA) [3], which is closely related to Kinetic Monte Carlo algorithms of statistical physics [4]. As does the SSA, the exact algorithm becomes slow for large systems. Using operator splitting, a faster approximate algorithm for spatially embedded graphs can be derived. In section 2 we discuss the preliminary work done on the approximate algorithm, and areas for improvement.

In this work we present a paradigm for model creation and demonstrate the utility provided by DGGs. In particular, we focus on one specific example in biology, plant cell division, and one particular system in that process, the cortical microtubule array (CMA). We further restrict our graphs, and require them to be spatially embedded in Euclidean space.

1.2. Biological motivation

Eukaryotic organisms comprise complex cells with many subsystems that are well suited to be modeled with dynamic graphs. Over time cells can divide allowing for a plant to grow, among other processes. Understanding the exact biomechanical mechanisms taking place during cell division in plants is a long-standing question [5], but it is known that there is a connection between division plane orientation in plants and a change in the orientation of cortical microtubules (MTs) associated with the plasma membrane [6], as they form the pre-prophase band (PPB). For example, one hypothesis for the PPB orientation process is 'survival of the aligned' [7], and another is alignment through selective katanin mediated severing [8]. The ensemble of MTs associated with the plasma membrane of the cell is the CMA. The question of how MTs contribute to cell shape and other processes during cell division motivated us to develop a simplified model for the dynamics found in the CMA, with potential to extend this work to larger systems with more complicated dynamics and interacting networks at different spatial scales.

1.3. Stochastic chemical kinetics

The SSA is an example of stochastic modeling, as opposed to the deterministic modeling approach [9]. In a deterministic approach, the time-continuous processes are wholly predictable and can be governed by the reaction-rate equations [10], a set of coupled ordinary differential equations (ODEs). The stochastic approach is a type of random-walk process that is completely encoded in the master equation. The master equation is itself a high dimensional linear differential equation, governing the rate at which probability flows through different states in the system. However, systems can become very large due to an exponential state space explosion with respect to the number of biological variables, and the systems may have infinite dimensional state spaces, making the analytical solution to the master equation computationally intractable or impossible.

The work of Gillespie [11] uses the Monte Carlo method and kinetic theory to rigorously derive the exact SSA for chemical kinetics. The derivation makes a case based on several assumptions about the systems, the most important being the system contains a large number of well-mixed molecules at thermal equilibrium. After making key assumptions, it is necessary to set reaction rates—which can be difficult to determine. Three routes for determining rates are lab measurements, giant ab initio quantum mechanical calculations or machine-learning generalizations therefrom, and parameter optimizations in the context of system-level observations together with the use of other known reactions rates that are more easily measurable. Finally, an event is sampled from a conditional density function. The Monte Carlo procedure does not give the analytic solution to the master equation, but it does yield an unbiased sample trajectory of a system. It effectively provides a realization by means of numerical simulation.

As powerful as the exact SSA is, it is prohibitively slow, since each reaction event must be computed in order. Numerous methods have been proposed to speed up the exact SSA. τ-Leap [12] fires all reactions in a window of τ before updating propensity functions, saving computation at the cost of errors. Later, it was made even more efficient [13]. R-Leaping [14] lets a preselected number of reactions fire in a simulation step, again at some cost in accuracy. The Exact R-Leap, 'ER-Leap' [15] modifies the R-Leaping algorithm to both be exact and provide a substantial speed up over SSA. ER-leap was later improved upon and parallelized in HiER-Leap [16]. More recently, S-Leap [17] was introduced as an adaptive, accelerated method that bridges the methods of τ-Leaping and R-Leaping. There are many other works on speeding up the original SSA as well.

Our work builds on this rich history and complements it. We are not just interested in solving the master equation for stochastic chemical kinetics. Instead, we'd like to solve a broader class of problems in biology and beyond, by representing the dynamics of spatially extended objects using graphs. The foundational work for the mathematical theory will be briefly discussed in the DGG formalism section and the curious reader may refer to [1, 2, 18] for more detailed information.

1.4. Graph theory

We will use the following graph theory and notation. A graph (undirected) $G = (V, E)$ is a set of V vertices and set of edges $E \subseteq \ | u,v \in V\} $, each an unordered pair of V, where the elements $u,v \in V$ are vertices. V(G) is the set of vertices of graph G and E(G) is the set of edges of graph G. Now, let $G = (V, E)$ and $G^} = (V^},E^})$. The graph G is homomorphic to $G^$ if there exists a mapping $f:V \longrightarrow V^}$ that preserves the adjacency of vertices i.e. $(v, v^) \in E \implies (f(v), f(v^)) \in E^$. We call this a homomorphism from G to $G^}$. If the function f is bijective and its inverse is a homomorphism, then f is also an [19] isomorphism.

A labeled graph G is defined as $G = (V, E, \alpha)$, where (V, E) is a graph and $\alpha : V \longrightarrow L$ is a function assigning labels to vertices. To each labeled graph there corresponds an (unlabeled) graph (V, E) without the labels. We define a label-preserving homomorphism of labeled graphs to be a graph homomorphism that preserves the labels exactly, without remapping them. We define a match to be an injective label-preserving graph homomorphism $G \hookrightarrow G^$. Informally, a match locates a 'copy' of G as a subgraph inside of $G^}$ for which vertices, edges, and labels are all preserved.

A labeled graph can be seen in figure 1. Here, the nodes are uniquely labeled using positive integers and the edges remain unlabeled. The discrete vertex labels have been mapped to a color set and visualized with those colors. In this case, the graph has no spatial embedding, so we could visualize the graph in many different ways.

Figure 1. A visual example of a graph labeled by number and color.

Standard image High-resolution image

A dynamic graph, G(t) is a graph that changes over time. The change can either be in the form of vertex/edge creation or destruction, or the change of label parameters. Mathematically, we write $G(t) = (V(t), E(t), \alpha_t)$, where $\alpha_t: V(t) \longrightarrow L$.

1.5. Extended objects and the expanded cell complex (ECC)

Declarative modeling of complex biological systems requires a way to describe non point-like extended objects [1]. Examples of extended objects are polymer networks in the cytoskeleton, and multi-cellular tissues. In this section, we use a mix of standard and non-standard definitions used in construction of extended objects and we introduce the 'ECC' used in the approximate simulation algorithm.

Graphs augmented with labels are expressive mathematical objects capable of a high level of abstraction, and we use these for the representations of all of our extended objects. As defined in [1], numbered graphs are special cases of labeled graphs that have unique consecutive non-negative integer labels for vertices. If the graph in figure 1 did not have colors assigned to nodes it would be a numbered graph. A graded graph, on the other hand, is a graph where vertices are labeled non-uniquely with a level number, associated with spatial resolution which can only differ by $\$ between neighbors. A stratified graph labels vertices by a non-negative integer 'dimension' of the stratum to which they belong. Graded stratified graphs have both dimension and level number vertex labels with suitable constraints.

A special case of stratified graphs is the abstract cell complex. The abstract cell complex is a graph that is used to represent the topology of a space in the manner of a CW cell complex [20]. It has further constraints on the dimension labels. A graded abstract cell complex can represent topological properties of the space with the addition of level numbers associated with spatial resolution.

'Expanding' is a process of consistently mapping each lower dimensional cell in a cell complex to a cell of the highest dimension in a new 'expanded' cell complex. By expanding as in [21, 22] all of the abstract cell complex cells with dimension numbers less than the maximum, we get an ECC. We only apply expansion to the interior of lower dimensional cells. In our case we apply expansion to a two level graded abstract cell complex with a coarse scale 2D grid that is refined to a finer scale 2D grid. Figure 2 exhibits a side by side visualization of a pre-expansion (figure 2(a)) cell complex and its post expansion (figure 2(b)) cell complex. In figure 2(b), lower dimension interior cells are expanded such that they always have a 'collar' width less than the cell of the dimension above, so that cells of the same dimension are always separated by at least a dimension-specific minimum distance. This key criteria of the ECC is well-separatedness, as will be explained more in the methods section. The 'collar' we refer to is related to the idea of a tubular neighborhood in differential topology [23]. Further discussion of cell complex theory can be found in [20, 2427].

Figure 2. Expanding the cell complex of a $4 \times 4$ Cartesian grid into well-separated lower dimensional cells. For this example, only the interior is expanded.

Standard image High-resolution image 1.6. Related work

There already has been work done to simulate plant MT dynamics [6, 28]. However, there is no other work to our knowledge that does it with dynamic graph grammars. The tool Plenum [29] implements DGGs as an embedded symbolic meta-programming language in Mathematica. However, it is not scalable because it uses the exact algorithm—which is only practical for small systems. Additionally, Plenum supports graphs by using unique object identifiers (OIDs), but does not directly support graphs as a native data structure. Improvements to the algorithm used in Plenum and a scalable solution is necessary to modernize the current work and allow for faster and more realistic results.

2.1. DGGs formalism

DGGs are a further refinement of the Dynamical Grammars (DGs) [18], which generalized Stochastic Parameterized Grammars (SPGs) [18] by the inclusion of differential equation rules. SPGs function to unify the formalism of generative grammars, stochastic processes, and dynamic systems. While SPGs can be applied to graphs, DGGs include all the related formalisims of SPGs and DGs, along with an additional and expressive modeling language framework for graphs. The semantics of the DGG formalism starts with DGG models $M_\mathrm$ in language $L_\mathrm$ and using a compositional map $\Psi_\mathrm$, maps the declarative grammar rules in the model to a valid dynamical system expressed by a master equation.

The master equation represents the time evolution of a continuous time Markov process. It can be written in the form:

Equation (1)

with the equation having the formal (but usually not practical) solution:

Equation (2)

W is called the model system's 'time-evolution operator', since it entirely specifies (in a probabilistic way, which can specialize to deterministic dynamics if need be) how the model evolves in time.

Let $\Psi(M) = W(M)$ be a semantic map over DGG models comprising rules indexed by r, and $\hat \equiv \hat__r \rightarrow \mathrm_r}$ be an operator that specifies the non-negative flow of probability between states under each rule r. Then Ψ is 'compositional' if it sums the operators $W_r$ over rules thusly:

Equation (3a)Equation (3b)Equation (3c)

where equation (3a) states that rule operators sum up, equation (3b) states rules conserve probability, and equation (3c) represents the summed conditional probability outflow per state. The operators for rules indexed by $r \in M$ map to the operator sum and the dynamics can be defined under the ME.

Parameterized grammar rules extend the pure reaction rules to an additional parameterized space and allow for a more expressive form of modeling. This gives rise to the SPG. The probability space for the SPG was defined in [29]. A form of a stochastic parameterized rule is:

Equation (4)

where $\tau_[x_p]$ and $\tau_[x_q]$ are the object types parameterized by parameters xp and xq . Note that xp and xq may be vectors. Again, r is the rule index, and Lr , Rr are the left and right hand side argument list indexed sets. So, p and q represent the position of $\tau_[x_p]$ and $\tau_[x_q]$ in their respective argument lists. Finally, $\rho_r([x_p], [\,y_q])$ is the reaction rate function of both the incoming and outgoing parameters. $\rho_r([x_p], [\,y_q]) \longrightarrow \mathbb^$ is a non-negative propensity rate function. If $\rho_r([x_p], [\,y_q])$ is integrable over output parameters it can be decomposed into a rate function over input parameters and a conditional probability over the output parameters:

Equation (5)

For clarity, grammar rules will be written decomposed in this manner.

Parameterized rules are encoded into the master equation and we can derive a simulation algorithm [2]. We can elevate these parameterized rules to include graphs by adding unique discrete object IDs [30] as parameters.

Using operator algebra [2] we can derive an exact time warping simulation algorithm [2] (algorithm 1), and add in differential equation rules:

Equation (6)

Here, everything remains the same in regard to notation, except the left hand side (LHS) and right hand side do not change in number or object type, but the parameters can evolve by solving a differential equation. When we combine these differential rules with the parameterized rules, we get DGs [16].

Algorithm 1. Exact hybrid SSA/ODE algorithm [2].   factor $\rho_r([x_p], [\,y_q]) = \rho_r([x_p]) * P([\,y_q] \: | \: [x_p])$;  while  $t \leqslant t_\mathrm$  do    initialize  SSA propensities as $\rho_r([x_p])$;   initialize  $\rho^\mathrm : = \sum_r \rho_r([x_p])$;   initialize  $\tau : = 0$;   draw  effective waiting time $\tau_\mathrm$ from $\exp)}$;   while  $\tau \lt \tau_\mathrm$  do     solve  ODE system, plus an extra ODE updating τ;      $\frac\tau}t} = \rho^\mathrm(t)$;   draw  reaction r from distribution $\rho_r([x_p])/\rho^\mathrm$;   draw  $[\,y_q]$ from $P([\,y_q] \: | \: [x_p])$ and execute reaction r;

The exact algorithm simulates a single trajectory of a continuous time stochastic process. Propensity functions are factored into a product of the rate function and a distribution of output parameters conditioned on input parameters, as in equation (5). While the simulation time is less than the maximum, we compute the time until the next reaction and modify the system when it occurs. The process is very similar to the standard SSA, but with the inclusion of the time warping equation. The warp equation is an ODE to keep track of the time until the next event and must be solved as part of the system of ODEs governing the time evolution of the parameters. When a reaction does occur, the state of the system is modified according to the rule instance selected and the parameters are sampled from the conditional distribution of the factored propensity function.

DGs [30] are able to handle graph representations using unique OID parameters. However, using OIDs to represent graphs decreases readability and natural expressiveness. Alternatively, using a formal graph notation [1], DGGs can be represented using a different form:

Equation (7)

Here $G\langle\!\langle\lambda\rangle\!\rangle$ is the LHS labeled graph with label vector λ and $G^\langle\!\langle\lambda^\rangle\!\rangle$ is the right hand side labeled graph with label vector $\lambda^$. G and $G^$ without their label vectors λ and $\lambda^$ are numbered graphs, so that the assignment of label component λi to graph node member i is unambiguously specified. We have the usual 'solving' and 'with' clauses. For examples of such graph grammar rules, see the supplementary material.

2.2. Approximating the exact simulation algorithm

The forgoing exact algorithm is powerful and works for multiple rules of different forms [2]; however, it is prohibitively slow for large systems. A single run of the exact algorithm yields only one trajectory. In practice thousands or more may be needed to be run to compute meaningful statistics or to recover outcome density functions. The type of rules we use are expressed as graphs [1] and extend previous work [30] by being a more efficient and readable representation of DGG rules compared to using the OIDs mentioned in the previous section.

We make two key assumptions in our approximation of the exact algorithm: spatial locality of the rules and well-separatedness of the cell complex used to decompose biological space into domains. Consider the spatial locality constraint for graphs. The system state comprises extended objects taking the form of labeled graphs. Each of the nodes in a graph is labeled with a vector-valued position parameter. Additional parameters are allowed and have no spatial constraint. Our graph grammar rules are made spatially local by virtue of their propensity functions. We use spatial locality to define local neighborhoods of rule firings. Any rule instantiations outside this neighborhood have zero or near zero propensity that decreases rapidly, for example exponentially, with distance. Hence, any two objects in the system that are too far apart have a very small chance of reacting, and their potential interactions are ignored.

Spatial locality also allows us to decompose the domain of the simulation space into smaller, well-separated geometric cells. In the context of the simulation algorithm, a 'cell' refers to a computational spatial domain, which differs from the biological notion of a cell. Such a geometric cell (geocell) is a cell of an ECC, labeled by the dimension of the corresponding cell in the unexpanded cell complex. Lower dimensional cells of the cell complex are expanded to be large enough to keep rule instances from spanning multiple same-dimensional geocells. An example can be seen in figure 2. By setting these geocells to be large enough (at least several factors larger than the exponential 'fall off distance'), we are able to logically map rule instances to well-separated geocells.

The operator W in equation (1) assumes a state space and specifies the probability flow on that space for all of the extended objects in our system. Considering equation (3a), $W = \sum$, the method we propose to approximate etW is an operator splitting algorithm that imposes a domain decomposition by means of an ECC that corresponds to summing operators, $W = \sum_ W_ = \sum_ W_$, over pre-expansion dimensions d, and cells c of each dimension:

Equation (8a)Equation (8b)Equation (8c)

Equation (8a) is a first-order operator splitting, by solution phases of fixed cell dimension, where $d \downarrow$ means we multiply from right to left in order of highest dimension to lowest. It incurs an approximate error of $\mathcal((t/n)^2)$. Equation (8b) uses the fact that the resulting cells c of fixed dimension d are all well-separated geometrically with enough margin (due to the 'collar' of dimension

留言 (0)

沒有登入
gif