Friday, September 4, 2015

A mean field model of the cell

In multicellular organisms, the same genetic material is shared by many distinct cell types. During embryonic development, tissue repair & growth, etc., cells can transition from one type to another. At least in "complex" animals (read "vertebrates") under normal conditions, cell type transitions across different species broadly share the following characteristics:

  1. Transitions are targeted (a cell can't just transit from any type to any other type) and directional (transitions are one way, i.e., bone marrow $\to$ blood, but not vice versa).
  2. Transitions are directed by a small set of molecular signals (morphogens). However, different cell types can respond to the same signal in distinct ways.
I don't know how widely true these characteristics are in very "simple" animals such as sponges or the hydra, or in plants (I know nothing about plant biology).

It also seems approximately true that organisms with smaller gene-regulatory networks (GRNs) have fewer cell types. There are ~250 DNA-binding transcription factors (TFs) and $\mathcal{O}(1)$ cell type in S. cerevisiae (yeast), ~1000 TFs and $\mathcal{O}(10)$ cell types in C. elegans (nematode), ~2500 TFs and $\mathcal{O}(100)$ cell types in humans. However, the scaling of cell type number with GRN size is actually kind of odd, since generically the number of states of a $N$-component network grows as $e^{N}$. The problem is not that we humans have more cell types than yeast, but that we only have ~100 times more.

I think these facts are probably related on a deep, physical level, by which I mean they are probably independent of the detailed interaction between specific genes and specific proteins in specific organisms. I suspect any sufficiently complex GRN will exhibit these characteristics.

Let's consider a minimalist $N$-component GRN governed by \begin{equation}\label{model} \frac{ds_{i}}{dt} = -\frac{s_i}{\tau_i} + r_{i}(\vec{s}), \quad i=1, \dots, N. \end{equation} Here $s_i$ is the expression level of gene $i$. I make no distinction between mRNA and protein (more on this in a later post), and by "genes" I mean genes encoding TFs. The term $-s_i/\tau_i$ describes the degradation of gene products over time (for simplicity I will set $\tau_i=1$ for all $i$), and $r_i(\vec{s})$ is the rate of transcription/translation from gene $i$. The rate is $\vec{s}$-dependent due to possible gene-regulatory activity by $j, k, \dots$, as well as $i$ itself.

Now I assume the net gene-regulatory activity on $i$ from $i, j, k, \dots$ is described by a mean field $\phi_i(\vec{s})$, and that $r_i(\vec{s})=r[\phi_i(\vec{s})]$ is an universal function of $\phi_i$ only. In particular, I take $r(\phi_i) = r(\phi_i - \phi_0)$ to be a S-shaped function centered on a threshold $\phi_0 > 0$ and bounded by $0 \leq r(\phi_i) \leq 1$. The strength scale of TF-gene interactions is given by $\phi_0$: gene-regulatory interaction where $\vert J_{ij} \vert < \phi_0$ can be regarded as weak, while interaction where $\vert J_{ij} \vert > \phi_0$ can be regarded as strong. We will set $\phi_0 = 1$ without loss of generality (see update below).

Figure 1. The assumed functional form of $r(\phi_i)$.

I don't know what $\phi_i(\vec{s})$ looks like, but I can expand it as a series, \begin{equation}\label{phi_expansion} \phi_i(\vec{s}) = \sum_j J_{ij} s_{j} + \frac{1}{2}\sum_{j, k} K_{ijk}s_{j}s_{k} + \dots. \end{equation} The terms of the series can be interpreted as binding of $i$'s regulatory elements by one, two, or more TFs. I will truncate the expansion to retain only one-body TF-gene interactions parametrized by $J_{ij}$. The sign ($\pm$) of $J_{ij}$ describes whether $j$ promotes or inhibits the expression of $i$.

As defined, (\ref{model}) describes a large, non-linear dynamical system, and in general we expect such a system will be multi-stable. I will assume experimentally observable cell types correspond to stable fixed points of the model. The state $\vec{s}=0$, i.e. death, where nothing is being expressed, is a stable fixed point of the model. GRN states that are not fixed points will evolve on a time scale set by the lifetime of gene products (mRNA and protein), which is ~1 hour. Cells corresponding to a non-stationary state of the GRN will not be around long enough for us to capture and study it.

Does this model do anything interesting? Of course it does; otherwise I wouldn't be writing about it. This is an on-going research project, and these posts are essentially a set of notes-to-self to collect my thoughts in a semi-organized fashion.

Updated September 6, 2015. If $J_{ij}$s are i.i.d. with zero mean and variance $\sigma_J^2$, then $\phi_i = \sum_j J_{ij}s_j$ will be i.i.d. with zero mean and variance $\sigma_\phi^2 = N\sigma_J^2/2$. The typical size of $\vert\phi_i\vert \approx \sigma_\phi$ will grow as $\sqrt{N}$; this is dynamically equivalent letting $\phi_0 \to 0$ as $1/\sqrt{N}$. Maintaining dynamical similarity between GRNs of different sizes requires either rescaling $\phi_0 \to \sqrt{N/2}\phi_0$, or rescaling $\sigma_J \to \sigma_J/\sqrt{N/2}$.