Groups | Search | Server Info | Login | Register


Groups > comp.sys.dec > #4223

Biology Simulation |LINK|

Newsgroups comp.sys.dec
Date 2024-01-24 04:35 -0800
Message-ID <da6f88f0-3ea2-45b6-838e-e05e849b971bn@googlegroups.com> (permalink)
Subject Biology Simulation |LINK|
From Dorotha Grant <grantdorotha809@gmail.com>

Show all headers | View raw


PySB is a framework for building mathematical models of biochemical systems as Python programs. PySB abstracts the complex process of creating equations describing interactions among multiple proteins or other biomolecules into a simple and intuitive domain specific programming language (see example below), which is internally translated into BioNetGen or Kappa rules and from there into systems of equations. PySB makes it straightforward to divide models into modules and to call libraries of reusable elements (macros) that encode standard biochemical actions. These features promote model transparency, reuse and accuracy. PySB also interoperates with standard scientific Python libraries such as NumPy, SciPy and SymPy, enabling model simulation and analysis.


Oglethorpe University biology majors Kerin Mejia, Roxine Rattray and Camila Triana recently had the opportunity to participate in a shadowing experience at nearby Emory University Nursing Learning Center (ENLC).



biology simulation

Download File https://t.co/osdZBVVKIK 






ENLC is a state-of-the-art simulation and skills lab that provides students who are pursuing careers in healthcare with simulated real-life scenarios via interactive learning technology and immersive experiential learning environments. The program utilizes a large collection of simulators and structured practice sessions to help students attain core skill sets ranging from invasive clinical skills to communication and team building skills.


An option for the Internal Assessment Individual Project is to use data from a computer simulation or model rather than from a traditional "hands-on" experiment in which there's actual physical interaction with equipment, materials, and specimens. This page has links to websites with opportunities for simulations and modeling in all areas of the life sciences.


Simulation is a technique of studying and analyzing the behavior of a real world system or process by mimicking it on a computer application. A simulation works using a mathematical model that describes the structure, then one or more variables of the mathematical model are changed. Users can observe the resulting changes in the other variables, enabling them to predict the behavior of the real world system being studied. An example would be a Johnson Labs simulation of mitosis (a link from Online Labs in Biology) that explores the ways in which a person's cells decide if and when they will divide. In the simulation the user investigates how different cells make the decision to divide by first selecting a particular,tissue and then alters the conditions to which that tissue is exposed. The user then monitors the intracellular chain of events that leads to cell division.


As single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed, and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here, we present the Splatter Bioconductor package for simple, reproducible, and well-documented simulation of scRNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types, or differentiation paths.


In this paper we present Splatter, an R Bioconductor package for reproducible and accurate simulation of single-cell RNA sequencing data. Splatter is a framework designed to provide a consistent interface to multiple published simulations, enabling researchers to quickly simulate scRNA-seq count data in a reproducible fashion and make comparisons between simulations and real data. Along with the framework we have developed our own simulation model, Splat, and show how it compares to previously published simulations based on real datasets. We also provide a short example of how simulations can be used for assessing analysis methods.


Currently, Splatter implements six different simulation models, each with their own assumptions but accessed through a consistent, easy-to-use interface. These simulations are described in more detail in the following sections and in the documentation for each simulation in Splatter, which also describes the required input parameters.


The Splatter simulation process consists of two steps. The first step estimates the parameters required for the simulation from a real dataset. The result of the first step is a parameters object unique to each simulation model. These objects have been designed to hold the information required for the specific simulation and display details such as which parameters can be estimated and which have been changed from the default value. It is important that each simulation has its own object for storing parameters as different simulations can vary greatly in the information they require. For example, some simulations only need parameters for well-known statistical distributions while others require large vectors or matrices of data sampled from real datasets.


Splatter is also able to compare SCESet objects. These may contain simulations with different models or different parameters, or real datasets from which parameters have been estimated. The comparison function takes one or more SCESet objects, combines them (keeping any cell or gene-level information that is present in all of them) and produces a series of diagnostic plots comparing aspects of scRNA-seq data. The combined datasets are also returned, making it easy to produce additional comparison plots or statistics. Alternatively, one SCESet can be designated as a reference, such as the real data used to estimate parameters, and the difference between the reference and the other datasets can be assessed. This approach is particularly useful for comparing how well simulations recapitulate real datasets. Examples of these comparison plots are shown in the following sections.






Splatter provides implementations of our own simulation model, Splat, as well as several previously published simulations. The previous simulations have either been published as R code associated with a paper or as functions in existing packages. By including them in Splatter, we have made them available in a single place in a more accessible way. If only a script was originally published, such as the Lun [18] and Lun 2 [19] simulations, the simulations have been re-implemented in Splatter. If the simulation is available in an existing R package, for example, scDD [20] and BASiCS [21], we have simply written wrappers that provide consistent input and output but use the package implementation. We have endeavored to keep the simulations and estimation procedures as close as possible to what was originally published while providing a consistent interface within Splatter. The six different simulations currently available in Splatter are described below.


The scDD package aims to test for differential expression between two groups of cells but also more complex changes such as differential distributions or differential proportions [20]. This is reflected in the scDD simulation, which can contain a mixture of genes simulated to have different distributions, or differing proportions where the expression of the gene is multi-modal. This simulation also samples information from a real dataset. As the scDD simulation is designed to reproduce a high quality, filtered dataset, it only samples from genes with less than 75% zeros. As a result, it only simulates relatively highly expressed genes. The Splatter package simply provides wrapper functions to the simulation function in the scDD package, while capturing the necessary inputs and outputs needed to compare to other simulations. The full details of the scDD simulation are described in the scDD package vignette [24].


The BASiCS package introduced a model for separating variation in scRNA-seq data into biological and technical components based on the expression of external spike-in controls [21]. This model also enables cell-specific normalization and was extended to detect differential expression between groups of cells [25]. Similar to the scDD simulation, Splatter provides a wrapper for the BASiCS simulation function, which is able to produce datasets with both endogenous and spike-in genes as well as multiple batches of cells. As the BASiCS simulation contains both biological and technical variation, it can be used to test the ability of methods to distinguish between the two.


We have developed the Splat simulation to capture many features observed in real scRNA-Seq data, including high expression outlier genes, differing sequencing depths (library sizes) between cells, trended gene-wise dispersion, and zero-inflation. Our model uses parametric distributions with hyper-parameters estimated from real data (Fig. 1). The core of the Splat simulation is the gamma-Poisson hierarchical model where the mean expression level for each gene \( i \), \( i=1,\dots, N \), is simulated from a gamma distribution and the count for each cell \( j \), \( j=1,\dots, M \), is subsequently sampled from a Poisson distribution, with modifications to include expression outliers and to enforce a mean-variance trend.


More specifically, the Splat simulation initially samples gene means from a Gamma distribution with shape \( \alpha \) and rate \( \beta \). While the gamma distribution is a good fit for gene means it does not always capture extreme expression levels. To counter this a probability (\( \pi^O \)) that a gene is a high expression outlier can be specified. We then add these outliers to the simulation by replacing the previously simulated mean with the median of the simulated gene means multiplied by an inflation factor. The inflation factor is sampled from a log-normal distribution with location \( \mu^O \) and scale \( \sigma^O \).

 f5d0e4f075


Back to comp.sys.dec | Previous | Next | Find similar


Thread

Biology Simulation |LINK| Dorotha Grant <grantdorotha809@gmail.com> - 2024-01-24 04:35 -0800

csiph-web