API

Simulations

spatialcorr_sim.simulate_gene_pair_within_region_varying_correlation(adata, gene_1, gene_2, clust_to_fisher_corr_mean=None, fisher_corr_mean=None, clust_to_bandwidth=None, bandwidth=None, clust_to_cov_strength=None, cov_strength=None, clust_key='cluster', row_key='row', col_key='col', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)

Simulate pairwise expression with non-varying correlation within each region, but differing correlation between regions.

Parameters
adataAnnData

The AnnData object storing the spatial expression data that is to be used to seed the simulation.

gene_1string

The name of a gene on which to base the simulated expression values for the first gene. The simulated data’s first gene will have the same mean and variance as this gene.

gene_2string

The name of a gene on which to base the simulated expression values for the second gene. The simulated data’s second gene will have the same mean and variance as this gene.

clust_to_fisher_corr_meandictionary, optional (default

Map each cluster to the mean Fisher correlation between the two genes within that cluster. The correlation within each cluster will vary, but the Fisher-transformed correlations will vary around this mean.

fisher_corr_meanfloat, optional (default

The mean Fisher correlation between the two genes within every cluster. If clust_to_fisher_corr_mean is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_fisher_corr_mean.

clust_to_bandwidthdictionary, optional (default

Map each cluster/region to the bandwidth parameter used in the Gaussian kernel used to sample correlations within that cluster. Larger bandwidth parameters will produce coarser patterns of correlation.

bandwidthfloat, optional (deault

The bandwidth parameter to use for all clusters. If clust_to_bandwidth is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_bandwidth.

clust_to_cov_strengthdictionary, optional (default

Map each cluster/region to the size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene.

cov_strengthfloat, optional (default

The size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene. If clust_to_cov_strength is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_cov_strength.

clust_keystring, optional (default

The name of the column in adata.obs that stores the cluster/region ID of each spot.

row_keystring (default

The name of the column in adata.obs that stores the row coordinates of each spot.

col_keystring (default

The name of the column in adata.obs that stores the column coordinates of each spot.

poissonboolean, optional (default

If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.

size_factorsndarray, optional (default

A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.

gene_to_clust_to_meandictionary, optional (default

A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

gene_to_clust_to_vardictionary, optional (default

A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

Returns
corrsndarray

An N-length array, where N is the number of spots, storing the latent correlation used to generate expression at each spot.

covsndarray

An N-length array, where N is the number of spots, storing the latent covariance used to generate expression at each spot.

adata_simAnnData

A Nx2 simulated dataset for N spots and two genes.

spatialcorr_sim.simulate_gene_set_within_region_varying_correlation(adata, genes, row_key='array_row', col_key='array_col', clust_to_bandwidth=None, bandwidth=None, clust_to_cov_strength=None, cov_strength=None, clust_key='cluster', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)

Simulate spatial gene expression for a set of genes for which the correlation matrix varies smoothly within each region.

Parameters
adataAnnData

The AnnData object storing the spatial expression data that is to be used to seed the simulation.

geneslist

A G-length list of gene names for which to base the simulated expression values. Within each region, the simulated datas’ genes will have similar means and variances as these genes.

clust_to_bandwidthdictionary, optional (default

Map each cluster/region to the bandwidth parameter used in the Gaussian kernel used to sample correlations within that cluster. Larger bandwidth parameters will produce coarser patterns of correlation.

bandwidthfloat, optional (deault

The bandwidth parameter to use for all clusters. If clust_to_bandwidth is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_bandwidth.

clust_to_cov_strengthdictionary, optional (default

Map each cluster/region to the size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene.

cov_strengthfloat, optional (default

The size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene. If clust_to_cov_strength is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_cov_strength.

clust_keystring

The name of the column in adata.obs that stores the cluster/region ID of each spot.

row_keystring (default

The name of the column in adata.obs that stores the row coordinates of each spot.

col_keystring (default

The name of the column in adata.obs that stores the column coordinates of each spot.

poissonboolean, optional (default

If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.

size_factorsndarray, optional (default

A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.

gene_to_clust_to_meandictionary, optional (default

A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

gene_to_clust_to_vardictionary, optional (default

A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

Returns
corrsndarray

An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent correlation matrix used to generate expression at each spot.

covsndarray

An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent covariance matrix used to generate expression at each spot.

adata_simAnnData

A NxG simulated dataset for N spots and G genes.

spatialcorr_sim.simulate_gene_pair_region_specific(adata, clust_to_corr, gene_1, gene_2, clust_key, row_key='row', col_key='col', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)

Simulate pairwise expression with non-varying correlation within each region, but differing correlation between regions.

Parameters
adataAnnData

The AnnData object storing the spatial expression data that is to be used to seed the simulation.

clust_to_corr

Map each cluster to the correlation between the two genes within that cluster.

gene_1string

The name of a gene on which to base the simulated expression values for the first gene. The simulated data’s first gene will have the same mean and variance as this gene.

gene_2string

The name of a gene on which to base the simulated expression values for the second gene. The simulated data’s second gene will have the same mean and variance as this gene.

clust_keystring

The name of the column in adata.obs that stores the cluster/region ID of each spot.

row_keystring (default

The name of the column in adata.obs that stores the row coordinates of each spot.

col_keystring (default

The name of the column in adata.obs that stores the column coordinates of each spot.

poissonboolean, optional (default

If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.

size_factorsndarray, optional (default

A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.

gene_to_clust_to_meandictionary, optional (default

A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

gene_to_clust_to_vardictionary, optional (default

A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

Returns
corrsndarray

An N-length array, where N is the number of spots, storing the latent correlation used to generate expression at each spot.

covsndarray

An N-length array, where N is the number of spots, storing the latent covariance used to generate expression at each spot.

adata_simAnnData

A Nx2 simulated dataset for N spots and two genes.

spatialcorr_sim.simulate_gene_set_region_specific(adata, all_genes, spot_covs, clust_to_corr_mat, clust_key='cluster', row_key='row', col_key='col', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)

Simulate gene expression for a set of genes with non-varying correlation within each region, but differing correlation matrices between regions.

Parameters
adataAnnData

The AnnData object storing the spatial expression data that is to be used to seed the simulation.

all_geneslist

A G-length list of gene names for which to base the simulated expression values. Within each region, the simulated datas’ genes will have similar means and variances as these genes.

clust_keystring

The name of the column in adata.obs that stores the cluster/region ID of each spot.

row_keystring (default

The name of the column in adata.obs that stores the row coordinates of each spot.

col_keystring (default

The name of the column in adata.obs that stores the column coordinates of each spot.

poissonboolean, optional (default

If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.

size_factorsndarray, optional (default

A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.

gene_to_clust_to_meandictionary, optional (default

A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

gene_to_clust_to_vardictionary, optional (default

A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

Returns
corrsndarray

An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent correlation matrix used to generate expression at each spot.

covsndarray

An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent covariance matrix used to generate expression at each spot.

adata_simAnnData

A NxG simulated dataset for N spots and G genes.