API

Simulations

spatialcorr_sim.simulate_gene_pair_within_region_varying_correlation(adata, gene_1, gene_2, clust_to_fisher_corr_mean=None, fisher_corr_mean=None, clust_to_bandwidth=None, bandwidth=None, clust_to_cov_strength=None, cov_strength=None, clust_key='cluster', row_key='row', col_key='col', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)

Simulate pairwise expression with non-varying correlation within each region, but differing correlation between regions.

Parameters

adataAnnData: The AnnData object storing the spatial expression data that is to be used to seed the simulation.
gene_1string: The name of a gene on which to base the simulated expression values for the first gene. The simulated data’s first gene will have the same mean and variance as this gene.
gene_2string: The name of a gene on which to base the simulated expression values for the second gene. The simulated data’s second gene will have the same mean and variance as this gene.
clust_to_fisher_corr_meandictionary, optional (default: Map each cluster to the mean Fisher correlation between the two genes within that cluster. The correlation within each cluster will vary, but the Fisher-transformed correlations will vary around this mean.
fisher_corr_meanfloat, optional (default: The mean Fisher correlation between the two genes within every cluster. If clust_to_fisher_corr_mean is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_fisher_corr_mean.
clust_to_bandwidthdictionary, optional (default: Map each cluster/region to the bandwidth parameter used in the Gaussian kernel used to sample correlations within that cluster. Larger bandwidth parameters will produce coarser patterns of correlation.
bandwidthfloat, optional (deault: The bandwidth parameter to use for all clusters. If clust_to_bandwidth is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_bandwidth.
clust_to_cov_strengthdictionary, optional (default: Map each cluster/region to the size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene.
cov_strengthfloat, optional (default: The size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene. If clust_to_cov_strength is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_cov_strength.
clust_keystring, optional (default: The name of the column in adata.obs that stores the cluster/region ID of each spot.
row_keystring (default: The name of the column in adata.obs that stores the row coordinates of each spot.
col_keystring (default: The name of the column in adata.obs that stores the column coordinates of each spot.
poissonboolean, optional (default: If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.
size_factorsndarray, optional (default: A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.
gene_to_clust_to_meandictionary, optional (default: A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
gene_to_clust_to_vardictionary, optional (default: A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

Returns

corrsndarray: An N-length array, where N is the number of spots, storing the latent correlation used to generate expression at each spot.
covsndarray: An N-length array, where N is the number of spots, storing the latent covariance used to generate expression at each spot.
adata_simAnnData: A Nx2 simulated dataset for N spots and two genes.

spatialcorr_sim.simulate_gene_set_within_region_varying_correlation(adata, genes, row_key='array_row', col_key='array_col', clust_to_bandwidth=None, bandwidth=None, clust_to_cov_strength=None, cov_strength=None, clust_key='cluster', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)

Simulate spatial gene expression for a set of genes for which the correlation matrix varies smoothly within each region.

Parameters

adataAnnData: The AnnData object storing the spatial expression data that is to be used to seed the simulation.
geneslist: A G-length list of gene names for which to base the simulated expression values. Within each region, the simulated datas’ genes will have similar means and variances as these genes.
clust_to_bandwidthdictionary, optional (default: Map each cluster/region to the bandwidth parameter used in the Gaussian kernel used to sample correlations within that cluster. Larger bandwidth parameters will produce coarser patterns of correlation.
bandwidthfloat, optional (deault: The bandwidth parameter to use for all clusters. If clust_to_bandwidth is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_bandwidth.
clust_to_cov_strengthdictionary, optional (default: Map each cluster/region to the size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene.
cov_strengthfloat, optional (default: The size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene. If clust_to_cov_strength is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_cov_strength.
clust_keystring: The name of the column in adata.obs that stores the cluster/region ID of each spot.
row_keystring (default: The name of the column in adata.obs that stores the row coordinates of each spot.
col_keystring (default: The name of the column in adata.obs that stores the column coordinates of each spot.
poissonboolean, optional (default: If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.
size_factorsndarray, optional (default: A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.
gene_to_clust_to_meandictionary, optional (default: A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
gene_to_clust_to_vardictionary, optional (default: A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

Returns

corrsndarray: An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent correlation matrix used to generate expression at each spot.
covsndarray: An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent covariance matrix used to generate expression at each spot.
adata_simAnnData: A NxG simulated dataset for N spots and G genes.

spatialcorr_sim.simulate_gene_pair_region_specific(adata, clust_to_corr, gene_1, gene_2, clust_key, row_key='row', col_key='col', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)

Simulate pairwise expression with non-varying correlation within each region, but differing correlation between regions.

Parameters

adataAnnData: The AnnData object storing the spatial expression data that is to be used to seed the simulation.
clust_to_corr: Map each cluster to the correlation between the two genes within that cluster.
gene_1string: The name of a gene on which to base the simulated expression values for the first gene. The simulated data’s first gene will have the same mean and variance as this gene.
gene_2string: The name of a gene on which to base the simulated expression values for the second gene. The simulated data’s second gene will have the same mean and variance as this gene.
clust_keystring: The name of the column in adata.obs that stores the cluster/region ID of each spot.
row_keystring (default: The name of the column in adata.obs that stores the row coordinates of each spot.
col_keystring (default: The name of the column in adata.obs that stores the column coordinates of each spot.
poissonboolean, optional (default: If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.
size_factorsndarray, optional (default: A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.
gene_to_clust_to_meandictionary, optional (default: A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
gene_to_clust_to_vardictionary, optional (default: A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

Returns

corrsndarray: An N-length array, where N is the number of spots, storing the latent correlation used to generate expression at each spot.
covsndarray: An N-length array, where N is the number of spots, storing the latent covariance used to generate expression at each spot.
adata_simAnnData: A Nx2 simulated dataset for N spots and two genes.

spatialcorr_sim.simulate_gene_set_region_specific(adata, all_genes, spot_covs, clust_to_corr_mat, clust_key='cluster', row_key='row', col_key='col', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)

Simulate gene expression for a set of genes with non-varying correlation within each region, but differing correlation matrices between regions.

Parameters

adataAnnData: The AnnData object storing the spatial expression data that is to be used to seed the simulation.
all_geneslist: A G-length list of gene names for which to base the simulated expression values. Within each region, the simulated datas’ genes will have similar means and variances as these genes.
clust_keystring: The name of the column in adata.obs that stores the cluster/region ID of each spot.
row_keystring (default: The name of the column in adata.obs that stores the row coordinates of each spot.
col_keystring (default: The name of the column in adata.obs that stores the column coordinates of each spot.
poissonboolean, optional (default: If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.
size_factorsndarray, optional (default: A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.
gene_to_clust_to_meandictionary, optional (default: A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
gene_to_clust_to_vardictionary, optional (default: A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.

Returns

corrsndarray: An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent correlation matrix used to generate expression at each spot.
covsndarray: An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent covariance matrix used to generate expression at each spot.
adata_simAnnData: A NxG simulated dataset for N spots and G genes.