API
Simulations
- spatialcorr_sim.simulate_gene_pair_within_region_varying_correlation(adata, gene_1, gene_2, clust_to_fisher_corr_mean=None, fisher_corr_mean=None, clust_to_bandwidth=None, bandwidth=None, clust_to_cov_strength=None, cov_strength=None, clust_key='cluster', row_key='row', col_key='col', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)
Simulate pairwise expression with non-varying correlation within each region, but differing correlation between regions.
- Parameters
- adataAnnData
The AnnData object storing the spatial expression data that is to be used to seed the simulation.
- gene_1string
The name of a gene on which to base the simulated expression values for the first gene. The simulated data’s first gene will have the same mean and variance as this gene.
- gene_2string
The name of a gene on which to base the simulated expression values for the second gene. The simulated data’s second gene will have the same mean and variance as this gene.
- clust_to_fisher_corr_meandictionary, optional (default
Map each cluster to the mean Fisher correlation between the two genes within that cluster. The correlation within each cluster will vary, but the Fisher-transformed correlations will vary around this mean.
- fisher_corr_meanfloat, optional (default
The mean Fisher correlation between the two genes within every cluster. If clust_to_fisher_corr_mean is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_fisher_corr_mean.
- clust_to_bandwidthdictionary, optional (default
Map each cluster/region to the bandwidth parameter used in the Gaussian kernel used to sample correlations within that cluster. Larger bandwidth parameters will produce coarser patterns of correlation.
- bandwidthfloat, optional (deault
The bandwidth parameter to use for all clusters. If clust_to_bandwidth is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_bandwidth.
- clust_to_cov_strengthdictionary, optional (default
Map each cluster/region to the size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene.
- cov_strengthfloat, optional (default
The size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene. If clust_to_cov_strength is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_cov_strength.
- clust_keystring, optional (default
The name of the column in adata.obs that stores the cluster/region ID of each spot.
- row_keystring (default
The name of the column in adata.obs that stores the row coordinates of each spot.
- col_keystring (default
The name of the column in adata.obs that stores the column coordinates of each spot.
- poissonboolean, optional (default
If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.
- size_factorsndarray, optional (default
A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.
- gene_to_clust_to_meandictionary, optional (default
A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
- gene_to_clust_to_vardictionary, optional (default
A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
- Returns
- corrsndarray
An N-length array, where N is the number of spots, storing the latent correlation used to generate expression at each spot.
- covsndarray
An N-length array, where N is the number of spots, storing the latent covariance used to generate expression at each spot.
- adata_simAnnData
A Nx2 simulated dataset for N spots and two genes.
- spatialcorr_sim.simulate_gene_set_within_region_varying_correlation(adata, genes, row_key='array_row', col_key='array_col', clust_to_bandwidth=None, bandwidth=None, clust_to_cov_strength=None, cov_strength=None, clust_key='cluster', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)
Simulate spatial gene expression for a set of genes for which the correlation matrix varies smoothly within each region.
- Parameters
- adataAnnData
The AnnData object storing the spatial expression data that is to be used to seed the simulation.
- geneslist
A G-length list of gene names for which to base the simulated expression values. Within each region, the simulated datas’ genes will have similar means and variances as these genes.
- clust_to_bandwidthdictionary, optional (default
Map each cluster/region to the bandwidth parameter used in the Gaussian kernel used to sample correlations within that cluster. Larger bandwidth parameters will produce coarser patterns of correlation.
- bandwidthfloat, optional (deault
The bandwidth parameter to use for all clusters. If clust_to_bandwidth is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_bandwidth.
- clust_to_cov_strengthdictionary, optional (default
Map each cluster/region to the size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene.
- cov_strengthfloat, optional (default
The size of the “correlation strength” (i.e., the scalar that scales the covariance matrix used in the Guassian process to generate a spatial pattern of correlation). Higher values lead to larger magnitudes for the correlation of each gene. If clust_to_cov_strength is not provided, this value will be used for all clusters/regions. Otherwise, this argument will be over-ridden by the values in clust_to_cov_strength.
- clust_keystring
The name of the column in adata.obs that stores the cluster/region ID of each spot.
- row_keystring (default
The name of the column in adata.obs that stores the row coordinates of each spot.
- col_keystring (default
The name of the column in adata.obs that stores the column coordinates of each spot.
- poissonboolean, optional (default
If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.
- size_factorsndarray, optional (default
A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.
- gene_to_clust_to_meandictionary, optional (default
A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
- gene_to_clust_to_vardictionary, optional (default
A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
- Returns
- corrsndarray
An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent correlation matrix used to generate expression at each spot.
- covsndarray
An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent covariance matrix used to generate expression at each spot.
- adata_simAnnData
A NxG simulated dataset for N spots and G genes.
- spatialcorr_sim.simulate_gene_pair_region_specific(adata, clust_to_corr, gene_1, gene_2, clust_key, row_key='row', col_key='col', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)
Simulate pairwise expression with non-varying correlation within each region, but differing correlation between regions.
- Parameters
- adataAnnData
The AnnData object storing the spatial expression data that is to be used to seed the simulation.
- clust_to_corr
Map each cluster to the correlation between the two genes within that cluster.
- gene_1string
The name of a gene on which to base the simulated expression values for the first gene. The simulated data’s first gene will have the same mean and variance as this gene.
- gene_2string
The name of a gene on which to base the simulated expression values for the second gene. The simulated data’s second gene will have the same mean and variance as this gene.
- clust_keystring
The name of the column in adata.obs that stores the cluster/region ID of each spot.
- row_keystring (default
The name of the column in adata.obs that stores the row coordinates of each spot.
- col_keystring (default
The name of the column in adata.obs that stores the column coordinates of each spot.
- poissonboolean, optional (default
If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.
- size_factorsndarray, optional (default
A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.
- gene_to_clust_to_meandictionary, optional (default
A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
- gene_to_clust_to_vardictionary, optional (default
A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
- Returns
- corrsndarray
An N-length array, where N is the number of spots, storing the latent correlation used to generate expression at each spot.
- covsndarray
An N-length array, where N is the number of spots, storing the latent covariance used to generate expression at each spot.
- adata_simAnnData
A Nx2 simulated dataset for N spots and two genes.
- spatialcorr_sim.simulate_gene_set_region_specific(adata, all_genes, spot_covs, clust_to_corr_mat, clust_key='cluster', row_key='row', col_key='col', poisson=False, size_factors=None, gene_to_clust_to_mean=None, gene_to_clust_to_var=None)
Simulate gene expression for a set of genes with non-varying correlation within each region, but differing correlation matrices between regions.
- Parameters
- adataAnnData
The AnnData object storing the spatial expression data that is to be used to seed the simulation.
- all_geneslist
A G-length list of gene names for which to base the simulated expression values. Within each region, the simulated datas’ genes will have similar means and variances as these genes.
- clust_keystring
The name of the column in adata.obs that stores the cluster/region ID of each spot.
- row_keystring (default
The name of the column in adata.obs that stores the row coordinates of each spot.
- col_keystring (default
The name of the column in adata.obs that stores the column coordinates of each spot.
- poissonboolean, optional (default
If False, sample expression values from a multivariate lognormal distribution at each spot. If True, these expression values are used to construct the mean counts, and counts are sampled from a Poisson-lognormal distribution.
- size_factorsndarray, optional (default
A N-length array, where N is the number of spots containing the size-factor (i.e., library size) for each spot. If poisson is set to True, then this argument must be provided.
- gene_to_clust_to_meandictionary, optional (default
A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression mean. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
- gene_to_clust_to_vardictionary, optional (default
A dictionary mapping each simulated gene to a dictionary that maps each cluster to the latent expression variance. If not provided, they will be estimated from the expression data in adata via a hierarchical Bayesian model.
- Returns
- corrsndarray
An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent correlation matrix used to generate expression at each spot.
- covsndarray
An NxGxG sized array, where N is the number of spots, and G is the number of genes, storing the latent covariance matrix used to generate expression at each spot.
- adata_simAnnData
A NxG simulated dataset for N spots and G genes.