Data Integration (Batch correction)#

Batch effects are changes in gene expression due to batches arise by different handling conditions such as , library depth, machines, Days, Stress management during extraction, even samples etc.

But selecting batch and label key is important . according to requirement of keeping batch

general, one can say that Harmony and Seurat consistently perform well for simple batch correction tasks, and scVI, scGen, scANVI, and Scanorama perform well for more complex data integration tasks.

Note

Previous Feature selection used Normalized , scaled data . But for Batch correction it is important to use RawData and find variable genes based on batch (Not on whole data)

It is important to use Rawdata

  • RawData as adata2

  • We also use Filted DM data as adata

Check Batch correction Needed ?#

#@title Load DM reduced filtered data:

adata_QCNFSDM =  sc.read_h5ad("/content/drive/MyDrive/scRNA_using_Python/Objects/sc_QCNFSDM_covid.h5ad")

adata_QCNFSDM
AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std', 'highly_deviant', 'binomial_deviance', 'highly_variable'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'doublet_colors', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'tsne', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_tsne', 'X_umap'
    varm: 'PCs'
    layers: 'APR_counts', 'ambient_counts', 'counts', 'log1p_norm'
    obsp: 'connectivities', 'distances'
#@title Load Raw Data :

adata_raw =  sc.read_h5ad("/content/drive/MyDrive/scRNA_using_Python/Objects/adata_raw_covid.h5ad")

adata_raw.layers["counts"] = adata_raw.X
adata_raw
AnnData object with n_obs × n_vars = 9000 × 33538
    obs: 'type', 'sample', 'batch'
    var: 'gene_ids', 'feature_types', 'genome'
    layers: 'counts'
adata_raw.obs
type sample batch
AGGGTCCCATGACCCG-1-0 Covid covid_1 0
TACCCACAGCGGGTTA-1-0 Covid covid_1 0
CCCAACTTCATATGGC-1-0 Covid covid_1 0
TCAAGTGTCCGAACGC-1-0 Covid covid_1 0
ATTCCTAGTGACTGTT-1-0 Covid covid_1 0
... ... ... ...
CGCATAATCTTACGGA-14-5 Ctrl ctrl_14 5
GAGGCCTTCTCCTGCA-14-5 Ctrl ctrl_14 5
CCCTAACAGTTTCTTC-14-5 Ctrl ctrl_14 5
GGGATGATCAAGCTTG-14-5 Ctrl ctrl_14 5
CAATGACCACTGCATA-14-5 Ctrl ctrl_14 5

9000 rows × 3 columns

#@title Assign key for batch and label 
label_key = "type"
batch_key = "sample"
adata_raw.obs[batch_key].value_counts()
covid_1     1500
covid_15    1500
covid_17    1500
ctrl_5      1500
ctrl_13     1500
ctrl_14     1500
Name: sample, dtype: int64
adata_raw.var["feature_types"].value_counts()
Gene Expression    33538
Name: feature_types, dtype: int64
#@title filtering to make sure we have no features with zero counts

sc.pp.filter_genes(adata_raw, min_cells=1)
adata_raw
AnnData object with n_obs × n_vars = 9000 × 21830
    obs: 'type', 'sample', 'batch'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells'
    layers: 'counts'

we also need to re-normalise the data. Here we just normalise using global scaling by the total counts per cell.

#@title simple normalize log1p

adata_raw.X = adata_raw.layers["counts"].copy()
sc.pp.normalize_total(adata_raw)
sc.pp.log1p(adata_raw)
adata_raw.layers["logcounts"] = adata_raw.X.copy()

adata_raw
AnnData object with n_obs × n_vars = 9000 × 21830
    obs: 'type', 'sample', 'batch'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells'
    uns: 'log1p'
    layers: 'counts', 'logcounts'

sc.pp.highly_variable_genes(adata_raw)
sc.tl.pca(adata_raw)
sc.pp.neighbors(adata_raw)
sc.tl.umap(adata_raw)
adata_raw
AnnData object with n_obs × n_vars = 9000 × 21830
    obs: 'type', 'sample', 'batch'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'distances', 'connectivities'
#@title Batch correction needed ?

adata_raw.uns[batch_key + "_colors"] = [
    "#1b9e77",
    "#d95f02",
    "#7570b3",
]  # Set custom colours for batches
sc.pl.umap(adata_raw, color=[label_key, batch_key], wspace=1)
../../_images/4922e68bcece773e1e29b5c5ea4159f4ccec6ee7cd3f533bd35e86776668b9ee.png

We need batch correction according to sample batch

Batch Correction#

#@title Feature selection wrt Batch :

sc.pp.highly_variable_genes(
    adata_raw, n_top_genes=2000, flavor="cell_ranger", batch_key=batch_key
)
adata_raw
adata_raw.var
gene_ids feature_types genome n_cells highly_variable means dispersions dispersions_norm highly_variable_nbatches highly_variable_intersection
AL627309.1 ENSG00000238009 Gene Expression GRCh38 25 False 0.001361 0.555600 -0.151289 0 False
AL627309.3 ENSG00000239945 Gene Expression GRCh38 1 False 0.000041 0.061465 -0.002369 0 False
AL669831.5 ENSG00000237491 Gene Expression GRCh38 461 False 0.026669 0.737217 -0.089099 0 False
FAM87B ENSG00000177757 Gene Expression GRCh38 7 False 0.000366 0.368892 0.569010 1 False
LINC00115 ENSG00000225880 Gene Expression GRCh38 208 False 0.011747 0.654689 -0.411236 0 False
... ... ... ... ... ... ... ... ... ... ...
AC007325.4 ENSG00000278817 Gene Expression GRCh38 42 False 0.002123 0.618409 -0.435802 0 False
AL354822.1 ENSG00000278384 Gene Expression GRCh38 43 False 0.003140 0.822280 1.033842 2 False
AC004556.1 ENSG00000276345 Gene Expression GRCh38 400 False 0.022188 0.670790 0.646585 1 False
AC233755.1 ENSG00000275063 Gene Expression GRCh38 4 False 0.000313 0.385842 0.627800 0 False
AC240274.1 ENSG00000271254 Gene Expression GRCh38 61 False 0.003973 0.839225 0.555512 1 False

21830 rows × 10 columns

  • highly_variable_nbatches - The number of batches where each gene was found to be highly variable

  • highly_variable_intersection - Whether each gene was highly variable in every batch

  • highly_variable - Whether each gene was selected as highly variable after combining the results from each batch

#@title No of Batches eaach gene

n_batches = adata_raw.var["highly_variable_nbatches"].value_counts()
ax = n_batches.plot(kind="bar")
n_batches
0    14794
1     4371
2     1440
3      618
4      286
5      175
6      146
Name: highly_variable_nbatches, dtype: int64
../../_images/cef0151bd6761e2b124f0ee8150dec2e2fb2153cc73f0dc31aaff25ae4e4eb9b.png

most genes are not highly variable.

#@title create object to use for Integration

adata_hvg = adata_raw[:, adata_raw.var["highly_variable"]].copy()
adata_hvg
AnnData object with n_obs × n_vars = 9000 × 2000
    obs: 'type', 'sample', 'batch'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'distances', 'connectivities'

scvi Data integration (Batch correction)#

adata_scvi = adata_hvg.copy()

adata_scvi
AnnData object with n_obs × n_vars = 9000 × 2000
    obs: 'type', 'sample', 'batch'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'distances', 'connectivities'

prepare our AnnData object. This step stores some information required by scVI such as which expression matrix to use and what the batch key is.


scvi.model.SCVI.setup_anndata(adata_scvi, layer="counts", batch_key=batch_key)
adata_scvi
AnnData object with n_obs × n_vars = 9000 × 2000
    obs: 'type', 'sample', 'batch', '_scvi_batch', '_scvi_labels'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors', '_scvi_uuid', '_scvi_manager_uuid'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'distances', 'connectivities'
model_scvi = scvi.model.SCVI(adata_scvi)
model_scvi
SCVI Model with the following params: 
n_hidden: 128, n_latent: 10, n_layers: 1, dropout_rate: 0.1, dispersion: gene, gene_likelihood: zinb, 
latent_distribution: normal
Training status: Not Trained
Model's adata is minified?: False

model_scvi.view_anndata_setup()
Anndata setup with scvi-tools version 0.20.3.

Setup via `SCVI.setup_anndata` with arguments:
{
'layer': 'counts',
'batch_key': 'sample',
'labels_key': None,
'size_factor_key': None,
'categorical_covariate_keys': None,
'continuous_covariate_keys': None
}

         Summary Statistics         
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃     Summary Stat Key      Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│         n_batch             6   │
│         n_cells           9000  │
│ n_extra_categorical_covs    0   │
│ n_extra_continuous_covs     0   │
│         n_labels            1   │
│          n_vars           2000  │
└──────────────────────────┴───────┘
               Data Registry                
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Registry Key     scvi-tools Location    ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│      X         adata.layers['counts']   │
│    batch      adata.obs['_scvi_batch']  │
│    labels     adata.obs['_scvi_labels'] │
└──────────────┴───────────────────────────┘
                   batch State Registry                   
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃   Source Location    Categories  scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['sample']   covid_1             0          │
│                       covid_15            1          │
│                       covid_17            2          │
│                        ctrl_5             3          │
│                       ctrl_13             4          │
│                       ctrl_14             5          │
└─────────────────────┴────────────┴─────────────────────┘
                     labels State Registry                      
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃      Source Location       Categories  scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['_scvi_labels']      0                0          │
└───────────────────────────┴────────────┴─────────────────────┘
max_epochs_scvi = np.min([round((20000 / adata_scvi.n_obs) * 400), 400])
max_epochs_scvi
400
model_scvi.train()
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 400/400: 100%|██████████| 400/400 [04:31<00:00,  1.47it/s, loss=429, v_num=1]
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=400` reached.
Epoch 400/400: 100%|██████████| 400/400 [04:31<00:00,  1.48it/s, loss=429, v_num=1]
adata_scvi.obsm["X_scVI"] = model_scvi.get_latent_representation()

adata_scvi
AnnData object with n_obs × n_vars = 9000 × 2000
    obs: 'type', 'sample', 'batch', '_scvi_batch', '_scvi_labels'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors', '_scvi_uuid', '_scvi_manager_uuid'
    obsm: 'X_pca', 'X_umap', 'X_scVI'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'distances', 'connectivities'
sc.pp.neighbors(adata_scvi, use_rep="X_scVI")
sc.tl.umap(adata_scvi)
adata_scvi
AnnData object with n_obs × n_vars = 9000 × 2000
    obs: 'type', 'sample', 'batch', '_scvi_batch', '_scvi_labels'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors', '_scvi_uuid', '_scvi_manager_uuid'
    obsm: 'X_pca', 'X_umap', 'X_scVI'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'distances', 'connectivities'
sc.pl.umap(adata_scvi, color=[label_key, batch_key], wspace=1)
../../_images/3af4b180b936ad995bfb355a47923e31a45e67fe31fc26d4fac7e5ad3ae2c896.png

Much Better . But we didn’t get corrected metrix

BBKNN Integration#

An important parameter for BBKNN is the number of neighbors per batch. A suggested heuristic for this is to use 25 if there are more than 100,000 cells or the default of 3 if there are fewer than 100,000.

neighbors_within_batch = 25 if adata_hvg.n_obs > 100000 else 3
neighbors_within_batch
3
adata_bbknn = adata_hvg.copy()
adata_bbknn.X = adata_bbknn.layers["logcounts"].copy()
sc.pp.pca(adata_bbknn)
bbknn.bbknn(
    adata_bbknn, batch_key=batch_key, neighbors_within_batch=neighbors_within_batch
)
adata_bbknn
AnnData object with n_obs × n_vars = 9000 × 2000
    obs: 'type', 'sample', 'batch'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'distances', 'connectivities'
sc.tl.umap(adata_bbknn)
sc.pl.umap(adata_bbknn, color=[label_key, batch_key], wspace=1)
../../_images/4a7a6fb8c66818ac3c0785e7f8ddfdb8e9dc408d72caf5937821c6238a552740.png

Much Better

scanoama#

To run Scanorama, you need to install python-annoy (already included in conda environment) and scanorama with pip. We can run scanorama to get a corrected matrix with the correct function, or to just get the data projected onto a new common dimension with the function integrate. Or both with the correct_scanpy and setting return_dimred=True. For now, run with just integration.

First we need to create individual AnnData objects from each of the datasets.

adata_scanorama = adata_hvg.copy()

adata_scanorama
AnnData object with n_obs × n_vars = 9000 × 2000
    obs: 'type', 'sample', 'batch'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'distances', 'connectivities'
# split per batch into new objects.
batches = adata_scanorama.obs['sample'].cat.categories.tolist()
alldata = {}
for batch in batches:
    alldata[batch] = adata_scanorama[adata_scanorama.obs['sample'] == batch,]

alldata 
{'covid_1': View of AnnData object with n_obs × n_vars = 1500 × 2000
     obs: 'type', 'sample', 'batch'
     var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
     uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
     obsm: 'X_pca', 'X_umap'
     varm: 'PCs'
     layers: 'counts', 'logcounts'
     obsp: 'distances', 'connectivities',
 'covid_15': View of AnnData object with n_obs × n_vars = 1500 × 2000
     obs: 'type', 'sample', 'batch'
     var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
     uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
     obsm: 'X_pca', 'X_umap'
     varm: 'PCs'
     layers: 'counts', 'logcounts'
     obsp: 'distances', 'connectivities',
 'covid_17': View of AnnData object with n_obs × n_vars = 1500 × 2000
     obs: 'type', 'sample', 'batch'
     var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
     uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
     obsm: 'X_pca', 'X_umap'
     varm: 'PCs'
     layers: 'counts', 'logcounts'
     obsp: 'distances', 'connectivities',
 'ctrl_5': View of AnnData object with n_obs × n_vars = 1500 × 2000
     obs: 'type', 'sample', 'batch'
     var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
     uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
     obsm: 'X_pca', 'X_umap'
     varm: 'PCs'
     layers: 'counts', 'logcounts'
     obsp: 'distances', 'connectivities',
 'ctrl_13': View of AnnData object with n_obs × n_vars = 1500 × 2000
     obs: 'type', 'sample', 'batch'
     var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
     uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
     obsm: 'X_pca', 'X_umap'
     varm: 'PCs'
     layers: 'counts', 'logcounts'
     obsp: 'distances', 'connectivities',
 'ctrl_14': View of AnnData object with n_obs × n_vars = 1500 × 2000
     obs: 'type', 'sample', 'batch'
     var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
     uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
     obsm: 'X_pca', 'X_umap'
     varm: 'PCs'
     layers: 'counts', 'logcounts'
     obsp: 'distances', 'connectivities'}
adata_scanorama.var['highly_variable_nbatches']

var_select = adata_scanorama.var['highly_variable_nbatches'] > 2
var_genes = var_select.index[var_select]
len(var_genes)
1225
#subset the individual dataset to the variable genes we defined at the beginning
alldata2 = dict()
for ds in alldata.keys():
    print(ds)
    alldata2[ds] = alldata[ds][:,var_genes]

#convert to list of AnnData objects
adatas = list(alldata2.values())

# run scanorama.integrate
scanorama.integrate_scanpy(adatas, dimred = 50)
covid_1
covid_15
covid_17
ctrl_5
ctrl_13
ctrl_14
Found 1225 genes among all datasets
[[0.         0.606      0.34933333 0.53266667 0.38866667 0.25733333]
 [0.         0.         0.63866667 0.478      0.26133333 0.19266667]
 [0.         0.         0.         0.52266667 0.20933333 0.20733333]
 [0.         0.         0.         0.         0.84133333 0.76466667]
 [0.         0.         0.         0.         0.         0.84133333]
 [0.         0.         0.         0.         0.         0.        ]]
Processing datasets (4, 5)
Processing datasets (3, 4)
Processing datasets (3, 5)
Processing datasets (1, 2)
Processing datasets (0, 1)
Processing datasets (0, 3)
Processing datasets (2, 3)
Processing datasets (1, 3)
Processing datasets (0, 4)
Processing datasets (0, 2)
Processing datasets (1, 4)
Processing datasets (0, 5)
Processing datasets (2, 4)
Processing datasets (2, 5)
Processing datasets (1, 5)
#scanorama adds the corrected matrix to adata.obsm in each of the datasets in adatas.

adatas[0].obsm['X_scanorama'].shape
(1500, 50)
# Get all the integrated matrices.
scanorama_int = [ad.obsm['X_scanorama'] for ad in adatas]

# make into one matrix.
all_s = np.concatenate(scanorama_int)
print(all_s.shape)

# add to the AnnData object, create a new object first
adata_sc = adata_scanorama.copy()
adata_sc.obsm["Scanorama"] = all_s
(9000, 50)
adata_sc
AnnData object with n_obs × n_vars = 9000 × 2000
    obs: 'type', 'sample', 'batch'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'sample_colors', 'type_colors'
    obsm: 'X_pca', 'X_umap', 'Scanorama'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'distances', 'connectivities'
# tsne and umap
sc.pp.neighbors(adata_sc, n_pcs =30, use_rep = "Scanorama")
sc.tl.umap(adata_sc)
sc.tl.tsne(adata_sc, n_pcs = 30, use_rep = "Scanorama")
fig, axs = plt.subplots(2, 2, figsize=(10,8),constrained_layout=True)

sc.pl.umap(adata_sc, color="sample", title="Scanorama tsne", ax=axs[0,0], show=False)
<Axes: title={'center': 'Scanorama tsne'}, xlabel='UMAP1', ylabel='UMAP2'>
../../_images/0abbea4e2240780144bb28021363ec5a90b5c6fe40c385a64136bed21e2bde13.png
#@title save scanroma integrated data to file
save_file = 'Objects/sc_QCNFSDM_scanorama_corrected_covid.h5ad'
adata_sc.write_h5ad(save_file)