single-cell

You are currently browsing the archive for the single-cell category.

Heatmap与层次聚类

21 5 月, 2021 in single-cell | No comments

Tang Ming gave an implementation of improved heatmap using ComplexHeatmap.

Do not show too many cells on a single screen.

When you have too many cells (> 10,000), the use_raster option really helps. Also consider downsample the Seurat object to a smaller number of cells for plotting the heatmap. Your screen resolution is not as high as 300,000 pixels if you have 300,000 cells (columns).

https://divingintogeneticsandgenomics.rbind.io/post/enhancement-of-scrnaseq-heatmap-using-complexheatmap/

For hierarchical clustering, consult this hclust tutorial

https://uc-r.github.io/hc_clustering#algorithms

基于Seurat数据创建clustered heatmap

0. 导入需要的包

suppressPackageStartupMessages({
    library(ComplexHeatmap)
    library(circlize)
})

1. 准备数据

# 1.1 收集表达矩阵
mat  = GetAssayData(t.combined, slot = "data", assay = "RNA")

# 1.2 把细胞按某种想要的顺序排好
ordered.cells <- rownames(t.combined@meta.data)

# 1.3 选择一列感兴趣的基因用于绘图
features.sel <-  unique(c("CD3E","CD3D","CD4","CD8A","CD8B"))

# 1.4 重新组织原始矩阵按行列
mat = as.matrix( mat[features.sel, ordered.cells] )

2. 对数据作层次聚类(可选)

# 2.1 层次聚类
## 注意，mat是从Seurat来的feature-by-sample矩阵，应该转置成sample-by-feature矩阵；
## 此处选择一些基因做聚类，然后用另一些基因画zheng
distobj   <- dist(t(mat[c("CD3D","CD3E","CD4","CD8A","CD8B"),]), method = "euclidean")
hclustobj <- hclust(distobj, method = "ward.D2" )

# 2.2 切隔树形成聚类
sub_grp <- cutree(hclustobj, k = 10) #参数k直接指定类别数,也可用参数h指定切割深度
# 打印各类里的细胞数看看
table(sub_grp) 
# 打印聚类树看看
options(repr.plot.width=15, repr.plot.height=5)
plot(hclustobj, cex = 0.1, label=F) #画🌲，label=F隐藏样本名，不然过密不美观
rect.hclust(hclustobj, k = 10) #画方框
# 按聚类树顺序画heatmap看看
library(pheatmap)
options(repr.plot.width=15, repr.plot.height=10)
pheatmap(mat[,hclustobj$order], cluster_rows = F, cluster_cols = F, 
         border_color=NA,
         show_colnames = F, use_raster=TRUE)

# 看看要是还行，就把层次聚类的结果标签弄进metadata去
t.combined@meta.data["subgroup"]<-as.character(sub_grp)

3. 创建meta数据框，并构建annotations

meta = t.combined@meta.data[ordered.cells,]

# configure colors for cell type annotations
library(paletteer)  
cl_levels = unique(meta$subgroup)
blockcol = paletteer_d("ggsci::default_igv")[1:length(cl_levels)] %>% as.vector
names(blockcol) <-cl_levels


colann <- HeatmapAnnotation(
    cluster = meta$cluster,
    organ   = meta$organ,
    subgroup= meta$subgroup,
    col = list( subgroup = blockcol ),
    annotation_legend_param=list(
        cluster = list(nrow=5),
        organ = list(nrow=3),
        subgroup=list(nrow=1)
    )
)

4. 绘制Heatmap

hm<-Heatmap(mat, name = "Normalized expression", 
        cluster_rows = T, 
        cluster_columns = hclustobj, show_column_names=FALSE,
        column_dend_height = unit(4, "cm"),
        
        #column_split=meta$organ, cluster_column_slices=T,
        col= colorRamp2(c(0,1.5,3), c("#486E9E", "white", "#D84B59")),
        column_title_rot=90, column_gap=unit(2, "mm"),
        top_annotation=colann, heatmap_legend_param = list(direction = "horizontal"),
        use_raster = TRUE)

options(warn=-1)
options(repr.plot.width=20, repr.plot.height=20)
#cairo_pdf("XXXX.hm.pdf",width=15, height = 18)
draw(hm,
     padding = unit(c(10, 10, 30, 3), "mm"), #下左上右
     merge_legend = TRUE,
     heatmap_legend_side = "bottom", 
     annotation_legend_side = "bottom")
#dev.off()

scRNA-seq pipeline 2021.1

19 1 月, 2021 in computational protocols, single-cell | No comments

pip install jupyterlab
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install –user
pip install jupyter_nbextensions_configurator
jupyter nbextensions_configurator enable –user

pip install anndata
pip install scanpy
pip install scvelo
pip install episcanpy
conda install -c conda-forge r-base=4.0.3
conda install -c conda-forge libgit2 or brew install libgit2 on mac
conda install -c conda-forge r-cairo
conda install -c conda-forge r-hdf5r
install.packages(c(“devtools”,”tidyverse”))

devtools::install_git(“https://gitee.com/chansigit/seurat.git”)
devtools::install_git(“https://gitee.com/chansigit/uwot.git”)
devtools::install_git(“https://gitee.com/chansigit/liger.git”)
devtools::install_git(“https://gitee.com/chansigit/seurat-wrappers.git”)

if (!requireNamespace(“BiocManager”, quietly = TRUE))
install.packages(“BiocManager”)
BiocManager::install(version = “3.12”)
BiocManager::install(c(
‘BiocGenerics’, ‘DelayedArray’, ‘DelayedMatrixStats’,
‘limma’, ‘S4Vectors’, ‘SingleCellExperiment’,’multtest’,
“SingleCellExperiment”,”GenomicRanges”,”scRNAseq”,
“Rhdf5lib”,”pcaMethods”,”DropletUtils”,”scater”,
“SingleR”,”geneplotter”,”AUCell”,”GSVA”,”ComplexHeatmap”,
“AnnotationHub”,’SummarizedExperiment’, ‘batchelor’, ‘Matrix.utils’,
“clusterProfiler”,”ChIPseeker”,”ChIPpeakAnno”))

Single-cell RNA-seq Environments (version scrna0309)

24 3 月, 2020 in computational protocols, single-cell | No comments


conda create -n scrna0309 python=3.8
conda activate scrna0309
conda install -c conda-forge r-base=3.6.2
pip install scanpy==1.4.6
conda install -c conda-forge r-rcpp
conda install -c conda-forge r-devtools
conda install -c conda-forge leidenalg
pip install loompy
conda install -c conda-forge jupyterlab
conda install -c conda-forge r-hdf5r
pip install -U scvelo
conda install -c conda-forge xorg-libxt
pip install rpy2
git clone https://github.com/aertslab/pySCENIC.git
cd pySCENIC/
pip install .

install.packages('codetools')
install.packages('IRkernel')
IRkernel::installspec(user = T)
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.10")
BiocManager::install('multtest')
BiocManager::install("SingleCellExperiment")
BiocManager::install("GenomicRanges")
BiocManager::install("scRNAseq")
BiocManager::install("Rhdf5lib")
BiocManager::install("pcaMethods")
BiocManager::install("DropletUtils") 
BiocManager::install("scater")
install.packages("Matrix")
install.packages("leiden")
install.packages('Seurat')
install.packages("pheatmap")
BiocManager::install("ComplexHeatmap")
BiocManager::install("clusterProfiler")
devtools::install_github('satijalab/seurat-wrappers')
devtools::install_github("constantAmateur/SoupX")
devtools::install_github("velocyto-team/velocyto.R")
BiocManager::install("SingleR") 
BiocManager::install("AUCell")
BiocManager::install("geneplotter")
BiocManager::install("GSVA")
devtools::install_github('MacoskoLab/liger')
BiocManager::install("M3Drop")

wget https://cran.r-project.org/src/contrib/Archive/modes/modes_0.7.0.tar.gz
R CMD INSTALL modes_0.7.0.tar.gz
install.packages("KernSmooth")
install.packages("ROCR")
install.packages("fields")
devtools::install_github('chris-mcginnis-ucsf/DoubletFinder')

conda env create metaseq/scrna0309

pip install scanpy==1.5.1
pip install -U loompy
pip install git+https://github.com/theislab/scvelo
conda install -c conda-forge r-rcpp r-devtools leidenalg r-hdf5r xorg-libxt
BiocManager::install(c('multtest',"SingleCellExperiment","GenomicRanges",
                       "scRNAseq","Rhdf5lib","pcaMethods","DropletUtils",
                       "scater","SingleR","geneplotter","AUCell","GSVA",
                       "M3Drop","ComplexHeatmap","AnnotationHub",
                       "clusterProfiler","ChIPseeker","ChIPpeakAnno"))
devtools::install_github('satijalab/seurat-wrappers') 
devtools::install_github("constantAmateur/SoupX") 
devtools::install_github("velocyto-team/velocyto.R")
devtools::install_github('MacoskoLab/liger')
install.packages(c("Matrix","leiden","pheatmap"))

wget https://cran.r-project.org/src/contrib/Archive/modes/modes_0.7.0.tar.gz 
R CMD INSTALL modes_0.7.0.tar.gz 
install.packages("KernSmooth") 
install.packages("ROCR") 
install.packages("fields") 
devtools::install_github('chris-mcginnis-ucsf/DoubletFinder')

Setting up single-cell analysis environment

12 2 月, 2020 in computational protocols, single-cell | No comments

# set up env

conda create -n r362 
conda activate r362;

# install R

conda install -c conda-forge r-base

# install analysis-associated packages

install.packages('Rcpp')

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.10")
BiocManager::install('multtest')

install.packages('devtools');
conda install -c conda-forge r-devtools

conda install r-rjava;
conda install -c conda-forge umap-learn
install.packages('Seurat')
devtools::install_github("constantAmateur/SoupX")

conda install -c bioconda scanpy;
RRRRRR
conda install -c conda-forge leidenalg;

# Seurat Ecosystem

BiocManager::install("pcaMethods")
sudo yum install hdf5-devel
conda install -c conda-forge r-hdf5r
devtools::install_github("velocyto-team/velocyto.R")
devtools::install_github('MacoskoLab/liger')
devtools::install_github('satijalab/seurat-wrappers')

install.packages("pheatmap")

# Scanpy Ecosystem

pip install loompy
pip install -U scvelo

# Misc

BiocManager::install("GenomicRanges")
BiocManager::install("SingleCellExperiment")
BiocManager::install("scater")
BiocManager::install("DropletUtils") 
BiocManager::install("SingleR") 
BiocManager::install("scRNAseq")

# install jupyter

conda install -c conda-forge jupyterlab;

install.packages('IRkernel')
IRkernel::installspec(user = FALSE)

# remove env if necessary

conda remove --name r362 --all

单细胞转录组数据的整合

31 1 月, 2020 in integration, single-cell | No comments

原作者：Satija Lab
翻译：陈斯杰
原文地址： https://satijalab.org/seurat/v3.1/integration.html

Seurat3提供了一套整合多个单细胞数据集的新方法，这些方法旨在识别出跨多个数据集的shared cell states。这些方法对于实验设计的要求较低——就算数据集采集自不同的个体，使用不同的实验条件，产生自不同的测序技术，甚至来自不同的物种，Seurat3的整合方法都能处理这些数据。

在进行整合时，Seurat3的整合算法首先寻找数据集之间的anchors（固定锚点）。Anchors衡量了两个数据集之间拥有相同cell states的细胞的相关性（correspondence）。借助这些anchors，我们可以将两个数据集整合在一起，将一个数据集上的信息迁移到另一个数据集上。下文中我们将介绍Seurat提供几种整合算法的案例，其中还包括一些2019年manuscript未提及的新功能：

Standard Workflow
标准Seurat3整合算法案例。本例整合了几种测序技术产生的人胰岛单细胞数据集，还展示了Seurat3如何通过分类算法将已有的cluster labels标注在新收集的被整合数据集上。
SCTransform
标准Seurat3整合算法的SCTransform升级版（ SCTransform是 SatijaLab的新的Normalization算法）。在本例中，我们不仅用它整合了人胰岛数据集，还用它整合了8种技术产生的PBMC数据集，为HCA提供了一个测序技术的benchmark。

Standard Workflow

数据集预处理

导入样例数据集，在样例数据集的metadata中，tech列记录了该细胞使用的测序技术，celltype列记录了细胞类型的注释。我们将panc8中提供的数据集按照测序技术拆分成一个个独立的Seurat Objects，并把它们放到一个R list当中去，按照技术的名字给list中的Seurat Object命名。
在进一步整合之前，我们按照Seurat分析单细胞数据的惯例，对list里各个Seurat Object中存储的对象进行Normalization。

library(Seurat)
library(SeuratData)
InstallData("panc8")
data("panc8")

pancreas.list <- SplitObject(panc8, split.by = "tech")
pancreas.list <- pancreas.list[c("celseq", "celseq2", "fluidigmc1", "smartseq2")]


for (i in 1:length(pancreas.list)) {
    pancreas.list[[i]] <- NormalizeData(pancreas.list[[i]], verbose = FALSE)
    pancreas.list[[i]] <- FindVariableFeatures(pancreas.list[[i]], selection.method = "vst", 
        nfeatures = 2000, verbose = FALSE)
}

整合来自3种测序技术的数据集

FindIntegrationAnchors函数接受一个Seurat Object的list作为被整合输入项，返回找到的Anchors。本函数常见可调节的参数为dims，通常设置在1:30到1:50之间。本例中输入的list里包含用三种测序技术测的数据集，我们需要将这三种技术之间的批次效应给去除掉，保留批次效应以外的biological states。
运行IntegrateData函数返回的Seurat Object里有一个新的叫做“integrated”的新assay，这个新assay里包含整合好的（或者说校正好批次的）表达矩阵。整合前原始的基因表达值并没有被丢弃，仍然存留在叫做“RNA”的assay中。

reference.list <- pancreas.list[c("celseq", "celseq2", "smartseq2")]
pancreas.anchors <- FindIntegrationAnchors(object.list = reference.list, dims = 1:30)

pancreas.integrated <- IntegrateData(anchorset = pancreas.anchors, dims = 1:30)

整合后我们可以继续进行ScaleData（行归一化），PCA降维，UMAP可视化等操作

library(ggplot2)
library(cowplot)
# switch to integrated assay. The variable features of this assay are automatically
# set during IntegrateData
DefaultAssay(pancreas.integrated) <- "integrated"

# Run the standard workflow for visualization and clustering
pancreas.integrated <- ScaleData(pancreas.integrated, verbose = FALSE)
pancreas.integrated <- RunPCA(pancreas.integrated, npcs = 30, verbose = FALSE)
pancreas.integrated <- RunUMAP(pancreas.integrated, reduction = "pca", dims = 1:30)
p1 <- DimPlot(pancreas.integrated, reduction = "umap", group.by = "tech")
p2 <- DimPlot(pancreas.integrated, reduction = "umap", group.by = "celltype", label = TRUE, 
    repel = TRUE) + NoLegend()
plot_grid(p1, p2)

用参考数据集进行细胞类型迁移

除了整合算法，Seurat3还支持迁移算法——将Reference数据集中的数据迁移到Query数据集上。这种迁移和整合有以下几点不同：
– 迁移算法不对Query数据集的基因表达值做校正
– 迁移算法默认把Reference数据集的PCA结构投射到Query的PCA结构上去，而不是用CCA去学一个Query和Reference的共同结构。
在迁移算法中，我们先找Anchors，然后用TransferData函数来对Query数据集中的细胞进行分类。TransferData函数返回一个每个细胞的细胞类别预测结果和预测分数。

pancreas.query <- pancreas.list[["fluidigmc1"]]
pancreas.anchors <- FindTransferAnchors(reference = pancreas.integrated, query = pancreas.query, dims = 1:30)
predictions <- TransferData(anchorset = pancreas.anchors, refdata = pancreas.integrated$celltype, dims = 1:30)
pancreas.query <- AddMetaData(pancreas.query, metadata = predictions)

« Older entries

Sijie’s Blog