<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          哈佛大學(xué)單細(xì)胞課程|筆記匯總 (六)

          共 2306字,需瀏覽 5分鐘

           ·

          2020-09-24 06:06


          生物信息學(xué)習(xí)的正確姿勢(shì)

          NGS系列文章包括NGS基礎(chǔ)、在線繪圖、轉(zhuǎn)錄組分析?Nature重磅綜述|關(guān)于RNA-seq你想知道的全在這、ChIP-seq分析?ChIP-seq基本分析流程、單細(xì)胞測(cè)序分析?(重磅綜述:三萬字長文讀懂單細(xì)胞RNA測(cè)序分析的最佳實(shí)踐教程)、DNA甲基化分析、重測(cè)序分析、GEO數(shù)據(jù)挖掘典型醫(yī)學(xué)設(shè)計(jì)實(shí)驗(yàn)GEO數(shù)據(jù)分析 (step-by-step)批次效應(yīng)處理等內(nèi)容。


          哈佛大學(xué)單細(xì)胞課程|筆記匯總 (五)

          哈佛大學(xué)單細(xì)胞課程|筆記匯總 (四)

          (六)Single-cell RNA-seq clustering analysis: aligning cells across conditions

          我們的數(shù)據(jù)集包含來自兩個(gè)不同條件(對(duì)照和刺激)的兩個(gè)樣本,因此將這些樣本整合在一起以更好地進(jìn)行比較。

          聚類工作流程

          聚類分析的目的是在待定義細(xì)胞類型的數(shù)據(jù)集中保留主要的變異來源,同時(shí)盡量屏蔽由于無用的變異來源(測(cè)序深度,細(xì)胞周期差異,線粒體表達(dá),批次效應(yīng)等)而產(chǎn)生的變異。

          在我們的工作流程中需要應(yīng)用到以下兩個(gè)資源:

          • Satija Lab: Seurat v3 Guided Integration Tutorial(https://satijalab.org/seurat/v3.0/immune_alignment.html)

          • Paul Hoffman: Cell-Cycle Scoring and Regression(http://satijalab.org/seurat/cell_cycle_vignette.html)

          為了識(shí)別細(xì)胞亞群,我們將進(jìn)行以下步驟:

          (1)Normalization, variance stabilization, and regression of unwanted variation (e.g. mitochondrial transcript abundance, cell cycle phase, etc.) for each sample

          (2)Integrationof the samples using shared highly variable genes (optional, but recommended to align cells from different samples/conditions if cell types are separating by sample/condition)

          (3)Clustering cells based on top PCs (metagenes)

          (4)Exploration of quality control metrics: determine whether clusters are unbalanced wrt UMIs, genes, cell cycle, mitochondrial content, samples, etc.

          (5)Searching for expected cell types using known cell type-specific gene markers

          以下流程為前兩步。

          加載R包

          # Single-cell RNA-seq analysis - clustering analysis

          # Load libraries
          library(Seurat)
          library(tidyverse)
          library(RCurl)
          library(cowplot)

          Normalization, variance stabilization, and regression of unwanted variation for each sample

          分析的第一步是將count矩陣標(biāo)準(zhǔn)化,以解決每個(gè)樣品每個(gè)細(xì)胞的測(cè)序深度差異。Seurat最近推出了一種叫sctransform的新方法,用于對(duì)scRNA-seq數(shù)據(jù)進(jìn)行歸一化和方差穩(wěn)定變化。

          sctransform方法使用正則化負(fù)二項(xiàng)式模型對(duì)UMI counts進(jìn)行建模,以消除由于測(cè)序深度(每個(gè)細(xì)胞的總nUMI)所引起的變異,同時(shí)基于具有相似豐度的基因之間的合并信息來調(diào)整方差。

          Image credit: Hafemeister C and Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression, bioRxiv 2019 (https://doi.org/10.1101/576827)

          模型結(jié)果殘差(residuals)是每個(gè)轉(zhuǎn)錄本標(biāo)準(zhǔn)化后的表達(dá)水平。

          sctransform能通過回歸分析校對(duì)測(cè)序深度(nUMI)。但對(duì)于某些數(shù)據(jù)集,細(xì)胞周期可能是基因表達(dá)變化的顯著來源,對(duì)于其他數(shù)據(jù)集則不一定,我們需要檢查細(xì)胞周期是否是數(shù)據(jù)變化的主要來源并在需要時(shí)校對(duì)細(xì)胞周期的影響。

          Cell cycle scoring

          建議在執(zhí)行sctransform方法之前檢查細(xì)胞周期階段。檢查之前先對(duì)數(shù)據(jù)做個(gè)標(biāo)準(zhǔn)化使得不同測(cè)序深度的細(xì)胞可比。

          # Normalize the counts
          seurat_phase <- NormalizeData(filtered_seurat)

          作者提供了人體細(xì)胞周期相關(guān)基因可以下載 (https://www.dropbox.com/s/hus4mrkueh1tfpr/cycle.rda?dl=1)和其他生物的細(xì)胞周期相關(guān)基因(https://github.com/hbctraining/scRNA-seq/blob/master/lessons/cell_cycle_scoring.md)。

          把細(xì)胞周期相關(guān)基因讀入data目錄

          # Load cell cycle markers
          load("data/cycle.rda")

          # Score cells for cell cycle
          seurat_phase <- CellCycleScoring(seurat_phase,
          g2m.features = g2m_genes,
          s.features = s_genes)

          # View cell cycle scores and phases assigned to cells
          View([email protected])

          在對(duì)細(xì)胞進(jìn)行細(xì)胞周期評(píng)分后,我們想使用PCA確定細(xì)胞周期是否是我們數(shù)據(jù)集中變異的主要來源。在此之前,我們首先需要對(duì)數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化scale),并使用Seurat ScaleData()函數(shù)。

          • 調(diào)整每個(gè)基因的表達(dá)以使整個(gè)細(xì)胞的平均表達(dá)為0

          • 縮放每個(gè)基因的表達(dá)以使細(xì)胞之間的表達(dá)值轉(zhuǎn)換標(biāo)準(zhǔn)差的倍數(shù)

          # Identify the most variable genes
          seurat_phase <- FindVariableFeatures(seurat_phase,
          selection.method = "vst",
          nfeatures = 2000,
          verbose = FALSE)

          # Scale the counts
          seurat_phase <- ScaleData(seurat_phase)

          NOTE: For the selection.method and nfeatures arguments the values specified are the default settings. Therefore, you do not necessarily need to include these in your code. We have included it here for transparency and inform you what you are using.

          # Perform PCA
          seurat_phase <- RunPCA(seurat_phase)

          # Plot the PCA colored by cell cycle phase
          DimPlot(seurat_phase,
          reduction = "pca",
          group.by= "Phase",
          split.by = "Phase")

          我們沒有看到在細(xì)胞周期方面具有很大的差異,也基于此圖,我們不會(huì)考慮細(xì)胞周期引起的變化。

          SCTransform

          我們此時(shí)可以放心的進(jìn)行SCTransform,我們打算使用for循環(huán)對(duì)每個(gè)樣本進(jìn)行 NormalizeData(), CellCycleScoring()SCTransform() ,并使用 vars.to.regress 去除線粒體相關(guān)基因。

          # Split seurat object by condition to perform cell cycle scoring and SCT on all samples
          split_seurat <- SplitObject(filtered_seurat, split.by = "sample")

          split_seurat <- split_seurat[c("ctrl", "stim")]

          for (i in 1:length(split_seurat)) {
          split_seurat[[i]] <- NormalizeData(split_seurat[[i]], verbose = TRUE)
          split_seurat[[i]] <- CellCycleScoring(split_seurat[[i]], g2m.features=g2m_genes, s.features=s_genes)
          split_seurat[[i]] <- SCTransform(split_seurat[[i]], vars.to.regress = c("mitoRatio"))
          }

          NOTE: By default, after normalizing, adjusting the variance, and regressing out uninteresting sources of variation, SCTransform will rank the genes by residual variance and output the 3000 most variant genes. If the dataset has larger cell numbers, then it may be beneficial to adjust this parameter higher using the variable.features.n argument.

          除了原始RNA計(jì)數(shù)外,assays slot中現(xiàn)在還有一個(gè)SCT組件。細(xì)胞之間表達(dá)變化最大的特征基因存儲(chǔ)于SCT assay中。

          通常,在決定是否需要進(jìn)行數(shù)據(jù)整合之前,我們總是先查看細(xì)胞在空間的分布。如果我們?cè)赟eurat對(duì)象中同時(shí)對(duì)兩個(gè)條件進(jìn)行了歸一化并可視化,我們將看到特定于不同條件的聚類:

          不同條件下細(xì)胞的聚類表明我們需要跨條件整合細(xì)胞。

          NOTE: Seurat has a vignette for how to run through the workflow without integration(https://satijalab.org/seurat/v3.1/sctransform_vignette.html). The workflow is fairly similar to this workflow, but the samples would not necessarily be split in the beginning and integration would not be performed.

          Integrate samples using shared highly variable genes

          為了進(jìn)行整合(“harmony”整合不同平臺(tái)的單細(xì)胞數(shù)據(jù)之旅),我們將使用不同條件的細(xì)胞共有的高可變基因,然后,我們將“整合”或“協(xié)調(diào)”條件,以鑒定組之間相似或具有“共同的生物學(xué)特征”的細(xì)胞。

          如果不確定要期望的簇或期望條件之間的某些不同細(xì)胞類型(例如腫瘤和對(duì)照樣品),可以先單獨(dú)處理某個(gè)條件,然后一起運(yùn)行以查看兩種條件下是否存在針對(duì)特定細(xì)胞類型的特定簇。通常在不同條件下進(jìn)行聚類分析時(shí)會(huì)出現(xiàn)基于條件的聚類,而數(shù)據(jù)整合可以幫助相同類型的細(xì)胞進(jìn)行聚類。為了整合,我們首先需要先通過SCTransform篩選高變基因,然行“整合”或者“協(xié)調(diào)”("integrate" or "harmonize")條件,以便在不同條件下同種細(xì)胞具有相似的表達(dá)。這些條件包括:

          (1)不同條件(control VS stimuli

          (2)不同數(shù)據(jù)集

          (3)不同類型 (e.g. scRNA-seq and scATAC-seq)

          整合最重要的目的就是使得不同條件和數(shù)據(jù)集之間細(xì)胞類型相同的數(shù)據(jù)表達(dá)譜一致。這也意味著部分細(xì)胞亞群的生物學(xué)狀態(tài)是相同的,下圖為整合步驟:

          Image credit: Stuart T and Butler A, et al. Comprehensive integration of single cell data, bioRxiv 2018 (https://doi.org/10.1101/460147)

          具體細(xì)節(jié)是:

          1. 先執(zhí)行典型相關(guān)性分析(canonical correlation analysis (CCA)):

            CCA是PCA的一種形式,能識(shí)別數(shù)據(jù)中最大的差異,不過前提是組/條件之間共有這個(gè)最大變異(使用每個(gè)樣本中變化最大的3000個(gè)基因)。

          NOTE: The shared highly variable genes are used because they are the most likely to represent those genes distinguishing the different cell types present.

          1. 在數(shù)據(jù)集中識(shí)別錨點(diǎn)(anchors)或相互最近的鄰居(mutual nearest neighbors ,MNN):MMN可以被認(rèn)為是“最佳搭檔”('best buddies')。對(duì)于每種條件的細(xì)胞來說:

            “The difference in expression values between cells in an MNN pair provides an estimate of the batch effect, which is made more precise by averaging across many such pairs. A correction vector is obtained and applied to the expression values to perform batch correction.” ——Stuart and Bulter et al. (2018)(https://www.biorxiv.org/content/early/2018/11/02/460147).

            • 根據(jù)基因表達(dá)值確定細(xì)胞在其他情況下最接近的鄰居-它是'best buddies'

            • 進(jìn)行二次分析,如果兩個(gè)細(xì)胞在兩個(gè)方向上都是'best buddies',則這些細(xì)胞將被標(biāo)記為將兩個(gè)數(shù)據(jù)集“錨定”在一起的錨點(diǎn)。

          2. 篩選錨點(diǎn):通過其本地鄰域中的重疊來評(píng)估錨點(diǎn)對(duì)之間的相似性(不正確的錨點(diǎn)得分會(huì)較低)——相鄰細(xì)胞是否具有彼此相鄰的'best buddies'?

          3. 整合數(shù)據(jù)/條件:使用錨點(diǎn)和相應(yīng)分?jǐn)?shù)來轉(zhuǎn)換細(xì)胞表達(dá)值,從而可以整合條件/數(shù)據(jù)集(不同的樣本、條件、數(shù)據(jù)集、模態(tài))

          NOTE: Transformation of each cell uses a weighted average of the two cells of each anchor across anchors of the datasets. Weights determined by cell similarity score (distance between cell and k nearest anchors) and anchor scores, so cells in the same neighborhood should have similar correction values.

          現(xiàn)在,使用我們的SCTransform對(duì)象作為輸入,進(jìn)行數(shù)據(jù)整合:

          首先,我們需要指定使用由SCTransform識(shí)別的3000個(gè)高變基因進(jìn)行整合。默認(rèn)情況下,僅選擇前2000個(gè)基因。

          # Select the most variable features to use for integration
          integ_features <- SelectIntegrationFeatures(object.list = split_seurat,
          nfeatures = 3000)

          然后,我們需要準(zhǔn)備SCTransform對(duì)象以進(jìn)行整合。

          # Prepare the SCT list object for integration
          split_seurat <- PrepSCTIntegration(object.list = split_seurat,
          anchor.features = integ_features)

          現(xiàn)在,我們將執(zhí)行canonical correlation analysisCCA),找到最佳錨點(diǎn)并過濾不正確的錨點(diǎn)(Cell 深度 一套普遍適用于各類單細(xì)胞測(cè)序數(shù)據(jù)集的錨定整合方案)。

          # Find best buddies - can take a while to run
          integ_anchors <- FindIntegrationAnchors(object.list = split_seurat,
          normalization.method = "SCT",
          anchor.features = integ_features)

          最后,整合不同條件并保存。

          # Integrate across conditions
          seurat_integrated <- IntegrateData(anchorset = integ_anchors,
          normalization.method = "SCT")

          # Save integrated seurat object
          saveRDS(seurat_integrated, "results/integrated_seurat.rds")

          UMAP visualization

          我們利用降維技術(shù)可視化整合后的數(shù)據(jù),例如PCAUniform Manifold Approximation and Projection(UMAP)

          盡管PCA將能獲得所有PC,但繪圖時(shí)一般只使用兩個(gè)或3個(gè)。相比之下,UMAP則進(jìn)一步整合多個(gè)PCs獲取更多信息,并在二維中進(jìn)行展示。這樣,細(xì)胞之間的距離代表表達(dá)上的相似性。

          為了生成這些可視化,我們需要先運(yùn)行PCA和UMAP方法。先從PCA開始:

          # Run PCA
          seurat_integrated <- RunPCA(object = seurat_integrated)

          # Plot PCA
          PCAPlot(seurat_integrated,
          split.by = "sample")

          我們看到在PCA映射中具有很好的重合。然后進(jìn)行UMAP:

          # Run UMAP
          seurat_integrated <- RunUMAP(seurat_integrated,
          dims = 1:40,
          reduction = "pca")

          # Plot UMAP
          DimPlot(seurat_integrated)

          整合數(shù)據(jù)后

          整合數(shù)據(jù)前

          從圖中我們可以看到很好的整合,比起未經(jīng)過CCA的數(shù)據(jù)要好太多了。

          往期精品(點(diǎn)擊圖片直達(dá)文字對(duì)應(yīng)教程)


          后臺(tái)回復(fù)“生信寶典福利第一波”或點(diǎn)擊閱讀原文獲取教程合集

          瀏覽 92
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評(píng)論
          圖片
          表情
          推薦
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  欧美精品另类 | 欧美在线大香蕉 | 秘 韩国免费网站18禁 | 音影先锋一区二区 | 国产精品99久久久精品无码 |