全流程更新----Spatial HD数据全流程更新（数据分析 + 图像识别）

追风少年ii

已于 2024-10-24 09:31:43 修改

阅读量602

点赞数 5

文章标签：数据分析数据挖掘单细胞空间课程外显子 1024程序员节

于 2024-10-21 19:19:55 首次发布

本文链接：https://blog.csdn.net/weixin_53637133/article/details/143125230

版权

作者，Evil Genius

妹妹结婚，我呢，婚礼正进行的时候，相亲对象给我发微信，我觉得我们不合适~~~，相亲对象，一面之缘。

HD数据已经有了很多项目了，之前也一直提HD数据是要结合图像分割来分析的，但是10X官网的分析方法仍需要改进, 同时要有显微镜扫描的高清btf文件，目前能做这个的公司不多，而且我们需要一种更加简单实用经济的分析方法。

核心点：将细胞分割与Visium HD转录组学数据相结合

目前HD数据分析面临的一些挑战

首先，一些细胞比聚集的spot尺寸小，从而导致污染。更有问题的是，这些spot很少完全重叠在单个细胞上。在许多情况下，每个spot重叠了2个或更多的细胞，反之亦然，每个细胞重叠超过1个spot（通常超过2个给定的3D位置），特别是在细胞直径小于8微米的情况下，或者细胞重叠或紧密间隔的情况下。

解决方法

Bin2cell等先前的工作提出了一种结合形态学和基因表达信息来获得准确的单细胞转录物计数的方法，而不是使用8µm x 8µm的bin进行下游分析。Bin2cell使用Stardist获取细胞轮廓。虽然这种集成成像加测序方法在某些情况下获得了良好的结果，但是当细胞间隔较紧时，效果可能较差。在这种情况下，许多spot重叠多个细胞。

最佳解决方案：基于深度学习的细胞分割模型 + 空间表达数据

方法步骤

（1）图像分割
（2）Bin-to-Cell Assignment
（3）Cell Type Annotation
（4）下游的个性化分析（包括共定位等等）

实现方法，以10X数据为例

###下载数据
curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_tissue_image.btf
####Visium HD output file
curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_binned_outputs.tar.gz
tar -xvzf Visium_HD_Human_Colon_Cancer_binned_outputs.tar.gz

.
└── binned_outputs/
    └── square_002um/
        ├── filtered_feature_bc_matrix.h5   <---- Transcript counts file (2um resolution)
        └── spatial/
            └── tissue_positions.parquet    <---- Bin locations relative to the full resolution image

安装及配置文件

###安装
pip install enact-SO
####配置文件
analysis_name: "colon-demo"
run_synthetic: False # True if you want to run bin to cell assignment on synthetic dataset, False otherwise.
cache_dir: "cache/ENACT_outputs"                                                                          # Change according to your desired output location
paths:  
  wsi_path: "<path_to_data>/Visium_HD_Human_Colon_Cancer_tissue_image.btf"                                # whole slide image path
  visiumhd_h5_path: "<path_to_data>/binned_outputs/square_002um/filtered_feature_bc_matrix.h5"            # location of the 2um x 2um gene by bin file (filtered_feature_bc_matrix.h5) from 10X Genomics.   
  tissue_positions_path: "<path_to_data>/binned_outputs/square_002um/spatial/tissue_positions.parquet"    # location of the tissue of the tissue_positions.parquet file from 10X genomics
steps:
  segmentation: True # True if you want to run segmentation
  bin_to_geodataframes: True # True to convert bin to geodataframes
  bin_to_cell_assignment: True # True to assign cells to bins
  cell_type_annotation: True # True to run cell type annotation
params:
  seg_method: "stardist" # Stardist is the only option for now
  patch_size: 4000 # Defines the patch size. The whole resolution image will be broken into patches of this size
  bin_representation: "polygon"  # or point TODO: Remove support for anything else
  bin_to_cell_method: "weighted_by_cluster" # or naive
  cell_annotation_method: "celltypist"
  cell_typist_model: "Human_Colorectal_Cancer.pkl"
  use_hvg: True # Only run analysis on highly variable genes + cell markers specified
  n_hvg: 1000 # Number of highly variable genes to use
  n_clusters: 4 
  chunks_to_run: []
cell_markers:
  # Human Colon
  Epithelial: ["CDH1","EPCAM","CLDN1","CD2"]
  Enterocytes: ["CD55", "ELF3", "PLIN2", "GSTM3", "KLF5", "CBR1", "APOA1", "CA1", "PDHA1", "EHF"]
  Goblet cells: ["MANF", "KRT7", "AQP3", "AGR2", "BACE2", "TFF3", "PHGR1", "MUC4", "MUC13", "GUCA2A"]
  Enteroendocrine cells: ["NUCB2", "FABP5", "CPE", "ALCAM", "GCG", "SST", "CHGB", "IAPP", "CHGA", "ENPP2"]
  Crypt cells: ["HOPX", "SLC12A2", "MSI1", "SMOC2", "OLFM4", "ASCL2", "PROM1", "BMI1", "EPHB2", "LRIG1"]
  Endothelial: ["PECAM1","CD34","KDR","CDH5","PROM1","PDPN","TEK","FLT1","VCAM1","PTPRC","VWF","ENG","MCAM","ICAM1","FLT4"]     
  Fibroblast: ["COL1A1","COL3A1","COL5A2","PDGFRA","ACTA2","TCF21","FN"]
  Smooth muscle cell: ["BGN","MYL9","MYLK","FHL2","ITGA1","ACTA2","EHD2","OGN","SNCG","FABP4"]
  B cells: ["CD74", "HMGA1", "CD52", "PTPRC", "HLA-DRA", "CD24", "CXCR4", "SPCS3", "LTB", "IGKC"]
  T cells: ["JUNB", "S100A4", "CD52", "PFN1P1", "CD81", "EEF1B2P3", "CXCR4", "CREM", "IL32", "TGIF1"]
  NK cells: ["S100A4", "IL32", "CXCR4", "FHL2", "IL2RG", "CD69", "CD7", "NKG7", "CD2", "HOPX"]

运行

from enact.pipeline import ENACT
import yaml

configs_path = "config/configs.yaml" # Change this to the location of the configs.yaml file that you just edited
with open(configs_path, "r") as stream:
    configs = yaml.safe_load(stream)

so_hd = ENACT(configs)
so_hd.run_enact()

输出目录

.
└── cache/
    └── <anaylsis_name> /
        ├── chunks/
        │   ├── bins_gdf/
        │   │   └── patch_<patch_id>.csv
        │   ├── cells_gdf/
        │   │   └── patch_<patch_id>.csv
        │   └── <bin_to_cell_method>/
        │       ├── bin_to_cell_assign/
        │       │   └── patch_<patch_id>.csv
        │       ├── cell_ix_lookup/
        │       │   └── patch_<patch_id>.csv
        │       └── <cell_annotation_method>_results/
        │           ├── cells_adata.csv
        │           └── merged_results.csv
        └── cells_df.csv