单细胞的这种命名方式可取吗？-369IT编程

admin管理员组
文章数量:1037775

单细胞的这种命名方式可取吗？

今天又来学习一篇单细胞数据文章，标题为《Integration of Single-Cell and Bulk Transcriptomes to Identify a Poor Prognostic Tumor Subgroup to Predict the Prognosis of Patients with Early-stage Lung Adenocarcinoma》，这篇文献于2025年1月21号发表在J Cancer杂志上。

这是曾老板发在群里的一个数据集：

【写作任务1】这个文献不应该是出现一个， SLC45A3+ cells were identified as a group of cells lacking significant features of known cell markers. 降维聚类分群就好奇怪， eight groups: T, NK, B, epithelial, myeloid, stromal, and SLC45A3+ cells 大家试试看，做一下，然后写一个笔记

文献中的注释结果

文章中的单细胞数据分析结果如下图：先不说作者很诡异的 tSNE 和 UMAP都是分群只放一个就行他偏偏要两个都展示一下感觉有凑图的嫌疑吧。

在众多分群中有一个群：SLC45A3+ 命名的群出现感觉有点突兀，而且下面的基因对应的是：SLC5A3。文章中的描述为：

SLC45A3+ cells were identified as a group of cells lacking significant features of known cell markers

那这个群为什么不可能出现在数据中呢？这就来看一下！

这个单细胞对应的数据集为：.cgi?acc=GSE117570，为非小细胞性肺癌样本，总共8个样本：

代码语言：javascript代码运行次数：0运行复制

GSM3304007 P1_Tumor
GSM3304008 P1_Normal
GSM3304009 P2_Tumor
GSM3304010 P2_Normal
GSM3304011 P3_Tumor
GSM3304012 P3_Normal
GSM3304013 P4_Tumor
GSM3304014 P4_Normal

数据预处理

去GEO中把 GSE117570_RAW.tar 下载下来，解压，读取：

代码语言：javascript代码运行次数：0运行复制

###
### Create: Jianming Zeng
### Date:  2023-12-31  
### Email: jmzeng1314@163
### Blog: /
### Forum:  .html
### CAFS/SUSTC/Eli Lilly/University of Macau
### Update Log: 2023-12-31   First version 
### Update Log: 2024-12-09   by juan zhang (492482942@qq)
### 

rm(list=ls())
options(stringsAsFactors = F)
library(ggsci)
library(dplyr) 
library(future)
library(Seurat)
library(clustree)
library(cowplot)
library(data.table)
library(ggplot2)
library(patchwork)
library(stringr)
library(qs)
library(Matrix)
getwd()

# 创建目录
getwd()
gse <- "GSE117570"
dir.create(gse)

# 方式一：标准文件夹
###### step1: 导入数据 ######   
samples <- list.files("GSE117570/",pattern = "processed_data.txt.gz")
samples
scRNAlist <- lapply(samples, function(pro){
#pro <- samples[1]
print(pro)

  ct <- data.table::fread(paste0("GSE117570/",pro),data.table = F)
  ct[1:5, 1:5]
  dim(ct)
  rownames(ct) <- ct[,1]
  ct <- ct[,-1]
  ct[1:5, 1:5]

  sce <- CreateSeuratObject(ct, project=gsub("_processed_data.txt.gz","", pro), min.cells = 3)
return(sce)
})
names(scRNAlist) <-  gsub("_processed_data.txt.gz","", samples)
scRNAlist

# merge
sce.all <- merge(scRNAlist[[1]], y=scRNAlist[-1], add.cell.ids=gsub("_processed_data.txt.gz","", samples))
sce.all <- JoinLayers(sce.all) # seurat v5
sce.all


# 查看特征
as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all$orig.ident)
temp <- str_split(sce.all$orig.ident, pattern = "_", simplify = T)
head(temp)
sce.all$orig.ident <- temp[,1]
sce.all$patient <- temp[,2]
sce.all$group <- temp[,3]
table(sce.all$orig.ident)
table(sce.all$patient)
table(sce.all$group)

library(qs)
qsave(sce.all, file="GSE117570/sce.all.qs")

经过简单质控，并降维聚类分群，选择res 0.5的结果，共得到16个亚群，10627个细胞：

使用已知marker进行注释

简单绘制一张已知基因的气泡图如下：

简单注释结果如下：

代码语言：javascript代码运行次数：0运行复制

################## 注释，读取注释文件
sce.all.int
head(sce.all.int@meta.data)
Idents(sce.all.int) <- "RNA_snn_res.0.5"
Idents(sce.all.int)

temp <- read.table("3-check-by-0.5/anno.txt",sep = "\t")
temp

new.cluster.ids <- temp[,2]
names(new.cluster.ids) <- temp[,1]
new.cluster.ids

sce.all.int <- RenameIdents(sce.all.int, new.cluster.ids)
table(Idents(sce.all.int))

##  新增一列注释
anno <- as.data.frame(sce.all.int@active.ident)
sce.all.int <- AddMetaData(sce.all.int, metadata = anno, col.name = "celltype")
head(sce.all.int@meta.data)
table(sce.all.int$celltype)

# 美化版
p <- CellDimPlot(sce.all.int, group.by = "celltype", reduction = "UMAP", label = T,label.size = 3, label_repel = T, label_insitu = T,
                 label_point_size = 1, label_point_color =NA ,label_segment_color = NA)
p
ggsave(plot=p, filename="3-check-by-0.3/Dimplot_celltype.pdf",width = 8, height = 8)

phe <- sce.all.int@meta.data
head(phe)
saveRDS(phe, file = "3-check-by-0.5/phe.RData")

没有什么大问题，再来看看文章中描述的那一群，当我想看看 SLC45A3 这个基因在哪一群中高表达时：

代码语言：javascript代码运行次数：0运行复制

VlnPlot(sce.all.int, features = c("SLC45A3"))

结果却告诉我，额有点懵：

代码语言：javascript代码运行次数：0运行复制

> VlnPlot(sce.all.int, features = c("SLC45A3"))
Error in `FetchData()`:
! None of the requested variables were found: SLC45A3
Run `rlang::last_trace()` to see where the error occurred.

我还担心是不是出问题了，使用了模糊搜索：

代码语言：javascript代码运行次数：0运行复制

grep("SLC45A3",rownames(sce.all.int),value = T)
# character(0)

grep("SLC4",rownames(sce.all.int),value = T)
# [1] "SLC4A1AP" "SLC4A7"   "SLC41A3"  "SLC44A1"  "SLC43A3"  "SLC43A2"  "SLC44A2"  "SLC41A1"  "SLC40A1"  "SLC44A4"  "SLC4A2"   "SLC48A1"  "SLC46A3"  "SLC44A3"

这就是不可能出现的原因吗？这个数据根本就没有 SLC45A3 这个基因！

再来看看这个SLC5A3基因呢：

代码语言：javascript代码运行次数：0运行复制

VlnPlot(sce.all.int, features = c("SLC5A3"),group.by = select_idet)
DotPlot(sce.all.int, features = c("SLC5A3"), assay='RNA',group.by = select_idet,cols = c("grey", "red") )

现在的文章灌水不知道为什么到了这么猖狂的地步~

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。原始发表：2025-03-16，如有侵权请联系 cloudcommunity@tencent 删除数据分析datainttable数据

单细胞的这种命名方式可取吗？

这是曾老板发在群里的一个数据集：

【写作任务1】这个文献不应该是出现一个， SLC45A3+ cells were identified as a group of cells lacking significant features of known cell markers. 降维聚类分群就好奇怪， eight groups: T, NK, B, epithelial, myeloid, stromal, and SLC45A3+ cells 大家试试看，做一下，然后写一个笔记

文献中的注释结果

文章中的单细胞数据分析结果如下图：先不说作者很诡异的 tSNE 和 UMAP都是分群只放一个就行他偏偏要两个都展示一下感觉有凑图的嫌疑吧。

在众多分群中有一个群：SLC45A3+ 命名的群出现感觉有点突兀，而且下面的基因对应的是：SLC5A3。文章中的描述为：

SLC45A3+ cells were identified as a group of cells lacking significant features of known cell markers

那这个群为什么不可能出现在数据中呢？这就来看一下！

这个单细胞对应的数据集为：.cgi?acc=GSE117570，为非小细胞性肺癌样本，总共8个样本：

代码语言：javascript代码运行次数：0运行复制

GSM3304007 P1_Tumor
GSM3304008 P1_Normal
GSM3304009 P2_Tumor
GSM3304010 P2_Normal
GSM3304011 P3_Tumor
GSM3304012 P3_Normal
GSM3304013 P4_Tumor
GSM3304014 P4_Normal

数据预处理

去GEO中把 GSE117570_RAW.tar 下载下来，解压，读取：

代码语言：javascript代码运行次数：0运行复制

###
### Create: Jianming Zeng
### Date:  2023-12-31  
### Email: jmzeng1314@163
### Blog: /
### Forum:  .html
### CAFS/SUSTC/Eli Lilly/University of Macau
### Update Log: 2023-12-31   First version 
### Update Log: 2024-12-09   by juan zhang (492482942@qq)
### 

rm(list=ls())
options(stringsAsFactors = F)
library(ggsci)
library(dplyr) 
library(future)
library(Seurat)
library(clustree)
library(cowplot)
library(data.table)
library(ggplot2)
library(patchwork)
library(stringr)
library(qs)
library(Matrix)
getwd()

# 创建目录
getwd()
gse <- "GSE117570"
dir.create(gse)

# 方式一：标准文件夹
###### step1: 导入数据 ######   
samples <- list.files("GSE117570/",pattern = "processed_data.txt.gz")
samples
scRNAlist <- lapply(samples, function(pro){
#pro <- samples[1]
print(pro)

  ct <- data.table::fread(paste0("GSE117570/",pro),data.table = F)
  ct[1:5, 1:5]
  dim(ct)
  rownames(ct) <- ct[,1]
  ct <- ct[,-1]
  ct[1:5, 1:5]

  sce <- CreateSeuratObject(ct, project=gsub("_processed_data.txt.gz","", pro), min.cells = 3)
return(sce)
})
names(scRNAlist) <-  gsub("_processed_data.txt.gz","", samples)
scRNAlist

# merge
sce.all <- merge(scRNAlist[[1]], y=scRNAlist[-1], add.cell.ids=gsub("_processed_data.txt.gz","", samples))
sce.all <- JoinLayers(sce.all) # seurat v5
sce.all


# 查看特征
as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all$orig.ident)
temp <- str_split(sce.all$orig.ident, pattern = "_", simplify = T)
head(temp)
sce.all$orig.ident <- temp[,1]
sce.all$patient <- temp[,2]
sce.all$group <- temp[,3]
table(sce.all$orig.ident)
table(sce.all$patient)
table(sce.all$group)

library(qs)
qsave(sce.all, file="GSE117570/sce.all.qs")

经过简单质控，并降维聚类分群，选择res 0.5的结果，共得到16个亚群，10627个细胞：

使用已知marker进行注释

简单绘制一张已知基因的气泡图如下：

简单注释结果如下：

代码语言：javascript代码运行次数：0运行复制

################## 注释，读取注释文件
sce.all.int
head(sce.all.int@meta.data)
Idents(sce.all.int) <- "RNA_snn_res.0.5"
Idents(sce.all.int)

temp <- read.table("3-check-by-0.5/anno.txt",sep = "\t")
temp

new.cluster.ids <- temp[,2]
names(new.cluster.ids) <- temp[,1]
new.cluster.ids

sce.all.int <- RenameIdents(sce.all.int, new.cluster.ids)
table(Idents(sce.all.int))

##  新增一列注释
anno <- as.data.frame(sce.all.int@active.ident)
sce.all.int <- AddMetaData(sce.all.int, metadata = anno, col.name = "celltype")
head(sce.all.int@meta.data)
table(sce.all.int$celltype)

# 美化版
p <- CellDimPlot(sce.all.int, group.by = "celltype", reduction = "UMAP", label = T,label.size = 3, label_repel = T, label_insitu = T,
                 label_point_size = 1, label_point_color =NA ,label_segment_color = NA)
p
ggsave(plot=p, filename="3-check-by-0.3/Dimplot_celltype.pdf",width = 8, height = 8)

phe <- sce.all.int@meta.data
head(phe)
saveRDS(phe, file = "3-check-by-0.5/phe.RData")

没有什么大问题，再来看看文章中描述的那一群，当我想看看 SLC45A3 这个基因在哪一群中高表达时：

代码语言：javascript代码运行次数：0运行复制

VlnPlot(sce.all.int, features = c("SLC45A3"))

结果却告诉我，额有点懵：

代码语言：javascript代码运行次数：0运行复制

> VlnPlot(sce.all.int, features = c("SLC45A3"))
Error in `FetchData()`:
! None of the requested variables were found: SLC45A3
Run `rlang::last_trace()` to see where the error occurred.

我还担心是不是出问题了，使用了模糊搜索：

代码语言：javascript代码运行次数：0运行复制

grep("SLC45A3",rownames(sce.all.int),value = T)
# character(0)

grep("SLC4",rownames(sce.all.int),value = T)
# [1] "SLC4A1AP" "SLC4A7"   "SLC41A3"  "SLC44A1"  "SLC43A3"  "SLC43A2"  "SLC44A2"  "SLC41A1"  "SLC40A1"  "SLC44A4"  "SLC4A2"   "SLC48A1"  "SLC46A3"  "SLC44A3"

这就是不可能出现的原因吗？这个数据根本就没有 SLC45A3 这个基因！

再来看看这个SLC5A3基因呢：

代码语言：javascript代码运行次数：0运行复制

VlnPlot(sce.all.int, features = c("SLC5A3"),group.by = select_idet)
DotPlot(sce.all.int, features = c("SLC5A3"), assay='RNA',group.by = select_idet,cols = c("grey", "red") )

现在的文章灌水不知道为什么到了这么猖狂的地步~

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。原始发表：2025-03-16，如有侵权请联系 cloudcommunity@tencent 删除数据分析datainttable数据

本文标签：单细胞的这种命名方式可取吗

版权声明：本文标题：单细胞的这种命名方式可取吗？内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://it.en369.cn/jiaocheng/1748248162a2275039.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

369IT编程

单细胞的这种命名方式可取吗？

单细胞的这种命名方式可取吗？

文献中的注释结果

数据预处理

使用已知marker进行注释

这就是不可能出现的原因吗？这个数据根本就没有 SLC45A3 这个基因！

再来看看这个SLC5A3基因呢：

现在的文章灌水不知道为什么到了这么猖狂的地步~

单细胞的这种命名方式可取吗？

文献中的注释结果

数据预处理

使用已知marker进行注释

这就是不可能出现的原因吗？这个数据根本就没有 SLC45A3 这个基因！

再来看看这个SLC5A3基因呢：

现在的文章灌水不知道为什么到了这么猖狂的地步~

更多相关文章

单细胞的这种命名方式可取吗？

发表评论

推荐文章

如何处理selenium无法打开浏览器或打开时间过长的问题

DeepSeek 模型的成本效益深度解析：低成本、高性能的AI新选择

评估零样本时间序列预测中系统 1 与系统 2 推理方法：基准与见解

五、MyBatis的增删改查模板（参数形式包括：String、对象、集合、数组、Map）

2025年全面的 .NET 图表构建解决方案

热门文章

【愚公系列】《高效使用DeepSeek》028

单倍型的显著性分析

鸿蒙开发：了解Canvas绘制

Agent+RPA：Python实现自动化办公流程的智能升级

linux系统如何安装使用MTR收集路径测试信息？

[oeasy]python074

PHP 日志系统的最佳搭档：一个 Go 写的远程日志收集服务

69.9K star！这个API调试神器让你告别Postman，开源免费真香！

Rust粉丝破大防！TypeScript之父选Go语言重写编译器，性能飙升10倍引战：Rust不香了？

布谷直播系统源码开发实战:从架构设计到性能优化

最新文章

LLM学习笔记：如何理解LLM中的Transformer架构

o3 deep research: 智能体的应用和演进

从开发者视角洞见未来，找到自己的破局之道：Deepseek和Manus如何助力破局？

嵌入式Linux：阻塞式IO与非阻塞式IO

STM32如何精准控制步进电机？

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

How to vary WooCommerce product prices based on order dates

For a menu custom link (without a link) a &quot;#&quot; is appended. Can this just be blank, i.e. NO link?

multisite - Hook into &#39;admin_url&#39; but only on Mysites admin bar menu

Global login to password protected pages

having a page in multi sub-menu and match current-menu-item

For a menu custom link (without a link) a "#" is appended. Can this just be blank, i.e. NO link?

multisite - Hook into 'admin_url' but only on Mysites admin bar menu