这种明显错误的单细胞数据分析结果也能发文章吗？-369IT编程

admin管理员组
文章数量:1037775

这种明显错误的单细胞数据分析结果也能发文章吗？

今天看的这篇文章，于2023年10月25号发表在scientific reports 杂志上，标题为《Construction and validation of a novel prognostic model of neutrophil‑related genes signature of lung adenocarcinoma》，其中单细胞部分的分析使用的公共数据集：.cgi?acc=GSE131907，提取了其中11个 primary LUAD 样本进行分析。

这是曾老板在群里发布的第三个数据集，一起来看看：

【写作任务3】，找出里面的错误。 We selected 11 primary LUAD samples from the single-cell dataset GSE131907 for subsequent analysis. To identify marker genes for neutrophils, we used the “SingleR” package the CellMarker database .1038/s41598-023-45289-8

数据集 GSE131907的背景

这个数据集非常有名，来自2020年发在NC杂志上的文章《Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma》，总共有44个病人，取了58个样本，共20w左右的细胞，涵盖了非常多的组织样本类型：

Note：primary sites (tLung and tL/B), pleural fluids (PE), lymph node (mLN), and brain metastases (mBrain), as well as normal tissues from lungs (nLung) and lymph nodes (nLN)

GSE131907

去GEO下载：.cgi?acc=GSE131907

代码语言：javascript代码运行次数：0运行复制

GSE131907_Lung_Cancer_cell_annotation.txt.gz
GSE131907_Lung_Cancer_raw_UMI_matrix.txt.gz

简单读取一下：

代码语言：javascript代码运行次数：0运行复制

###
### Create: Jianming Zeng
### Date:  2023-12-31  
### Email: jmzeng1314@163
### Blog: /
### Forum:  .html
### CAFS/SUSTC/Eli Lilly/University of Macau
### Update Log: 2023-12-31   First version 
### Update Log: 2024-12-09   by juan zhang (492482942@qq)
### 
rm(list=ls())
options(stringsAsFactors = F)
library(ggsci)
library(dplyr) 
library(future)
library(Seurat)
library(clustree)
library(cowplot)
library(data.table)
library(ggplot2)
library(patchwork)
library(stringr)
library(qs)
library(Matrix)
getwd()

# 创建目录
getwd()
gse <- "GSE131907"
dir.create(gse)

# 方式三：
if(T) {
###### step1: 导入数据 ######   
  ct <- data.table::fread("GSE131907/GSE131907_Lung_Cancer_raw_UMI_matrix.txt.gz",data.table = F)
  ct[1:5, 1:5]
  dim(ct)
  rownames(ct) <- ct[,1]
  ct <- ct[,-1]
  ct[1:5, 1:5]

  phe <- data.table::fread('GSE131907/GSE131907_Lung_Cancer_cell_annotation.txt.gz',data.table = F)
  head(phe)
  table(phe$Sample)
  rownames(phe) <- phe[,1]
  phe <- phe[,-1]
  identical(rownames(phe),colnames(ct))

# 创建对象
  sce.all <- CreateSeuratObject(counts = ct, meta.data = phe, min.cells = 3)
  sce.all
}

# 查看特征
as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all$orig.ident) 
table(sce.all$Sample_Origin)
sce.all$orig.ident <- sce.all$Sample
table( sce.all$Sample, sce.all$Sample_Origin)

library(qs)
qsave(sce.all, file="GSE131907/sce.all.qs")

# 提取其中 tLung: 11 primary LUAD samples
sce.all <- subset(sce.all, Sample_Origin=="tLung")
sce.all

就可以走后面的标准分析了，很容易就可以将其中的髓系细胞分离出来：

髓系注释分群

文章《Construction and validation of a novel prognostic model of neutrophil‑related genes signature of lung adenocarcinoma》中对这11个样本先分了免疫和非免疫，然后使用singleR包注释免疫细胞，免疫细胞分为6大类，又将其中的单核、巨噬、DC提取出来作为髓系再细分。

下面重点来了，髓系注释的结果很奇怪呀！

如果背诵过不同细胞亚群的marker基因应该一眼就可以看出来，标记DC2的那一群都是mast细胞的特征基因：

TPSB2：是一种蛋白编码基因，它编码的蛋白是类胰蛋白酶β-2（tryptase beta-2），也被称为tryptase II，是一种蛋白水解酶，
TPSAB1（Tryptase Alpha/Beta 1）：基因编码的蛋白是类胰蛋白酶α/β 1，属于胰蛋白酶样丝氨酸蛋白酶家族。
CPA3（Carboxypeptidase A3）：是一种由CPA3基因编码的酶，属于羧肽酶A家族的锌金属蛋白酶，属于肥大细胞特异蛋白酶，可以在脉络丛肥大细胞中表达。
CTSG：可以标记肥大细胞和粒细胞

再来看看，单核与巨噬细胞的注释结果也有问题：

2023年5月发表在 nature reviews immunology （2023 年 IF=100+）杂志上的经典综述：《Tissue-specific macrophages: how they develop and choreograph tissue biolog》，总结描述了巨噬细胞的起源、发育过程、功能和多样性和起源。

单核细胞一般在血液中循环，随后进入各个组织分化为巨噬细胞。定居在组织中的单核细胞因其具有组织特异性而被赋予特定名称。在组织中，更倾向于注释为各种巨噬细胞：CD68/CD163 这种非常典型的都没有标注出来。

一起来背诵各种细胞的标记基因吧：单细胞亚群的关键基因背诵不下来肿么办。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。原始发表：2025-03-14，如有侵权请联系 cloudcommunity@tencent 删除数据数据分析datatable编码

这种明显错误的单细胞数据分析结果也能发文章吗？

这是曾老板在群里发布的第三个数据集，一起来看看：

【写作任务3】，找出里面的错误。 We selected 11 primary LUAD samples from the single-cell dataset GSE131907 for subsequent analysis. To identify marker genes for neutrophils, we used the “SingleR” package the CellMarker database .1038/s41598-023-45289-8

数据集 GSE131907的背景

Note：primary sites (tLung and tL/B), pleural fluids (PE), lymph node (mLN), and brain metastases (mBrain), as well as normal tissues from lungs (nLung) and lymph nodes (nLN)

GSE131907

去GEO下载：.cgi?acc=GSE131907

代码语言：javascript代码运行次数：0运行复制

GSE131907_Lung_Cancer_cell_annotation.txt.gz
GSE131907_Lung_Cancer_raw_UMI_matrix.txt.gz

简单读取一下：

代码语言：javascript代码运行次数：0运行复制

###
### Create: Jianming Zeng
### Date:  2023-12-31  
### Email: jmzeng1314@163
### Blog: /
### Forum:  .html
### CAFS/SUSTC/Eli Lilly/University of Macau
### Update Log: 2023-12-31   First version 
### Update Log: 2024-12-09   by juan zhang (492482942@qq)
### 
rm(list=ls())
options(stringsAsFactors = F)
library(ggsci)
library(dplyr) 
library(future)
library(Seurat)
library(clustree)
library(cowplot)
library(data.table)
library(ggplot2)
library(patchwork)
library(stringr)
library(qs)
library(Matrix)
getwd()

# 创建目录
getwd()
gse <- "GSE131907"
dir.create(gse)

# 方式三：
if(T) {
###### step1: 导入数据 ######   
  ct <- data.table::fread("GSE131907/GSE131907_Lung_Cancer_raw_UMI_matrix.txt.gz",data.table = F)
  ct[1:5, 1:5]
  dim(ct)
  rownames(ct) <- ct[,1]
  ct <- ct[,-1]
  ct[1:5, 1:5]

  phe <- data.table::fread('GSE131907/GSE131907_Lung_Cancer_cell_annotation.txt.gz',data.table = F)
  head(phe)
  table(phe$Sample)
  rownames(phe) <- phe[,1]
  phe <- phe[,-1]
  identical(rownames(phe),colnames(ct))

# 创建对象
  sce.all <- CreateSeuratObject(counts = ct, meta.data = phe, min.cells = 3)
  sce.all
}

# 查看特征
as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all$orig.ident) 
table(sce.all$Sample_Origin)
sce.all$orig.ident <- sce.all$Sample
table( sce.all$Sample, sce.all$Sample_Origin)

library(qs)
qsave(sce.all, file="GSE131907/sce.all.qs")

# 提取其中 tLung: 11 primary LUAD samples
sce.all <- subset(sce.all, Sample_Origin=="tLung")
sce.all

就可以走后面的标准分析了，很容易就可以将其中的髓系细胞分离出来：

髓系注释分群

下面重点来了，髓系注释的结果很奇怪呀！

如果背诵过不同细胞亚群的marker基因应该一眼就可以看出来，标记DC2的那一群都是mast细胞的特征基因：

TPSB2：是一种蛋白编码基因，它编码的蛋白是类胰蛋白酶β-2（tryptase beta-2），也被称为tryptase II，是一种蛋白水解酶，
TPSAB1（Tryptase Alpha/Beta 1）：基因编码的蛋白是类胰蛋白酶α/β 1，属于胰蛋白酶样丝氨酸蛋白酶家族。
CPA3（Carboxypeptidase A3）：是一种由CPA3基因编码的酶，属于羧肽酶A家族的锌金属蛋白酶，属于肥大细胞特异蛋白酶，可以在脉络丛肥大细胞中表达。
CTSG：可以标记肥大细胞和粒细胞

再来看看，单核与巨噬细胞的注释结果也有问题：

一起来背诵各种细胞的标记基因吧：单细胞亚群的关键基因背诵不下来肿么办。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。原始发表：2025-03-14，如有侵权请联系 cloudcommunity@tencent 删除数据数据分析datatable编码

本文标签：这种明显错误的单细胞数据分析结果也能发文章吗

版权声明：本文标题：这种明显错误的单细胞数据分析结果也能发文章吗？内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://it.en369.cn/jiaocheng/1748248404a2275066.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

369IT编程

这种明显错误的单细胞数据分析结果也能发文章吗？

这种明显错误的单细胞数据分析结果也能发文章吗？

数据集 GSE131907的背景

髓系注释分群

下面重点来了，髓系注释的结果很奇怪呀！

再来看看，单核与巨噬细胞的注释结果也有问题：

一起来背诵各种细胞的标记基因吧：单细胞亚群的关键基因背诵不下来肿么办。

这种明显错误的单细胞数据分析结果也能发文章吗？

数据集 GSE131907的背景

髓系注释分群

下面重点来了，髓系注释的结果很奇怪呀！

再来看看，单核与巨噬细胞的注释结果也有问题：

一起来背诵各种细胞的标记基因吧：单细胞亚群的关键基因背诵不下来肿么办。

更多相关文章

这种明显错误的单细胞数据分析结果也能发文章吗？

发表评论

推荐文章

JavaWeb后端入门案例二—改进登录案例（验证码+会话技术）

开发者必看！如何用 AI 和大模型彻底优化你的时间管理？

JVM实战—6.频繁YGC和频繁FGC的后果

地图导航的幕后英雄：图论如何改变出行？—全程动画可视化数据结构算法之图

嵌入式Linux：阻塞式IO与非阻塞式IO

热门文章

C++20 中的std::c8rtomb和 std::mbrtoc8

Node 转录组数据库批量下载指南

石油开发企业部署人员定位系统的必要性及方案解析

系统提示“证书无效”，如何解决？

如何在 Windows 10 上安装 PyGame

iPhone党福音快捷指令接入DeepSeek，对话记录自动保存到备忘录

HarmonyOS NEXT 设置持续定位和后台持续定位

【redis】set 类型：基本命令

奥特曼自曝全新OpenAI写作模型：第一次被AI震撼！网友：AI写的坚决不看

照相机数据恢复方法

最新文章

LLM学习笔记：如何理解LLM中的Transformer架构

o3 deep research: 智能体的应用和演进

从开发者视角洞见未来，找到自己的破局之道：Deepseek和Manus如何助力破局？

嵌入式Linux：阻塞式IO与非阻塞式IO

STM32如何精准控制步进电机？

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

How to vary WooCommerce product prices based on order dates

For a menu custom link (without a link) a &quot;#&quot; is appended. Can this just be blank, i.e. NO link?

multisite - Hook into &#39;admin_url&#39; but only on Mysites admin bar menu

Global login to password protected pages

having a page in multi sub-menu and match current-menu-item

For a menu custom link (without a link) a "#" is appended. Can this just be blank, i.e. NO link?

multisite - Hook into 'admin_url' but only on Mysites admin bar menu