An Information Retrieval Approach to Short Text Conversation-369IT编程

admin管理员组
文章数量:1130349

引言

Hang Li 等发在2014年arxiv上的文章，原文地址：https://arxiv/pdf/1408.6988.pdf
基于新浪微博的数据集做的短文本单轮QA，从论文名可以看到，这是基于检索的方式做的，作者称之为short text conversation (STC)。大概流程是先做Retrieval召回candidate pairs，之后对待定数据做人工标注，再对candidate pairs进行特征提取，最终用LTR的方式进行排序。

数据集

数据集是从新浪微博爬取的微博及其下的评论构成(p, r)这样的pairs，选区的是一些中国搞NLP的高级知识分子的微博，相对来说posts的质量较高。

Sampling Strategy

确定10个在sina微博上活跃的NLP大牛，然后爬他们的followee，得到3200多个NLPer/MLer作为种子。
之后基于上述种子爬了两个月，抓取他们的微博及相关评论。统计出来数据的topic主要为：Research、General Arts and Science、IT Technology、Life等

Processing, Filtering, and Data Cleaning

接下来对数据进行清洗，主要有以下几个策略：
1、去除post小于10个字符，及response小于5个字符的，还有一些万金油式的" Wow"或者“Nice”之类的语句。
2、只保留每个post前100

引言

数据集

数据集是从新浪微博爬取的微博及其下的评论构成(p, r)这样的pairs，选区的是一些中国搞NLP的高级知识分子的微博，相对来说posts的质量较高。

Sampling Strategy

Processing, Filtering, and Data Cleaning

本文标签： Approach Retrieval information Conversation text

版权声明：本文标题：An Information Retrieval Approach to Short Text Conversation 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://it.en369.cn/jiaocheng/1754604682a2707597.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

369IT编程

An Information Retrieval Approach to Short Text Conversation

引言

数据集

Sampling Strategy

Processing, Filtering, and Data Cleaning

引言

数据集

Sampling Strategy

Processing, Filtering, and Data Cleaning

更多相关文章

lstm论文：A deep learning approach to predict significant wave height using lstm 总结（1.17）

svg地图中text文字出现偏移问题

【text recognition算法】Scene Text Recognition from Two-Dimensional Perspective

自然语言处理学习——论文分享——A Mutual Information Maximization Perspective of Language Representation Learning

A Comprehensive Study on Text-attributed Graphs: Benchmarking and Rethinking

（四十三）：Comprehensive Linguistic-Visual Composition Network for Image Retrieval

在sublime text 3中用浏览器打开PHP文件

approach和method的区别

半监督学习 MixMatch：A Holistic Approach to Semi-Supervised Learning（核心代码）

机器人局部避障的动态窗口法(dynamic window approach)

【预训练语言模型】RoBERTa: A Robustly Optimized BERT Pretraining Approach

【压缩感知 SDA】A Deep Learning Approach to Structured Signal Recovery

文献阅读笔记-CSC-数据集-A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check

《A Unified Approach to Interpreting Model Predictions》论文解读——解释模型 预测的统一方法

A Practical Approach to Constructing a Knowledge Graph for Cybersecurity 阅读笔记

Minimum-Fuel Low-Thrust Transfers for Spacecraft：A Convex Approach

论文阅读”A deep variational approach to clustering survival data“(ICLR2022)

A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cloud Systems

论文笔记：A Robust Learning Approach to Domain Adaptive Object Detection

CVPR2022学习-人脸识别:An Efficient Training Approach for Very Large Scale Face Recognition

发表评论

推荐文章

手机里堪称神器的APP，你用过多少个？

基于linux7的pgsql-14promote主备切换

Ubuntu 上安装搜狗输入法、CUDA &amp; cuDNN 和 Anaconda

macbook系统占用硬盘大_解决macos双系统情况下重制macos系统导致windows系统盘一直占用的情况...

计算机英文文献及翻译,英文文献及翻译(计算机专业).doc

热门文章

office 2007 oracle 9,office2007win10版下载

deepin linux 卸载搜狗,在Deepin v20下搜狗输入法有依赖不足等问题，但有解决办法...

华为路由器ws5200虚拟服务器,想处理垃圾路由吗？那就先入手这款华为WS5200路由器吧...

android x86触屏驱动下载,农步祥作品 - 使用台式机和触摸屏玩Android X86 [Soomal]

教你清理Windows 7系统的垃圾文件

POI实现Excel文件加密

Bash Shell：从入门到精通

基于python英文文件名批量翻译并重命名

计算机必须配置的设备是,CSGO Mengxin必须查看计算机配置和设备选择建议以及经验分享...

win7 32位与64位下载地址存档

最新文章

Sublime 32位 激活码

windows下载安装远程桌面工具RealVNC-Server教程(RealVNC_E4_6_1版带注册码)

【亲测免费】 抖音直播伴侣推流密钥获取工具使用教程

【亲测免费】 Proxifer 安装包与注册码

Royal TSX许可证密钥(6.x后所有版本都可以用)

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

WordPress get parent category taxonomy

Omit specific product categories from WooCommerce shortcode

Updating Posts table in database without overwriting user generated content

php - Use wp_get_recent_posts with search term

responsive - How to exclude an image size from the Wordpress srcset

《A Unified Approach to Interpreting Model Predictions》论文解读——解释模型预测的统一方法

Ubuntu 上安装搜狗输入法、CUDA & cuDNN 和 Anaconda

Sublime 32位激活码

【亲测免费】抖音直播伴侣推流密钥获取工具使用教程