0%

【arxiv论文】 Computation and Language 2020-08-24

目录

1. Howl: A Deployed, Open-Source Wake Word Detection System [PDF] 摘要
2. Keywords lie far from the mean of all words in local vector space [PDF] 摘要
3. Top2Vec: Distributed Representations of Topics [PDF] 摘要
4. Neural Machine Translation without Embeddings [PDF] 摘要
5. A Variational Approach to Unsupervised Sentiment Analysis [PDF] 摘要
6. EmoGraph: Capturing Emotion Correlations using Graph Networks [PDF] 摘要
7. It's better to say "I can't answer" than answering incorrectly: Towards Safety critical NLP systems [PDF] 摘要
8. MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark [PDF] 摘要
9. Tweet to News Conversion: An Investigation into Unsupervised Controllable Text Generation [PDF] 摘要
10. To Paraphrase or Not To Paraphrase: User-Controllable Selective Paraphrase Generation [PDF] 摘要
11. Adapting Event Extractors to Medical Data: Bridging the Covariate Shift [PDF] 摘要
12. Document-level Event-based Extraction Using Generative Template-filling Transformers [PDF] 摘要
13. Spatial Language Representation with Multi-Level Geocoding [PDF] 摘要
14. Multi-modal Cooking Workflow Construction for Food Recipes [PDF] 摘要
15. VisualSem: a high-quality knowledge graph for vision and language [PDF] 摘要
16. PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data [PDF] 摘要
17. Entropia: A Family of Entropy-Based Conformance Checking Measures for Process Mining [PDF] 摘要
18. Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning [PDF] 摘要
19. COOKIE: A Dataset for Conversational Recommendation over Knowledge Graphs in E-commerce [PDF] 摘要
20. Dyadic Speech-based Affect Recognition using DAMI-P2C Parent-child Multimodal Interaction Dataset [PDF] 摘要
21. ImagiFilter: A resource to enable the semi-automatic mining of images at scale [PDF] 摘要

摘要

1. Howl: A Deployed, Open-Source Wake Word Detection System [PDF] 返回目录
  Raphael Tang, Jaejun Lee, Afsaneh Razi, Julia Cambre, Ian Bicking, Jofish Kaye, Jimmy Lin
Abstract: We describe Howl, an open-source wake word detection toolkit with native support for open speech datasets, like Mozilla Common Voice and Google Speech Commands. We report benchmark results on Speech Commands and our own freely available wake word detection dataset, built from MCV. We operationalize our system for Firefox Voice, a plugin enabling speech interactivity for the Firefox web browser. Howl represents, to the best of our knowledge, the first fully productionized yet open-source wake word detection toolkit with a web browser deployment target. Our codebase is at this https URL.
摘要:我们描述嚎叫,与开放式言语数据集,如Mozilla共同的声音和谷歌语音命令的原生支持的开源唤醒词检测工具包。我们报告语音命令和我们自己的自由使用唤醒词检测数据集,从MCV内置的基准测试结果。我们投入运作我们的系统为Firefox语音,对于Firefox浏览器的一个插件实现语音交互。哈尔表示,到我们所知,使用Web浏览器部署目标的第一个完全产品化尚未开源唤醒词检测工具包。我们的代码库是在此HTTPS URL。

2. Keywords lie far from the mean of all words in local vector space [PDF] 返回目录
  Eirini Papagiannopoulou, Grigorios Tsoumakas, Apostolos N. Papadopoulos
Abstract: Keyword extraction is an important document process that aims at finding a small set of terms that concisely describe a document's topics. The most popular state-of-the-art unsupervised approaches belong to the family of the graph-based methods that build a graph-of-words and use various centrality measures to score the nodes (candidate keywords). In this work, we follow a different path to detect the keywords from a text document by modeling the main distribution of the document's words using local word vector representations. Then, we rank the candidates based on their position in the text and the distance between the corresponding local vectors and the main distribution's center. We confirm the high performance of our approach compared to strong baselines and state-of-the-art unsupervised keyword extraction methods, through an extended experimental study, investigating the properties of the local representations.
摘要:关键词提取重要文件的过程,目的是找出一小部分简洁地描述文档的主题术语。最流行的国家的最先进的无人监督的方法属于家庭的是建立一个图的的字,并使用不同的中心性措施,将比分节点(候选关键字)基于图形的方法。在这项工作中,我们按照不同的路径通过模拟的使用本地字向量表示文档的话主要分布检测从文本文档的关键字。然后,我们的排名基于其在文本位置和相应的局部载体和主要分销中心之间的距离的候选人。我们确认比较强的基线和国家的最先进的无监督关键词的提取方法,通过扩展的实验研究,调查当地交涉的性能我们的方法的高性能。

3. Top2Vec: Distributed Representations of Topics [PDF] 返回目录
  Dimo Angelov
Abstract: Topic modeling is used for discovering latent semantic structure, usually referred to as topics, in a large collection of documents. The most widely used methods are Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis. Despite their popularity they have several weaknesses. In order to achieve optimal results they often require the number of topics to be known, custom stop-word lists, stemming, and lemmatization. Additionally these methods rely on bag-of-words representation of documents which ignore the ordering and semantics of words. Distributed representations of documents and words have gained popularity due to their ability to capture semantics of words and documents. We present $\texttt{top2vec}$, which leverages joint document and word semantic embedding to find $\textit{topic vectors}$. This model does not require stop-word lists, stemming or lemmatization, and it automatically finds the number of topics. The resulting topic vectors are jointly embedded with the document and word vectors with distance between them representing semantic similarity. Our experiments demonstrate that $\texttt{top2vec}$ finds topics which are significantly more informative and representative of the corpus trained on than probabilistic generative models.
摘要:主题建模用于发现潜在语义结构,通常被称为专题,收集了大量的文件。最广泛使用的方法是隐含狄利克雷分布和概率潜在语义分析。尽管他们的知名度,他们有几个弱点。为了达到最佳的效果,他们往往需要被称为主题,自定义停用词列表,词干和词形归并的数量。此外,这些方法都依赖于它忽视了排序和词的语义文件袋的词表示。文件和文字分布式表示已经获得了普及,由于自己的能力的话和文档捕捉语义。我们现在$ \ texttt {top2vec} $,它利用共同文件和文字语义嵌入找到$ \ {textit话题向量} $。这种模式不需要停止单词列表,词干或词形还原,它会自动查找主题数。将所得的主题载体共同嵌入有与表示语义相似度它们之间距离的文档和词语向量。我们的实验证明,$ \ texttt {top2vec} $认定其主题是显著更多的信息和代表培训了超过概率生成模型的语料库。

4. Neural Machine Translation without Embeddings [PDF] 返回目录
  Uri Shaham, Omer Levy
Abstract: Many NLP models follow the embed-contextualize-predict paradigm, in which each sequence token is represented as a dense vector via an embedding matrix, and fed into a contextualization component that aggregates the information from the entire sequence in order to make a prediction. Could NLP models work without the embedding component? To that end, we omit the input and output embeddings from a standard machine translation model, and represent text as a sequence of bytes via UTF-8 encoding, using a constant 256-dimension one-hot representation for each byte. Experiments on 10 language pairs show that removing the embedding matrix consistently improves the performance of byte-to-byte models, often outperforms character-to-character models, and sometimes even produces better translations than standard subword models.
摘要:许多NLP模型遵循嵌入-情境-预测模式,其中每个序列令牌被表示为经由嵌入基质致密矢量,并送入一个语境组件,聚集体从整个序列中的信息,以使预测。可以NLP模型,而不嵌入组件?为此目的,我们省略从标准机器翻译模型的输入和输出的嵌入,以及表示文本经由UTF-8编码的字节序列,使用用于每一字节的恒定256尺寸独热表示。在10种语言对实验结果表明,去除嵌入矩阵一致提高字节到字节车型的表现,往往优于字符到人物模型,有时甚至会产生比标准型号子字更好的翻译。

5. A Variational Approach to Unsupervised Sentiment Analysis [PDF] 返回目录
  Ziqian Zeng, Wenxuan Zhou, Xin Liu, Zizheng Lin, Yangqin Song, Michael David Kuo, Wan Hang Keith Chiu
Abstract: In this paper, we propose a variational approach to unsupervised sentiment analysis. Instead of using ground truth provided by domain experts, we use target-opinion word pairs as a supervision signal. For example, in a document snippet "the room is big," (room, big) is a target-opinion word pair. These word pairs can be extracted by using dependency parsers and simple rules. Our objective function is to predict an opinion word given a target word while our ultimate goal is to learn a sentiment classifier. By introducing a latent variable, i.e., the sentiment polarity, to the objective function, we can inject the sentiment classifier to the objective function via the evidence lower bound. We can learn a sentiment classifier by optimizing the lower bound. We also impose sophisticated constraints on opinion words as regularization which encourages that if two documents have similar (dissimilar) opinion words, the sentiment classifiers should produce similar (different) probability distribution. We apply our method to sentiment analysis on customer reviews and clinical narratives. The experiment results show our method can outperform unsupervised baselines in sentiment analysis task on both domains, and our method obtains comparable results to the supervised method with hundreds of labels per aspect in customer reviews domain, and obtains comparable results to supervised methods in clinical narratives domain.
摘要:在本文中,我们提出了一个变分法无监督的情感分析。除了使用由领域专家提供的基础事实,我们使用目标的意见词对作为监管的信号。举例来说,在一个文档片段“的房间大,”(室,大)是目标看来单词对。这些词对可以通过使用依赖解析器和简单的规则来提取。我们的目标函数,而我们的最终目标是学习的情感分类预测给定目标字词的意见字。通过引入潜在变量,即,情感极性,目标函数,我们可以注入所述情感分类经由下界证据目标函数。我们可以通过优化下界学习情感分类。我们还强加观点词作为正规化鼓励,如果两个文件具有相似(不同)意见的话,感情分类应产生相似(不同)的概率分布的复杂约束。我们应用我们的方法,情感分析客户的评价和临床叙述。实验结果表明我们的方法可以超越上两个域的情感分析任务的无人监管的基准,我们的方法获得可比较的结果到监测方法与数百名客户的评论域中每方面的标签,并获得类似的结果在临床叙事领域的监督方法。

6. EmoGraph: Capturing Emotion Correlations using Graph Networks [PDF] 返回目录
  Peng Xu, Zihan Liu, Genta Indra Winata, Zhaojiang Lin, Pascale Fung
Abstract: Most emotion recognition methods tackle the emotion understanding task by considering individual emotion independently while ignoring their fuzziness nature and the interconnections among them. In this paper, we explore how emotion correlations can be captured and help different classification tasks. We propose EmoGraph that captures the dependencies among different emotions through graph networks. These graphs are constructed by leveraging the co-occurrence statistics among different emotion categories. Empirical results on two multi-label classification datasets demonstrate that EmoGraph outperforms strong baselines, especially for macro-F1. An additional experiment illustrates the captured emotion correlations can also benefit a single-label classification task.
摘要:大多数的情感识别方法考虑个人情感独立而忽略了模糊的性质以及它们之间的互连解决情绪理解任务。在本文中,我们探讨情绪的相关性如何可以被捕获,并帮助不同的分类任务。我们建议EmoGraph捕获通过图形网络不同的情感之间的依赖关系。这些图是通过利用不同的情感类别之间的同现统计数字构成。在两个多标签分类数据集的实证结果表明,EmoGraph优于强基线,特别是对宏观F1。另外一个实验显示所拍摄的情感关联还可以享受单标签分类任务。

7. It's better to say "I can't answer" than answering incorrectly: Towards Safety critical NLP systems [PDF] 返回目录
  Neeraj Varshney, Swaroop Mishra, Chitta Baral
Abstract: In order to make AI systems more reliable and their adoption in safety critical applications possible, it is essential to impart the capability to abstain from answering when their prediction is likely to be incorrect and seek human intervention. Recently proposed "selective answering" techniques model calibration as a binary classification task. We argue that, not all incorrectly answered questions are incorrect to the same extent and the same is true for correctly answered questions. Hence, treating all correct predictions equally and all incorrect predictions equally constraints calibration. In this work, we propose a methodology that incorporates the degree of correctness, shifting away from classification labels as it directly tries to predict the probability of model's prediction being correct. We show the efficacy of the proposed method on existing Natural Language Inference (NLI) datasets by training on SNLI and evaluating on MNLI mismatched and matched datasets. Our approach improves area under the curve (AUC) of risk-coverage plot by 10.22\% and 8.06\% over maxProb with respect to the maximum possible improvement on MNLI mismatched and matched set respectively. In order to evaluate our method on Out of Distribution (OOD) datasets, we propose a novel setup involving questions with a variety of reasoning skills. Our setup includes a test set for each of the five reasoning skills: numerical, logical, qualitative, abductive and commonsense. We select confidence threshold for each of the approaches where the in-domain accuracy (SNLI) is 99\%. Our results show that, the proposed method outperforms existing approaches by abstaining on 2.6\% more OOD questions at respective confidence thresholds.
摘要:为了使AI系统更加可靠,其安全关键应用可能通过,有必要从回答传授能力弃权时,他们的预测很可能是不正确的,寻求人的干预。最近提出的“选择性应答”技术模型校准作为二元分类任务。我们认为,并非所有的回答不正确的问题是不正确的相同程度和相同的是正确回答问题正确的。因此,治疗同样正确的预测和所有不正确的预测同样制约校准。在这项工作中,我们提出了整合的正确性的程度,从分类标签转移走,因为它直接试图预测模型的预测是正确的概率的方法。我们展示由上SNLI培训现有的自然语言推理(NLI)数据集和MNLI评估了该方法的有效性不匹配和匹配的数据集。我们的方法10.22 \超过maxProb%和8.06 \%的风险覆盖图上的曲线(AUC)下提高区域相对于最大可能的改进上MNLI不匹配和匹配的分别设定。为了评估我们的输出分配(OOD)的数据集的方法,我们提出了一个新颖的设置涉及与各种推理技巧的问题。我们的设置包括每个五个推理能力测试集:数字,逻辑,定性,溯和常识。我们选择置信度阈值每个的方法,其中域内精度(SNLI)为99 \%。我们的研究结果表明,现有的方法优于投弃权票的接近2.6 \%,在各自的置信度阈值的详细OOD的问题。

8. MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark [PDF] 返回目录
  Haoran Li, Abhinav Arora, Shuohui Chen, Anchit Gupta, Sonal Gupta, Yashar Mehdad
Abstract: Scaling semantic parsing models for task-oriented dialog systems to new languages is often expensive and time-consuming due to the lack of available datasets. Even though few datasets are available, they suffer from many shortcomings: a) they contain few languages and small amounts of labeled data for other languages b) they are based on the simple intent and slot detection paradigm for non-compositional queries. In this paper, we present a new multilingual dataset, called MTOP, comprising of 100k annotated utterances in 6 languages across 11 domains. We use this dataset and other publicly available datasets to conduct a comprehensive benchmarking study on using various state-of-the-art multilingual pre-trained models for task-oriented semantic parsing. We achieve an average improvement of +6.3\% on Slot F1 for the two existing multilingual datasets, over best results reported in their experiments. Furthermore, we also demonstrate strong zero-shot performance using pre-trained models combined with automatic translation and alignment, and a proposed distant supervision method to reduce the noise in slot label projection.
摘要:缩放语义分析模型,面向任务的对话系统,以新的语言往往是昂贵和耗时由于缺乏可用的数据集。尽管一些数据集是可用的,他们从许多缺点影响:1)它们含有几种语言和少量其他语言的b标签数据),他们是基于非成分查询简单的意图和插槽检测模式。在本文中,我们提出了一个新的多语言数据集,称为MTOP,跨越11个结构域在6种语言包括100k的注释的话语。我们使用这个数据集和其他公开数据集上使用各种先进设备,最先进的多语种预先训练面向任务的语义分析模型进行全面的基准研究。我们实现了6.3 \%的插槽F1的两个现有的多语种数据集的平均改善,在最好的结果报告在他们的实验。此外,我们还利用预先训练模式与自动翻译和校准,并提出了一个遥远的监督相结合的方法,以减少在插槽标签投影噪声表现出较强的零射门的表现。

9. Tweet to News Conversion: An Investigation into Unsupervised Controllable Text Generation [PDF] 返回目录
  Zishan Ahmad, Mukuntha N S, Asif Ekbal, Pushpak Bhattacharyya
Abstract: Text generator systems have become extremely popular with the advent of recent deep learning models such as encoder-decoder. Controlling the information and style of the generated output without supervision is an important and challenging Natural Language Processing (NLP) task. In this paper, we define the task of constructing a coherent paragraph from a set of disaster domain tweets, without any parallel data. We tackle the problem by building two systems in pipeline. The first system focuses on unsupervised style transfer and converts the individual tweets into news sentences. The second system stitches together the outputs from the first system to form a coherent news paragraph. We also propose a novel training mechanism, by splitting the sentences into propositions and training the second system to merge the sentences. We create a validation and test set consisting of tweet-sets and their equivalent news paragraphs to perform empirical evaluation. In a completely unsupervised setting, our model was able to achieve a BLEU score of 19.32, while successfully transferring styles and joining tweets to form a meaningful news paragraph.
摘要:文本生成系统已经变得非常流行近期深度学习模式,如编码器,解码器的问世。控制生成的输出不受监督的信息和风格是一个重要而具有挑战性的自然语言处理(NLP)的任务。在本文中,我们定义构建从一组灾难域鸣叫的连贯款,没有任何并行数据的任务。我们在管道建设两个系统解决这个问题。第一个系统的重点监督的风格转移和个人微博转换成新闻句子。第二系统缝线一起从第一系统的输出,以形成一个连贯的消息段。我们还提出了一种新的培训机制,通过拆分句子变成命题和培训的第二系统的句子合并。我们创建由鸣叫集及其等价新闻段落进行实证分析验证和测试集。在完全无人监管的环境,我们的模型能够达到19.32一BLEU得分,同时成功地转移样式和加入微博,形成有意义的新闻段落。

10. To Paraphrase or Not To Paraphrase: User-Controllable Selective Paraphrase Generation [PDF] 返回目录
  Mohan Zhang, Luchen Tan, Zhengkai Tu, Zihang Fu, Kun Xiong, Ming Li, Jimmy Lin
Abstract: In this article, we propose a paraphrase generation technique to keep the key phrases in source sentences during paraphrasing. We also develop a model called TAGPA with such technique, which has multiple pre-configured or trainable key phrase detector and a paraphrase generator. The paraphrase generator aims to keep the key phrases and increase the diversity of the paraphrased sentences. The key phrases can be entities provided by our user, like company names, people's names, domain-specific terminologies, etc., or can be learned from a given dataset.
摘要:在这篇文章中,我们提出了一个意译代技术,以保持在源句子的关键词语转述过程中。我们还开发了一个名为TAGPA与这样的技术模型,其具有预先配置的多个或可训练关键短语检测器和一个复述发生器。复述发电机目标,以保持关键短语,并增加输入转述句子的多样性。关键短语可以通过我们的用户提供的,如公司名称,人名,特定领域的术语等实体,也可以从一个给定的数据集中学习。

11. Adapting Event Extractors to Medical Data: Bridging the Covariate Shift [PDF] 返回目录
  Aakanksha Naik, Jill Lehman, Carolyn Rose
Abstract: We tackle the task of adapting event extractors to new domains without labeled data, by aligning the marginal distributions of source and target domains. As a testbed, we create two new event extraction datasets using English texts from two medical domains: (i) clinical notes, and (ii) doctor-patient conversations. We test the efficacy of three marginal alignment techniques: (i) adversarial domain adaptation (ADA), (ii) domain adaptive fine-tuning (DAFT), and (iii) a novel instance weighting technique based on language model likelihood scores (LIW). LIW and DAFT improve over a no-transfer BERT baseline on both domains, but ADA only improves on clinical notes. Deeper analysis of performance under different types of shifts (e.g., lexical shift, semantic shift) reveals interesting variations among models. Our best-performing models reach F1 scores of 70.0 and 72.9 on notes and conversations respectively, using no labeled data from target domains.
摘要:我们应对适应事件提取到新的领域没有标记数据,通过对齐源和目标域的边缘分布的任务。作为测试平台,我们创建一个使用英文文本来自两个医学领域两个新的事件抽取的数据集:(一)临床记录,以及(ii)医患对话。我们测试三个边缘对准技术的功效:(ⅰ)对抗性域适配(ADA),(ⅱ)域自适应微调(DAFT),和(iii)基于语言模型似然性评分的新颖实例加权技术(LIW) 。 LIW,疯狂在提高两个域无转移BERT基线,但ADA既提高临床笔记。的下不同类型的偏移的性能更深入的分析(例如,词汇移位,语义移位)揭示了模型之间有趣的变化。我们表现​​最好的车型上做笔记和交谈达到70.0和72.9的得分F1分别使用从目标域没有标记的数据。

12. Document-level Event-based Extraction Using Generative Template-filling Transformers [PDF] 返回目录
  Xinya Du, Alexander Rush, Claire Cardie
Abstract: We revisit the classic information extraction problem of document-level template filling. We argue that sentence-level approaches are ill-suited to the task and introduce a generative transformer-based encoder-decoder framework that is designed to model context at the document level: it can make extraction decisions across sentence boundaries; is \emph{implicitly} aware of noun phrase coreference structure, and has the capacity to respect cross-role dependencies in the template structure. We evaluate our approach on the MUC-4 dataset, and show that our model performs substantially better than prior work. We also show that our modeling choices contribute to model performance, e.g., by implicitly capturing linguistic knowledge such as recognizing coreferent entity mentions. Our code for the evaluation script and models will be open-sourced at this https URL for reproduction purposes.
摘要:我们重温文档级模板填充的经典信息的提取问题。我们认为,句子级的方法是不适合的任务,并引入生成基于变压器的编码解码器的框架,旨在以文件级车型方面:它可以使整个句子边界提取的决定;是\ {EMPH隐含}知道的名词短语共指结构,并具有尊重模板结构的交叉作用的依赖性的能力。我们评估我们对MUC-4数据集的做法,并表明我们的模型比以前的工作大大改善。我们还表明,我们的模型的选择有助于模型的性能,例如,通过隐含获取的语言知识,如承认coreferent实体提及。在此HTTPS URL我们的评价脚本和模型的代码将是开源的再生用。

13. Spatial Language Representation with Multi-Level Geocoding [PDF] 返回目录
  Sayali Kulkarni, Shailee Jain, Mohammad Javad Hosseini, Jason Baldridge, Eugene Ie, Li Zhang
Abstract: We present a multi-level geocoding model (MLG) that learns to associate texts to geographic locations. The Earth's surface is represented using space-filling curves that decompose the sphere into a hierarchy of similarly sized, non-overlapping cells. MLG balances generalization and accuracy by combining losses across multiple levels and predicting cells at each level simultaneously. Without using any dataset-specific tuning, we show that MLG obtains state-of-the-art results for toponym resolution on three English datasets. Furthermore, it obtains large gains without any knowledge base metadata, demonstrating that it can effectively learn the connection between text spans and coordinates - and thus can be extended to toponymns not present in knowledge bases.
摘要:我们提出了一个多层次的地理编码模型(MLG),该学会副文本到地理位置。地球表面是使用空间填充曲线可分解球体成同样大小的,非重叠的细胞的层次结构表示。 MLG通过在多个水平结合损失,并在每个水平同时预测细胞平衡概括和准确性。不使用任何数据集特有的调整,我们表明,MLG取得国家的先进成果在三个英文数据集地名的决议。此外,获得没有任何知识基础的元数据大的收益,证明它可以有效地学习文本跨度和坐标之间的连接 - 从而可以扩展到知识基础不存在toponymns。

14. Multi-modal Cooking Workflow Construction for Food Recipes [PDF] 返回目录
  Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua
Abstract: Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines.
摘要:了解食品配方需要预测烹饪的动作,使得该配方可以被转换成描述配方的时间流程的图的隐式因果作用。这是一个不平凡的任务,涉及常识推理。然而,现有的努力依靠手工制作的功能,以提取食谱的工作流图,由于缺乏大规模的数据集标记。此外,它们不能利用烹饪图像,构成食品配方的一个重要部分。在本文中,我们建立MM-RES,第一个大型数据集烹饪流程建设,由9850个食谱与人类标记的工作流程图表。烹饪步骤是多模态,显示了文字说明和烹饪的图像。然后,我们建议,利用视觉和文本信息来构建烹饪流程,取得了比现有的手工制作的基线超过20%的性能增益的神经编码器,解码器模型。

15. VisualSem: a high-quality knowledge graph for vision and language [PDF] 返回目录
  Houda Alberts, Teresa Huang, Yash Deshpande, Yibo Liu, Kyunghyun Cho, Clara Vania, Iacer Calixto
Abstract: We argue that the next frontier in natural language understanding (NLU) and generation (NLG) will include models that can efficiently access external structured knowledge repositories. In order to support the development of such models, we release the VisualSem knowledge graph (KG) which includes nodes with multilingual glosses and multiple illustrative images and visually relevant relations. We also release a neural multi-modal retrieval model that can use images or sentences as inputs and retrieves entities in the KG. This multi-modal retrieval model can be integrated into any (neural network) model pipeline and we encourage the research community to use VisualSem for data augmentation and/or as a source of grounding, among other possible uses. VisualSem as well as the multi-modal retrieval model are publicly available and can be downloaded in: this https URL.
摘要:我们认为,在自然语言理解(NLU)和代(NLG)的下一个前沿领域将包括能够有效地访问外部结构化的知识仓库模型。为了支持这种模式的发展,我们发布VisualSem知识图(KG),包括多语种的唇彩和多个说明图像和视觉相关关系的节点。我们也释放出可以使用图像或句子作为KG的输入和检索实体神经多模态检索模型。这种多模态检索模型可以被集成到任何(神经网络)模型管道,我们鼓励研究团体使用VisualSem数据增强和/或接地的来源,其他可能的用途。 VisualSem以及多模态检索模型是公开的,可以在以下网址下载:此HTTPS URL。

16. PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data [PDF] 返回目录
  Diedre Carmo, Marcos Piau, Israel Campiotti, Rodrigo Nogueira, Roberto Lotufo
Abstract: In natural language processing (NLP), there is a need for more resources in Portuguese, since much of the data used in the state-of-the-art research is in other languages. In this paper, we pretrain a T5 model on the BrWac corpus, an extensive collection of web pages in Portuguese, and evaluate its performance against other Portuguese pretrained models and multilingual models on the sentence similarity and sentence entailment tasks. We show that our Portuguese pretrained models have significantly better performance over the original T5 models. Moreover, we showcase the positive impact of using a Portuguese vocabulary.
摘要:在自然语言处理(NLP),有必要在葡萄牙更多的资源,因为在很多国家的最先进的研究使用的数据是在其他语言。在本文中,我们pretrain在BrWac语料库T5模型,丰富的网页在葡萄牙的收集,并评估其对其他葡萄牙预训练模式和多语言模型的句子相似度和句子蕴涵任务中的表现。我们证明了我们的葡萄牙语预训练的车型有超过原来的T5车型显著更好的性能。此外,我们展示使用葡萄牙语词汇的积极影响。

17. Entropia: A Family of Entropy-Based Conformance Checking Measures for Process Mining [PDF] 返回目录
  Artem Polyvyanyy, Hanan Alkhammash, Claudio Di Ciccio, Luciano García-Bañuelos, Anna Kalenkova, Sander J. J. Leemans, Jan Mendling, Alistair Moffat, Matthias Weidlich
Abstract: This paper presents a command-line tool, called Entropia, that implements a family of conformance checking measures for process mining founded on the notion of entropy from information theory. The measures allow quantifying classical non-deterministic and stochastic precision and recall quality criteria for process models automatically discovered from traces executed by IT-systems and recorded in their event logs. A process model has "good" precision with respect to the log it was discovered from if it does not encode many traces that are not part of the log, and has "good" recall if it encodes most of the traces from the log. By definition, the measures possess useful properties and can often be computed fast.
摘要:本文提出了一个命令行工具,叫做安特罗皮亚,实现的一致性检查措施过程挖掘一个家庭建立在熵从信息论的概念。该措施允许用于定量由IT系统执行并记录在他们的事件日志跟踪自动发现的过程模型经典的非确定性和随机性的精度和召回的质量标准。流程模型相对于它的,如果它不编码的痕迹不属于日志的一部分发现的日志“好”的精度,并具有“良好”的召回,如果它编码大部分从日志的痕迹。根据定义,这些措施具有有用的属性和经常可以快速计算。

18. Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning [PDF] 返回目录
  Noé Tits, Kevin El Haddad, Thierry Dutoit
Abstract: Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area. In this paper we propose an audio laughter synthesis system based on a sequence-to-sequence TTS synthesis system. We leverage transfer learning by training a deep learning model to learn to generate both speech and laughs from annotations. We evaluate our model with a listening test, comparing its performance to an HMM-based laughter synthesis one and assess that it reaches higher perceived naturalness. Our solution is a first step towards a TTS system that would be able to synthesize speech with a control on amusement level with laughter integration.
摘要:尽管表现为语音合成的兴趣与日俱增,非言语表达的合成是一个充分开发的区域。在本文中,我们提出了一种基于序列对序列TTS合成系统上的音频笑声合成系统。我们充分利用传输通过训练深度学习模型学习产生语音和笑声,从标注的学习。我们评估我们的模型有听力测试,其性能比较基于HMM的笑声合成一个,并评估其达到更高的感知自然。我们的解决方案是向TTS系统,该系统将能够合成语音与娱乐水平控制在笑声整合的第一步。

19. COOKIE: A Dataset for Conversational Recommendation over Knowledge Graphs in E-commerce [PDF] 返回目录
  Zuohui Fu, Yikun Xian, Yaxin Zhu, Yongfeng Zhang, Gerard de Melo
Abstract: In this work, we present a new dataset for conversational recommendation over knowledge graphs in e-commerce platforms called COOKIE. The dataset is constructed from an Amazon review corpus by integrating both user-agent dialogue and custom knowledge graphs for recommendation. Specifically, we first construct a unified knowledge graph and extract key entities between user--product pairs, which serve as the skeleton of a conversation. Then we simulate conversations mirroring the human coarse-to-fine process of choosing preferred items. The proposed baselines and experiments demonstrate that our dataset is able to provide innovative opportunities for conversational recommendation.
摘要:在这项工作中,我们提出了在电子商务平台上的对话建议,对知识的图形称为COOKIE新的数据集。该数据集是从亚马逊的评论文集由推荐整合双方用户代理对话和自定义图表的知识结构。具体而言,我们首先构建一个统一的知识图和提取用户之间的主要实体 - 产品对,作为谈话的骨架。然后,我们模拟对话镜像选择首选项目的人粗到精的过程。所提出的基线和实验结果表明,我们的数据能够为对话的建议提供创新的机会。

20. Dyadic Speech-based Affect Recognition using DAMI-P2C Parent-child Multimodal Interaction Dataset [PDF] 返回目录
  Huili Chen, Yue Zhang, Felix Weninger, Rosalind Picard, Cynthia Breazeal, Hae Won Park
Abstract: Automatic speech-based affect recognition of individuals in dyadic conversation is a challenging task, in part because of its heavy reliance on manual pre-processing. Traditional approaches frequently require hand-crafted speech features and segmentation of speaker turns. In this work, we design end-to-end deep learning methods to recognize each person's affective expression in an audio stream with two speakers, automatically discovering features and time regions relevant to the target speaker's affect. We integrate a local attention mechanism into the end-to-end architecture and compare the performance of three attention implementations -- one mean pooling and two weighted pooling methods. Our results show that the proposed weighted-pooling attention solutions are able to learn to focus on the regions containing target speaker's affective information and successfully extract the individual's valence and arousal intensity. Here we introduce and use a "dyadic affect in multimodal interaction - parent to child" (DAMI-P2C) dataset collected in a study of 34 families, where a parent and a child (3-7 years old) engage in reading storybooks together. In contrast to existing public datasets for affect recognition, each instance for both speakers in the DAMI-P2C dataset is annotated for the perceived affect by three labelers. To encourage more research on the challenging task of multi-speaker affect sensing, we make the annotated DAMI-P2C dataset publicly available, including acoustic features of the dyads' raw audios, affect annotations, and a diverse set of developmental, social, and demographic profiles of each dyad.
摘要:自动语音为基础的影响识别的二进谈话个人是一项艰巨的任务,这部分是因为在手动预处理其严重依赖。传统的方法经常需要手工制作的语音特征和扬声器圈分割。在这项工作中,我们设计的终端到终端的深度学习方法来识别每个人的情感表达的音频流与两个扬声器,自动发现有关目标讲话者的影响功能和时间区域。我们整合本地注意机制成终端到终端的架构和比较的三项注意实现性能 - 一个平均值池和两个加权池的方法。我们的研究结果表明,该加权池注意解决方案能够集中精力在包含目标说话人的情感信息的地区,并成功提取了个人的效价和唤醒强度。下面我们就介绍并使用了“多通道交互二元影响 - 父母和孩子之间” 34个科,其中父母和孩子(3-7岁)从事阅读故事书在一起的研究中收集(DAMI-P2C)数据集。相较于现有的公共数据集的影响识别,每个实例在DAMI-P2C数据集两个扬声器被标注为感知的三个贴标影响。为了鼓励多扬声器的具有挑战性的任务更多的研究影响传感,我们做了注解DAMI-P2C数据集中公布,其中包括二元体系原始音频的声学特征,影响注释,且多样化的发展,社会和人口型材每个成对。

21. ImagiFilter: A resource to enable the semi-automatic mining of images at scale [PDF] 返回目录
  Houda Alberts, Iacer Calixto
Abstract: Datasets (semi-)automatically collected from the web can easily scale to millions of entries, but a dataset's usefulness is directly related to how clean and high-quality its examples are. In this paper, we describe and publicly release an image dataset along with pretrained models designed to (semi-)automatically filter out undesirable images from very large image collections, possibly obtained from the web. Our dataset focusses on photographic and/or natural images, a very common use-case in computer vision research. We provide annotations for coarse prediction, i.e. photographic vs. non-photographic, and smaller fine-grained prediction tasks where we further break down the non-photographic class into five classes: maps, drawings, graphs, icons, and sketches. Results on held out validation data show that a model architecture with reduced memory footprint achieves over 96% accuracy on coarse-prediction. Our best model achieves 88% accuracy on the hardest fine-grained classification task available. Dataset and pretrained models are available at: this https URL.
摘要:数据集(半)从网络自动采集可以很容易地扩展到数百万条目,但数据集的有效性直接关系到它的实例如何清洁和高质量的。在本文中,我们描述与设计的(半)预训练的模型一起公开发布的图像数据集自动从非常大的图像集合,有可能从网上获得的过滤掉不良的图像。我们对摄影和/或自然图像数据集论点集中,一个很常见的用例在计算机视觉研究。我们的粗略预测提供了注解,即摄影与非摄影,和更小的细颗粒预测的任务,我们进一步打破非摄影类分为五个等级:地图,图纸,图表,图标和草图。上伸出的验证数据结果表明,与降低的内存占用达到以上粗粒预测96%的准确度的模型体系结构。我们最好的模型达到上可用的最艰难的细粒度分类任务88%的准确率。数据集和预训练的模型,请访问:此HTTPS URL。

注:中文为机器翻译结果!封面为论文标题词云图!