摘要

1. Unsupervised Opinion Summarization with Content Planning [PDF] 返回目录
Reinald Kim Amplayo, Stefanos Angelidis, Mirella Lapata
Abstract: The recent success of deep learning techniques for abstractive summarization is predicated on the availability of large-scale datasets. When summarizing reviews (e.g., for products or movies), such training data is neither available nor can be easily sourced, motivating the development of methods which rely on synthetic datasets for supervised training. We show that explicitly incorporating content planning in a summarization model not only yields output of higher quality, but also allows the creation of synthetic datasets which are more natural, resembling real world document-summary pairs. Our content plans take the form of aspect and sentiment distributions which we induce from data without access to expensive annotations. Synthetic datasets are created by sampling pseudo-reviews from a Dirichlet distribution parametrized by our content planner, while our model generates summaries based on input reviews and induced content plans. Experimental results on three domains show that our approach outperforms competitive models in generating informative, coherent, and fluent summaries that capture opinion consensus.
摘要：深度学习技术在抽象摘要方面的最新成功取决于大规模数据集的可用性。当总结评论时（例如，针对产品或电影），这样的训练数据既不可用也不容易获得，从而激发了依赖于合成数据集进行监督训练的方法的发展。我们表明，将内容计划明确纳入汇总模型中，不仅可以产生更高质量的输出，还可以创建更自然的合成数据集，类似于真实世界的文档摘要对。我们的内容计划采用方面和情感分布的形式，这些是我们从数据中得出的，而无需访问昂贵的注释。合成数据集是通过从内容计划者根据参数设置的Dirichlet分布中采样伪评论来创建的，而我们的模型会根据输入评论和诱导的内容计划生成摘要。在三个领域的实验结果表明，我们的方法在生成信息性，连贯性和流利性的摘要以捕获观点共识方面优于竞争模型。

2. Simple or Complex? Learning to Predict Readability of Bengali Texts [PDF] 返回目录
Susmoy Chakraborty, Mir Tafseer Nayeem, Wasi Uddin Ahmad
Abstract: Determining the readability of a text is the first step to its simplification. In this paper, we present a readability analysis tool capable of analyzing text written in the Bengali language to provide in-depth information on its readability and complexity. Despite being the 7th most spoken language in the world with 230 million native speakers, Bengali suffers from a lack of fundamental resources for natural language processing. Readability related research of the Bengali language so far can be considered to be narrow and sometimes faulty due to the lack of resources. Therefore, we correctly adopt document-level readability formulas traditionally used for U.S. based education system to the Bengali language with a proper age-to-age comparison. Due to the unavailability of large-scale human-annotated corpora, we further divide the document-level task into sentence-level and experiment with neural architectures, which will serve as a baseline for the future works of Bengali readability prediction. During the process, we present several human-annotated corpora and dictionaries such as a document-level dataset comprising 618 documents with 12 different grade levels, a large-scale sentence-level dataset comprising more than 96K sentences with simple and complex labels, a consonant conjunct count algorithm and a corpus of 341 words to validate the effectiveness of the algorithm, a list of 3,396 easy words, and an updated pronunciation dictionary with more than 67K words. These resources can be useful for several other tasks of this low-resource language. We make our Code & Dataset publicly available at this https URL} for reproduciblity.
摘要：确定文本的可读性是简化文本的第一步。在本文中，我们提供了一种可读性分析工具，该工具能够分析以孟加拉语编写的文本，以提供有关其可读性和复杂性的深入信息。尽管孟加拉语是世界上使用语言最多的第七种语言，拥有2.3亿母语使用者，但孟加拉语却缺乏用于自然语言处理的基本资源。到目前为止，由于缺乏资源，孟加拉语的与可读性相关的研究被认为是狭窄的，有时甚至是错误的。因此，我们正确地采用了按年龄段进行比较的孟加拉语语言对美国教育系统传统采用的文档级可读性公式。由于无法使用大型人工注释语料库，因此我们将文档级任务进一步分为句子级和神经结构实验，这将作为孟加拉语可读性预测的未来工作的基础。在此过程中，我们介绍了几种带有人工注释的语料库和词典，例如包含618个12级不同文档的文档级数据集，包含超过96K个带有简单和复杂标签的句子，辅音的句子级数据集联合计数算法和341个单词的语料库来验证算法的有效性，列出了3396个简单单词，以及更新的包含超过67K个单词的发音词典。这些资源对于这种资源不足的语言的其他一些任务可能很有用。我们可在此https URL上公开提供“代码和数据集”，以实现可重复性。

3. Vartani Spellcheck -- Automatic Context-Sensitive Spelling Correction of OCR-generated Hindi Text Using BERT and Levenshtein Distance [PDF] 返回目录
Aditya Pal, Abhijit Mustafi
Abstract: Traditional Optical Character Recognition (OCR) systems that generate text of highly inflectional Indic languages like Hindi tend to suffer from poor accuracy due to a wide alphabet set, compound characters and difficulty in segmenting characters in a word. Automatic spelling error detection and context-sensitive error correction can be used to improve accuracy by post-processing the text generated by these OCR systems. A majority of previously developed language models for error correction of Hindi spelling have been context-free. In this paper, we present Vartani Spellcheck - a context-sensitive approach for spelling correction of Hindi text using a state-of-the-art transformer - BERT in conjunction with the Levenshtein distance algorithm, popularly known as Edit Distance. We use a lookup dictionary and context-based named entity recognition (NER) for detection of possible spelling errors in the text. Our proposed technique has been tested on a large corpus of text generated by the widely used Tesseract OCR on the Hindi epic Ramayana. With an accuracy of 81%, the results show a significant improvement over some of the previously established context-sensitive error correction mechanisms for Hindi. We also explain how Vartani Spellcheck may be used for on-the-fly autocorrect suggestion during continuous typing in a text editor environment.
摘要：传统的光学字符识别（OCR）系统会产生高度折弯的印度语言（如印地语）的文本，由于宽泛的字母集，复合字符以及难以将单词分割成多个字符，因此它们的准确性往往很差。通过对这些OCR系统生成的文本进行后处理，可以使用自动拼写错误检测和上下文相关的错误纠正来提高准确性。先前开发的大多数用于印地语拼写错误纠正的语言模型都没有上下文。在本文中，我们介绍了Vartani Spellcheck（一种上下文相关的方法，它使用最新的转换器对印度语文本进行拼写校正）-BERT与Levenshtein距离算法（通常称为“编辑距离”）结合使用。我们使用查找字典和基于上下文的命名实体识别（NER）来检测文本中可能的拼写错误。我们提出的技术已经在印地语史诗罗摩衍那上广泛使用的Tesseract OCR生成的大量文本集上进行了测试。结果具有81％的准确度，与以前建立的印地语上下文相关错误纠正机制相比，有了显着改进。我们还将说明在文本编辑器环境中连续键入过程中，如何将Vartani Spellcheck用于动态自动更正建议。

4. The Complexity of Comparative Text Analysis -- "The Gardener is always the Murderer" says the Fourth Machine [PDF] 返回目录
Marcus Weber, Konstantin Fackeldey
Abstract: There is a heated debate about how far computers can map the complexity of text analysis compared to the abilities of the whole team of human researchers. A "deep" analysis of a given text is still beyond the possibilities of modern computers. In the heart of the existing computational text analysis algorithms there are operations with real numbers, such as additions and multiplications according to the rules of algebraic fields. However, the process of "comparing" has a very precise mathematical structure, which is different from the structure of an algebraic field. The mathematical structure of "comparing" can be expressed by using Boolean rings. We build on this structure and define the corresponding algebraic equations lifting algorithms of comparative text analysis onto the "correct" algebraic basis. From this point of view, we can investigate the question of {\em computational} complexity of comparative text analysis.
摘要：关于计算机可以映射文本分析的复杂性与人类研究人员整个团队的能力相比，存在着激烈的争论。对给定文本的“深度”分析仍然超出了现代计算机的可能性。在现有的计算文本分析算法的核心中，存在具有实数的运算，例如根据代数字段规则的加法和乘法。但是，“比较”过程具有非常精确的数学结构，这与代数场的结构不同。 “比较”的数学结构可以通过使用布尔环来表示。我们在此结构的基础上，定义了相应的代数方程，将比较文本分析的算法提升到“正确”的代数基础上。从这个角度来看，我们可以研究比较文本分析的{\ em计算}复杂性问题。

5. What Makes a Good Summary? Reconsidering the Focus of Automatic Summarization [PDF] 返回目录
Maartje ter Hoeve, Julia Kiseleva, Maarten de Rijke
Abstract: Automatic text summarization has enjoyed great progress over the last years. Now is the time to re-assess its focus and objectives. Does the current focus fully adhere to users' desires or should we expand or change our focus? We investigate this question empirically by conducting a survey amongst heavy users of pre-made summaries. We find that the current focus of the field does not fully align with participants' wishes. In response, we identify three groups of implications. First, we argue that it is important to adopt a broader perspective on automatic summarization. Based on our findings, we illustrate how we can expand our view when it comes to the types of input material that is to be summarized, the purpose of the summaries and their potential formats. Second, we define requirements for datasets that can facilitate these research directions. Third, usefulness is an important aspect of summarization that should be included in our evaluation methodology; we propose a methodology to evaluate the usefulness of a summary. With this work we unlock important research directions for future work on automatic summarization and we hope to initiate the development of methods in these directions.
摘要：近年来，自动文本摘要已取得了很大的进步。现在是重新评估其重点和目标的时候了。当前关注点是否完全符合用户的需求，还是我们应该扩展或更改关注点？我们通过对大量预先编制摘要的用户进行调查，从经验上调查此问题。我们发现该领域的当前重点与参与者的意愿并不完全一致。作为回应，我们确定了三组含义。首先，我们认为对自动摘要采用更广泛的观点很重要。根据我们的发现，我们说明了在要汇总的输入材料的类型，摘要的目的及其潜在格式方面如何扩展我们的观点。其次，我们定义了可以促进这些研究方向的数据集要求。第三，有用性是总结的重要方面，应包括在我们的评估方法中；我们提出了一种方法来评估摘要的有效性。通过这项工作，我们为以后的自动汇总工作开辟了重要的研究方向，并希望在这些方向上启动方法的开发。

6. Time to Transfer: Predicting and Evaluating Machine-Human Chatting Handoff [PDF] 返回目录
Jiawei Liu, Zhe Gao, Yangyang Kang, Zhuoren Jiang, Guoxiu He, Changlong Sun, Xiaozhong Liu, Wei Lu
Abstract: Is chatbot able to completely replace the human agent? The short answer could be - "it depends...". For some challenging cases, e.g., dialogue's topical spectrum spreads beyond the training corpus coverage, the chatbot may malfunction and return unsatisfied utterances. This problem can be addressed by introducing the Machine-Human Chatting Handoff (MHCH), which enables human-algorithm collaboration. To detect the normal/transferable utterances, we propose a Difficulty-Assisted Matching Inference (DAMI) network, utilizing difficulty-assisted encoding to enhance the representations of utterances. Moreover, a matching inference mechanism is introduced to capture the contextual matching features. A new evaluation metric, Golden Transfer within Tolerance (GT-T), is proposed to assess the performance by considering the tolerance property of the MHCH. To provide insights into the task and validate the proposed model, we collect two new datasets. Extensive experimental results are presented and contrasted against a series of baseline models to demonstrate the efficacy of our model on MHCH.
摘要：chatbot是否能够完全取代人工代理？简短的答案可能是-“取决于...”。对于某些具有挑战性的情况，例如对话的话题范围超出了训练语料库的覆盖范围，聊天机器人可能会发生故障并返回不满意的语音。可以通过引入支持人机算法协作的机器人机聊天切换（MHCH）来解决此问题。为了检测正常/可转移话语，我们提出了一种难度辅助匹配推理（DAMI）网络，利用难度辅助编码来增强话语的表示。此外，引入了一种匹配推理机制来捕获上下文匹配特征。提出了一种新的评估指标，公差内的黄金转移（GT-T），通过考虑MHCH的公差属性来评估性能。为了提供对任务的见解并验证提出的模型，我们收集了两个新的数据集。介绍了广泛的实验结果，并将其与一系列基线模型进行了对比，以证明我们的模型对MHCH的有效性。

7. Clickbait in Hindi News Media : A Preliminary Study [PDF] 返回目录
Vivek Kaushal, Kavita Vemuri
Abstract: A corpus of Hindi news headlines shared on Twitter was created by collecting tweets of 5 mainstream Hindi news sources for a period of 4 months. 7 independent annotators were recruited to mark the 20 most retweeted news posts by each of the 5 news sources on its clickbait nature. The clickbait score hence generated was assessed for its correlation with interactions on the platform (retweets, favorites, reader replies), tweet word count, and normalized POS (part-of-speech) tag counts in tweets. A positive correlation was observed between readers' interactions with tweets and tweets' clickbait score. Significant correlations were also observed for POS tag counts and clickbait score. The prevalence of clickbait in mainstream Hindi news media was found to be similar to its prevalence in English news media. We hope that our observations would provide a platform for discussions on clickbait in mainstream Hindi news media.
摘要：通过在5个月内收集5种主流印地语新闻来源的推文，创建了一个在Twitter上共享的印地语新闻头条语料库。 5个新闻来源分别按其点击诱饵性质招募了7个独立的注释者来标记20个转发次数最多的新闻帖子。评估由此产生的clickbait分数与平台上的交互（转推，收藏夹，读者回复），tweet字数和tweet中标准化的POS（词性）标签数的相关性。读者与推文的互动与推文的点击诱饵得分之间观察到正相关。 POS标签计数和点击诱饵得分也观察到显着相关性。发现主流印地语新闻媒体中的点击诱饵流行与英语新闻媒体中的流行诱饵相似。我们希望我们的观察结果将为主流印度语新闻媒体中有关点击诱饵的讨论提供一个平台。

8. Incorporating Domain Knowledge To Improve Topic Segmentation Of Long MOOC Lecture Videos [PDF] 返回目录
Ananda Das, Partha Pratim Das
Abstract: Topical Segmentation poses a great role in reducing search space of the topics taught in a lecture video specially when the video metadata lacks topic wise segmentation information. This segmentation information eases user efforts of searching, locating and browsing a topic inside a lecture video. In this work we propose an algorithm, that combines state-of-the art language model and domain knowledge graph for automatically detecting different coherent topics present inside a long lecture video. We use the language model on speech-to-text transcription to capture the implicit meaning of the whole video while the knowledge graph provides us the domain specific dependencies between different concepts of that subjects. Also leveraging the domain knowledge we can capture the way instructor binds and connects different concepts while teaching, which helps us in achieving better segmentation accuracy. We tested our approach on NPTEL lecture videos and holistic evaluation shows that it out performs the other methods described in the literature.
摘要：主题细分在减少讲座视频中讲授的主题的搜索空间方面起着重要作用，特别是在视频元数据缺少主题明智的细分信息时。此细分信息可简化用户在演讲视频中搜索，查找和浏览主题的工作。在这项工作中，我们提出了一种算法，该算法结合了最新的语言模型和领域知识图，可以自动检测长篇演讲视频中出现的不同连贯主题。我们在语音到文本的转录中使用语言模型来捕获整个视频的隐含含义，而知识图则为我们提供了该主题不同概念之间的领域特定依存关系。此外，利用领域知识，我们可以捕获教师在教学过程中绑定和连接不同概念的方式，这有助于我们实现更好的细分精度。我们在NPTEL讲座视频上测试了我们的方法，整体评估表明它可以执行文献中描述的其他方法。

9. Detecting Insincere Questions from Text: A Transfer Learning Approach [PDF] 返回目录
Ashwin Rachha, Gaurav Vanmane
Abstract: The internet today has become an unrivalled source of information where people converse on content based websites such as Quora, Reddit, StackOverflow and Twitter asking doubts and sharing knowledge with the world. A major arising problem with such websites is the proliferation of toxic comments or instances of insincerity wherein the users instead of maintaining a sincere motive indulge in spreading toxic and divisive content. The straightforward course of action in confronting this situation is detecting such content beforehand and preventing it from subsisting online. In recent times Transfer Learning in Natural Language Processing has seen an unprecedented growth. Today with the existence of transformers and various state of the art innovations, a tremendous growth has been made in various NLP domains. The introduction of BERT has caused quite a stir in the NLP community. As mentioned, when published, BERT dominated performance benchmarks and thereby inspired many other authors to experiment with it and publish similar models. This led to the development of a whole BERT-family, each member being specialized on a different task. In this paper we solve the Insincere Questions Classification problem by fine tuning four cutting age models viz BERT, RoBERTa, DistilBERT and ALBERT.
摘要：当今的互联网已成为无与伦比的信息来源，人们可以在基于内容的网站上进行交流，例如Quora，Reddit，StackOverflow和Twitter，以提出疑问并与世界分享知识。这种网站出现的主要问题是有毒评论的泛滥或含糊不清的情况，其中用户不是在散布有毒和分裂性内容的过程中沉迷于真诚的动机。面对这种情况的直接行动是预先检测此类内容并阻止其在线存在。近年来，自然语言处理中的转移学习有了空前的增长。如今，随着变压器的出现和各种先进技术的创新，各种NLP领域都取得了巨大的发展。 BERT的引入在NLP社区引起了不小的轰动。如前所述，BERT在性能基准测试中占主导地位，从而启发了许多其他作者对其进行试验并发布了类似的模型。这导致了整个BERT系列的发展，每个成员都专门从事不同的任务。在本文中，我们通过微调四个切割年龄模型，即BERT，RoBERTa，DistilBERT和ALBERT，解决了Insincere Questions分类问题。

10. Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings [PDF] 返回目录
Na Li, Zied Bouraoui, Jose Camacho Collados, Luis Espinosa-Anke, Qing Gu, Steven Schockaert
Abstract: While the success of pre-trained language models has largely eliminated the need for high-quality static word vectors in many NLP applications, static word vectors continue to play an important role in tasks where word meaning needs to be modelled in the absence of linguistic context. In this paper, we explore how the contextualised embeddings predicted by BERT can be used to produce high-quality word vectors for such domains, in particular related to knowledge base completion, where our focus is on capturing the semantic properties of nouns. We find that a simple strategy of averaging the contextualised embeddings of masked word mentions leads to vectors that outperform the static word vectors learned by BERT, as well as those from standard word embedding models, in property induction tasks. We notice in particular that masking target words is critical to achieve this strong performance, as the resulting vectors focus less on idiosyncratic properties and more on general semantic properties. Inspired by this view, we propose a filtering strategy which is aimed at removing the most idiosyncratic mention vectors, allowing us to obtain further performance gains in property induction.
摘要：尽管在许多NLP应用中，预训练语言模型的成功很大程度上消除了对高质量静态词向量的需求，但是静态词向量在需要对词义进行建模的任务中仍然起着重要作用语言环境。在本文中，我们探索了BERT预测的上下文嵌入如何可用于为此类领域生成高质量的词向量，尤其是与知识库的完成有关，其中我们的重点是捕获名词的语义特性。我们发现，在属性归纳任务中，对蒙版单词提及的上下文化嵌入进行平均的简单策略会导致矢量的性能优于BERT和标准单词嵌入模型中的BERT所学习的静态单词矢量。我们特别注意到，屏蔽目标词对于实现此强大性能至关重要，因为生成的矢量较少关注特质属性，而更多地关注一般语义属性。受此观点的启发，我们提出了一种过滤策略，旨在消除最特异的提及向量，从而使我们在性能归纳中获得更多的性能提升。

11. Do Mass Media Shape Public Opinion toward China? Quantitative Evidence on New York Times with Deep Learning [PDF] 返回目录
Junming Huang, Gavin Cook, Yu Xie
Abstract: Do mass media influence people's opinion of other countries? Using BERT, a deep neural network-based natural language processing model, we analyze a large corpus of 267,907 China-related articles published by The New York Times since 1970. We then compare our output from The New York Times to a longitudinal data set constructed from 101 cross-sectional surveys of the American public's views on China. We find that the reporting of The New York Times on China in one year explains 54% of the variance in American public opinion on China in the next. Our result confirms hypothesized links between media and public opinion and helps shed light on how mass media can influence public opinion of foreign countries.
摘要：大众传媒是否会影响他人的看法？使用基于深度神经网络的自然语言处理模型BERT，我们分析了自1970年以来《纽约时报》发表的267,907篇有关中国的大型文章。然后，我们将《纽约时报》的输出与构建的纵向数据集进行比较来自对美国公众对中国观点的101项横断面调查。我们发现，《纽约时报》一年内对中国的报道解释了下一年美国舆论对中国的54％的差异。我们的结果证实了媒体与舆论之间的假设联系，并有助于阐明大众媒体如何影响外国的舆论。

12. Automating Document Classification with Distant Supervision to Increase the Efficiency of Systematic Reviews [PDF] 返回目录
Xiaoxiao Li, Rabah Al-Zaidy, Amy Zhang, Stefan Baral, Le Bao, C. Lee Giles
Abstract: Objective: Systematic reviews of scholarly documents often provide complete and exhaustive summaries of literature relevant to a research question. However, well-done systematic reviews are expensive, time-demanding, and labor-intensive. Here, we propose an automatic document classification approach to significantly reduce the effort in reviewing documents. Methods: We first describe a manual document classification procedure that is used to curate a pertinent training dataset and then propose three classifiers: a keyword-guided method, a cluster analysis-based refined method, and a random forest approach that utilizes a large set of feature tokens. As an example, this approach is used to identify documents studying female sex workers that are assumed to contain content relevant to either HIV or violence. We compare the performance of the three classifiers by cross-validation and conduct a sensitivity analysis on the portion of data utilized in training the model. Results: The random forest approach provides the highest area under the curve (AUC) for both receiver operating characteristic (ROC) and precision/recall (PR). Analyses of precision and recall suggest that random forest could facilitate manually reviewing 20\% of the articles while containing 80\% of the relevant cases. Finally, we found a good classifier could be obtained by using a relatively small training sample size. Conclusions: In sum, the automated procedure of document classification presented here could improve both the precision and efficiency of systematic reviews, as well as facilitating live reviews, where reviews are updated regularly.
摘要：目的：对学术文献的系统评价通常会提供与研究问题相关的文献的完整而详尽的摘要。然而，做得好的系统评价是昂贵，费时且费力的。在这里，我们提出了一种自动文档分类方法，以大大减少审核文档的工作量。方法：我们首先描述一种用于整理相关训练数据集的手动文档分类程序，然后提出三个分类器：关键字引导方法，基于聚类分析的精炼方法以及利用大量分类数据的随机森林方法功能令牌。例如，此方法用于识别研究女性性工作者的文档，这些文档假定包含与艾滋病毒或暴力相关的内容。我们通过交叉验证比较这三个分类器的性能，并对用于训练模型的数据部分进行了敏感性分析。结果：对于接收器工作特性（ROC）和精确度/召回率（PR），随机森林方法提供了曲线下的最大面积（AUC）。对准确性和召回率的分析表明，随机森林可以帮助人工审阅20％的文章，同时包含80％的相关案例。最后，我们发现可以使用相对较小的训练样本量来获得良好的分类器。结论：总而言之，此处介绍的文档分类自动程序可以提高系统评价的准确性和效率，并可以促进实时评价（定期更新评价）。

13. A Practical Approach towards Causality Mining in Clinical Text using Active Transfer Learning [PDF] 返回目录
Musarrat Hussain, Fahad Ahmed Satti, Jamil Hussain, Taqdir Ali, Syed Imran Ali, Hafiz Syed Muhammad Bilal, Gwang Hoon Park, Sungyoung Lee
Abstract: Objective: Causality mining is an active research area, which requires the application of state-of-the-art natural language processing techniques. In the healthcare domain, medical experts create clinical text to overcome the limitation of well-defined and schema driven information systems. The objective of this research work is to create a framework, which can convert clinical text into causal knowledge. Methods: A practical approach based on term expansion, phrase generation, BERT based phrase embedding and semantic matching, semantic enrichment, expert verification, and model evolution has been used to construct a comprehensive causality mining framework. This active transfer learning based framework along with its supplementary services, is able to extract and enrich, causal relationships and their corresponding entities from clinical text. Results: The multi-model transfer learning technique when applied over multiple iterations, gains performance improvements in terms of its accuracy and recall while keeping the precision constant. We also present a comparative analysis of the presented techniques with their common alternatives, which demonstrate the correctness of our approach and its ability to capture most causal relationships. Conclusion: The presented framework has provided cutting-edge results in the healthcare domain. However, the framework can be tweaked to provide causality detection in other domains, as well. Significance: The presented framework is generic enough to be utilized in any domain, healthcare services can gain massive benefits due to the voluminous and various nature of its data. This causal knowledge extraction framework can be used to summarize clinical text, create personas, discover medical knowledge, and provide evidence to clinical decision making.
摘要：目的：因果关系挖掘是一个活跃的研究领域，它需要应用最先进的自然语言处理技术。在医疗保健领域，医学专家创建了临床文本，以克服定义明确且由模式驱动的信息系统的局限性。这项研究工作的目的是创建一个框架，该框架可以将临床文本转换为因果知识。方法：基于术语扩展，短语生成，基于BERT的短语嵌入和语义匹配，语义丰富，专家验证以及模型演化的实用方法已被用于构建全面的因果关系挖掘框架。这种基于主动转移学习的框架及其补充服务能够从临床文本中提取和丰富因果关系及其对应的实体。结果：将多模型转移学习技术应用于多次迭代后，就其准确性和查全率而言，在保持精度不变的同时，可以提高性能。我们还对现有技术与它们的常见替代方案进行了比较分析，这证明了我们方法的正确性及其捕获大多数因果关系的能力。结论：提出的框架在医疗保健领域提供了最前沿的成果。但是，也可以对框架进行调整，以在其他域中提供因果关系检测。启示：提出的框架足够通用，可以在任何领域中使用，由于其数据量巨大且性质多样，医疗保健服务可以获得巨大的收益。此因果知识提取框架可用于总结临床文本，创建角色，发现医学知识并为临床决策提供证据。

14. Leveraging Transfer Learning for Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL) [PDF] 返回目录
Trung-Hieu Tran, Long Phan, Truong-Son Nguyen
Abstract: This paper proposed several transformer-based approaches for Reliable Intelligence Identification on Vietnamese social network sites at VLSP 2020 evaluation campaign. We exploit both of monolingual and multilingual pre-trained models. Besides, we utilize the ensemble method to improve the robustness of different approaches. Our team achieved a score of 0.9378 at ROC-AUC metric in the private test set which is competitive to other participants.
摘要：本文在VLSP 2020评估活动中提出了几种基于变压器的越南社交网站可靠情报识别方法。我们利用单语言和多语言的预训练模型。此外，我们利用集成方法来提高不同方法的鲁棒性。在私人测试集中，我们的团队在ROC-AUC指标上获得了0.9378的得分，与其他参与者相比具有竞争力。

15. An End-to-End Solution for Named Entity Recognition in eCommerce Search [PDF] 返回目录
Xiang Cheng, Mitchell Bowden, Bhushan Ramesh Bhange, Priyanka Goyal, Thomas Packer, Faizan Javed
Abstract: Named entity recognition (NER) is a critical step in modern search query understanding. In the domain of eCommerce, identifying the key entities, such as brand and product type, can help a search engine retrieve relevant products and therefore offer an engaging shopping experience. Recent research shows promising results on shared benchmark NER tasks using deep learning methods, but there are still unique challenges in the industry regarding domain knowledge, training data, and model production. This paper demonstrates an end-to-end solution to address these challenges. The core of our solution is a novel model training framework "TripleLearn" which iteratively learns from three separate training datasets, instead of one training set as is traditionally done. Using this approach, the best model lifts the F1 score from 69.5 to 93.3 on the holdout test data. In our offline experiments, TripleLearn improved the model performance compared to traditional training approaches which use a single set of training data. Moreover, in the online A/B test, we see significant improvements in user engagement and revenue conversion. The model has been live on this http URL for more than 9 months, boosting search conversions and revenue. Beyond our application, this TripleLearn framework, as well as the end-to-end process, is model-independent and problem-independent, so it can be generalized to more industrial applications, especially to the eCommerce industry which has similar data foundations and problems.
摘要：命名实体识别（NER）是现代搜索查询理解的关键步骤。在电子商务领域，确定关键实体（例如品牌和产品类型）可以帮助搜索引擎检索相关产品，从而提供引人入胜的购物体验。最近的研究表明，使用深度学习方法在共享基准NER任务上有可喜的结果，但是在领域知识，培训数据和模型生产方面，行业仍然存在独特的挑战。本文演示了解决这些挑战的端到端解决方案。我们解决方案的核心是一个新颖的模型训练框架“ TripleLearn”，该框架从三个单独的训练数据集中迭代学习，而不是像传统上那样训练一个训练集。使用这种方法，最好的模型将坚持测试数据上的F1分数从69.5提高到93.3。在我们的离线实验中，与使用单个训练数据集的传统训练方法相比，TripleLearn改善了模型性能。此外，在在线A / B测试中，我们发现用户参与度和收益转化方面有了显着改善。该模型已经在此http URL上运行了9个月以上，从而提高了搜索转化次数和收入。除我们的应用程序外，此TripleLearn框架以及端到端过程与模型无关且与问题无关，因此可以推广到更多的工业应用程序，尤其是具有类似数据基础和问题的电子商务行业。。

16. Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks [PDF] 返回目录
Herman Kamper, Benjamin van Niekerk
Abstract: We investigate segmenting and clustering speech into low-bitrate phone-like sequences without supervision. We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units. Two segmentation methods are considered. In the first, features are greedily merged until a prespecified number of segments are reached. The second uses dynamic programming to optimize a squared error with a penalty term to encourage fewer but longer segments. We show that these VQ segmentation methods can be used without alteration across a wide range of tasks: unsupervised phone segmentation, ABX phone discrimination, same-different word discrimination, and as inputs to a symbolic word segmentation algorithm. The penalized method generally performs best. While results are only comparable to the state-of-the-art in some cases, in all tasks a reasonable competing approach is outperformed at a substantially lower bitrate.
摘要：我们研究了在没有监督的情况下将语音分段和聚类为低比特率电话样序列的情况。我们专门约束了预训练的自监督向量量化（VQ）神经网络，以便将连续特征向量的块分配给同一代码，从而将语音的可变速率分段成离散单位。考虑了两种分割方法。首先，贪婪地合并要素，直到达到预定数量的线段。第二种使用动态编程来优化带有惩罚项的平方误差，以鼓励更少但更长的细分。我们证明了这些VQ分割方法可以在多种任务中使用而无需更改：无监督的电话分割，ABX电话识别，同等单词识别以及作为符号单词分割算法的输入。惩罚方法通常效果最好。尽管在某些情况下结果只能与最新技术相媲美，但在所有任务中，合理的竞争方法在低得多的比特率下均表现不佳。

17. Sentiment analysis in Bengali via transfer learning using multi-lingual BERT [PDF] 返回目录
Khondoker Ittehadul Islam, Md. Saiful Islam, Md Ruhul Amin
Abstract: Sentiment analysis (SA) in Bengali is challenging due to this Indo-Aryan language's highly inflected properties with more than 160 different inflected forms for verbs and 36 different forms for noun and 24 different forms for pronouns. The lack of standard labeled datasets in the Bengali domain makes the task of SA even harder. In this paper, we present manually tagged 2-class and 3-class SA datasets in Bengali. We also demonstrate that the multi-lingual BERT model with relevant extensions can be trained via the approach of transfer learning over those novel datasets to improve the state-of-the-art performance in sentiment classification tasks. This deep learning model achieves an accuracy of 71\% for 2-class sentiment classification compared to the current state-of-the-art accuracy of 68\%. We also present the very first Bengali SA classifier for the 3-class manually tagged dataset, and our proposed model achieves an accuracy of 60\%. We further use this model to analyze the sentiment of public comments in the online daily newspaper. Our analysis shows that people post negative comments for political or sports news more often, while the religious article comments represent positive sentiment. The dataset and code is publicly available at this https URL\_Sentiment.
摘要：孟加拉语的情感分析（SA）颇具挑战性，因为这种印度－雅利安语具有高度的变形属性，动词有160多种不同的变形形式，名词有36种不同的变形形式，代词有24种不同的变形形式。孟加拉语域中缺乏标准的标记数据集，使SA的任务更加艰巨。在本文中，我们介绍了孟加拉语中手动标记的2类和3类SA数据集。我们还证明，可以通过在那些新颖的数据集上进行转移学习的方法来训练具有相关扩展的多语言BERT模型，以改善情感分类任务中的最新性能。与目前最先进的68％的准确性相比，这种深度学习模型对2类情感分类的准确性达到71％。我们还为3类手动标记的数据集展示了第一个孟加拉语SA分类器，并且我们提出的模型实现了60％的精度。我们进一步使用此模型来分析在线日报中的公众评论情绪。我们的分析表明，人们对政治或体育新闻发表负面评论的频率更高，而宗教文章的评论则代表积极的情绪。数据集和代码可从此https URL \ _Sentiment公开获得。

18. Movie Summarization via Sparse Graph Construction [PDF] 返回目录
Pinelopi Papalampidi, Frank Keller, Mirella Lapata
Abstract: We summarize full-length movies by creating shorter videos containing their most informative scenes. We explore the hypothesis that a summary can be created by assembling scenes which are turning points (TPs), i.e., key events in a movie that describe its storyline. We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constructed using multimodal information. According to human judges, the summaries created by our approach are more informative and complete, and receive higher ratings, than the outputs of sequence-based models and general-purpose summarization algorithms. The induced graphs are interpretable, displaying different topology for different movie genres.
摘要：我们通过创建包含最丰富信息场景的较短视频来总结全长电影。我们探索这样一个假设，即可以通过组合场景来创建摘要，这些场景是转折点（TP），即电影中描述其故事情节的关键事件。我们提出了一个模型，该模型通过构建表示场景之间关系的稀疏电影图来识别TP场景，并使用多模式信息来构建。根据人类法官的判断，与基于序列的模型和通用摘要算法的输出相比，我们的方法创建的摘要更加翔实和完整，并且获得了更高的评分。诱导图是可解释的，针对不同的电影类型显示不同的拓扑。

19. Ensemble Distillation Approaches for Grammatical Error Correction [PDF] 返回目录
Yassir Fathullah, Mark Gales, Andrey Malinin
Abstract: Ensemble approaches are commonly used techniques to improving a system by combining multiple model predictions. Additionally these schemes allow the uncertainty, as well as the source of the uncertainty, to be derived for the prediction. Unfortunately these benefits come at a computational and memory cost. To address this problem ensemble distillation (EnD) and more recently ensemble distribution distillation (EnDD) have been proposed that compress the ensemble into a single model, representing either the ensemble average prediction or prediction distribution respectively. This paper examines the application of both these distillation approaches to a sequence prediction task, grammatical error correction (GEC). This is an important application area for language learning tasks as it can yield highly useful feedback to the learner. It is, however, more challenging than the standard tasks investigated for distillation as the prediction of any grammatical correction to a word will be highly dependent on both the input sequence and the generated output history for the word. The performance of both EnD and EnDD are evaluated on both publicly available GEC tasks as well as a spoken language task.
摘要：集成方法是通过组合多个模型预测来改进系统的常用技术。另外，这些方案允许导出不确定性以及不确定性的来源以用于预测。不幸的是，这些好处是以计算和存储成本为代价的。为了解决这个问题，提出了集合蒸馏（EnD）和最近的集合分布蒸馏（EnDD），其将集合压缩成单个模型，分别代表集合平均预测或预测分布。本文研究了这两种蒸馏方法在序列预测任务中的应用，即语法错误校正（GEC）。这是语言学习任务的重要应用领域，因为它可以为学习者提供非常有用的反馈。但是，它比蒸馏研究的标准任务更具挑战性，因为对单词的任何语法更正的预测将高度依赖于单词的输入序列和生成的输出历史记录。 EnD和EnDD的性能都在公开可用的GEC任务和口头语言任务上进行评估。

20. Effect of Word Embedding Models on Hate and Offensive Speech Detection [PDF] 返回目录
Safa Alsafari, Samira Sadaoui, Malek Mouhoub
Abstract: Deep neural networks have been adopted successfully in hate speech detection problems. Nevertheless, the effect of the word embedding models on the neural network's performance has not been appropriately examined in the literature. In our study, through different detection tasks, 2-class, 3-class, and 6-class classification, we investigate the impact of both word embedding models and neural network architectures on the predictive accuracy. Our focus is on the Arabic language. We first train several word embedding models on a large-scale unlabelled Arabic text corpus. Next, based on a dataset of Arabic hate and offensive speech, for each detection task, we train several neural network classifiers using the pre-trained word embedding models. This task yields a large number of various learned models, which allows conducting an exhaustive comparison. The empirical analysis demonstrates, on the one hand, the superiority of the skip-gram models and, on the other hand, the superiority of the CNN network across the three detection tasks.
摘要：深度神经网络已成功应用于仇恨语音检测问题。然而，文献中尚未适当检查单词嵌入模型对神经网络性能的影响。在我们的研究中，通过2级，3级和6级分类的不同检测任务，我们研究了词嵌入模型和神经网络体系结构对预测准确性的影响。我们的重点是阿拉伯语。我们首先在大规模的未标记阿拉伯文本语料库上训练几种单词嵌入模型。接下来，基于阿拉伯语仇恨和攻击性语音的数据集，对于每个检测任务，我们使用预训练的词嵌入模型训练几个神经网络分类器。该任务产生了大量的各种学习模型，可以进行详尽的比较。经验分析一方面证明了跳过文法模型的优越性，另一方面证明了在三个检测任务中CNN网络的优越性。

21. Disentangling Homophemes in Lip Reading using Perplexity Analysis [PDF] 返回目录
Souheil Fenghour, Daqing Chen, Kun Guo, Perry Xiao
Abstract: The performance of automated lip reading using visemes as a classification schema has achieved less success compared with the use of ASCII characters and words largely due to the problem of different words sharing identical visemes. The Generative Pre-Training transformer is an effective autoregressive language model used for many tasks in Natural Language Processing, including sentence prediction and text classification. This paper proposes a new application for this model and applies it in the context of lip reading, where it serves as a language model to convert visual speech in the form of visemes, to language in the form of words and sentences. The network uses the search for optimal perplexity to perform the viseme-to-word mapping and is thus a solution to the one-to-many mapping problem that exists whereby various words that sound different when spoken look identical. This paper proposes a method to tackle the one-to-many mapping problem when performing automated lip reading using solely visual cues in two separate scenarios: the first scenario is where the word boundary, that is, the beginning and the ending of a word, is unknown; and the second scenario is where the boundary is known. Sentences from the benchmark BBC dataset "Lip Reading Sentences in the Wild"(LRS2), are classified with a character error rate of 10.7% and a word error rate of 18.0%. The main contribution of this paper is to propose a method of predicting words through the use of perplexity analysis when only visual cues are present, using an autoregressive language model.
摘要：与使用ASCII字符和单词相比，使用视位素作为分类方案的自动唇读的性能获得的成功较少，这主要是由于不同单词共享相同视位素的问题。生成式预训练转换器是一种有效的自回归语言模型，可用于自然语言处理中的许多任务，包括句子预测和文本分类。本文为该模型提出了一种新的应用，并将其应用在唇读的上下文中，在该模型中，它作为一种语言模型，可以将视位形式的视觉语音转换为单词和句子形式的语言。网络使用对最佳困惑的搜索来执行视位素到单词的映射，因此是对存在的一对多映射问题的解决方案，通过该问题，说话时听起来不同的各种单词看起来都相同。本文提出了一种方法，用于在两种不同的情况下仅使用视觉提示执行自动唇读时的一对多映射问题：第一种情况是单词边界，即单词的开头和结尾，未知第二种情况是边界是已知的。来自基准BBC数据集“野外唇读句子”（LRS2）的句子的字符错误率为10.7％，单词错误率为18.0％。本文的主要贡献是提出了一种使用自回归语言模型通过仅在视觉提示存在时进行困惑分析来预测单词的方法。

22. Regularizing Recurrent Neural Networks via Sequence Mixup [PDF] 返回目录
Armin Karamzade, Amir Najafi, Seyed Abolfazl Motahari
Abstract: In this paper, we extend a class of celebrated regularization techniques originally proposed for feed-forward neural networks, namely Input Mixup (Zhang et al., 2017) and Manifold Mixup (Verma et al., 2018), to the realm of Recurrent Neural Networks (RNN). Our proposed methods are easy to implement and have a low computational complexity, while leverage the performance of simple neural architectures in a variety of tasks. We have validated our claims through several experiments on real-world datasets, and also provide an asymptotic theoretical analysis to further investigate the properties and potential impacts of our proposed techniques. Applying sequence mixup to BiLSTM-CRF model (Huang et al., 2015) to Named Entity Recognition task on CoNLL-2003 data (Sang and De Meulder, 2003) has improved the F-1 score on the test stage and reduced the loss, considerably.
摘要：在本文中，我们将最初为前馈神经网络提出的一类著名的正则化技术扩展到输入法领域，即Input Mixup（Zhang等人，2017）和Manifold Mixup（Verma等人，2018）。递归神经网络（RNN）。我们提出的方法易于实现，并且计算复杂度低，同时在各种任务中利用了简单神经体系结构的性能。我们已经通过对真实数据集的几次实验验证了我们的主张，并提供了渐近理论分析以进一步研究所提出技术的特性和潜在影响。将序列混合应用于BiLSTM-CRF模型（Huang等，2015）以对CoNLL-2003数据进行命名实体识别任务（Sang和De Meulder，2003），可以提高测试阶段的F-1得分并减少损失，相当。

23. Procode: the Swiss Multilingual Solution for Automatic Coding and Recoding of Occupations and Economic Activities [PDF] 返回目录
Nenad Savic, Nicolas Bovio, Fabian Gilbert, Irina Guseva Canu
Abstract: Objective. Epidemiological studies require data that are in alignment with the classifications established for occupations or economic activities. The classifications usually include hundreds of codes and titles. Manual coding of raw data may result in misclassification and be time consuming. The goal was to develop and test a web-tool, named Procode, for coding of free-texts against classifications and recoding between different classifications. Methods. Three text classifiers, i.e. Complement Naive Bayes (CNB), Support Vector Machine (SVM) and Random Forest Classifier (RFC), were investigated using a k-fold cross-validation. 30 000 free-texts with manually assigned classification codes of French classification of occupations (PCS) and French classification of activities (NAF) were available. For recoding, Procode integrated a workflow that converts codes of one classification to another according to existing crosswalks. Since this is a straightforward operation, only the recoding time was measured. Results. Among the three investigated text classifiers, CNB resulted in the best performance, where the classifier predicted accurately 57-81% and 63-83% classification codes for PCS and NAF, respectively. SVM lead to somewhat lower results (by 1-2%), while RFC coded accurately up to 30% of the data. The coding operation required one minute per 10 000 records, while the recoding was faster, i.e. 5-10 seconds. Conclusion. The algorithm integrated in Procode showed satisfactory performance, since the tool had to assign the right code by choosing between 500-700 different choices. Based on the results, the authors decided to implement CNB in Procode. In future, if another classifier shows a superior performance, an update will include the required modifications.
摘要：目的。流行病学研究需要与为职业或经济活动建立的分类一致的数据。分类通常包括数百个代码和标题。手动编码原始数据可能会导致分类错误并且非常耗时。目的是开发和测试一个名为Procode的网络工具，用于根据分类对自由文本进行编码并在不同分类之间进行重新编码。方法。使用k折交叉验证研究了三个文本分类器，即补全朴素贝叶斯（CNB），支持向量机（SVM）和随机森林分类器（RFC）。提供了3万种自由文本，这些文本带有手动分配的法国职业分类（PCS）和法国活动分类（NAF）分类代码。为了进行编码，Procode集成了一个工作流，该工作流可根据现有的人行横道将一种分类的代码转换为另一种。由于这是一项简单的操作，因此仅测量了记录时间。结果。在三个调查的文本分类器中，CNB表现最佳，其中该分类器分别准确预测了PCS和NAF的分类代码，分别为57-81％和63-83％。 SVM导致结果略低（降低1-2％），而RFC则精确地编码了多达30％的数据。编码操作每1万条记录需要1分钟，而编码速度更快，即5-10秒。结论。 Procode中集成的算法显示出令人满意的性能，因为该工具必须通过在500-700个不同的选择之间进行选择来分配正确的代码。基于结果，作者决定在Procode中实现CNB。将来，如果另一个分类器显示出更好的性能，则更新将包括所需的修改。

24. Fake News Detection in Social Media using Graph Neural Networks and NLP Techniques: A COVID-19 Use-case [PDF] 返回目录
Abdullah Hamid, Nasrullah Shiekh, Naina Said, Kashif Ahmad, Asma Gul, Laiq Hassan, Ala Al-Fuqaha
Abstract: The paper presents our solutions for the MediaEval 2020 task namely FakeNews: Corona Virus and 5G Conspiracy Multimedia Twitter-Data-Based Analysis. The task aims to analyze tweets related to COVID-19 and 5G conspiracy theories to detect misinformation spreaders. The task is composed of two sub-tasks namely (i) text-based, and (ii) structure-based fake news detection. For the first task, we propose six different solutions relying on Bag of Words (BoW) and BERT embedding. Three of the methods aim at binary classification task by differentiating in 5G conspiracy and the rest of the COVID-19 related tweets while the rest of them treat the task as ternary classification problem. In the ternary classification task, our BoW and BERT based methods obtained an F1-score of .606% and .566% on the development set, respectively. On the binary classification, the BoW and BERT based solutions obtained an average F1-score of .666% and .693%, respectively. On the other hand, for structure-based fake news detection, we rely on Graph Neural Networks (GNNs) achieving an average ROC of .95% on the development set.
摘要：本文介绍了针对MediaEval 2020任务的解决方案，即FakeNews：Corona Virus和5G Conspiracy多媒体Twitter-基于数据的分析。该任务旨在分析与COVID-19和5G阴谋论相关的推文，以检测错误信息传播者。该任务由两个子任务组成，即（i）基于文本和（ii）基于结构的假新闻检测。对于第一个任务，我们提出了六个基于单词袋（BoW）和BERT嵌入的解决方案。其中三种方法通过区分5G阴谋和其余与COVID-19相关的推文来针对二进制分类任务，而其余方法将任务视为三元分类问题。在三元分类任务中，我们的基于BoW和BERT的方法在开发集上的F1得分分别为.606％和.566％。在二进制分类中，基于BoW和BERT的解决方案获得的平均F1分数分别为.666％和.693％。另一方面，对于基于结构的假新闻检测，我们依靠图神经网络（GNN）在开发集上实现平均ROC为0.95％。

25. Meta learning to classify intent and slot labels with noisy few shot examples [PDF] 返回目录
Shang-Wen Li, Jason Krone, Shuyan Dong, Yi Zhang, Yaser Al-onaizan
Abstract: Recently deep learning has dominated many machine learning areas, including spoken language understanding (SLU). However, deep learning models are notorious for being data-hungry, and the heavily optimized models are usually sensitive to the quality of the training examples provided and the consistency between training and inference conditions. To improve the performance of SLU models on tasks with noisy and low training resources, we propose a new SLU benchmarking task: few-shot robust SLU, where SLU comprises two core problems, intent classification (IC) and slot labeling (SL). We establish the task by defining few-shot splits on three public IC/SL datasets, ATIS, SNIPS, and TOP, and adding two types of natural noises (adaptation example missing/replacing and modality mismatch) to the splits. We further propose a novel noise-robust few-shot SLU model based on prototypical networks. We show the model consistently outperforms the conventional fine-tuning baseline and another popular meta-learning method, Model-Agnostic Meta-Learning (MAML), in terms of achieving better IC accuracy and SL F1, and yielding smaller performance variation when noises are present.
摘要：近来，深度学习已占据了许多机器学习领域，其中包括口语理解（SLU）。但是，深度学习模型因需要大量数据而臭名昭著，并且高度优化的模型通常对所提供的训练示例的质量以及训练和推理条件之间的一致性敏感。为了提高在嘈杂且培训资源少的任务上执行SLU模型的性能，我们提出了一项新的SLU基准测试任务：少拍鲁棒SLU，其中SLU包含两个核心问题：意图分类（IC）和时段标签（SL）。我们通过在三个公共IC / SL数据集ATIS，SNIPS和TOP上定义少量快照拆分并向拆分添加两种自然噪声（适应示例丢失/替换和模态不匹配）来建立任务。我们进一步提出了一种基于原型网络的新型鲁棒性少发SLU模型。我们展示了该模型在实现更好的IC精度和SL F1以及在存在噪声的情况下产生较小的性能变化方面，始终优于传统的微调基线和另一种流行的元学习方法，即与模型无关的元学习（MAML）。。

26. Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming [PDF] 返回目录
J Liu, R Bai, Z Lu, P Ge, D Liu, Uwe Aickelin
Abstract: In medical fields, text classification is one of the most important tasks that can significantly reduce human workload through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification results for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfactory precision and recall while allow human to read the classifier and fine-tune accordingly if necessary. Given a seed population of regular expressions (can be randomly initialized or manually constructed by experts), our method evolves a population of regular expressions according to chosen fitness function, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.
摘要：在医学领域，文本分类是最重要的任务之一，它可以通过结构化信息数字化和智能决策支持显着减少人员工作量。尽管基于学习的文本分类技术很受欢迎，但是由于学习的黑匣子性质，人们仍然难以理解或手动微调分类结果以提高精度和召回率。这项研究提出了一种新颖的基于正则表达式的文本分类方法，该方法利用基因编程（GP）方法来演化正则表达式，可以以令人满意的精度和召回率对给定的医学文本查询进行分类，同时允许人类阅读分类器并进行相应的微调。如有必要。给定正则表达式的种子种群（可以随机初始化或由专家手动构建），我们的方法使用新颖的正则表达式语法和一系列精心选择的复制运算符，根据选择的适应度函数来演化正则表达式的种群。我们的方法通过在线医疗服务提供商的实时医疗文本查询进行了评估，并显示出令人鼓舞的性能。更重要的是，我们的方法生成的分类器可以被医生完全理解，检查和更新，这对于医学相关实践至关重要。

27. Linguistic Classification using Instance-Based Learning [PDF] 返回目录
Priya S. Nayak, Rhythm S. Girdhar, Shreekanth M. Prabhu
Abstract: Traditionally linguists have organized languages of the world as language families modelled as trees. In this work we take a contrarian approach and question the tree-based model that is rather restrictive. For example, the affinity that Sanskrit independently has with languages across Indo-European languages is better illustrated using a network model. We can say the same about inter-relationship between languages in India, where the inter-relationships are better discovered than assumed. To enable such a discovery, in this paper we have made use of instance-based learning techniques to assign language labels to words. We vocalize each word and then classify it by making use of our custom linguistic distance metric of the word relative to training sets containing language labels. We construct the training sets by making use of word clusters and assigning a language and category label to that cluster. Further, we make use of clustering coefficients as a quality metric for our research. We believe our work has the potential to usher in a new era in linguistics. We have limited this work for important languages in India. This work can be further strengthened by applying Adaboost for classification coupled with structural equivalence concepts of social network analysis.
摘要：传统的语言学家将世界语言组织成树状的语言族。在这项工作中，我们采取了逆势方法，并对基于树的模型提出了严格的限制。例如，使用网络模型可以更好地说明梵文独立于印欧语系语言的亲和力。我们可以说印度语言之间的相互关系也一样，在印度，相互关系比假设的关系要好发现。为了实现这种发现，在本文中，我们利用了基于实例的学习技术为单词分配语言标签。我们先对每个单词发声，然后通过使用相对于包含语言标签的训练集的自定义单词距离度量，来对单词进行分类。我们通过使用单词簇并为该簇分配语言和类别标签来构造训练集。此外，我们将聚类系数用作我们研究的质量指标。我们相信我们的工作有可能开启语言学的新时代。我们将这项工作限于印度的重要语言。通过将Adaboost用于分类以及社会网络分析的结构对等概念，可以进一步加强这项工作。

28. Exploiting BERT to improve aspect-based sentiment analysis performance on Persian language [PDF] 返回目录
H. Jafarian, A. H. Taghavi, A. Javaheri, R. Rawassizadeh
Abstract: Aspect-based sentiment analysis (ABSA) is a more detailed task in sentiment analysis, by identifying opinion polarity toward a certain aspect in a text. This method is attracting more attention from the community, due to the fact that it provides more thorough and useful information. However, there are few language-specific researches on Persian language. The present research aims to improve the ABSA on the Persian Pars-ABSA dataset. This research shows the potential of using pre-trained BERT model and taking advantage of using sentence-pair input on an ABSA task. The results indicate that employing Pars-BERT pre-trained model along with natural language inference auxiliary sentence (NLI-M) could boost the ABSA task accuracy up to 91% which is 5.5% (absolute) higher than state-of-the-art studies on Pars-ABSA dataset.
摘要：基于方面的情感分析（ABSA）是情感分析中的一项更详细的任务，通过识别文本中某个方面的观点极性。由于该方法提供了更全面和有用的信息，因此引起了社区的更多关注。但是，很少有针对波斯语言的语言研究。本研究旨在改进波斯Pars-ABSA数据集上的ABSA。这项研究表明了使用预训练的BERT模型并在ABSA任务上利用句子对输入的潜力。结果表明，结合使用Pars-BERT预训练模型和自然语言推理辅助语句（NLI-M），可以将ABSA任务的准确度提高到91％，比最新技术高5.5％（绝对） Pars-ABSA数据集的研究。

29. A learning perspective on the emergence of abstractions: the curious case of phonemes [PDF] 返回目录
Petar Milin, Benjamin V. Tucker, Dagmar Divjak
Abstract: In the present paper we use a range of modeling techniques to investigate whether an abstract phone could emerge from exposure to speech sounds. We test two opposing principles regarding the development of language knowledge in linguistically untrained language users: Memory-Based Learning (MBL) and Error-Correction Learning (ECL). A process of generalization underlies the abstractions linguists operate with, and we probed whether MBL and ECL could give rise to a type of language knowledge that resembles linguistic abstractions. Each model was presented with a significant amount of pre-processed speech produced by one speaker. We assessed the consistency or stability of what the models have learned and their ability to give rise to abstract categories. Both types of models fare differently with regard to these tests. We show that ECL learning models can learn abstractions and that at least part of the phone inventory can be reliably identified from the input.
摘要：在本文中，我们使用了多种建模技术来研究抽象电话是否可能由于暴露于语音而出现。我们测试了在未经语言训练的语言用户中有关语言知识发展的两个相反的原则：基于记忆的学习（MBL）和错误纠正学习（ECL）。概括的过程是语言学家所使用的抽象的基础，我们探讨了MBL和ECL是否会产生类似于语言抽象的语言知识。每个模型都呈现有一位发言人产生的大量预处理语音。我们评估了模型学到的内容的一致性或稳定性，以及它们产生抽象类别的能力。两种类型的模型在这些测试方面的表现都不同。我们显示ECL学习模型可以学习抽象，并且可以从输入中可靠地识别出至少一部分电话清单。

30. Machine Learning to study the impact of gender-based violence in the news media [PDF] 返回目录
Hugo J. Bello, Nora Palomar, Elisa Gallego, Lourdes Jiménez Navascués, Celia Lozano
Abstract: While it remains a taboo topic, gender-based violence (GBV) undermines the health, dignity, security and autonomy of its victims. Many factors have been studied to generate or maintain this kind of violence, however, the influence of the media is still uncertain. Here, we use Machine Learning tools to extrapolate the effect of the news in GBV. By feeding neural networks with news, the topic information associated with each article can be recovered. Our findings show a relationship between GBV news and public awareness, the effect of mediatic GBV cases, and the intrinsic thematic relationship of GBV news. Because the used neural model can be easily adjusted, this also allows us to extend our approach to other media sources or topics
摘要：基于性别的暴力（GBV）虽然仍然是一个禁忌话题，但却损害了受害者的健康，尊严，安全和自主权。已经研究了许多因素来产生或维持这种暴力，但是，媒体的影响仍然不确定。在这里，我们使用机器学习工具来推断新闻在GBV中的影响。通过向神经网络提供新闻，可以恢复与每篇文章相关的主题信息。我们的发现表明，GBV新闻与公众意识，GBV媒体案例的效果以及GBV新闻的内在主题关系之间存在关联。由于可以轻松调整所使用的神经模型，因此这也使我们可以将方法扩展到其他媒体来源或主题

31. Parameter-Efficient Transfer Learning with Diff Pruning [PDF] 返回目录
Demi Guo, Alexander M. Rush, Yoon Kim
Abstract: While task-specific finetuning of pretrained networks has led to significant empirical advances in NLP, the large size of networks makes finetuning difficult to deploy in multi-task, memory-constrained settings. We propose diff pruning as a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework. This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks. The diff vector is adaptively pruned during training with a differentiable approximation to the L0-norm penalty to encourage sparsity. Diff pruning becomes parameter-efficient as the number of tasks increases, as it requires storing only the nonzero positions and weights of the diff vector for each task, while the cost of storing the shared pretrained model remains constant. It further does not require access to all tasks during training, which makes it attractive in settings where tasks arrive in stream or the set of tasks is unknown. We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark while only modifying 0.5% of the pretrained model's parameters per task.
摘要：尽管针对特定任务的预训练网络微调已在NLP方面取得了重大的经验进步，但网络规模庞大，使得微调难以部署在多任务，受内存限制的环境中。我们建议使用diff修剪作为一种简单的方法，以在pretrain-finetune框架内实现参数有效的传递学习。这种方法将微调视为学习特定于任务的差异向量，该向量应用在预训练的参数向量之上，该向量保持固定并在不同任务之间共享。 diff向量在训练期间以L0-norm惩罚的可微近似方式进行自适应修剪，以鼓励稀疏性。随着任务数量的增加，差异修剪变得参数有效，因为它只需要为每个任务存储差异向量的非零位置和权重，而存储共享的预训练模型的成本保持不变。此外，它不需要在培训期间访问所有任务，这使其在任务到达流或任务集未知的设置中很有吸引力。我们发现，使用diff修剪进行微调的模型可以匹配GLUE基准上完全微调的基线的性能，而每个任务仅修改0.5％的预训练模型参数。

32. The Style-Content Duality of Attractiveness: Learning to Write Eye-Catching Headlines via Disentanglement [PDF] 返回目录
Mingzhe Li, Xiuying Chen, Min Yang, Shen Gao, Dongyan Zhao, Rui Yan
Abstract: Eye-catching headlines function as the first device to trigger more clicks, bringing reciprocal effect between producers and viewers. Producers can obtain more traffic and profits, and readers can have access to outstanding articles. When generating attractive headlines, it is important to not only capture the attractive content but also follow an eye-catching written style. In this paper, we propose a Disentanglement-based Attractive Headline Generator (DAHG) that generates headline which captures the attractive content following the attractive style. Concretely, we first devise a disentanglement module to divide the style and content of an attractive prototype headline into latent spaces, with two auxiliary constraints to ensure the two spaces are indeed disentangled. The latent content information is then used to further polish the document representation and help capture the salient part. Finally, the generator takes the polished document as input to generate headline under the guidance of the attractive style. Extensive experiments on the public Kuaibao dataset show that DAHG achieves state-of-the-art performance. Human evaluation also demonstrates that DAHG triggers 22% more clicks than existing models.
摘要：抢眼的头条新闻是触发更多点击的第一个设备，在制作人和观众之间产生了相互影响。生产者可以获得更多的流量和利润，而读者可以访问优秀的文章。当产生吸引人的头条新闻时，重要的是不仅要捕捉吸引人的内容，而且要遵循引人注目的书面风格。在本文中，我们提出了一种基于Distanganglement的有吸引力的标题生成器（DAHG），该标题生成标题可以按照有吸引力的样式捕获有吸引力的内容。具体而言，我们首先设计一个解纠缠模块，将有吸引力的原型标题的样式和内容划分为潜在空间，并带有两个辅助约束以确保两个空间确实被纠缠。然后，潜在内容信息用于进一步完善文档表示形式并帮助捕获重要部分。最后，生成器将经过抛光的文档作为输入，以在吸引人的风格的指导下生成标题。在Kuaibao公开数据集上进行的大量实验表明，DAHG达到了最先进的性能。人工评估还表明，DAHG触发的点击次数比现有模型多22％。

33. Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension [PDF] 返回目录
Xiuying Chen, Zhi Cui, Jiayi Zhang, Chen Wei, Jianwei Cui, Bin Wang, Dongyan Zhao, Rui Yan
Abstract: In multi-turn dialog, utterances do not always take the full form of sentences \cite{Carbonell1983DiscoursePA}, which naturally makes understanding the dialog context more difficult. However, it is essential to fully grasp the dialog context to generate a reasonable response. Hence, in this paper, we propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question, where the question is focused on the omitted information in the dialog. Enlightened by the multi-task learning scheme, we propose a joint framework that unifies these two tasks, sharing the same encoder to extract the common and task-invariant features with different decoders to learn task-specific features. To better fusing information from the question and the dialog history in the encoding part, we propose to augment the Transformer architecture with a memory updater, which is designed to selectively store and update the history dialog information so as to support downstream tasks. For the experiment, we employ human annotators to write and examine a large-scale dialog reading comprehension dataset. Extensive experiments are conducted on this dataset, and the results show that the proposed model brings substantial improvements over several strong baselines on both tasks. In this way, we demonstrate that reasoning can indeed help better response generation and vice versa. We release our large-scale dataset for further research.
摘要：在多回合对话中，发声并不总是采用句子\ cite {Carbonell1983DiscoursePA}的完整形式，这自然使理解对话上下文变得更加困难。但是，必须完全掌握对话上下文以产生合理的响应。因此，在本文中，我们建议通过检查模型回答阅读理解问题的能力来提高响应生成性能，其中该问题集中在对话框中被忽略的信息上。在多任务学习方案的启发下，我们提出了一个联合框架，该框架统一了这两个任务，共享相同的编码器以与不同的解码器提取共同的和不变的特征，以学习特定于任务的特征。为了更好地从编码部分的问题和对话历史中融合信息，我们建议使用内存更新程序来增强Transformer体系结构，该更新程序旨在选择性地存储和更新历史对话信息，以支持下游任务。对于实验，我们使用人工注释器来编写和检查大型对话框阅读理解数据集。在该数据集上进行了广泛的实验，结果表明，所提出的模型在两项任务的几个强大基准上均带来了实质性的改进。通过这种方式，我们证明了推理确实可以帮助更好地生成响应，反之亦然。我们发布了大规模数据集以进行进一步研究。

34. Towards localisation of keywords in speech using weak supervision [PDF] 返回目录
Kayode Olaleye, Benjamin van Niekerk, Herman Kamper
Abstract: Developments in weakly supervised and self-supervised models could enable speech technology in low-resource settings where full transcriptions are not available. We consider whether keyword localisation is possible using two forms of weak supervision where location information is not provided explicitly. In the first, only the presence or absence of a word is indicated, i.e. a bag-of-words (BoW) labelling. In the second, visual context is provided in the form of an image paired with an unlabelled utterance; a model then needs to be trained in a self-supervised fashion using the paired data. For keyword localisation, we adapt a saliency-based method typically used in the vision domain. We compare this to an existing technique that performs localisation as a part of the network architecture. While the saliency-based method is more flexible (it can be applied without architectural restrictions), we identify a critical limitation when using it for keyword localisation. Of the two forms of supervision, the visually trained model performs worse than the BoW-trained model. We show qualitatively that the visually trained model sometimes locate semantically related words, but this is not consistent. While our results show that there is some signal allowing for localisation, it also calls for other localisation methods better matched to these forms of weak supervision.
摘要：弱监督和自我监督模型的发展可以使语音技术在缺乏完整转录本的低资源环境中发挥作用。我们考虑使用两种形式的弱监管（可能未明确提供位置信息）来实现关键字本地化。首先，仅指示单词的存在或不存在，即单词袋（BoW）标记。在第二种情况下，以图像形式与未标记的话语配对的形式提供视觉环境；然后，需要使用配对数据以自我监督的方式训练模型。对于关键字本地化，我们采用了视觉领域通常使用的基于显着性的方法。我们将其与执行本地化作为网络体系结构一部分的现有技术进行比较。虽然基于显着性的方法更加灵活（可以不受体系结构限制地应用），但在将其用于关键字本地化时，我们确定了一个关键限制。在两种形式的监督中，视觉训练的模型的表现要比BoW训练的模型差。我们定性地表明，经过视觉训练的模型有时会定位与语义相关的单词，但这并不一致。尽管我们的结果表明存在一些允许本地化的信号，但它也要求其他本地化方法能够更好地与这些形式的弱监管相匹配。

35. A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings [PDF] 返回目录
Lisa van Staden, Herman Kamper
Abstract: Many speech processing tasks involve measuring the acoustic similarity between speech segments. Acoustic word embeddings (AWE) allow for efficient comparisons by mapping speech segments of arbitrary duration to fixed-dimensional vectors. For zero-resource speech processing, where unlabelled speech is the only available resource, some of the best AWE approaches rely on weak top-down constraints in the form of automatically discovered word-like segments. Rather than learning embeddings at the segment level, another line of zero-resource research has looked at representation learning at the short-time frame level. Recent approaches include self-supervised predictive coding and correspondence autoencoder (CAE) models. In this paper we consider whether these frame-level features are beneficial when used as inputs for training to an unsupervised AWE model. We compare frame-level features from contrastive predictive coding (CPC), autoregressive predictive coding and a CAE to conventional MFCCs. These are used as inputs to a recurrent CAE-based AWE model. In a word discrimination task on English and Xitsonga data, all three representation learning approaches outperform MFCCs, with CPC consistently showing the biggest improvement. In cross-lingual experiments we find that CPC features trained on English can also be transferred to Xitsonga.
摘要：许多语音处理任务涉及测量语音段之间的声学相似度。声学词嵌入（AWE）通过将任意持续时间的语音段映射到固定维矢量来实现有效的比较。对于零资源语音处理，其中未标记的语音是唯一可用的资源，某些最佳的AWE方法依赖于自动发现的类似单词的段形式的弱自上而下的约束。零资源研究的另一行不是在细分市场级别上学习嵌入，而是在短时框架级别上研究表示学习。最近的方法包括自我监督的预测编码和对应自动编码器（CAE）模型。在本文中，我们考虑将这些帧级特征用作无监督AWE模型的训练输入是否有益。我们比较了从对比预测编码（CPC），自回归预测编码和CAE到常规MFCC的帧级功能。这些用作重复的基于CAE的AWE模型的输入。在针对英语和Xitsonga数据的单词歧视任务中，所有三种表示学习方法均胜过MFCC，而CPC始终显示出最大的改进。在跨语言实验中，我们发现接受英语培训的CPC功能也可以转移到Xitsonga。

36. Generating Math Word Problems from Equations with Topic Controlling and Commonsense Enforcement [PDF] 返回目录
Tianyang Cao, Shuang Zeng, Songge Zhao, Mairgup Mansur, Baobao Chang
Abstract: Recent years have seen significant advancement in text generation tasks with the help of neural language models. However, there exists a challenging task: generating math problem text based on mathematical equations, which has made little progress so far. In this paper, we present a novel equation-to-problem text generation model. In our model, 1) we propose a flexible scheme to effectively encode math equations, we then enhance the equation encoder by a Varitional Autoen-coder (VAE) 2) given a math equation, we perform topic selection, followed by which a dynamic topic memory mechanism is introduced to restrict the topic distribution of the generator 3) to avoid commonsense violation in traditional generation model, we pretrain word embedding with background knowledge graph (KG), and we link decoded words to related words in KG, targeted at injecting background knowledge into our model. We evaluate our model through both automatic metrices and human evaluation, experiments demonstrate our model outperforms baseline and previous models in both accuracy and richness of generated problem text.
摘要：近年来，在神经语言模型的帮助下，文本生成任务有了长足的进步。但是，存在一项艰巨的任务：根据数学方程式生成数学问题文本，到目前为止，进展甚微。在本文中，我们提出了一种新颖的方程式到问题文本生成模型。在我们的模型中，1）我们提出了一种灵活的方案来有效地编码数学方程，然后通过变分自动编码器（VAE）来增强方程编码器2）给定数学方程，然后执行主题选择，然后进行动态主题引入存储机制来限制生成器的主题分布3）为避免传统生成模型中的常识冲突，我们使用背景知识图（KG）对词嵌入进行预训练，并将解码后的词链接到KG中的相关词，以注入背景为目标知识融入我们的模型。我们通过自动度量和人工评估来评估我们的模型，实验证明我们的模型在生成的问题文本的准确性和丰富性方面都优于基线和以前的模型。

37. LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding [PDF] 返回目录
Hao Fu, Shaojun Zhou, Qihong Yang, Junjie Tang, Guiquan Liu, Kaikui Liu, Xiaolong Li
Abstract: The pre-training models such as BERT have achieved great results in various natural language processing problems. However, a large number of parameters need significant amounts of memory and the consumption of inference time, which makes it difficult to deploy them on edge devices. In this work, we propose a knowledge distillation method LRC-BERT based on contrastive learning to fit the output of the intermediate layer from the angular distance aspect, which is not considered by the existing distillation methods. Furthermore, we introduce a gradient perturbation-based training architecture in the training phase to increase the robustness of LRC-BERT, which is the first attempt in knowledge distillation. Additionally, in order to better capture the distribution characteristics of the intermediate layer, we design a two-stage training method for the total distillation loss. Finally, by verifying 8 datasets on the General Language Understanding Evaluation (GLUE) benchmark, the performance of the proposed LRC-BERT exceeds the existing state-of-the-art methods, which proves the effectiveness of our method.
摘要：像BERT这样的预训练模型在各种自然语言处理问题上都取得了不错的成绩。但是，大量参数需要大量的内存和推理时间的消耗，这使得很难将其部署在边缘设备上。在这项工作中，我们提出了一种基于对比学习的知识蒸馏方法LRC-BERT，以从角度距离方面拟合中间层的输出，这是现有蒸馏方法未考虑的。此外，我们在训练阶段引入了基于梯度摄动的训练架构，以提高LRC-BERT的鲁棒性，这是知识蒸馏的首次尝试。此外，为了更好地捕获中间层的分布特征，我们针对总蒸馏损失设计了一个两阶段训练方法。最后，通过在通用语言理解评估（GLUE）基准上验证8个数据集，所提出的LRC-BERT的性能超过了现有的最新方法，这证明了我们方法的有效性。

38. Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling [PDF] 返回目录
Yicheng Zou, Lujun Zhao, Yangyang Kang, Jun Lin, Minlong Peng, Zhuoren Jiang, Changlong Sun, Qi Zhang, Xuanjing Huang, Xiaozhong Liu
Abstract: In a customer service system, dialogue summarization can boost service efficiency by automatically creating summaries for long spoken dialogues in which customers and agents try to address issues about specific topics. In this work, we focus on topic-oriented dialogue summarization, which generates highly abstractive summaries that preserve the main ideas from dialogues. In spoken dialogues, abundant dialogue noise and common semantics could obscure the underlying informative content, making the general topic modeling approaches difficult to apply. In addition, for customer service, role-specific information matters and is an indispensable part of a summary. To effectively perform topic modeling on dialogues and capture multi-role information, in this work we propose a novel topic-augmented two-stage dialogue summarizer (TDS) jointly with a saliency-aware neural topic model (SATM) for topic-oriented summarization of customer service dialogues. Comprehensive studies on a real-world Chinese customer service dataset demonstrated the superiority of our method against several strong baselines.
摘要：在客户服务系统中，对话摘要可通过自动创建长时间对话的摘要来提高服务效率，在对话中，客户和代理商尝试解决有关特定主题的问题。在这项工作中，我们专注于面向主题的对话摘要，该摘要生成高度抽象的摘要，保留了对话的主要思想。在口语对话中，大量的对话噪音和通用语义可能会掩盖潜在的信息内容，从而使通用主题建模方法难以应用。此外，对于客户服务而言，特定于角色的信息很重要，并且是摘要中必不可少的一部分。为了有效地在对话上执行主题建模并捕获多角色信息，在这项工作中，我们提出了一种新颖的主题增强型两阶段对话摘要器（TDS）和显着性神经主题模型（SATM），以针对主题进行摘要。客户服务对话。对现实世界中的中国客户服务数据集的综合研究表明，相对于几个强大的基准，我们的方法具有优越性。

39. Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and Context-Aware Auto-Encoders [PDF] 返回目录
Yicheng Zou, Jun Lin, Lujun Zhao, Yangyang Kang, Zhuoren Jiang, Changlong Sun, Qi Zhang, Xuanjing Huang, Xiaozhong Liu
Abstract: Automatic chat summarization can help people quickly grasp important information from numerous chat messages. Unlike conventional documents, chat logs usually have fragmented and evolving topics. In addition, these logs contain a quantity of elliptical and interrogative sentences, which make the chat summarization highly context dependent. In this work, we propose a novel unsupervised framework called RankAE to perform chat summarization without employing manually labeled data. RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously, as well as a denoising auto-encoder that is carefully designed to generate succinct but context-informative summaries based on the selected utterances. To evaluate the proposed method, we collect a large-scale dataset of chat logs from a customer service environment and build an annotated set only for model evaluation. Experimental results show that RankAE significantly outperforms other unsupervised methods and is able to generate high-quality summaries in terms of relevance and topic coverage.
摘要：自动聊天摘要可以帮助人们从众多聊天消息中快速掌握重要信息。与常规文档不同，聊天日志通常具有分散且不断发展的主题。此外，这些日志包含大量的省略和疑问句，这使得聊天摘要在很大程度上取决于上下文。在这项工作中，我们提出了一种名为RankAE的新型无监督框架，该框架无需使用手动标记的数据即可执行聊天摘要。 RankAE包括一个面向主题的排名策略，该策略根据中心性和多样性同时选择主题话语，以及经过精心设计以根据所选话语生成简洁但具有上下文信息摘要的降噪自动编码器。为了评估所提出的方法，我们从客户服务环境中收集了大规模的聊天记录数据集，并构建了一个仅用于模型评估的带注释的集。实验结果表明，RankAE明显优于其他无监督方法，并且能够在相关性和主题覆盖率方面生成高质量的摘要。

40. Contrastive Learning with Adversarial Perturbations for Conditional Text Generation [PDF] 返回目录
Seanie Lee, Dong Bok Lee, Sung Ju Hwang
Abstract: Recently, sequence-to-sequence (seq2seq) models with the Transformer architecture have achieved remarkable performance on various conditional text generation tasks, such as machine translation. However, most of them are trained with teacher forcing with the ground truth label given at each time step, without being exposed to incorrectly generated tokens during training, which hurts its generalization to unseen inputs, that is known as the ``exposure bias" problem. In this work, we propose to mitigate the conditional text generation problem by contrasting positive pairs with negative pairs, such that the model is exposed to various valid or incorrect perturbations of the inputs, for improved generalization. However, training the model with naive contrastive learning framework using random non-target sequences as negative examples is suboptimal, since they are easily distinguishable from the correct output, especially so with models pretrained with large text corpora. Also, generating positive examples requires domain-specific augmentation heuristics which may not generalize over diverse domains. To tackle this problem, we propose a principled method to generate positive and negative samples for contrastive learning of seq2seq models. Specifically, we generate negative examples by adding small perturbations to the input sequence to minimize its conditional likelihood, and positive examples by adding large perturbations while enforcing it to have a high conditional likelihood. Such ``hard'' positive and negative pairs generated using our method guides the model to better distinguish correct outputs from incorrect ones. We empirically show that our proposed method significantly improves the generalization of the seq2seq on three text generation tasks - machine translation, text summarization, and question generation.
摘要：最近，具有Transformer架构的序列到序列（seq2seq）模型在各种条件文本生成任务（例如机器翻译）上均取得了卓越的性能。但是，大多数培训都是在老师的强迫下进行的，每个步骤都使用给定的地面真相标签进行培训，而不会在培训期间暴露于错误生成的标记中，这不利于将其推广到看不见的输入上，这就是所谓的``曝光偏差''问题在这项工作中，我们建议通过将正对与负对进行对比来缓解条件文本生成问题，从而使模型暴露于输入的各种有效或不正确的扰动下，以提高泛化性，但是，通过朴素的对比训练模型使用随机的非目标序列作为负样本的学习框架不是最佳的，因为它们很容易与正确的输出区分开，尤其是在使用大型文本语料库进行预训练的模型中，此外，生成正样本还需要特定于领域的扩充启发法，而这些启发式法可能不会在为了解决这个问题，我们提出了一种有原则的方法来产生积极的用于seq2seq模型的对比学习的正样本。具体来说，我们通过在输入序列中添加较小的扰动以最小化其条件可能性来生成否定示例，并通过在施加较大的扰动的同时将其强制具有较高的条件可能性来生成正示例。使用我们的方法生成的这种``硬''正负对可以指导模型更好地区分正确输出和错误输出。我们从经验上表明，我们提出的方法在三个文本生成任务（机器翻译，文本摘要和问题生成）上显着提高了seq2seq的泛化能力。

41. Mask-Align: Self-Supervised Neural Word Alignment [PDF] 返回目录
Chi Chen, Maosong Sun, Yang Liu
Abstract: Neural word alignment methods have received increasing attention recently. These methods usually extract word alignment from a machine translation model. However, there is a gap between translation and alignment tasks, since the target future context is available in the latter. In this paper, we propose Mask-Align, a self-supervised model specifically designed for the word alignment task. Our model parallelly masks and predicts each target token, and extracts high-quality alignments without any supervised loss. In addition, we introduce leaky attention to alleviate the problem of unexpected high attention weights on special tokens. Experiments on four language pairs show that our model significantly outperforms all existing unsupervised neural baselines and obtains new state-of-the-art results.
摘要：神经词对齐方法近来受到越来越多的关注。这些方法通常从机器翻译模型中提取单词对齐方式。但是，翻译和对齐任务之间存在差距，因为在后者中可以使用目标未来上下文。在本文中，我们提出了Mask-Align，这是一种专为单词对齐任务设计的自我监督模型。我们的模型并行掩盖和预测每个目标标记，并提取高质量的比对，而没有任何监督性损失。另外，我们引入了泄漏注意，以减轻特殊令牌上意外的高关注权重的问题。在四种语言对上的实验表明，我们的模型明显优于所有现有的无监督神经基线，并获得了最新的最新结果。

42. SPARTA: Speaker Profiling for ARabic TAlk [PDF] 返回目录
Wael Farhan, Muhy Eddin Za'ter, Qusai Abu Obaidah, Hisham al Bataineh, Zyad Sober, Hussein T. Al-Natsheh
Abstract: This paper proposes a novel approach to an automatic estimation of three speaker traits from Arabic speech: gender, emotion, and dialect. After showing promising results on different text classification tasks, the multi-task learning (MTL) approach is used in this paper for Arabic speech classification tasks. The dataset was assembled from six publicly available datasets. First, The datasets were edited and thoroughly divided into train, development, and test sets (open to the public), and a benchmark was set for each task and dataset throughout the paper. Then, three different networks were explored: Long Short Term Memory (LSTM), Convolutional Neural Network (CNN), and Fully-Connected Neural Network (FCNN) on five different types of features: two raw features (MFCC and MEL) and three pre-trained vectors (i-vectors, d-vectors, and x-vectors). LSTM and CNN networks were implemented using raw features: MFCC and MEL, where FCNN was explored on the pre-trained vectors while varying the hyper-parameters of these networks to obtain the best results for each dataset and task. MTL was evaluated against the single task learning (STL) approach for the three tasks and six datasets, in which the MTL and pre-trained vectors almost constantly outperformed STL. All the data and pre-trained models used in this paper are available and can be acquired by the public.
摘要：本文提出了一种新颖的方法，用于自动估计阿拉伯语语音中的三个说话者特征：性别，情感和方言。在对不同的文本分类任务显示出令人满意的结果后，本文将多任务学习（MTL）方法用于阿拉伯语语音分类任务。该数据集由六个公开可用的数据集组成。首先，对数据集进行编辑，并将其完全分为训练集，开发集和测试集（向公众开放），并为整个论文中的每个任务和数据集设置基准。然后，研究了三种不同的网络：长短期记忆（LSTM），卷积神经网络（CNN）和全连接神经网络（FCNN），具有五种不同类型的特征：两个原始特征（MFCC和MEL）和三个预特征训练向量（i向量，d向量和x向量）。 LSTM和CNN网络是使用原始功能实现的：MFCC和MEL，其中在预训练矢量上探索了FCNN，同时更改了这些网络的超参数以获得每个数据集和任务的最佳结果。针对三个任务和六个数据集，使用单任务学习（STL）方法对MTL进行了评估，其中MTL和预训练向量几乎始终超过STL。本文中使用的所有数据和预先训练的模型都是可用的，并且可以被公众获取。

43. Iterative Utterance Segmentation for Neural Semantic Parsing [PDF] 返回目录
Yinuo Guo, Zeqi Lin, Jian-Guang Lou, Dongmei Zhang
Abstract: Neural semantic parsers usually fail to parse long and complex utterances into correct meaning representations, due to the lack of exploiting the principle of compositionality. To address this issue, we present a novel framework for boosting neural semantic parsers via iterative utterance segmentation. Given an input utterance, our framework iterates between two neural modules: a segmenter for segmenting a span from the utterance, and a parser for mapping the span into a partial meaning representation. Then, these intermediate parsing results are composed into the final meaning representation. One key advantage is that this framework does not require any handcraft templates or additional labeled data for utterance segmentation: we achieve this through proposing a novel training method, in which the parser provides pseudo supervision for the segmenter. Experiments on Geo, ComplexWebQuestions, and Formulas show that our framework can consistently improve performances of neural semantic parsers in different domains. On data splits that require compositional generalization, our framework brings significant accuracy gains: Geo 63.1 to 81.2, Formulas 59.7 to 72.7, ComplexWebQuestions 27.1 to 56.3.
摘要：由于缺乏对组合性原理的利用，神经语义解析器通常无法将冗长而复杂的话语解析为正确的意义表示。为了解决这个问题，我们提出了一种通过迭代话语分段来增强神经语义解析器的新颖框架。给定输入话语，我们的框架在两个神经模块之间进行迭代：用于从话语中分割跨度的分段器，以及用于将跨度映射为部分含义表示的解析器。然后，将这些中间解析结果组合成最终含义表示。一个主要优点是，该框架不需要任何手工模板或其他带有标签的数据即可进行话语细分：我们通过提出一种新颖的训练方法来实现这一目标，在该方法中，解析器为分割器提供了伪监督。在Geo，ComplexWebQuestions和公式上进行的实验表明，我们的框架可以不断提高不同领域中神经语义解析器的性能。在需要组合归纳的数据拆分上，我们的框架带来了显着的准确性提升：Geo 63.1至81.2，公式59.7至72.7，ComplexWebQuestions 27.1至56.3。

44. Context-Enhanced Entity and Relation Embedding for Knowledge Graph Completion [PDF] 返回目录
Ziyue Qiao, Zhiyuan Ning, Yi Du, Yuanchun Zhou
Abstract: Most researches for knowledge graph completion learn representations of entities and relations to predict missing links in incomplete knowledge graphs. However, these methods fail to take full advantage of both the contextual information of entity and relation. Here, we extract contexts of entities and relations from the triplets which they compose. We propose a model named AggrE, which conducts efficient aggregations respectively on entity context and relation context in multi-hops, and learns context-enhanced entity and relation embeddings for knowledge graph completion. The experiment results show that AggrE is competitive to existing models.
摘要：大多数知识图完成的研究都学习实体和关系的表示，以预测不完整知识图中的缺失链接。但是，这些方法不能充分利用实体和关系的上下文信息。在这里，我们从它们组成的三元组中提取实体和关系的上下文。我们提出了一个名为AggrE的模型，该模型在多跳中分别对实体上下文和关系上下文进行有效的聚合，并学习上下文增强的实体和关系嵌入以完成知识图的完成。实验结果表明，AggrE与现有模型相比具有竞争力。

45. C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling [PDF] 返回目录
Yutai Hou, Sanyuan Chen, Wanxiang Che, Cheng Chen, Ting Liu
Abstract: Slot filling, a fundamental module of spoken language understanding, often suffers from insufficient quantity and diversity of training data. To remedy this, we propose a novel Cluster-to-Cluster generation framework for Data Augmentation (DA), named C2C-GenDA. It enlarges the training set by reconstructing existing utterances into alternative expressions while keeping semantic. Different from previous DA works that reconstruct utterances one by one independently, C2C-GenDA jointly encodes multiple existing utterances of the same semantics and simultaneously decodes multiple unseen expressions. Jointly generating multiple new utterances allows to consider the relations between generated instances and encourages diversity. Besides, encoding multiple existing utterances endows C2C with a wider view of existing expressions, helping to reduce generation that duplicates existing data. Experiments on ATIS and Snips datasets show that instances augmented by C2C-GenDA improve slot filling by 7.99 (11.9%) and 5.76 (13.6%) F-scores respectively, when there are only hundreds of training utterances.
摘要：插槽填充是口头语言理解的基本模块，经常遭受训练数据数量不足和多样性的困扰。为了解决这个问题，我们提出了一种新的用于数据增强（DA）的集群到集群生成框架，称为C2C-GenDA。它通过在保留语义的同时将现有话语重构为替代表达来扩大训练集。与以前的DA作品分别独立地重建发音不同，C2C-GenDA联合编码具有相同语义的多个现有发音，并同时解码多个看不见的表达。联合生成多个新话语可以考虑生成的实例之间的关系，并鼓励多样性。此外，对多个现有话语进行编码使C2C可以更广泛地查看现有表达式，从而有助于减少重复生成现有数据的生成。在ATIS和Snips数据集上进行的实验表明，当只有数百种训练语音时，C2C-GenDA增强的实例分别将时隙填充提高了7.99（11.9％）和5.76（13.6％）F分数。

46. Syntactic representation learning for neural network based TTS with syntactic parse tree traversal [PDF] 返回目录
Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng
Abstract: Syntactic structure of a sentence text is correlated with the prosodic structure of the speech that is crucial for improving the prosody and naturalness of a text-to-speech (TTS) system. Nowadays TTS systems usually try to incorporate syntactic structure information with manually designed features based on expert knowledge. In this paper, we propose a syntactic representation learning method based on syntactic parse tree traversal to automatically utilize the syntactic structure information. Two constituent label sequences are linearized through left-first and right-first traversals from constituent parse tree. Syntactic representations are then extracted at word level from each constituent label sequence by a corresponding uni-directional gated recurrent unit (GRU) network. Meanwhile, nuclear-norm maximization loss is introduced to enhance the discriminability and diversity of the embeddings of constituent labels. Upsampled syntactic representations and phoneme embeddings are concatenated to serve as the encoder input of Tacotron2. Experimental results demonstrate the effectiveness of our proposed approach, with mean opinion score (MOS) increasing from 3.70 to 3.82 and ABX preference exceeding by 17% compared with the baseline. In addition, for sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
摘要：句子文本的句法结构与语音的韵律结构相关，这对于提高文本转语音（TTS）系统的韵律和自然性至关重要。如今，TTS系统通常尝试将句法结构信息与基于专家知识的手动设计功能结合在一起。本文提出了一种基于句法分析树遍历的句法表示学习方法，以自动利用句法结构信息。通过从组成解析树的左先和右先遍历，线性化了两个组成标签序列。然后，通过相应的单向门控循环单元（GRU）网络从每个组成标签序列中以单词级别提取语法表示形式。同时，引入核规范最大化损失以增强组成标签嵌入的可辨性和多样性。上采样的语法表示形式和音素嵌入被串联起来，用作Tacotron2的编码器输入。实验结果证明了我们提出的方法的有效性，平均意见评分（MOS）从3.70提高到3.82，并且ABX偏好比基线高出17％。另外，对于具有多个句法分析树的句子，可以从合成语音中清楚地感知到韵律差异。

47. Discriminative Pre-training for Low Resource Title Compression in Conversational Grocery [PDF] 返回目录
Snehasish Mukherjee, Phaniram Sayapaneni, Shankar Subramanya
Abstract: The ubiquity of smart voice assistants has made conversational shopping commonplace. This is especially true for low consideration segments like grocery. A central problem in conversational grocery is the automatic generation of short product titles that can be read out fast during a conversation. Several supervised models have been proposed in the literature that leverage manually labeled datasets and additional product features to generate short titles automatically. However, obtaining large amounts of labeled data is expensive and most grocery item pages are not as feature-rich as other categories. To address this problem we propose a pre-training based solution that makes use of unlabeled data to learn contextual product representations which can then be fine-tuned to obtain better title compression even in a low resource setting. We use a self-attentive BiLSTM encoder network with a time distributed softmax layer for the title compression task. We overcome the vocabulary mismatch problem by using a hybrid embedding layer that combines pre-trained word embeddings with trainable character level convolutions. We pre-train this network as a discriminator on a replaced-token detection task over a large number of unlabeled grocery product titles. Finally, we fine tune this network, without any modifications, with a small labeled dataset for the title compression task. Experiments on Walmart's online grocery catalog show our model achieves performance comparable to state-of-the-art models like BERT and XLNet. When fine tuned on all of the available training data our model attains an F1 score of 0.8558 which lags the best performing model, BERT-Base, by 2.78% and XLNet by 0.28% only, while using 55 times lesser parameters than both. Further, when allowed to fine tune on 5% of the training data only, our model outperforms BERT-Base by 24.3% in F1 score.
摘要：智能语音助手的普及已使对话购物变得司空见惯。对于杂货店等低关注度细分市场而言尤其如此。对话式杂货店的中心问题是自动生成简短的产品标题，可以在对话过程中快速读取这些产品标题。文献中已经提出了几种监督模型，这些模型利用了手动标记的数据集和其他产品功能来自动生成简短标题。但是，获取大量带标签的数据非常昂贵，并且大多数杂货店商品页面的功能不如其他类别丰富。为了解决这个问题，我们提出了一种基于预训练的解决方案，该方案利用未标记的数据来学习上下文产品表示，然后即使在资源不足的情况下也可以对其进行微调以获得更好的标题压缩。我们将带有时间分布的softmax层的自关注BiLSTM编码器网络用于标题压缩任务。通过使用将预训练的词嵌入与可训练的字符级卷积相结合的混合嵌入层，我们克服了词汇失配问题。我们针对大量未贴标签的杂货产品标题上的替换令牌检测任务，对该网络进行了预训练，以作为鉴别器。最后，我们对网络进行了微调，无需任何修改，并使用一个小的标签数据集来进行标题压缩任务。沃尔玛在线食品杂货目录上的实验表明，我们的模型所获得的性能可与BERT和XLNet等最新模型相媲美。在所有可用的训练数据上进行微调后，我们的模型的F1得分为0.8558，仅比性能最佳的模型BERT-Base落后2.78％，而XLNet则只有0.28％，而使用的参数却比两者均少55倍。此外，当仅允许对5％的训练数据进行微调时，我们的模型在F1得分方面比BERT-Base高24.3％。

48. AffectON: Incorporating Affect Into Dialog Generation [PDF] 返回目录
Zana Bucinca, Yucel Yemez, Engin Erzin, Metin Sezgin
Abstract: Due to its expressivity, natural language is paramount for explicit and implicit affective state communication among humans. The same linguistic inquiry (e.g., How are you?) might induce responses with different affects depending on the affective state of the conversational partner(s) and the context of the conversation. Yet, most dialog systems do not consider affect as constitutive aspect of response generation. In this paper, we introduce AffectON, an approach for generating affective responses during inference. For generating language in a targeted affect, our approach leverages a probabilistic language model and an affective space. AffectON is language model agnostic, since it can work with probabilities generated by any language model (e.g., sequence-to-sequence models, neural language models, n-grams). Hence, it can be employed for both affective dialog and affective language generation. We experimented with affective dialog generation and evaluated the generated text objectively and subjectively. For the subjective part of the evaluation, we designed a custom user interface for rating and provided recommendations for the design of such interfaces. The results, both subjective and objective demonstrate that our approach is successful in pulling the generated language toward the targeted affect, with little sacrifice in syntactic coherence.
摘要：由于自然语言的表现力，对于人类之间显性和隐性的情感状态交流至关重要。相同的语言询问（例如，您好吗？）可能会根据会话对方的情感状态和会话的上下文，引发具有不同影响的响应。但是，大多数对话系统并不将情感视为响应生成的构成部分。在本文中，我们介绍了AffectON，这是一种在推理过程中产生情感反应的方法。为了在目标情感中生成语言，我们的方法利用了概率语言模型和情感空间。 AffectON与语言模型无关，因为它可以处理由任何语言模型（例如，序列到序列模型，神经语言模型，n元语法图）生成的概率。因此，它可以用于情感对话和情感语言生成。我们尝试了情感对话生成，并客观地，主观地评估了生成的文本。对于评估的主观部分，我们设计了一个用于评分的自定义用户界面，并为此类界面的设计提供了建议。结果，无论是主观的还是客观的，都表明我们的方法成功地将生成的语言拉向了目标情感，而在句法连贯性方面的牺牲很少。

49. GDPNet: Refining Latent Multi-View Graph for Relation Extraction [PDF] 返回目录
Fuzhao Xue, Aixin Sun, Hao Zhang, Eng Siong Chng
Abstract: Relation Extraction (RE) is to predict the relation type of two entities that are mentioned in a piece of text, e.g., a sentence or a dialogue. When the given text is long, it is challenging to identify indicative words for the relation prediction. Recent advances on RE task are from BERT-based sequence modeling and graph-based modeling of relationships among the tokens in the sequence. In this paper, we propose to construct a latent multi-view graph to capture various possible relationships among tokens. We then refine this graph to select important words for relation prediction. Finally, the representation of the refined graph and the BERT-based sequence representation are concatenated for relation extraction. Specifically, in our proposed GDPNet (Gaussian Dynamic Time Warping Pooling Net), we utilize Gaussian Graph Generator (GGG) to generate edges of the multi-view graph. The graph is then refined by Dynamic Time Warping Pooling (DTWPool). On DialogRE and TACRED, we show that GDPNet achieves the best performance on dialogue-level RE, and comparable performance with the state-of-the-arts on sentence-level RE.
摘要：关系提取（RE）用于预测文本中提到的两个实体的关系类型，例如句子或对话。当给定的文本很长时，为关系预测识别指示性单词是很困难的。 RE任务的最新进展来自基于BERT的序列建模和基于图形的序列中令牌之间关系的建模。在本文中，我们建议构造一个潜在的多视图图以捕获令牌之间的各种可能关系。然后，我们优化该图以选择重要的词来进行关系预测。最后，将精炼图的表示形式与基于BERT的序列表示形式进行级联以进行关系提取。具体来说，在我们提出的GDPNet（高斯动态时间规整池网络）中，我们利用高斯图生成器（GGG）生成多视图图的边缘。然后通过动态时间规整池（DTWPool）精简该图。在DialogRE和TACRED上，我们表明GDPNet在对话级RE上实现了最佳性能，并且与句子级RE上的最新技术具有可比的性能。

50. SenSeNet: Neural Keyphrase Generation with Document Structure [PDF] 返回目录
Yichao Luo, Zhengyan Li, Bingning Wang, Xiaoyu Xing, Qi Zhang, Xuanjing Huang
Abstract: Keyphrase Generation (KG) is the task of generating central topics from a given document or literary work, which captures the crucial information necessary to understand the content. Documents such as scientific literature contain rich meta-sentence information, which represents the logical-semantic structure of the documents. However, previous approaches ignore the constraints of document logical structure, and hence they mistakenly generate keyphrases from unimportant sentences. To address this problem, we propose a new method called Sentence Selective Network (SenSeNet) to incorporate the meta-sentence inductive bias into KG. In SenSeNet, we use a straight-through estimator for end-to-end training and incorporate weak supervision in the training of the sentence selection module. Experimental results show that SenSeNet can consistently improve the performance of major KG models based on seq2seq framework, which demonstrate the effectiveness of capturing structural information and distinguishing the significance of sentences in KG task.
摘要：关键字短语生成（KG）是从给定的文档或文学作品中生成中心主题的任务，它捕获了理解内容所必需的关键信息。诸如科学文献之类的文档包含丰富的元句信息，这些信息代表了文档的逻辑语义结构。但是，先前的方法忽略了文档逻辑结构的约束，因此它们错误地从不重要的句子中生成了关键短语。为了解决这个问题，我们提出了一种称为句子选择网络（SenSeNet）的新方法，将元句子归纳偏差纳入KG。在SenSeNet中，我们使用直通式估计器进行端到端训练，并将弱监督纳入句子选择模块的训练中。实验结果表明，SenSeNet可以基于seq2seq框架持续提高主要KG模型的性能，证明了捕获结构信息和区分句子在KG任务中的重要性是有效的。

51. Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging [PDF] 返回目录
Rohit Prabhavalkar, Yanzhang He, David Rybach, Sean Campbell, Arun Narayanan, Trevor Strohman, Tara N. Sainath
Abstract: End-to-end models that condition the output label sequence on all previously predicted labels have emerged as popular alternatives to conventional systems for automatic speech recognition (ASR). Since unique label histories correspond to distinct models states, such models are decoded using an approximate beam-search process which produces a tree of hypotheses. In this work, we study the influence of the amount of label context on the model's accuracy, and its impact on the efficiency of the decoding process. We find that we can limit the context of the recurrent neural network transducer (RNN-T) during training to just four previous word-piece labels, without degrading word error rate (WER) relative to the full-context baseline. Limiting context also provides opportunities to improve the efficiency of the beam-search process during decoding by removing redundant paths from the active beam, and instead retaining them in the final lattice. This path-merging scheme can also be applied when decoding the baseline full-context model through an approximation. Overall, we find that the proposed path-merging scheme is extremely effective allowing us to improve oracle WERs by up to 36% over the baseline, while simultaneously reducing the number of model evaluations by up to 5.3% without any degradation in WER.
摘要：将输出标签序列置于所有先前预测的标签上的端到端模型已经成为自动语音识别（ASR）常规系统的流行替代方案。由于唯一的标签历史记录对应于不同的模型状态，因此使用近似波束搜索过程对此类模型进行解码，从而生成假设树。在这项工作中，我们研究了标签上下文量对模型准确性的影响及其对解码过程效率的影响。我们发现，在训练过程中，我们可以将递归神经网络换能器（RNN-T）的上下文限制为仅四个先前的词片标签，而不会降低相对于整个上下文基线的词错误率（WER）。限制上下文还提供了机会，可以通过从活动波束中删除多余路径，并将它们保留在最终晶格中来提高解码期间波束搜索过程的效率。当通过近似对基准全文本模型进行解码时，也可以应用此路径合并方案。总体而言，我们发现拟议的路径合并方案非常有效，使我们可以将oracle WER较基准提高多达36％，同时将模型评估的数量减少多达5.3％，而不会降低WER。

52. Mapping the Timescale Organization of Neural Language Models [PDF] 返回目录
Hsiang-Yun Sherry Chien, Jinhan Zhang, Christopher. J. Honey
Abstract: In the human brain, sequences of language input are processed within a distributed and hierarchical architecture, in which higher stages of processing encode contextual information over longer timescales. In contrast, in recurrent neural networks which perform natural language processing, we know little about how the multiple timescales of contextual information are functionally organized. Therefore, we applied tools developed in neuroscience to map the "processing timescales" of individual units within a word-level LSTM language model. This timescale-mapping method assigned long timescales to units previously found to track long-range syntactic dependencies, and revealed a new cluster of previously unreported long-timescale units. Next, we explored the functional role of units by examining the relationship between their processing timescales and network connectivity. We identified two classes of long-timescale units: "Controller" units composed a densely interconnected subnetwork and strongly projected to the forget and input gates of the rest of the network, while "Integrator" units showed the longest timescales in the network, and expressed projection profiles closer to the mean projection profile. Ablating integrator and controller units affected model performance at different position of a sentence, suggesting distinctive functions of these two sets of units. Finally, we tested the generalization of these results to a character-level LSTM model. In summary, we demonstrated a model-free technique for mapping the timescale organization in neural network models, and we applied this method to reveal the timescale and functional organization of LSTM language models.
摘要：在人脑中，语言输入序列是在分布式和层次结构中处理的，其中较高的处理阶段会在较长的时间范围内编码上下文信息。相反，在执行自然语言处理的递归神经网络中，我们对上下文信息的多个时间尺度是如何组织功能的了解甚少。因此，我们应用了神经科学领域开发的工具来在单词级LSTM语言模型内映射各个单元的“处理时间尺度”。这种时标映射方法将长时标分配给先前发现的用于跟踪远程句法相关性的单元，并揭示了一个以前未报告的长时标单元的新簇。接下来，我们通过检查单元的处理时间尺度和网络连接性之间的关系，探索了单元的功能作用。我们确定了两类长时标单元：“控制器”单元组成了紧密互连的子网，并强烈投射到网络其余部分的“忘记”门和输入门，而“积分器”单元则显示了网络中最长的时标，并表示投影轮廓更接近平均投影轮廓。消除积分器和控制器单元会影响句子在不同位置处的模型性能，这表明这两套单元的独特功能。最后，我们测试了将这些结果推广到字符级LSTM模型的方法。总而言之，我们展示了一种用于在神经网络模型中映射时间尺度组织的无模型技术，并且我们将该方法应用于揭示LSTM语言模型的时间尺度和功能组织。

53. Yelp Review Rating Prediction: Machine Learning and Deep Learning Models [PDF] 返回目录
Zefang Liu
Abstract: We predict restaurant ratings from Yelp reviews based on Yelp Open Dataset. Data distribution is presented, and one balanced training dataset is built. Two vectorizers are experimented for feature engineering. Four machine learning models including Naive Bayes, Logistic Regression, Random Forest, and Linear Support Vector Machine are implemented. Four transformer-based models containing BERT, DistilBERT, RoBERTa, and XLNet are also applied. Accuracy, weighted F1 score, and confusion matrix are used for model evaluation. XLNet achieves 70% accuracy for 5-star classification compared with Logistic Regression with 64% accuracy.
摘要：我们根据Yelp开放数据集从Yelp点评中预测餐厅的评分。提出了数据分布，并建立了一个平衡的训练数据集。实验了两个矢量化器进行特征工程。实现了四种机器学习模型，包括朴素贝叶斯，逻辑回归，随机森林和线性支持向量机。还应用了四个基于变压器的模型，其中包括BERT，DistilBERT，RoBERTa和XLNet。准确性，加权F1分数和混淆矩阵用于模型评估。与Logistic回归的64％准确度相比，XLNet的五星级分类准确率达到70％。

54. TF-CR: Weighting Embeddings for Text Classification [PDF] 返回目录
Arkaitz Zubiaga
Abstract: Text classification, as the task consisting in assigning categories to textual instances, is a very common task in information science. Methods learning distributed representations of words, such as word embeddings, have become popular in recent years as the features to use for text classification tasks. Despite the increasing use of word embeddings for text classification, these are generally used in an unsupervised manner, i.e. information derived from class labels in the training data are not exploited. While word embeddings inherently capture the distributional characteristics of words, and contexts observed around them in a large dataset, they aren't optimised to consider the distributions of words across categories in the classification dataset at hand. To optimise text representations based on word embeddings by incorporating class distributions in the training data, we propose the use of weighting schemes that assign a weight to embeddings of each word based on its saliency in each class. To achieve this, we introduce a novel weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings. Our experiments on 16 classification datasets show the effectiveness of TF-CR, leading to improved performance scores over existing weighting schemes, with a performance gap that increases as the size of the training data grows.
摘要：文本分类是将类别分配给文本实例的任务，是信息科学中非常常见的任务。近年来，学习单词的分布式表示形式（例如单词嵌入）的方法作为用于文本分类任务的功能而变得流行。尽管越来越多地使用词嵌入来进行文本分类，但是通常以无监督的方式使用这些词嵌入，即，不会利用从训练数据中的类标签得出的信息。虽然词嵌入固有地捕获了词的分布特征以及在大型数据集中观察到的上下文，但它们并未经过优化以考虑手头分类数据集中各个类别中词的分布。为了通过在训练数据中合并类别分布来基于单词嵌入优化文本表示，我们建议使用加权方案，该加权方案根据每个单词在每个类别中的显着性为每个单词的嵌入分配权重。为了实现这一目标，我们引入了一种新颖的加权方案，术语频数比（TF-CR），在计算单词嵌入时，它可以对较高频率的类别专有词加权。我们在16个分类数据集上的实验表明TF-CR的有效性，与现有的加权方案相比，可以改善性能得分，并且随着训练数据量的增加，性能差距也会增加。

55. Extracting Training Data from Large Language Models [PDF] 返回目录
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel
Abstract: It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data. We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. For example, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.
摘要：发布大型（十亿参数）语言模型已在私有数据集上进行训练，已变得普遍。本文证明，在这种情况下，对手可以执行训练数据提取攻击，以通过查询语言模型来恢复单个训练示例。我们演示了对GPT-2的攻击，GPT-2是在公共Internet上进行拼版训练的一种语言模型，并且能够从该模型的训练数据中提取数百个逐字记录文本序列。这些提取的示例包括（公共）个人身份信息（名称，电话号码和电子邮件地址），IRC对话，代码和128位UUID。即使以上每个序列仅包含在训练数据中的一个文档中，我们的攻击也是可能的。我们全面评估提取攻击，以了解促成其成功的因素。例如，我们发现较大的模型比较小的模型更容易受到攻击。最后，我们通过吸取教训并讨论训练大型语言模型的可能保障措施。

56. Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes [PDF] 返回目录
Niklas Muennighoff
Abstract: This work presents Vilio, an implementation of state-of-the-art visio-linguistic models and their application to the Hateful Memes Dataset. The implemented models have been fitted into a uniform code-base and altered to yield better performance. The goal of Vilio is to provide a user-friendly starting point for any visio-linguistic problem. An ensemble of 5 different V+L models implemented in Vilio achieves 2nd place in the Hateful Memes Challenge out of 3,300 participants. The code is available at this https URL.
摘要：这项工作介绍了Vilio，这是一种最新的视觉语言模型的实现及其在仇恨模因数据集中的应用。已实现的模型已安装到统一的代码库中，并进行了更改以产生更好的性能。 Vilio的目标是为任何视觉语言问题提供用户友好的起点。在Vilio中实现的5种不同的V + L模型集成在3300名参与者的“仇恨模因挑战赛”中排名第二。该代码可从以下https URL获得。

57. Towards Fairness in Classifying Medical Conversations into SOAP Sections [PDF] 返回目录
Elisa Ferracane, Sandeep Konam
Abstract: As machine learning algorithms are more widely deployed in healthcare, the question of algorithmic fairness becomes more critical to examine. Our work seeks to identify and understand disparities in a deployed model that classifies doctor-patient conversations into sections of a medical SOAP note. We employ several metrics to measure disparities in the classifier performance, and find small differences in a portion of the disadvantaged groups. A deeper analysis of the language in these conversations and further stratifying the groups suggests these differences are related to and often attributable to the type of medical appointment (e.g., psychiatric vs. internist). Our findings stress the importance of understanding the disparities that may exist in the data itself and how that affects a model's ability to equally distribute benefits.
摘要：随着机器学习算法在医疗保健中的应用越来越广泛，算法公平性的问题变得越来越重要。我们的工作旨在识别和理解已部署模型中的差异，该模型将医患对话分为医疗SOAP注释的各个部分。我们采用多种指标来衡量分类器性能上的差异，并在部分弱势群体中发现较小的差异。在这些对话中对语言进行更深入的分析并进一步对人群进行分层表明，这些差异与医疗预约的类型（例如，精神病医生和内科医生）有关，并且通常归因于这种情况。我们的发现强调了理解数据本身中可能存在的差异以及如何影响模型平均分配收益的能力的重要性。

58. Argument Mining Driven Analysis of Peer-Reviews [PDF] 返回目录
Michael Fromm, Evgeniy Faerman, Max Berrendorf, Siddharth Bhargava, Ruoxia Qi, Yao Zhang, Lukas Dennert, Sophia Selle, Yang Mao, Thomas Seidl
Abstract: Peer reviewing is a central process in modern research and essential for ensuring high quality and reliability of published work. At the same time, it is a time-consuming process and increasing interest in emerging fields often results in a high review workload, especially for senior researchers in this area. How to cope with this problem is an open question and it is vividly discussed across all major conferences. In this work, we propose an Argument Mining based approach for the assistance of editors, meta-reviewers, and reviewers. We demonstrate that the decision process in the field of scientific publications is driven by arguments and automatic argument identification is helpful in various use-cases. One of our findings is that arguments used in the peer-review process differ from arguments in other domains making the transfer of pre-trained models difficult. Therefore, we provide the community with a new peer-review dataset from different computer science conferences with annotated arguments. In our extensive empirical evaluation, we show that Argument Mining can be used to efficiently extract the most relevant parts from reviews, which are paramount for the publication decision. The process remains interpretable since the extracted arguments can be highlighted in a review without detaching them from their context.
摘要：同行评审是现代研究的核心过程，对于确保已发表作品的高质量和可靠性至关重要。同时，这是一个耗时的过程，并且对新兴领域的兴趣不断增加，通常会导致大量的审查工作，尤其是对于该领域的高级研究人员而言。如何解决这个问题是一个悬而未决的问题，并且在所有主要会议上都进行了生动地讨论。在这项工作中，我们提出了一种基于论元挖掘的方法，以帮助编辑，元审阅者和审阅者。我们证明了科学出版物领域的决策过程是由论据驱动的，并且自动论据识别在各种用例中都是有帮助的。我们的发现之一是，同行评审过程中使用的论点与其他领域的论点不同，这使得预训练模型的传递变得困难。因此，我们为社区提供了来自不同计算机科学会议的新的同行评议数据集，带有注释的论点。在我们广泛的实证评估中，我们表明可以使用Argument Mining从评论中高效提取最相关的部分，这对于发布决策至关重要。该过程保持可解释性，因为可以在评论中突出显示提取的参数，而无需将其与上下文分离。

59. Agglomerative Clustering of Handwritten Numerals to Determine Similarity of Different Languages [PDF] 返回目录
Md. Rahat-uz-Zaman, Shadmaan Hye
Abstract: Handwritten numerals of different languages have various characteristics. Similarities and dissimilarities of the languages can be measured by analyzing the extracted features of the numerals. Handwritten numeral datasets are available and accessible for many renowned languages of different regions. In this paper, several handwritten numeral datasets of different languages are collected. Then they are used to find the similarity among those written languages through determining and comparing the similitude of each handwritten numerals. This will help to find which languages have the same or adjacent parent language. Firstly, a similarity measure of two numeral images is constructed with a Siamese network. Secondly, the similarity of the numeral datasets is determined with the help of the Siamese network and a new random sample with replacement similarity averaging technique. Finally, an agglomerative clustering is done based on the similarities of each dataset. This clustering technique shows some very interesting properties of the datasets. The property focused in this paper is the regional resemblance of the datasets. By analyzing the clusters, it becomes easy to identify which languages are originated from similar regions.
摘要：不同语言的手写数字具有不同的特征。可以通过分析数字的提取特征来衡量语言的相似性和相异性。手写数字数据集可用于不同地区的许多著名语言。本文收集了几种不同语言的手写数字数据集。然后通过确定和比较每个手写数字的相似性，将它们用于查找那些书面语言之间的相似性。这将有助于查找哪些语言具有相同或相邻的父语言。首先，用连体网络构造两个数字图像的相似性度量。其次，借助暹罗网络和具有替换相似度平均技术的新随机样本，确定数字数据集的相似度。最后，基于每个数据集的相似性进行聚集聚类。这种聚类技术显示了数据集的一些非常有趣的属性。本文关注的属性是数据集的区域相似性。通过分析聚类，可以轻松识别出哪些语言源自相似区域。

60. Fork or Fail: Cycle-Consistent Training with Many-to-One Mappings [PDF] 返回目录
Qipeng Guo, Zhijing Jin, Ziyu Wang, Xipeng Qiu, Weinan Zhang, Jun Zhu, Zheng Zhang, David Wipf
Abstract: Cycle-consistent training is widely used for jointly learning a forward and inverse mapping between two domains of interest without the cumbersome requirement of collecting matched pairs within each domain. In this regard, the implicit assumption is that there exists (at least approximately) a ground-truth bijection such that a given input from either domain can be accurately reconstructed from successive application of the respective mappings. But in many applications no such bijection can be expected to exist and large reconstruction errors can compromise the success of cycle-consistent training. As one important instance of this limitation, we consider practically-relevant situations where there exists a many-to-one or surjective mapping between domains. To address this regime, we develop a conditional variational autoencoder (CVAE) approach that can be viewed as converting surjective mappings to implicit bijections whereby reconstruction errors in both directions can be minimized, and as a natural byproduct, realistic output diversity can be obtained in the one-to-many direction. As theoretical motivation, we analyze a simplified scenario whereby minima of the proposed CVAE-based energy function align with the recovery of ground-truth surjective mappings. On the empirical side, we consider a synthetic image dataset with known ground-truth, as well as a real-world application involving natural language generation from knowledge graphs and vice versa, a prototypical surjective case. For the latter, our CVAE pipeline can capture such many-to-one mappings during cycle training while promoting textural diversity for graph-to-text tasks. Our code is available at this http URL
摘要：周期一致的训练广泛用于共同学习两个感兴趣的域之间的正向和反向映射，而无需在每个域内收集匹配对的麻烦。在这方面，隐含的假设是存在（至少近似）地面真相双射，从而可以根据各个映射的连续应用来准确地重构来自任一域的给定输入。但是在许多应用中，预计不会出现这样的双射，并且较大的重构错误可能会损害周期一致训练的成功。作为此限制的一个重要实例，我们考虑了在域之间存在多对一或排斥性映射的与实际相关的情况。为了解决这个问题，我们开发了一种条件变分自动编码器（CVAE）方法，该方法可以看作是将射影映射转换为隐含双射，从而可以最大程度地减小两个方向的重构误差，并且作为自然副产品，可以在输出中获得现实的输出多样性。一对多的方向。作为理论动机，我们分析了一种简化的方案，其中，基于CVAE的拟议能量函数的最小值与地面真相射影映射的恢复保持一致。在经验方面，我们考虑具有已知真实性的合成图像数据集，以及涉及从知识图生成自然语言的真实世界应用程序，反之亦然，这是一个典型的排斥性案例。对于后者，我们的CVAE管道可以在循环训练期间捕获此类多对一映射，同时促进图形到文本任务的纹理多样性。我们的代码可从以下http URL获得

61. Classification of ALS patients based on acoustic analysis of sustained vowel phonations [PDF] 返回目录
Maxim Vashkevich, Yulia Rushkevich
Abstract: Amyotrophic lateral sclerosis (ALS) is incurable neurological disorder with rapidly progressive course. Common early symptoms of ALS are difficulty in swallowing and speech. However, early acoustic manifestation of speech and voice symptoms is very variable, that making their detection very challenging, both by human specialists and automatic systems. This study presents an approach to voice assessment for automatic system that separates healthy people from patients with ALS. In particular, this work focus on analysing of sustain phonation of vowels /a/ and /i/ to perform automatic classification of ALS patients. A wide range of acoustic features such as MFCC, formants, jitter, shimmer, vibrato, PPE, GNE, HNR, etc. were analysed. We also proposed a new set of acoustic features for characterizing harmonic structure of the vowels. Calculation of these features is based on pitch synchronized voice analysis. A linear discriminant analysis (LDA) was used to classify the phonation produced by patients with ALS and those by healthy individuals. Several algorithms of feature selection were tested to find optimal feature subset for LDA model. The study's experiments show that the most successful LDA model based on 32 features picked out by LASSO feature selection algorithm attains 99.7% accuracy with 99.3% sensitivity and 99.9% specificity. Among the classifiers with a small number of features, we can highlight LDA model with 5 features, which has 89.0% accuracy (87.5% sensitivity and 90.4% specificity).
摘要：肌萎缩性侧索硬化症（ALS）是一种无法治愈的神经系统疾病，病程进展迅速。 ALS的常见早期症状是吞咽和说话困难。但是，语音和语音症状的早期声学表现变化很大，无论是专业人员还是自动系统，都很难对其进行检测。这项研究提出了一种自动系统的语音评估方法，该方法可将健康人与ALS患者区分开。特别地，这项工作专注于分析元音/ a /和/ i /的持续发声，以对ALS患者进行自动分类。分析了多种声学特征，例如MFCC，共振峰，抖动，微光，颤音，PPE，GNE，HNR等。我们还提出了一套新的声学特征，用于表征元音的谐波结构。这些特征的计算基于音高同步语音分析。使用线性判别分析（LDA）对ALS患者和健康人产生的发声进行分类。测试了几种特征选择算法，以找到LDA模型的最佳特征子集。该研究的实验表明，最成功的基于LASSO特征选择算法挑选出的32个特征的LDA模型可获得99.7％的准确度，99.3％的灵敏度和99.9％的特异性。在具有少量特征的分类器中，我们可以突出显示具有5个特征的LDA模型，该模型具有89.0％的准确性（87.5％的灵敏度和90.4％的特异性）。

62. Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval [PDF] 返回目录
Yuma Koizumi, Yasunori Ohishi, Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda
Abstract: The goal of audio captioning is to translate input audio into its description using natural language. One of the problems in audio captioning is the lack of training data due to the difficulty in collecting audio-caption pairs by crawling the web. In this study, to overcome this problem, we propose to use a pre-trained large-scale language model. Since an audio input cannot be directly inputted into such a language model, we utilize guidance captions retrieved from a training dataset based on similarities that may exist in different audio. Then, the caption of the audio input is generated by using a pre-trained language model while referring to the guidance captions. Experimental results show that (i) the proposed method has succeeded to use a pre-trained language model for audio captioning, and (ii) the oracle performance of the pre-trained model-based caption generator was clearly better than that of the conventional method trained from scratch.
摘要：音频字幕的目标是使用自然语言将输入音频转换成其描述。音频字幕的问题之一是由于难以通过爬网收集音频字幕对而导致缺乏训练数据。在这项研究中，为了克服这个问题，我们建议使用预先训练的大规模语言模型。由于无法将音频输入直接输入到这种语言模型中，因此我们基于可能存在于不同音频中的相似性，利用从训练数据集中检索的指导字幕。然后，在参考指导字幕的同时，通过使用预训练的语言模型来生成音频输入的字幕。实验结果表明，（i）所提出的方法已经成功地将预训练的语言模型用于音频字幕，并且（ii）基于预训练的基于模型的字幕发生器的预言性能明显优于传统方法。从头开始训练。

63. Learning Contextual Causality from Time-consecutive Images [PDF] 返回目录
Hongming Zhang, Yintong Huo, Xinran Zhao, Yangqiu Song, Dan Roth
Abstract: Causality knowledge is crucial for many artificial intelligence systems. Conventional textual-based causality knowledge acquisition methods typically require laborious and expensive human annotations. As a result, their scale is often limited. Moreover, as no context is provided during the annotation, the resulting causality knowledge records (e.g., ConceptNet) typically do not take the context into consideration. To explore a more scalable way of acquiring causality knowledge, in this paper, we jump out of the textual domain and investigate the possibility of learning contextual causality from the visual signal. Compared with pure text-based approaches, learning causality from the visual signal has the following advantages: (1) Causality knowledge belongs to the commonsense knowledge, which is rarely expressed in the text but rich in videos; (2) Most events in the video are naturally time-ordered, which provides a rich resource for us to mine causality knowledge from; (3) All the objects in the video can be used as context to study the contextual property of causal relations. In detail, we first propose a high-quality dataset Vis-Causal and then conduct experiments to demonstrate that with good language and visual representation models as well as enough training signals, it is possible to automatically discover meaningful causal knowledge from the videos. Further analysis also shows that the contextual property of causal relations indeed exists, taking which into consideration might be crucial if we want to use the causality knowledge in real applications, and the visual signal could serve as a good resource for learning such contextual causality.
摘要：因果关系知识对于许多人工智能系统至关重要。传统的基于文本的因果关系知识获取方法通常需要费力且昂贵的人工注释。结果，它们的规模常常受到限制。而且，由于在注释期间没有提供上下文，因此所得到的因果关系知识记录（例如，ConceptNet）通常不考虑上下文。为了探索一种更可扩展的获取因果关系知识的方法，在本文中，我们跳出了文本领域，并研究了从视觉信号中学习上下文因果关系的可能性。与基于纯文本的方法相比，从视觉信号中学习因果关系具有以下优点：（1）因果关系知识属于常识知识，常识知识很少在文本中表达，而在视频中却很丰富；（2）视频中的大多数事件自然都是按时间顺序排列的，这为我们提供了丰富的资源来挖掘因果关系知识；（3）视频中的所有对象都可以用作上下文来研究因果关系的上下文属性。详细地，我们首先提出一个高质量的因果数据集，然后进行实验以证明，良好的语言和视觉表示模型以及足够的训练信号可以从视频中自动发现有意义的因果知识。进一步的分析还表明，因果关系的上下文属性确实存在，如果要在实际应用中使用因果关系知识，将其考虑可能至关重要，并且可视信号可以作为学习此类因果关系的良好资源。

64. KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning [PDF] 返回目录
Dandan Song, Siyi Ma, Zhanchen Sun, Sicheng Yang, Lejian Liao
Abstract: Reasoning is a critical ability towards complete visual understanding. To develop machine with cognition-level visual understanding and reasoning abilities, the visual commonsense reasoning (VCR) task has been introduced. In VCR, given a challenging question about an image, a machine must answer correctly and then provide a rationale justifying its answer. The methods adopting the powerful BERT model as the backbone for learning joint representation of image content and natural language have shown promising improvements on VCR. However, none of the existing methods have utilized commonsense knowledge in visual commonsense reasoning, which we believe will be greatly helpful in this task. With the support of commonsense knowledge, complex questions even if the required information is not depicted in the image can be answered with cognitive reasoning. Therefore, we incorporate commonsense knowledge into the cross-modal BERT, and propose a novel Knowledge Enhanced Visual-and-Linguistic BERT (KVL-BERT for short) model. Besides taking visual and linguistic contents as input, external commonsense knowledge extracted from ConceptNet is integrated into the multi-layer Transformer. In order to reserve the structural information and semantic representation of the original sentence, we propose using relative position embedding and mask-self-attention to weaken the effect between the injected commonsense knowledge and other unrelated components in the input sequence. Compared to other task-specific models and general task-agnostic pre-training models, our KVL-BERT outperforms them by a large margin.
摘要：推理是获得完整视觉理解的关键能力。为了开发具有认知级视觉理解和推理能力的机器，引入了视觉常识推理（VCR）任务。在VCR中，给定有关图像的具有挑战性的问题，机器必须正确回答，然后提供合理的理由来证明其回答。采用强大的BERT模型作为学习图像内容和自然语言联合表示的主干的方法在VCR上已显示出可喜的改进。但是，现有的方法都没有在视觉常识推理中利用常识知识，我们认为这在此任务中将大有帮助。在常识性知识的支持下，即使未在图像中描述所需的信息，也可以通过认知推理来回答复杂的问题。因此，我们将常识知识纳入了跨模式BERT，并提出了一种新颖的知识增强型视觉与语言BERT（简称KVL-BERT）模型。除了将视觉和语言内容作为输入之外，还将从ConceptNet提取的外部常识知识集成到多层Transformer中。为了保留原始句子的结构信息和语义表示，我们建议使用相对位置嵌入和掩码自我注意来减弱输入序列中注入的常识知识与其他无关成分之间的影响。与其他特定于任务的模型和与任务无关的常规训练模型相比，我们的KVL-BERT大大优于它们。

65. Source Code Classification for Energy Efficiency in Parallel Ultra Low-Power Microcontrollers [PDF] 返回目录
Emanuele Parisi, Francesco Barchi, Andrea Bartolini, Giuseppe Tagliavini, Andrea Acquaviva
Abstract: The analysis of source code through machine learning techniques is an increasingly explored research topic aiming at increasing smartness in the software toolchain to exploit modern architectures in the best possible way. In the case of low-power, parallel embedded architectures, this means finding the configuration, for instance in terms of the number of cores, leading to minimum energy consumption. Depending on the kernel to be executed, the energy optimal scaling configuration is not trivial. While recent work has focused on general-purpose systems to learn and predict the best execution target in terms of the execution time of a snippet of code or kernel (e.g. offload OpenCL kernel on multicore CPU or GPU), in this work we focus on static compile-time features to assess if they can be successfully used to predict the minimum energy configuration on PULP, an ultra-low-power architecture featuring an on-chip cluster of RISC-V processors. Experiments show that using machine learning models on the source code to select the best energy scaling configuration automatically is viable and has the potential to be used in the context of automatic system configuration for energy minimisation.
摘要：通过机器学习技术对源代码进行分析是一个日益探索的研究主题，旨在提高软件工具链中的智能性，从而以最佳方式利用现代体系结构。在低功耗，并行嵌入式架构的情况下，这意味着寻找配置（例如，根据内核数量），从而实现最低能耗。取决于要执行的内核，能量最佳缩放比例配置并非易事。虽然最近的工作集中在通用系统上，以根据代码段或内核（例如，多核CPU或GPU上的OpenCL内核的卸载）的执行时间来学习和预测最佳执行目标，但在这项工作中，我们专注于静态编译时功能，以评估它们是否可以成功用于预测PULP上的最低能耗配置，PULP是一种超低功耗架构，具有片上RISC-V处理器集群。实验表明，在源代码上使用机器学习模型自动选择最佳的能量缩放配置是可行的，并且有可能在自动系统配置的背景下用于最小化能量。

66. DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization [PDF] 返回目录
Shaoshi Ling, Yuzong Liu
Abstract: Recent success in speech representation learning enables a new way to leverage unlabeled data to train speech recognition model. In speech representation learning, a large amount of unlabeled data is used in a self-supervised manner to learn a feature representation. Then a smaller amount of labeled data is used to train a downstream ASR system using the new feature representations. Based on our previous work DeCoAR and inspirations from other speech representation learning, we propose DeCoAR 2.0, a Deep Contextualized Acoustic Representation with vector quantization. We introduce several modifications over the DeCoAR: first, we use Transformers in encoding module instead of LSTMs; second, we introduce a vector quantization layer between encoder and reconstruction modules; third, we propose an objective that combines the reconstructive loss with vector quantization diversity loss to train speech representations. Our experiments show consistent improvements over other speech representations in different data-sparse scenarios. Without fine-tuning, a light-weight ASR model trained on 10 hours of LibriSpeech labeled data with DeCoAR 2.0 features outperforms the model trained on the full 960-hour dataset with filterbank features.
摘要：语音表示学习的最新成功提供了一种利用未标记数据来训练语音识别模型的新方法。在语音表示学习中，以自我监督的方式使用大量未标记的数据来学习特征表示。然后，使用新特征表示的标记数据量较少，可用于训练下游ASR系统。根据我们以前的DeCoAR工作和其他语音表示学习的启发，我们提出了DeCoAR 2.0，一种具有矢量量化的深度上下文化声学表示。我们对DeCoAR进行了一些修改：首先，我们在编码模块中使用了Transformers而不是LSTM。其次，我们在编码器和重构模块之间引入了矢量量化层。第三，我们提出了一个将重构损失与矢量量化分集损失相结合以训练语音表示的目标。我们的实验表明，在不同的数据稀疏方案中，与其他语音表示相比，它们具有持续改进。无需微调，使用DeCoAR 2.0功能在10个小时的LibriSpeech标记数据上训练的轻量级ASR模型要优于在具有滤波器组功能的整个960小时数据集上训练的模型。

注：中文为机器翻译结果！封面为论文标题词云图！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-12-15

目录

摘要