目录
5. Detecting depression in dyadic conversations with multimodal narratives and visualizations [PDF] 摘要
摘要
1. Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning [PDF] 返回目录
Roei Schuster, Tal Schuster, Yoav Meri, Vitaly Shmatikov
Abstract: Word embeddings, i.e., low-dimensional vector representations such as GloVe and SGNS, encode word "meaning" in the sense that distances between words' vectors correspond to their semantic proximity. This enables transfer learning of semantics for a variety of natural language processing tasks. Word embeddings are typically trained on large public corpora such as Wikipedia or Twitter. We demonstrate that an attacker who can modify the corpus on which the embedding is trained can control the "meaning" of new and existing words by changing their locations in the embedding space. We develop an explicit expression over corpus features that serves as a proxy for distance between words and establish a causative relationship between its values and embedding distances. We then show how to use this relationship for two adversarial objectives: (1) make a word a top-ranked neighbor of another word, and (2) move a word from one semantic cluster to another. An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios. We use this attack to manipulate query expansion in information retrieval systems such as resume search, make certain names more or less visible to named entity recognition models, and cause new words to be translated to a particular target word regardless of the language. Finally, we show how the attacker can generate linguistically likely corpus modifications, thus fooling defenses that attempt to filter implausible sentences from the corpus using a language model.
摘要:字的嵌入,即,低维向量表示,如手套和SGNS,编码字在这个意义上,词语向量之间的距离对应于它们的语义接近“意思是”。这使语义的迁移学习的各种自然语言处理任务。 Word中的嵌入通常是受过训练的大型公共语料库,如维基百科或Twitter。我们表明,攻击者谁可以修改其嵌入训练可以通过改变空间嵌入它们的位置控制的新的和现有的词“意义”的语料库。我们开发了语料库的特点,可作为单词之间距离的代理明确的表达,并建立自己的价值观和嵌入的距离之间的因果关系。然后,我们展示了如何使用两个敌对目标的这种关系:(1)做一个字一个字的世界排名第一的邻居,和(2)从一个语义集群移动一个字到另一个。在嵌入的攻击会影响不同的下游任务,这表明首次数据传输学习情境中毒的力量。我们使用这种攻击来操纵信息检索系统,如简历搜索查询扩展,使某些名字命名实体识别模型或多或少可见,并造成新词被翻译成特定的目标词无论使用什么语言。最后,我们展示了攻击者如何产生语言上可能语料库修改,从而欺骗试图难以置信的句子从使用语言模型的语料库过滤防御。
Roei Schuster, Tal Schuster, Yoav Meri, Vitaly Shmatikov
Abstract: Word embeddings, i.e., low-dimensional vector representations such as GloVe and SGNS, encode word "meaning" in the sense that distances between words' vectors correspond to their semantic proximity. This enables transfer learning of semantics for a variety of natural language processing tasks. Word embeddings are typically trained on large public corpora such as Wikipedia or Twitter. We demonstrate that an attacker who can modify the corpus on which the embedding is trained can control the "meaning" of new and existing words by changing their locations in the embedding space. We develop an explicit expression over corpus features that serves as a proxy for distance between words and establish a causative relationship between its values and embedding distances. We then show how to use this relationship for two adversarial objectives: (1) make a word a top-ranked neighbor of another word, and (2) move a word from one semantic cluster to another. An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios. We use this attack to manipulate query expansion in information retrieval systems such as resume search, make certain names more or less visible to named entity recognition models, and cause new words to be translated to a particular target word regardless of the language. Finally, we show how the attacker can generate linguistically likely corpus modifications, thus fooling defenses that attempt to filter implausible sentences from the corpus using a language model.
摘要:字的嵌入,即,低维向量表示,如手套和SGNS,编码字在这个意义上,词语向量之间的距离对应于它们的语义接近“意思是”。这使语义的迁移学习的各种自然语言处理任务。 Word中的嵌入通常是受过训练的大型公共语料库,如维基百科或Twitter。我们表明,攻击者谁可以修改其嵌入训练可以通过改变空间嵌入它们的位置控制的新的和现有的词“意义”的语料库。我们开发了语料库的特点,可作为单词之间距离的代理明确的表达,并建立自己的价值观和嵌入的距离之间的因果关系。然后,我们展示了如何使用两个敌对目标的这种关系:(1)做一个字一个字的世界排名第一的邻居,和(2)从一个语义集群移动一个字到另一个。在嵌入的攻击会影响不同的下游任务,这表明首次数据传输学习情境中毒的力量。我们使用这种攻击来操纵信息检索系统,如简历搜索查询扩展,使某些名字命名实体识别模型或多或少可见,并造成新词被翻译成特定的目标词无论使用什么语言。最后,我们展示了攻击者如何产生语言上可能语料库修改,从而欺骗试图难以置信的句子从使用语言模型的语料库过滤防御。
2. Balancing the composition of word embeddings across heterogenous data sets [PDF] 返回目录
Stephanie Brandl, David Lassner, Maximilian Alber
Abstract: Word embeddings capture semantic relationships based on contextual information and are the basis for a wide variety of natural language processing applications. Notably these relationships are solely learned from the data and subsequently the data composition impacts the semantic of embeddings which arguably can lead to biased word vectors. Given qualitatively different data subsets, we aim to align the influence of single subsets on the resulting word vectors, while retaining their quality. In this regard we propose a criteria to measure the shift towards a single data subset and develop approaches to meet both objectives. We find that a weighted average of the two subset embeddings balances the influence of those subsets while word similarity performance decreases. We further propose a promising optimization approach to balance influences and quality of word embeddings.
摘要:基于上下文信息的嵌入Word中捕捉语义关系,并且是各种各样的自然语言处理应用的基础。值得注意的是这些关系仅由数据并随后将数据组合物影响的语义的嵌入可论证可导致偏置字矢量的教训。考虑到质的不同数据子集,我们的目标是一致的最终的字向量单亚群的影响力,同时保持它们的质量。在这方面,我们提出了一个标准来衡量一个单一的数据子集的转变和发展途径,以满足这两个目标。我们发现,两个子集的嵌入的加权平均余额部分数据的影响,而单词类似性能降低。我们进一步提出了一个有前途的优化方法来平衡影响和字的嵌入质量。
Stephanie Brandl, David Lassner, Maximilian Alber
Abstract: Word embeddings capture semantic relationships based on contextual information and are the basis for a wide variety of natural language processing applications. Notably these relationships are solely learned from the data and subsequently the data composition impacts the semantic of embeddings which arguably can lead to biased word vectors. Given qualitatively different data subsets, we aim to align the influence of single subsets on the resulting word vectors, while retaining their quality. In this regard we propose a criteria to measure the shift towards a single data subset and develop approaches to meet both objectives. We find that a weighted average of the two subset embeddings balances the influence of those subsets while word similarity performance decreases. We further propose a promising optimization approach to balance influences and quality of word embeddings.
摘要:基于上下文信息的嵌入Word中捕捉语义关系,并且是各种各样的自然语言处理应用的基础。值得注意的是这些关系仅由数据并随后将数据组合物影响的语义的嵌入可论证可导致偏置字矢量的教训。考虑到质的不同数据子集,我们的目标是一致的最终的字向量单亚群的影响力,同时保持它们的质量。在这方面,我们提出了一个标准来衡量一个单一的数据子集的转变和发展途径,以满足这两个目标。我们发现,两个子集的嵌入的加权平均余额部分数据的影响,而单词类似性能降低。我们进一步提出了一个有前途的优化方法来平衡影响和字的嵌入质量。
3. Bi-Decoder Augmented Network for Neural Machine Translation [PDF] 返回目录
Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai
Abstract: Neural Machine Translation (NMT) has become a popular technology in recent years, and the encoder-decoder framework is the mainstream among all the methods. It's obvious that the quality of the semantic representations from encoding is very crucial and can significantly affect the performance of the model. However, existing unidirectional source-to-target architectures may hardly produce a language-independent representation of the text because they rely heavily on the specific relations of the given language pairs. To alleviate this problem, in this paper, we propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task. Besides the original decoder which generates the target language sequence, we add an auxiliary decoder to generate back the source language sequence at the training time. Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space. We conduct extensive experiments on several NMT benchmark datasets and the results demonstrate the effectiveness of our proposed approach.
摘要:神经机器翻译(NMT)已经成为一种流行的技术,近年来,和编码器,解码器的结构与第方法中所有的主流。很明显,从编码语义表示的质量是非常重要的,可以显著影响模型的性能。但是,现有的单向源到目标架构可以几乎不产生文本的语言无关的表示,因为他们在很大程度上依赖于特定的语言对的特定关系。为了缓解这一问题,在本文中,我们提出了一个新颖的双解码器增强网络(毕单)的神经机器翻译的任务。除了生成目标语言序列原有解码器,我们添加辅助解码器,以生成回到了训练时间的源语言序列。因为每个解码器将输入的文本的表示成其相应的语言,共同具有两个靶的端部的训练可以使共享编码器具有以产生独立于语言的语义空间的潜力。我们几个NMT基准数据集进行了广泛的实验,结果证明我们提出的方法的有效性。
Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai
Abstract: Neural Machine Translation (NMT) has become a popular technology in recent years, and the encoder-decoder framework is the mainstream among all the methods. It's obvious that the quality of the semantic representations from encoding is very crucial and can significantly affect the performance of the model. However, existing unidirectional source-to-target architectures may hardly produce a language-independent representation of the text because they rely heavily on the specific relations of the given language pairs. To alleviate this problem, in this paper, we propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task. Besides the original decoder which generates the target language sequence, we add an auxiliary decoder to generate back the source language sequence at the training time. Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space. We conduct extensive experiments on several NMT benchmark datasets and the results demonstrate the effectiveness of our proposed approach.
摘要:神经机器翻译(NMT)已经成为一种流行的技术,近年来,和编码器,解码器的结构与第方法中所有的主流。很明显,从编码语义表示的质量是非常重要的,可以显著影响模型的性能。但是,现有的单向源到目标架构可以几乎不产生文本的语言无关的表示,因为他们在很大程度上依赖于特定的语言对的特定关系。为了缓解这一问题,在本文中,我们提出了一个新颖的双解码器增强网络(毕单)的神经机器翻译的任务。除了生成目标语言序列原有解码器,我们添加辅助解码器,以生成回到了训练时间的源语言序列。因为每个解码器将输入的文本的表示成其相应的语言,共同具有两个靶的端部的训练可以使共享编码器具有以产生独立于语言的语义空间的潜力。我们几个NMT基准数据集进行了广泛的实验,结果证明我们提出的方法的有效性。
4. On the Replicability of Combining Word Embeddings and Retrieval Models [PDF] 返回目录
Luca Papariello, Alexandros Bampoulidis, Mihai Lupu
Abstract: We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these representations in document classification, clustering, and retrieval. Specifically, the hypothesis was that the use of a mixture model of von Mises-Fisher (VMF) distributions instead of Gaussian distributions would be beneficial because of the focus on cosine distances of both VMF and the vector space model traditionally used in information retrieval. Previous experiments had validated this hypothesis. Our replication was not able to validate it, despite a large parameter scan space.
摘要:近期重复实验,试图证明有关使用费的内核架构和混合模型的聚集对文档表示字的嵌入和使用文档分类,聚类和检索这些表象的一个有吸引力的假说。具体而言,假设是使用冯米塞斯-Fisher分析(VMF)的混合物模型的分布,而不是高斯分布将是因为聚焦在两个VMF和信息检索传统上使用向量空间模型的余弦距离的有益的。以前的实验已经证实了这一假设。我们的复制无法验证它,尽管大的参数扫描空间。
Luca Papariello, Alexandros Bampoulidis, Mihai Lupu
Abstract: We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these representations in document classification, clustering, and retrieval. Specifically, the hypothesis was that the use of a mixture model of von Mises-Fisher (VMF) distributions instead of Gaussian distributions would be beneficial because of the focus on cosine distances of both VMF and the vector space model traditionally used in information retrieval. Previous experiments had validated this hypothesis. Our replication was not able to validate it, despite a large parameter scan space.
摘要:近期重复实验,试图证明有关使用费的内核架构和混合模型的聚集对文档表示字的嵌入和使用文档分类,聚类和检索这些表象的一个有吸引力的假说。具体而言,假设是使用冯米塞斯-Fisher分析(VMF)的混合物模型的分布,而不是高斯分布将是因为聚焦在两个VMF和信息检索传统上使用向量空间模型的余弦距离的有益的。以前的实验已经证实了这一假设。我们的复制无法验证它,尽管大的参数扫描空间。
5. Detecting depression in dyadic conversations with multimodal narratives and visualizations [PDF] 返回目录
Joshua Y. Kim, Greyson Y. Kim, Kalina Yacef
Abstract: Conversations contain a wide spectrum of multimodal information that gives us hints about the emotions and moods of the speaker. In this paper, we developed a system that supports humans to analyze conversations. Our main contribution is the identification of appropriate multimodal features and the integration of such features into verbatim conversation transcripts. We demonstrate the ability of our system to take in a wide range of multimodal information and automatically generated a prediction score for the depression state of the individual. Our experiments showed that this approach yielded better performance than the baseline model. Furthermore, the multimodal narrative approach makes it easy to integrate learnings from other disciplines, such as conversational analysis and psychology. Lastly, this interdisciplinary and automated approach is a step towards emulating how practitioners record the course of treatment as well as emulating how conversational analysts have been analyzing conversations by hand.
摘要:对话包含的多模式信息范围广泛,让我们有预兆说话者的情绪和心情。在本文中,我们开发了一个系统,支持人类分析的对话。我们的主要贡献是适当的多模式特征的识别和整合这些功能集成到逐字谈话笔录。我们证明我们的系统采取广泛的多模式信息,并自动生成个人的抑郁状态的预测得分的能力。我们的实验表明,这种方法取得了比基线模型更好的性能。此外,多模式的叙事方法,可以轻松集成到其他学科,如会话分析和心理学的学习收获。最后,这种跨学科的和自动化的方法是对模拟从业者如何记录治疗过程,以及如何模拟对话分析家一直用手分析对话的一个步骤。
Joshua Y. Kim, Greyson Y. Kim, Kalina Yacef
Abstract: Conversations contain a wide spectrum of multimodal information that gives us hints about the emotions and moods of the speaker. In this paper, we developed a system that supports humans to analyze conversations. Our main contribution is the identification of appropriate multimodal features and the integration of such features into verbatim conversation transcripts. We demonstrate the ability of our system to take in a wide range of multimodal information and automatically generated a prediction score for the depression state of the individual. Our experiments showed that this approach yielded better performance than the baseline model. Furthermore, the multimodal narrative approach makes it easy to integrate learnings from other disciplines, such as conversational analysis and psychology. Lastly, this interdisciplinary and automated approach is a step towards emulating how practitioners record the course of treatment as well as emulating how conversational analysts have been analyzing conversations by hand.
摘要:对话包含的多模式信息范围广泛,让我们有预兆说话者的情绪和心情。在本文中,我们开发了一个系统,支持人类分析的对话。我们的主要贡献是适当的多模式特征的识别和整合这些功能集成到逐字谈话笔录。我们证明我们的系统采取广泛的多模式信息,并自动生成个人的抑郁状态的预测得分的能力。我们的实验表明,这种方法取得了比基线模型更好的性能。此外,多模式的叙事方法,可以轻松集成到其他学科,如会话分析和心理学的学习收获。最后,这种跨学科的和自动化的方法是对模拟从业者如何记录治疗过程,以及如何模拟对话分析家一直用手分析对话的一个步骤。
6. A (Simplified) Supreme Being Necessarily Exists -- Says the Computer! [PDF] 返回目录
Christoph Benzmüller
Abstract: A simplified variant of Kurt Gödel's modal ontological argument is presented. Some of Gödel's, resp. Scott's, premises are modified, others are dropped, and modal collapse is avoided. The emended argument is shown valid already in quantified modal logic K. The presented simplifications have been computationally explored utilising latest knowledge representation and reasoning technology based on higher-order logic. The paper thus illustrates how modern symbolic AI technology can contribute new knowledge to formal philosophy and theology.
摘要:哥德尔的模式本体论的简化变体显示。有些哥德尔,RESP的。斯科特的,房屋被修改,其他被丢弃,避免了模态崩溃。在仔细的校勘参数显示有效的已量化模态逻辑K.所提出的简化了计算研究利用最新的知识表示和基于高阶逻辑推理技术。因此,阐述了象征性的AI技术如何现代可以促进新知识的正式哲学和神学。
Christoph Benzmüller
Abstract: A simplified variant of Kurt Gödel's modal ontological argument is presented. Some of Gödel's, resp. Scott's, premises are modified, others are dropped, and modal collapse is avoided. The emended argument is shown valid already in quantified modal logic K. The presented simplifications have been computationally explored utilising latest knowledge representation and reasoning technology based on higher-order logic. The paper thus illustrates how modern symbolic AI technology can contribute new knowledge to formal philosophy and theology.
摘要:哥德尔的模式本体论的简化变体显示。有些哥德尔,RESP的。斯科特的,房屋被修改,其他被丢弃,避免了模态崩溃。在仔细的校勘参数显示有效的已量化模态逻辑K.所提出的简化了计算研究利用最新的知识表示和基于高阶逻辑推理技术。因此,阐述了象征性的AI技术如何现代可以促进新知识的正式哲学和神学。
7. Improved Robust ASR for Social Robots in Public Spaces [PDF] 返回目录
Charles Jankowski, Vishwas Mruthyunjaya, Ruixi Lin
Abstract: Social robots deployed in public spaces present a challenging task for ASR because of a variety of factors, including noise SNR of 20 to 5 dB. Existing ASR models perform well for higher SNRs in this range, but degrade considerably with more noise. This work explores methods for providing improved ASR performance in such conditions. We use the AiShell-1 Chinese speech corpus and the Kaldi ASR toolkit for evaluations. We were able to exceed state-of-the-art ASR performance with SNR lower than 20 dB, demonstrating the feasibility of achieving relatively high performing ASR with open-source toolkits and hundreds of hours of training data, which is commonly available.
摘要:部署在公共场所的社交机器人目前由于多种因素的影响,其中包括20至5分贝的噪音信噪比ASR一项艰巨的任务。现有ASR模型表现良好在这个范围内的较高的信噪比,但更多的噪音大大降低。这项工作探索提供在这样的条件下改善ASR性能的方法。我们使用AiShell-1中国语料库和Kaldi ASR工具包的评估。我们能够超过信噪比国家的最先进的ASR性能大于20dB低,表明达到比较高的用开源工具包和数以百计的训练数据,这是通常可以利用的时间来完成ASR的可行性。
Charles Jankowski, Vishwas Mruthyunjaya, Ruixi Lin
Abstract: Social robots deployed in public spaces present a challenging task for ASR because of a variety of factors, including noise SNR of 20 to 5 dB. Existing ASR models perform well for higher SNRs in this range, but degrade considerably with more noise. This work explores methods for providing improved ASR performance in such conditions. We use the AiShell-1 Chinese speech corpus and the Kaldi ASR toolkit for evaluations. We were able to exceed state-of-the-art ASR performance with SNR lower than 20 dB, demonstrating the feasibility of achieving relatively high performing ASR with open-source toolkits and hundreds of hours of training data, which is commonly available.
摘要:部署在公共场所的社交机器人目前由于多种因素的影响,其中包括20至5分贝的噪音信噪比ASR一项艰巨的任务。现有ASR模型表现良好在这个范围内的较高的信噪比,但更多的噪音大大降低。这项工作探索提供在这样的条件下改善ASR性能的方法。我们使用AiShell-1中国语料库和Kaldi ASR工具包的评估。我们能够超过信噪比国家的最先进的ASR性能大于20dB低,表明达到比较高的用开源工具包和数以百计的训练数据,这是通常可以利用的时间来完成ASR的可行性。
8. Faster Transformer Decoding: N-gram Masked Self-Attention [PDF] 返回目录
Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer
Abstract: Motivated by the fact that most of the information relevant to the prediction of target tokens is drawn from the source sentence $S=s_1, \ldots, s_S$, we propose truncating the target-side window used for computing self-attention by making an $N$-gram assumption. Experiments on WMT EnDe and EnFr data sets show that the $N$-gram masked self-attention model loses very little in BLEU score for $N$ values in the range $4, \ldots, 8$, depending on the task.
摘要:事实上,大多数的相关目标令牌的预测信息从源句子$ S = S_1,\ ldots,S_S $绘制的启发,我们建议截断用于通过计算自我关注的目标侧窗做一个$ N $ -gram假设。在WMT恩德和EnFr数据集上的实验表明,$ N $ -gram掩盖自我注意模型的BLEU分数$ N $值的范围在$ 4 \ ldots,$ 8,根据任务非常小的损失。
Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer
Abstract: Motivated by the fact that most of the information relevant to the prediction of target tokens is drawn from the source sentence $S=s_1, \ldots, s_S$, we propose truncating the target-side window used for computing self-attention by making an $N$-gram assumption. Experiments on WMT EnDe and EnFr data sets show that the $N$-gram masked self-attention model loses very little in BLEU score for $N$ values in the range $4, \ldots, 8$, depending on the task.
摘要:事实上,大多数的相关目标令牌的预测信息从源句子$ S = S_1,\ ldots,S_S $绘制的启发,我们建议截断用于通过计算自我关注的目标侧窗做一个$ N $ -gram假设。在WMT恩德和EnFr数据集上的实验表明,$ N $ -gram掩盖自我注意模型的BLEU分数$ N $值的范围在$ 4 \ ldots,$ 8,根据任务非常小的损失。
注:中文为机器翻译结果!