
【arxiv论文】 Computation and Language 2020-01-15


1. Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning [PDF] 摘要
2. Balancing the composition of word embeddings across heterogenous data sets [PDF] 摘要
3. Bi-Decoder Augmented Network for Neural Machine Translation [PDF] 摘要
4. On the Replicability of Combining Word Embeddings and Retrieval Models [PDF] 摘要
5. Detecting depression in dyadic conversations with multimodal narratives and visualizations [PDF] 摘要
6. A (Simplified) Supreme Being Necessarily Exists -- Says the Computer! [PDF] 摘要
7. Improved Robust ASR for Social Robots in Public Spaces [PDF] 摘要
8. Faster Transformer Decoding: N-gram Masked Self-Attention [PDF] 摘要


1. Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning [PDF] 返回目录
  Roei Schuster, Tal Schuster, Yoav Meri, Vitaly Shmatikov
Abstract: Word embeddings, i.e., low-dimensional vector representations such as GloVe and SGNS, encode word "meaning" in the sense that distances between words' vectors correspond to their semantic proximity. This enables transfer learning of semantics for a variety of natural language processing tasks. Word embeddings are typically trained on large public corpora such as Wikipedia or Twitter. We demonstrate that an attacker who can modify the corpus on which the embedding is trained can control the "meaning" of new and existing words by changing their locations in the embedding space. We develop an explicit expression over corpus features that serves as a proxy for distance between words and establish a causative relationship between its values and embedding distances. We then show how to use this relationship for two adversarial objectives: (1) make a word a top-ranked neighbor of another word, and (2) move a word from one semantic cluster to another. An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios. We use this attack to manipulate query expansion in information retrieval systems such as resume search, make certain names more or less visible to named entity recognition models, and cause new words to be translated to a particular target word regardless of the language. Finally, we show how the attacker can generate linguistically likely corpus modifications, thus fooling defenses that attempt to filter implausible sentences from the corpus using a language model.
摘要:字的嵌入,即,低维向量表示,如手套和SGNS,编码字在这个意义上,词语向量之间的距离对应于它们的语义接近“意思是”。这使语义的迁移学习的各种自然语言处理任务。 Word中的嵌入通常是受过训练的大型公共语料库,如维基百科或Twitter。我们表明,攻击者谁可以修改其嵌入训练可以通过改变空间嵌入它们的位置控制的新的和现有的词“意义”的语料库。我们开发了语料库的特点,可作为单词之间距离的代理明确的表达,并建立自己的价值观和嵌入的距离之间的因果关系。然后,我们展示了如何使用两个敌对目标的这种关系:(1)做一个字一个字的世界排名第一的邻居,和(2)从一个语义集群移动一个字到另一个。在嵌入的攻击会影响不同的下游任务,这表明首次数据传输学习情境中毒的力量。我们使用这种攻击来操纵信息检索系统,如简历搜索查询扩展,使某些名字命名实体识别模型或多或少可见,并造成新词被翻译成特定的目标词无论使用什么语言。最后,我们展示了攻击者如何产生语言上可能语料库修改,从而欺骗试图难以置信的句子从使用语言模型的语料库过滤防御。

2. Balancing the composition of word embeddings across heterogenous data sets [PDF] 返回目录
  Stephanie Brandl, David Lassner, Maximilian Alber
Abstract: Word embeddings capture semantic relationships based on contextual information and are the basis for a wide variety of natural language processing applications. Notably these relationships are solely learned from the data and subsequently the data composition impacts the semantic of embeddings which arguably can lead to biased word vectors. Given qualitatively different data subsets, we aim to align the influence of single subsets on the resulting word vectors, while retaining their quality. In this regard we propose a criteria to measure the shift towards a single data subset and develop approaches to meet both objectives. We find that a weighted average of the two subset embeddings balances the influence of those subsets while word similarity performance decreases. We further propose a promising optimization approach to balance influences and quality of word embeddings.

3. Bi-Decoder Augmented Network for Neural Machine Translation [PDF] 返回目录
  Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai
Abstract: Neural Machine Translation (NMT) has become a popular technology in recent years, and the encoder-decoder framework is the mainstream among all the methods. It's obvious that the quality of the semantic representations from encoding is very crucial and can significantly affect the performance of the model. However, existing unidirectional source-to-target architectures may hardly produce a language-independent representation of the text because they rely heavily on the specific relations of the given language pairs. To alleviate this problem, in this paper, we propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task. Besides the original decoder which generates the target language sequence, we add an auxiliary decoder to generate back the source language sequence at the training time. Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space. We conduct extensive experiments on several NMT benchmark datasets and the results demonstrate the effectiveness of our proposed approach.

4. On the Replicability of Combining Word Embeddings and Retrieval Models [PDF] 返回目录
  Luca Papariello, Alexandros Bampoulidis, Mihai Lupu
Abstract: We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these representations in document classification, clustering, and retrieval. Specifically, the hypothesis was that the use of a mixture model of von Mises-Fisher (VMF) distributions instead of Gaussian distributions would be beneficial because of the focus on cosine distances of both VMF and the vector space model traditionally used in information retrieval. Previous experiments had validated this hypothesis. Our replication was not able to validate it, despite a large parameter scan space.

5. Detecting depression in dyadic conversations with multimodal narratives and visualizations [PDF] 返回目录
  Joshua Y. Kim, Greyson Y. Kim, Kalina Yacef
Abstract: Conversations contain a wide spectrum of multimodal information that gives us hints about the emotions and moods of the speaker. In this paper, we developed a system that supports humans to analyze conversations. Our main contribution is the identification of appropriate multimodal features and the integration of such features into verbatim conversation transcripts. We demonstrate the ability of our system to take in a wide range of multimodal information and automatically generated a prediction score for the depression state of the individual. Our experiments showed that this approach yielded better performance than the baseline model. Furthermore, the multimodal narrative approach makes it easy to integrate learnings from other disciplines, such as conversational analysis and psychology. Lastly, this interdisciplinary and automated approach is a step towards emulating how practitioners record the course of treatment as well as emulating how conversational analysts have been analyzing conversations by hand.

6. A (Simplified) Supreme Being Necessarily Exists -- Says the Computer! [PDF] 返回目录
  Christoph Benzmüller
Abstract: A simplified variant of Kurt Gödel's modal ontological argument is presented. Some of Gödel's, resp. Scott's, premises are modified, others are dropped, and modal collapse is avoided. The emended argument is shown valid already in quantified modal logic K. The presented simplifications have been computationally explored utilising latest knowledge representation and reasoning technology based on higher-order logic. The paper thus illustrates how modern symbolic AI technology can contribute new knowledge to formal philosophy and theology.

7. Improved Robust ASR for Social Robots in Public Spaces [PDF] 返回目录
  Charles Jankowski, Vishwas Mruthyunjaya, Ruixi Lin
Abstract: Social robots deployed in public spaces present a challenging task for ASR because of a variety of factors, including noise SNR of 20 to 5 dB. Existing ASR models perform well for higher SNRs in this range, but degrade considerably with more noise. This work explores methods for providing improved ASR performance in such conditions. We use the AiShell-1 Chinese speech corpus and the Kaldi ASR toolkit for evaluations. We were able to exceed state-of-the-art ASR performance with SNR lower than 20 dB, demonstrating the feasibility of achieving relatively high performing ASR with open-source toolkits and hundreds of hours of training data, which is commonly available.
摘要:部署在公共场所的社交机器人目前由于多种因素的影响,其中包括20至5分贝的噪音信噪比ASR一项艰巨的任务。现有ASR模型表现良好在这个范围内的较高的信噪比,但更多的噪音大大降低。这项工作探索提供在这样的条件下改善ASR性能的方法。我们使用AiShell-1中国语料库和Kaldi ASR工具包的评估。我们能够超过信噪比国家的最先进的ASR性能大于20dB低,表明达到比较高的用开源工具包和数以百计的训练数据,这是通常可以利用的时间来完成ASR的可行性。

8. Faster Transformer Decoding: N-gram Masked Self-Attention [PDF] 返回目录
  Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer
Abstract: Motivated by the fact that most of the information relevant to the prediction of target tokens is drawn from the source sentence $S=s_1, \ldots, s_S$, we propose truncating the target-side window used for computing self-attention by making an $N$-gram assumption. Experiments on WMT EnDe and EnFr data sets show that the $N$-gram masked self-attention model loses very little in BLEU score for $N$ values in the range $4, \ldots, 8$, depending on the task.
摘要:事实上,大多数的相关目标令牌的预测信息从源句子$ S = S_1,\ ldots,S_S $绘制的启发,我们建议截断用于通过计算自我关注的目标侧窗做一个$ N $ -gram假设。在WMT恩德和EnFr数据集上的实验表明,$ N $ -gram掩盖自我注意模型的BLEU分数$ N $值的范围在$ 4 \ ldots,$ 8,根据任务非常小的损失。
