目录
8. What's in a Name? Are BERT Named Entity Representations just as Good for any other Name? [PDF] 摘要
9. Calling Out Bluff: Attacking the Robustness of Automatic Scoring Systems with Simple Adversarial Testing [PDF] 摘要
10. An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models [PDF] 摘要
12. BERTERS: Multimodal Representation Learning for Expert Recommendation System with Transformer [PDF] 摘要
13. Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances [PDF] 摘要
14. Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR [PDF] 摘要
摘要
1. Investigation of Sentiment Controllable Chatbot [PDF] 返回目录
Hung-yi Lee, Cheng-Hao Ho, Chien-Fu Lin, Chiung-Chih Chang, Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen
Abstract: Conventional seq2seq chatbot models attempt only to find sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences. In this paper, we investigate four models to scale or adjust the sentiment of the chatbot response: a persona-based model, reinforcement learning, a plug and play model, and CycleGAN, all based on the seq2seq model. We also develop machine-evaluated metrics to estimate whether the responses are reasonable given the input. These metrics, together with human evaluation, are used to analyze the performance of the four models in terms of different aspects; reinforcement learning and CycleGAN are shown to be very attractive.
摘要:传统的seq2seq聊天机器人的模型仅仅试图找到具有最高概率的条件输入顺序的句子,而不考虑输出句子的情绪。在本文中,我们研究了四种模式,以规模或调整聊天机器人响应的感悟:基于角色的模型,强化学习,即插即用模式,CycleGAN,全部基于seq2seq模型。我们还开发了机器评估的度量来估计响应是否合理给出的输入。这些指标,与人类一起评价,被用于分析四种型号的不同方面方面的性能;强化学习和CycleGAN被证明是非常有吸引力的。
Hung-yi Lee, Cheng-Hao Ho, Chien-Fu Lin, Chiung-Chih Chang, Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen
Abstract: Conventional seq2seq chatbot models attempt only to find sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences. In this paper, we investigate four models to scale or adjust the sentiment of the chatbot response: a persona-based model, reinforcement learning, a plug and play model, and CycleGAN, all based on the seq2seq model. We also develop machine-evaluated metrics to estimate whether the responses are reasonable given the input. These metrics, together with human evaluation, are used to analyze the performance of the four models in terms of different aspects; reinforcement learning and CycleGAN are shown to be very attractive.
摘要:传统的seq2seq聊天机器人的模型仅仅试图找到具有最高概率的条件输入顺序的句子,而不考虑输出句子的情绪。在本文中,我们研究了四种模式,以规模或调整聊天机器人响应的感悟:基于角色的模型,强化学习,即插即用模式,CycleGAN,全部基于seq2seq模型。我们还开发了机器评估的度量来估计响应是否合理给出的输入。这些指标,与人类一起评价,被用于分析四种型号的不同方面方面的性能;强化学习和CycleGAN被证明是非常有吸引力的。
2. COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes [PDF] 返回目录
Raj Kumar Gupta, Ajay Vishwanath, Yinping Yang
Abstract: This resource paper describes a large dataset covering over 63 million coronavirus-related Twitter posts from more than 13 million unique users since 28 January to 1 July 2020. As strong concerns and emotions are expressed in the tweets, we analyzed the tweets content using natural language processing techniques and machine-learning based algorithms, and inferred seventeen latent semantic attributes associated with each tweet, including 1) ten attributes indicating the tweet's relevance to ten detected topics, 2) five quantitative attributes indicating the degree of intensity in the valence (i.e., unpleasantness/pleasantness) and emotional intensities across four primary emotions of fear, anger, sadness and joy, and 3) two qualitative attributes indicating the sentiment category and the most dominant emotion category, respectively. To illustrate how the dataset can be used, we present descriptive statistics around the topics, sentiments and emotions attributes and their temporal distributions, and discuss possible applications in communication, psychology, public health, economics and epidemiology.
摘要:资源本文介绍了自28一月到七月2020年由于强烈的关注和情感在微博中表达1,我们分析了使用微博内容的大型数据集占地超过6300万冠状病毒相关的超过1300万个用户的Twitter信息自然语言处理技术和基于机器学习的算法,并推断出指示10鸣叫的相关性与每个鸣叫相关联的17个潜在语义属性,包括1)10的属性检测主题,表明在价强度的程度2)5个定量属性(即,不愉快/愉悦)和跨恐惧,愤怒,悲伤和喜悦的四个主要的情绪,并表示感情类别和最有统治力的情感类别,分别为3)两个定性属性的情感强度。为了说明该数据集可以如何使用,我们提出围绕主题,情绪和情感属性及其时空分布描述性统计分析,并讨论沟通,心理学,公共健康,经济和流行病学可能的应用。
Raj Kumar Gupta, Ajay Vishwanath, Yinping Yang
Abstract: This resource paper describes a large dataset covering over 63 million coronavirus-related Twitter posts from more than 13 million unique users since 28 January to 1 July 2020. As strong concerns and emotions are expressed in the tweets, we analyzed the tweets content using natural language processing techniques and machine-learning based algorithms, and inferred seventeen latent semantic attributes associated with each tweet, including 1) ten attributes indicating the tweet's relevance to ten detected topics, 2) five quantitative attributes indicating the degree of intensity in the valence (i.e., unpleasantness/pleasantness) and emotional intensities across four primary emotions of fear, anger, sadness and joy, and 3) two qualitative attributes indicating the sentiment category and the most dominant emotion category, respectively. To illustrate how the dataset can be used, we present descriptive statistics around the topics, sentiments and emotions attributes and their temporal distributions, and discuss possible applications in communication, psychology, public health, economics and epidemiology.
摘要:资源本文介绍了自28一月到七月2020年由于强烈的关注和情感在微博中表达1,我们分析了使用微博内容的大型数据集占地超过6300万冠状病毒相关的超过1300万个用户的Twitter信息自然语言处理技术和基于机器学习的算法,并推断出指示10鸣叫的相关性与每个鸣叫相关联的17个潜在语义属性,包括1)10的属性检测主题,表明在价强度的程度2)5个定量属性(即,不愉快/愉悦)和跨恐惧,愤怒,悲伤和喜悦的四个主要的情绪,并表示感情类别和最有统治力的情感类别,分别为3)两个定性属性的情感强度。为了说明该数据集可以如何使用,我们提出围绕主题,情绪和情感属性及其时空分布描述性统计分析,并讨论沟通,心理学,公共健康,经济和流行病学可能的应用。
3. Modeling Voting for System Combination in Machine Translation [PDF] 返回目录
Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu
Abstract: System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance. Although early statistical approaches to system combination have been proven effective in analyzing the consensus between hypotheses, they suffer from the error propagation problem due to the use of pipelines. While this problem has been alleviated by end-to-end training of multi-source sequence-to-sequence models recently, these neural models do not explicitly analyze the relations between hypotheses and fail to capture their agreement because the attention to a word in a hypothesis is calculated independently, ignoring the fact that the word might occur in multiple hypotheses. In this work, we propose an approach to modeling voting for system combination in machine translation. The basic idea is to enable words in hypotheses from different systems to vote on words that are representative and should get involved in the generation process. This can be done by quantifying the influence of each voter and its preference for each candidate. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training. Experiments show that our approach is capable of better taking advantage of the consensus between hypotheses and achieves significant improvements over state-of-the-art baselines on Chinese-English and English-German machine translation tasks.
摘要:系统组合产品组合不同的机器翻译系统的假设,以提高翻译性能的重要技术。虽然系统组合月初的统计方法已被证明有效的分析假设之间的共识,他们从错误传播问题的困扰,由于使用管道。虽然这个问题已经由终端到终端的培训多源序列到序列模型的近期缓解,这些神经模型并没有明确分析假设之间的关系,并没有捕捉到他们的协议是因为关注的一个字假设是独立计算,忽略这个词可能出现在多个假设的事实。在这项工作中,我们提出了一种方法来在机器翻译模型进行系统组合投票。其基本思想是让来自不同系统的假设的话对有代表性,应该涉足生成过程的话投票。这可以通过量化每个选民的影响力及其对每位候选人的偏好来进行。我们的方法结合了统计和神经方法的优点,因为它不仅可以分析假设之间的关系,而且也允许终端到终端的培训。实验结果表明,我们的做法是能够的假设之间的共识,更好地趁势而实现对中国英语和英语,德语机器翻译的任务在国家的最先进的基线显著的改善。
Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu
Abstract: System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance. Although early statistical approaches to system combination have been proven effective in analyzing the consensus between hypotheses, they suffer from the error propagation problem due to the use of pipelines. While this problem has been alleviated by end-to-end training of multi-source sequence-to-sequence models recently, these neural models do not explicitly analyze the relations between hypotheses and fail to capture their agreement because the attention to a word in a hypothesis is calculated independently, ignoring the fact that the word might occur in multiple hypotheses. In this work, we propose an approach to modeling voting for system combination in machine translation. The basic idea is to enable words in hypotheses from different systems to vote on words that are representative and should get involved in the generation process. This can be done by quantifying the influence of each voter and its preference for each candidate. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training. Experiments show that our approach is capable of better taking advantage of the consensus between hypotheses and achieves significant improvements over state-of-the-art baselines on Chinese-English and English-German machine translation tasks.
摘要:系统组合产品组合不同的机器翻译系统的假设,以提高翻译性能的重要技术。虽然系统组合月初的统计方法已被证明有效的分析假设之间的共识,他们从错误传播问题的困扰,由于使用管道。虽然这个问题已经由终端到终端的培训多源序列到序列模型的近期缓解,这些神经模型并没有明确分析假设之间的关系,并没有捕捉到他们的协议是因为关注的一个字假设是独立计算,忽略这个词可能出现在多个假设的事实。在这项工作中,我们提出了一种方法来在机器翻译模型进行系统组合投票。其基本思想是让来自不同系统的假设的话对有代表性,应该涉足生成过程的话投票。这可以通过量化每个选民的影响力及其对每位候选人的偏好来进行。我们的方法结合了统计和神经方法的优点,因为它不仅可以分析假设之间的关系,而且也允许终端到终端的培训。实验结果表明,我们的做法是能够的假设之间的共识,更好地趁势而实现对中国英语和英语,德语机器翻译的任务在国家的最先进的基线显著的改善。
4. Contextualized Code Representation Learning for Commit Message Generation [PDF] 返回目录
Lun Yiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, Zenglin Xu
Abstract: Automatic generation of high-quality commit messages for code commits can substantially facilitate developers' works and coordination. However, the semantic gap between source code and natural language poses a major challenge for the task. Several studies have been proposed to alleviate the challenge but none explicitly involves code contextual information during commit message generation. Specifically, existing research adopts static embedding for code tokens, which maps a token to the same vector regardless of its context. In this paper, we propose a novel Contextualized code representation learning method for commit message Generation (CoreGen). CoreGen first learns contextualized code representation which exploits the contextual information behind code commit sequences. The learned representations of code commits built upon Transformer are then transferred for downstream commit message generation. Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with an improvement of 28.18% in terms of BLEU-4 score. Furthermore, we also highlight the future opportunities in training contextualized code representations on larger code corpus as a solution to low-resource settings and adapting the pretrained code representations to other downstream code-to-text generation tasks.
摘要:自动生成的高质量提交代码的提交可以大大方便开发者的作品和协调消息。然而,源代码和自然语言的语义鸿沟带来的任务是一个重大挑战。几项研究已经提出了缓解挑战,但没有明确涉及提交信息生成过程中的代码的上下文信息。具体而言,现有的研究采用了代码的令牌,该令牌到相同的向量映射不管其上下文的静态嵌入。在本文中,我们提出了一种新颖的语境化码表示学习方法,用于提交消息生成(CoreGen)。 CoreGen第一获悉情境化码表示其利用后面码上下文信息提交的序列。那么在变压器编译后的代码提交的教训陈述转移下游提交信息的产生。基准的实验数据集展示与28.18%的BLEU-4分数方面的改进我们的模型比基线模型的优越的效力。此外,我们还对较大的代码语料训练语境代码表示作为解决资源匮乏和调整预训练的代码表示到其他下游代码到文本生成任务凸显未来的机会。
Lun Yiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, Zenglin Xu
Abstract: Automatic generation of high-quality commit messages for code commits can substantially facilitate developers' works and coordination. However, the semantic gap between source code and natural language poses a major challenge for the task. Several studies have been proposed to alleviate the challenge but none explicitly involves code contextual information during commit message generation. Specifically, existing research adopts static embedding for code tokens, which maps a token to the same vector regardless of its context. In this paper, we propose a novel Contextualized code representation learning method for commit message Generation (CoreGen). CoreGen first learns contextualized code representation which exploits the contextual information behind code commit sequences. The learned representations of code commits built upon Transformer are then transferred for downstream commit message generation. Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with an improvement of 28.18% in terms of BLEU-4 score. Furthermore, we also highlight the future opportunities in training contextualized code representations on larger code corpus as a solution to low-resource settings and adapting the pretrained code representations to other downstream code-to-text generation tasks.
摘要:自动生成的高质量提交代码的提交可以大大方便开发者的作品和协调消息。然而,源代码和自然语言的语义鸿沟带来的任务是一个重大挑战。几项研究已经提出了缓解挑战,但没有明确涉及提交信息生成过程中的代码的上下文信息。具体而言,现有的研究采用了代码的令牌,该令牌到相同的向量映射不管其上下文的静态嵌入。在本文中,我们提出了一种新颖的语境化码表示学习方法,用于提交消息生成(CoreGen)。 CoreGen第一获悉情境化码表示其利用后面码上下文信息提交的序列。那么在变压器编译后的代码提交的教训陈述转移下游提交信息的产生。基准的实验数据集展示与28.18%的BLEU-4分数方面的改进我们的模型比基线模型的优越的效力。此外,我们还对较大的代码语料训练语境代码表示作为解决资源匮乏和调整预训练的代码表示到其他下游代码到文本生成任务凸显未来的机会。
5. Questionnaire analysis to define the most suitable survey for port-noise investigation [PDF] 返回目录
Andrea Cerniglia, Davide Chiarella, Paola Cutugno, Lucia Marconi, Anna Magrini, Gelsomina Di Feo, Melissa Ferretti
Abstract: The high level of noise pollution affecting the areas between ports and logistic platforms represents a problem that can be faced from different points of view. Acoustic monitoring, mapping, short-term measurements, port and road traffic flows analyses can give useful indications on the strategies to be proposed for a better management of the problem. A survey campaign through the preparation of questionnaires to be submitted to the population exposed to noise in the back-port areas will help to better understand the subjective point of view. The paper analyses a sample of questions suitable for the specific research, chosen as part of the wide database of questionnaires internationally proposed for subjective investigations. The preliminary results of a first data collection campaign are considered to verify the adequacy of the number, the type of questions, and the type of sample noise used for the survey. The questionnaire will be optimized to be distributed in the TRIPLO project (TRansports and Innovative sustainable connections between Ports and LOgistic platforms). The results of this survey will be the starting point for the linguistic investigation carried out in combination with the acoustic monitoring, to improve understanding the connections between personal feeling and technical aspects.
摘要:影响港口和物流平台之间的区域噪音污染的高级别代表,可以从不同的观点面临的一个问题。声学监测,测绘,短期测量,港口和道路交通流的分析可以给你提出的这个问题的一个更好的管理上的策略有用的指示。通过问卷调查编制的调查活动,以提交给暴露在背面接口区噪声人口将有助于更好地理解的主观视点。本文分析了适用于具体的研究问题,选择作为国际上提出的主观调查问卷的广泛数据库的一部分样本。第一个数据收集活动的初步结果被认为是验证的数量,问题的类型,以及用于调查样本的噪声类型的充分性。问卷将被优化在TRIPLO项目(运输和港口及物流平台之间的创新可持续连接)进行分配。本次调查的结果将是结合声监控下进行,以增进了解个人的感觉和技术方面之间的连接的语言调查的起点。
Andrea Cerniglia, Davide Chiarella, Paola Cutugno, Lucia Marconi, Anna Magrini, Gelsomina Di Feo, Melissa Ferretti
Abstract: The high level of noise pollution affecting the areas between ports and logistic platforms represents a problem that can be faced from different points of view. Acoustic monitoring, mapping, short-term measurements, port and road traffic flows analyses can give useful indications on the strategies to be proposed for a better management of the problem. A survey campaign through the preparation of questionnaires to be submitted to the population exposed to noise in the back-port areas will help to better understand the subjective point of view. The paper analyses a sample of questions suitable for the specific research, chosen as part of the wide database of questionnaires internationally proposed for subjective investigations. The preliminary results of a first data collection campaign are considered to verify the adequacy of the number, the type of questions, and the type of sample noise used for the survey. The questionnaire will be optimized to be distributed in the TRIPLO project (TRansports and Innovative sustainable connections between Ports and LOgistic platforms). The results of this survey will be the starting point for the linguistic investigation carried out in combination with the acoustic monitoring, to improve understanding the connections between personal feeling and technical aspects.
摘要:影响港口和物流平台之间的区域噪音污染的高级别代表,可以从不同的观点面临的一个问题。声学监测,测绘,短期测量,港口和道路交通流的分析可以给你提出的这个问题的一个更好的管理上的策略有用的指示。通过问卷调查编制的调查活动,以提交给暴露在背面接口区噪声人口将有助于更好地理解的主观视点。本文分析了适用于具体的研究问题,选择作为国际上提出的主观调查问卷的广泛数据库的一部分样本。第一个数据收集活动的初步结果被认为是验证的数量,问题的类型,以及用于调查样本的噪声类型的充分性。问卷将被优化在TRIPLO项目(运输和港口及物流平台之间的创新可持续连接)进行分配。本次调查的结果将是结合声监控下进行,以增进了解个人的感觉和技术方面之间的连接的语言调查的起点。
6. Language, communication and society: a gender based linguistics analysis [PDF] 返回目录
P. Cutugno, D. Chiarella, R. Lucentini, L. Marconi, G. Morgavi
Abstract: The purpose of this study is to find evidence for supporting the hypothesis that language is the mirror of our thinking, our prejudices and cultural stereotypes. In this analysis, a questionnaire was administered to 537 people. The answers have been analysed to see if gender stereotypes were present such as the attribution of psychological and behavioural characteristics. In particular, the aim was to identify, if any, what are the stereotyped images, which emerge in defining the roles of men and women in modern society. Moreover, the results given can be a good starting point to understand if gender stereotypes, and the expectations they produce, can result in penalization or inequality. If so, the language and its use would create inherently a gender bias, which influences evaluations both in work settings both in everyday life.
摘要:本研究的目的是寻找证据支持这一假设认为,语言是我们的思想,我们的偏见和成见文化的一面镜子。在这种分析中,一份调查问卷,给予537人。这些问题的答案进行了分析,以查看是否性别定型观念存在诸如心理和行为特征的归属。特别是,我们的目标是确定,如果有的话,有什么刻板印象,它出现在定义在现代社会中的男人和女人的角色。此外,由于可以是一个很好的起点,结果以了解是否性别刻板印象,并可能导致惩罚或不等式它们所产生的预期。如果是这样,语言和其使用将固有营造性别偏见,这无论是在工作场所无论是在日常生活的影响评估。
P. Cutugno, D. Chiarella, R. Lucentini, L. Marconi, G. Morgavi
Abstract: The purpose of this study is to find evidence for supporting the hypothesis that language is the mirror of our thinking, our prejudices and cultural stereotypes. In this analysis, a questionnaire was administered to 537 people. The answers have been analysed to see if gender stereotypes were present such as the attribution of psychological and behavioural characteristics. In particular, the aim was to identify, if any, what are the stereotyped images, which emerge in defining the roles of men and women in modern society. Moreover, the results given can be a good starting point to understand if gender stereotypes, and the expectations they produce, can result in penalization or inequality. If so, the language and its use would create inherently a gender bias, which influences evaluations both in work settings both in everyday life.
摘要:本研究的目的是寻找证据支持这一假设认为,语言是我们的思想,我们的偏见和成见文化的一面镜子。在这种分析中,一份调查问卷,给予537人。这些问题的答案进行了分析,以查看是否性别定型观念存在诸如心理和行为特征的归属。特别是,我们的目标是确定,如果有的话,有什么刻板印象,它出现在定义在现代社会中的男人和女人的角色。此外,由于可以是一个很好的起点,结果以了解是否性别刻板印象,并可能导致惩罚或不等式它们所产生的预期。如果是这样,语言和其使用将固有营造性别偏见,这无论是在工作场所无论是在日常生活的影响评估。
7. Our Evaluation Metric Needs an Update to Encourage Generalization [PDF] 返回目录
Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral
Abstract: Models that surpass human performance on several popular benchmarks display significant degradation in performance on exposure to Out of Distribution (OOD) data. Recent research has shown that models overfit to spurious biases and `hack' datasets, in lieu of learning generalizable features like humans. In order to stop the inflation in model performance -- and thus overestimation in AI systems' capabilities -- we propose a simple and novel evaluation metric, WOOD Score, that encourages generalization during evaluation.
摘要:模型在几个流行的基准超越人的表现显示暴露于输出(OOD)的数据分布在性能显著下降。最近的研究表明,过度拟合的模型虚假的偏见和'黑客”的数据集,以代替学习人类一样普及的特点。为了停止模型性能的通货膨胀 - 人工智能系统的能力,从而高估 - 我们提出了一个简单新颖的评价指标,木材得分,该评估过程中鼓励推广。
Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral
Abstract: Models that surpass human performance on several popular benchmarks display significant degradation in performance on exposure to Out of Distribution (OOD) data. Recent research has shown that models overfit to spurious biases and `hack' datasets, in lieu of learning generalizable features like humans. In order to stop the inflation in model performance -- and thus overestimation in AI systems' capabilities -- we propose a simple and novel evaluation metric, WOOD Score, that encourages generalization during evaluation.
摘要:模型在几个流行的基准超越人的表现显示暴露于输出(OOD)的数据分布在性能显著下降。最近的研究表明,过度拟合的模型虚假的偏见和'黑客”的数据集,以代替学习人类一样普及的特点。为了停止模型性能的通货膨胀 - 人工智能系统的能力,从而高估 - 我们提出了一个简单新颖的评价指标,木材得分,该评估过程中鼓励推广。
8. What's in a Name? Are BERT Named Entity Representations just as Good for any other Name? [PDF] 返回目录
Sriram Balasubramanian, Naman Jain, Gaurav Jindal, Abhijeet Awasthi, Sunita Sarawagi
Abstract: We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input. We highlight that on several tasks while such perturbations are natural, state of the art trained models are surprisingly brittle. The brittleness continues even with the recent entity-aware BERT models. We also try to discern the cause of this non-robustness, considering factors such as tokenization and frequency of occurrence. Then we provide a simple method that ensembles predictions from multiple replacements while jointly modeling the uncertainty of type annotations and label predictions. Experiments on three NLP tasks show that our method enhances robustness and increases accuracy on both natural and adversarial datasets.
摘要:我们评估通过从输入同一类型类调查其稳健性替代命名基于BERT-NLP模型的实体表示。我们强调的是在几个任务,而这种扰动的自然,艺术训练的模型的状态都出奇脆。脆性继续甚至与近期实体感知BERT模式。我们也尝试辨别这种非鲁棒性的原因,考虑因素如标记化和发生频率。然后我们提供一个简单的方法,从多个替代合奏的预测,同时联合模拟类型的注释和标签预测的不确定性。三个NLP任务的实验结果表明,我们的方法提高对自然和对抗性的数据集的鲁棒性和准确性提高。
Sriram Balasubramanian, Naman Jain, Gaurav Jindal, Abhijeet Awasthi, Sunita Sarawagi
Abstract: We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input. We highlight that on several tasks while such perturbations are natural, state of the art trained models are surprisingly brittle. The brittleness continues even with the recent entity-aware BERT models. We also try to discern the cause of this non-robustness, considering factors such as tokenization and frequency of occurrence. Then we provide a simple method that ensembles predictions from multiple replacements while jointly modeling the uncertainty of type annotations and label predictions. Experiments on three NLP tasks show that our method enhances robustness and increases accuracy on both natural and adversarial datasets.
摘要:我们评估通过从输入同一类型类调查其稳健性替代命名基于BERT-NLP模型的实体表示。我们强调的是在几个任务,而这种扰动的自然,艺术训练的模型的状态都出奇脆。脆性继续甚至与近期实体感知BERT模式。我们也尝试辨别这种非鲁棒性的原因,考虑因素如标记化和发生频率。然后我们提供一个简单的方法,从多个替代合奏的预测,同时联合模拟类型的注释和标签预测的不确定性。三个NLP任务的实验结果表明,我们的方法提高对自然和对抗性的数据集的鲁棒性和准确性提高。
9. Calling Out Bluff: Attacking the Robustness of Automatic Scoring Systems with Simple Adversarial Testing [PDF] 返回目录
Yaman Kumar, Mehar Bhatia, Anubha Kabra, Jessy Junyi Li, Di Jin, Rajiv Ratn Shah
Abstract: A significant progress has been made in deep-learning based Automatic Essay Scoring (AES) systems in the past two decades. The performance commonly measured by the standard performance metrics like Quadratic Weighted Kappa (QWK), and accuracy points to the same. However, testing on common-sense adversarial examples of these AES systems reveal their lack of natural language understanding capability. Inspired by common student behaviour during examinations, we propose a task agnostic adversarial evaluation scheme for AES systems to test their natural language understanding capabilities and overall robustness.
摘要:显著进展,在深学习基于自动作文评分(AES)系统已经取得在过去的二十年。性能通常通过像二次加权卡伯标准性能度量(QWK),和准确性指向相同的测量。然而,测试这些AES系统的常识对抗的例子揭示其缺乏自然语言理解能力。通过检查过程中常见的学生行为的启发,我们提出了一个任务无关的对抗性评估方案对AES系统,以测试他们的自然语言理解能力和整体稳健性。
Yaman Kumar, Mehar Bhatia, Anubha Kabra, Jessy Junyi Li, Di Jin, Rajiv Ratn Shah
Abstract: A significant progress has been made in deep-learning based Automatic Essay Scoring (AES) systems in the past two decades. The performance commonly measured by the standard performance metrics like Quadratic Weighted Kappa (QWK), and accuracy points to the same. However, testing on common-sense adversarial examples of these AES systems reveal their lack of natural language understanding capability. Inspired by common student behaviour during examinations, we propose a task agnostic adversarial evaluation scheme for AES systems to test their natural language understanding capabilities and overall robustness.
摘要:显著进展,在深学习基于自动作文评分(AES)系统已经取得在过去的二十年。性能通常通过像二次加权卡伯标准性能度量(QWK),和准确性指向相同的测量。然而,测试这些AES系统的常识对抗的例子揭示其缺乏自然语言理解能力。通过检查过程中常见的学生行为的启发,我们提出了一个任务无关的对抗性评估方案对AES系统,以测试他们的自然语言理解能力和整体稳健性。
10. An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models [PDF] 返回目录
Lifu Tu, Garima Lalwani, Spandana Gella, He He
Abstract: Recent work has shown that pre-trained language models such as BERT improve robustness to spurious correlations in the dataset. Intrigued by these results, we find that the key to their success is generalization from a small amount of counterexamples where the spurious correlations do not hold. When such minority examples are scarce, pre-trained models perform as poorly as models trained from scratch. In the case of extreme minority, we propose to use multi-task learning (MTL) to improve generalization. Our experiments on natural language inference and paraphrase identification show that MTL with the right auxiliary tasks significantly improves performance on challenging examples without hurting the in-distribution performance. Further, we show that the gain from MTL mainly comes from improved generalization from the minority examples. Our results highlight the importance of data diversity for overcoming spurious correlations.
摘要:最近的研究表明,预先训练语言模型,如BERT提高稳健性数据集中的伪相关。这些结果吸引,我们发现他们成功的关键是从反例,其中虚假相关不持有少量概括。当这样的少数例子是稀缺的,预先训练的模型执行的不好的从头训练的模型。在极端少数的情况下,我们建议使用多任务学习(MTL),以提高泛化。我们对自然语言的推理和复述鉴定表明,MTL用正确的辅助任务显著提高了具有挑战性的例子,不伤在分布性能性能实验。此外,我们还表明,MTL增益主要来自改善泛化从少数的例子。我们的研究结果强调了克服伪相关数据多样性的重要性。
Lifu Tu, Garima Lalwani, Spandana Gella, He He
Abstract: Recent work has shown that pre-trained language models such as BERT improve robustness to spurious correlations in the dataset. Intrigued by these results, we find that the key to their success is generalization from a small amount of counterexamples where the spurious correlations do not hold. When such minority examples are scarce, pre-trained models perform as poorly as models trained from scratch. In the case of extreme minority, we propose to use multi-task learning (MTL) to improve generalization. Our experiments on natural language inference and paraphrase identification show that MTL with the right auxiliary tasks significantly improves performance on challenging examples without hurting the in-distribution performance. Further, we show that the gain from MTL mainly comes from improved generalization from the minority examples. Our results highlight the importance of data diversity for overcoming spurious correlations.
摘要:最近的研究表明,预先训练语言模型,如BERT提高稳健性数据集中的伪相关。这些结果吸引,我们发现他们成功的关键是从反例,其中虚假相关不持有少量概括。当这样的少数例子是稀缺的,预先训练的模型执行的不好的从头训练的模型。在极端少数的情况下,我们建议使用多任务学习(MTL),以提高泛化。我们对自然语言的推理和复述鉴定表明,MTL用正确的辅助任务显著提高了具有挑战性的例子,不伤在分布性能性能实验。此外,我们还表明,MTL增益主要来自改善泛化从少数的例子。我们的研究结果强调了克服伪相关数据多样性的重要性。
11. Can neural networks acquire a structural bias from raw linguistic data? [PDF] 返回目录
Alex Warstadt, Samuel R. Bowman
Abstract: We evaluate whether BERT, a widely used neural network for sentence processing, acquires an inductive bias towards forming structural generalizations through pretraining on raw data. We conduct four experiments testing its preference for structural vs. linear generalizations in different structure-dependent phenomena. We find that BERT makes a structural generalization in 3 out of 4 empirical domains---subject-auxiliary inversion, reflexive binding, and verb tense detection in embedded clauses---but makes a linear generalization when tested on NPI licensing. We argue that these results are the strongest evidence so far from artificial learners supporting the proposition that a structural bias can be acquired from raw data. If this conclusion is correct, it is tentative evidence that some linguistic universals can be acquired by learners without innate biases. However, the precise implications for human language acquisition are unclear, as humans learn language from significantly less data than BERT.
摘要:我们评估BERT,句子加工广泛使用的神经网络是否获得对通过训练前的原始数据形成结构的概括的归纳偏置。我们进行了四个实验测试其偏好结构与线性在不同结构的依赖现象的概括。我们发现,使BERT的结构概括在四分之三的经验结构域---主题辅助反转,自反结合,和动词在内嵌子句紧张检测---但使一个线性泛化当NPI许可测试。我们认为,这些结果是最有力的证据,到目前为止人工学习者支持这一结构性偏差可以从原始数据采集的命题。如果这个结论是正确的,这是初步证据表明,一些语言的共性可以通过学习者没有先天的偏见被收购。然而,对于人类语言习得的确切含义不明确,作为人类学习语言从比BERT显著更少的数据。
Alex Warstadt, Samuel R. Bowman
Abstract: We evaluate whether BERT, a widely used neural network for sentence processing, acquires an inductive bias towards forming structural generalizations through pretraining on raw data. We conduct four experiments testing its preference for structural vs. linear generalizations in different structure-dependent phenomena. We find that BERT makes a structural generalization in 3 out of 4 empirical domains---subject-auxiliary inversion, reflexive binding, and verb tense detection in embedded clauses---but makes a linear generalization when tested on NPI licensing. We argue that these results are the strongest evidence so far from artificial learners supporting the proposition that a structural bias can be acquired from raw data. If this conclusion is correct, it is tentative evidence that some linguistic universals can be acquired by learners without innate biases. However, the precise implications for human language acquisition are unclear, as humans learn language from significantly less data than BERT.
摘要:我们评估BERT,句子加工广泛使用的神经网络是否获得对通过训练前的原始数据形成结构的概括的归纳偏置。我们进行了四个实验测试其偏好结构与线性在不同结构的依赖现象的概括。我们发现,使BERT的结构概括在四分之三的经验结构域---主题辅助反转,自反结合,和动词在内嵌子句紧张检测---但使一个线性泛化当NPI许可测试。我们认为,这些结果是最有力的证据,到目前为止人工学习者支持这一结构性偏差可以从原始数据采集的命题。如果这个结论是正确的,这是初步证据表明,一些语言的共性可以通过学习者没有先天的偏见被收购。然而,对于人类语言习得的确切含义不明确,作为人类学习语言从比BERT显著更少的数据。
12. BERTERS: Multimodal Representation Learning for Expert Recommendation System with Transformer [PDF] 返回目录
N. Nikzad-Khasmakhi, M. A. Balafar, M.Reza Feizi-Derakhshi, Cina Motamed
Abstract: The objective of an expert recommendation system is to trace a set of candidates' expertise and preferences, recognize their expertise patterns, and identify experts. In this paper, we introduce a multimodal classification approach for expert recommendation system (BERTERS). In our proposed system, the modalities are derived from text (articles published by candidates) and graph (their co-author connections) information. BERTERS converts text into a vector using the Bidirectional Encoder Representations from Transformer (BERT). Also, a graph Representation technique called ExEm is used to extract the features of candidates from the co-author network. Final representation of a candidate is the concatenation of these vectors and other features. Eventually, a classifier is built on the concatenation of features. This multimodal approach can be used in both the academic community and the community question answering. To verify the effectiveness of BERTERS, we analyze its performance on multi-label classification and visualization tasks.
摘要:专家推荐系统的目的是追踪一组候选人的专长和爱好,承认自己的专业知识的模式,并确定专家。在本文中,我们介绍专家推荐系统(BERTERS)多模式分类方法。在我们所提出的系统,该方式从文本(候选人发表的文章)和图形(他们合着的连接)的信息的。 BERTERS转换文本到载体中使用的编码器的双向从交涉变压器(BERT)。此外,一种叫做ExEm图表示技术来提取从合着者网络候选的特征。候选的最后表示是这些载体和其它特征的串联。最终,分类是建立在功能的连接。这种多式联运方法可以同时在学术界和社会各界的答疑中。为了验证BERTERS的有效性,我们分析其对多标签分类和可视化任务的性能。
N. Nikzad-Khasmakhi, M. A. Balafar, M.Reza Feizi-Derakhshi, Cina Motamed
Abstract: The objective of an expert recommendation system is to trace a set of candidates' expertise and preferences, recognize their expertise patterns, and identify experts. In this paper, we introduce a multimodal classification approach for expert recommendation system (BERTERS). In our proposed system, the modalities are derived from text (articles published by candidates) and graph (their co-author connections) information. BERTERS converts text into a vector using the Bidirectional Encoder Representations from Transformer (BERT). Also, a graph Representation technique called ExEm is used to extract the features of candidates from the co-author network. Final representation of a candidate is the concatenation of these vectors and other features. Eventually, a classifier is built on the concatenation of features. This multimodal approach can be used in both the academic community and the community question answering. To verify the effectiveness of BERTERS, we analyze its performance on multi-label classification and visualization tasks.
摘要:专家推荐系统的目的是追踪一组候选人的专长和爱好,承认自己的专业知识的模式,并确定专家。在本文中,我们介绍专家推荐系统(BERTERS)多模式分类方法。在我们所提出的系统,该方式从文本(候选人发表的文章)和图形(他们合着的连接)的信息的。 BERTERS转换文本到载体中使用的编码器的双向从交涉变压器(BERT)。此外,一种叫做ExEm图表示技术来提取从合着者网络候选的特征。候选的最后表示是这些载体和其它特征的串联。最终,分类是建立在功能的连接。这种多式联运方法可以同时在学术界和社会各界的答疑中。为了验证BERTERS的有效性,我们分析其对多标签分类和可视化任务的性能。
13. Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances [PDF] 返回目录
Kundan Krishna, Amy Pavel, Benjamin Schloss, Jeffrey P. Bigham, Zachary C. Lipton
Abstract: Despite diverse efforts to mine various modalities of medical data, the conversations between physicians and patients at the time of care remain an untapped source of insights. In this paper, we leverage this data to extract structured information that might assist physicians with post-visit documentation in electronic health records, potentially lightening the clerical burden. In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels. We focus on the tasks of recognizing relevant diagnoses and abnormalities in the review of organ systems (RoS). One methodological challenge is that the conversations are long (around 1500 words), making it difficult for modern deep-learning models to use them as input. To address this challenge, we extract noteworthy utterances---parts of the conversation likely to be cited as evidence supporting some summary sentence. We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.
摘要:尽管各种努力,以矿山医疗数据的各种方式,在护理时医生和病人之间的对话仍然见解的一个未开发的来源。在本文中,我们利用这些数据提取可能帮助后访问文档医师电子健康记录,可能减轻事务性负担的结构化信息。在这种探索性研究中,我们描述了一个新的数据集,包括谈话笔录,后访问摘要,与之相配套的证据(在成绩单)和结构化的标签。我们专注于认识到器官系统(ROS)的审查有关诊断和异常的任务。一个方法论的挑战是,谈话很长(约1500字),因此很难为现代深学习模型使用它们作为输入。为了应对这一挑战,我们提取值得注意的话语---通话的一部分可能被引用为证据支持一些总结一句话。我们发现,通过值得注意的话语为第一滤波(预测),我们可以显著提高预测性能识别二者的诊断和ROS异常。
Kundan Krishna, Amy Pavel, Benjamin Schloss, Jeffrey P. Bigham, Zachary C. Lipton
Abstract: Despite diverse efforts to mine various modalities of medical data, the conversations between physicians and patients at the time of care remain an untapped source of insights. In this paper, we leverage this data to extract structured information that might assist physicians with post-visit documentation in electronic health records, potentially lightening the clerical burden. In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels. We focus on the tasks of recognizing relevant diagnoses and abnormalities in the review of organ systems (RoS). One methodological challenge is that the conversations are long (around 1500 words), making it difficult for modern deep-learning models to use them as input. To address this challenge, we extract noteworthy utterances---parts of the conversation likely to be cited as evidence supporting some summary sentence. We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.
摘要:尽管各种努力,以矿山医疗数据的各种方式,在护理时医生和病人之间的对话仍然见解的一个未开发的来源。在本文中,我们利用这些数据提取可能帮助后访问文档医师电子健康记录,可能减轻事务性负担的结构化信息。在这种探索性研究中,我们描述了一个新的数据集,包括谈话笔录,后访问摘要,与之相配套的证据(在成绩单)和结构化的标签。我们专注于认识到器官系统(ROS)的审查有关诊断和异常的任务。一个方法论的挑战是,谈话很长(约1500字),因此很难为现代深学习模型使用它们作为输入。为了应对这一挑战,我们提取值得注意的话语---通话的一部分可能被引用为证据支持一些总结一句话。我们发现,通过值得注意的话语为第一滤波(预测),我们可以显著提高预测性能识别二者的诊断和ROS异常。
14. Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR [PDF] 返回目录
Balázs Tarján, György Szaszák, Tibor Fegyó, Péter Mihajlik
Abstract: Recently Deep Transformer models have proven to be particularly powerful in language modeling tasks for ASR. Their high complexity, however, makes them very difficult to apply in the first (single) pass of an online system. Recent studies showed that a considerable part of the knowledge of neural network Language Models (LM) can be transferred to traditional n-grams by using neural text generation based data augmentation. In our paper, we pre-train a GPT-2 Transformer LM on a general text corpus and fine-tune it on our Hungarian conversational call center ASR task. We show that although data augmentation with Transformer-generated text works well for isolating languages, it causes a vocabulary explosion in a morphologically rich language. Therefore, we propose a new method called subword-based neural text augmentation, where we retokenize the generated text into statistically derived subwords. We show that this method can significantly reduce the WER while greatly reducing vocabulary size and memory requirements. Finally, we also show that subword-based neural text augmentation outperforms the word-based approach not only in terms of overall WER but also in recognition of OOV words.
摘要:近日深Transformer模型已被证明是在ASR语言建模任务特别强大。它们的高复杂性,但是,让他们很难在在线系统中的第一(单)通适用。最近的研究表明,神经网络语言模型(LM)的知识有相当一部分可以通过使用神经文本生成基于数据增强被转移到传统的正克。在本文中,我们在一般的文本语料库和微调它在我们的匈牙利对话的呼叫中心ASR任务预培养GPT-2变压器LM。我们表明,尽管与变压器产生的文本数据增强行之有效的孤立语,它会导致在形态丰富的语言词汇爆炸。因此,我们提出了一种称为基于子词神经文本增强,在这里我们retokenize生成的文本到统计上得到的子词新方法。我们表明,这种方法可以显著降低WER同时极大地减少词汇量的大小和内存要求。最后,我们还表明,基于子词神经文本增强优于不仅在整体WER方面,而且在识别的OOV单词基于单词的方法。
Balázs Tarján, György Szaszák, Tibor Fegyó, Péter Mihajlik
Abstract: Recently Deep Transformer models have proven to be particularly powerful in language modeling tasks for ASR. Their high complexity, however, makes them very difficult to apply in the first (single) pass of an online system. Recent studies showed that a considerable part of the knowledge of neural network Language Models (LM) can be transferred to traditional n-grams by using neural text generation based data augmentation. In our paper, we pre-train a GPT-2 Transformer LM on a general text corpus and fine-tune it on our Hungarian conversational call center ASR task. We show that although data augmentation with Transformer-generated text works well for isolating languages, it causes a vocabulary explosion in a morphologically rich language. Therefore, we propose a new method called subword-based neural text augmentation, where we retokenize the generated text into statistically derived subwords. We show that this method can significantly reduce the WER while greatly reducing vocabulary size and memory requirements. Finally, we also show that subword-based neural text augmentation outperforms the word-based approach not only in terms of overall WER but also in recognition of OOV words.
摘要:近日深Transformer模型已被证明是在ASR语言建模任务特别强大。它们的高复杂性,但是,让他们很难在在线系统中的第一(单)通适用。最近的研究表明,神经网络语言模型(LM)的知识有相当一部分可以通过使用神经文本生成基于数据增强被转移到传统的正克。在本文中,我们在一般的文本语料库和微调它在我们的匈牙利对话的呼叫中心ASR任务预培养GPT-2变压器LM。我们表明,尽管与变压器产生的文本数据增强行之有效的孤立语,它会导致在形态丰富的语言词汇爆炸。因此,我们提出了一种称为基于子词神经文本增强,在这里我们retokenize生成的文本到统计上得到的子词新方法。我们表明,这种方法可以显著降低WER同时极大地减少词汇量的大小和内存要求。最后,我们还表明,基于子词神经文本增强优于不仅在整体WER方面,而且在识别的OOV单词基于单词的方法。
15. Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets [PDF] 返回目录
Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan
Abstract: A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are generic for similar images and lack distinctiveness, i.e., cannot properly describe the uniqueness of each image. In this paper, we aim to improve the distinctiveness of image captions through training with sets of similar images. First, we propose a distinctiveness metric -- between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric shows that the human annotations of each image are not equivalent based on distinctiveness. Thus we propose several new training strategies to encourage the distinctiveness of the generated caption for each image, which are based on using CIDErBtw in a weighted loss function or as a reinforcement learning reward. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study.
摘要:大范围的图像字幕车型已经开发,基于流行的指标,如BLEU,苹果酒和SPICE实现显著的改善。然而,尽管所生成的字幕可以准确地描述图像,它们是通用的类似图像和缺乏显着的,即,不能正确地描述了每个图像的唯一性。在本文中,我们的目标是通过与台类似的图像的培训,以提高图像字幕的独特性。首先,我们提出了度量的独特性 - 之间设定的苹果酒(CIDErBtw)相对于那些相似的图像来评估一个标题的独特性。我们的指标显示,每幅图像的人的注释是不等价的基础上的独特性。因此,我们提出了一些新的培训战略,鼓励每个图像,这是基于加权损失函数或作为强化学习奖励使用CIDErBtw生成的字幕的独特性。最后,大量的实验进行,显示出我们的提议的方法显著改善了显着性和精确度(如通过CIDErBtw和检索的度量测量的)(例如,如由苹果酒测量)用于多种图像字幕基线。这些结果通过用户研究进一步证实。
Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan
Abstract: A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are generic for similar images and lack distinctiveness, i.e., cannot properly describe the uniqueness of each image. In this paper, we aim to improve the distinctiveness of image captions through training with sets of similar images. First, we propose a distinctiveness metric -- between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric shows that the human annotations of each image are not equivalent based on distinctiveness. Thus we propose several new training strategies to encourage the distinctiveness of the generated caption for each image, which are based on using CIDErBtw in a weighted loss function or as a reinforcement learning reward. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study.
摘要:大范围的图像字幕车型已经开发,基于流行的指标,如BLEU,苹果酒和SPICE实现显著的改善。然而,尽管所生成的字幕可以准确地描述图像,它们是通用的类似图像和缺乏显着的,即,不能正确地描述了每个图像的唯一性。在本文中,我们的目标是通过与台类似的图像的培训,以提高图像字幕的独特性。首先,我们提出了度量的独特性 - 之间设定的苹果酒(CIDErBtw)相对于那些相似的图像来评估一个标题的独特性。我们的指标显示,每幅图像的人的注释是不等价的基础上的独特性。因此,我们提出了一些新的培训战略,鼓励每个图像,这是基于加权损失函数或作为强化学习奖励使用CIDErBtw生成的字幕的独特性。最后,大量的实验进行,显示出我们的提议的方法显著改善了显着性和精确度(如通过CIDErBtw和检索的度量测量的)(例如,如由苹果酒测量)用于多种图像字幕基线。这些结果通过用户研究进一步证实。
16. Sudo rm -rf: Efficient Networks for Universal Audio Source Separation [PDF] 返回目录
Efthymios Tzinis, Zhepei Wang, Paris Smaragdis
Abstract: In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRMRF) as well as their aggregation which is performed through simple one-dimensional convolutions. In this way, we are able to obtain high quality audio source separation with limited number of floating point operations, memory requirements, number of parameters and latency. Our experiments on both speech and environmental sound separation datasets show that SuDoRMRF performs comparably and even surpasses various state-of-the-art approaches with significantly higher computational resource requirements.
摘要:在本文中,我们提出了端 - 端的通用音频源分离的有效的神经网络。具体地,该卷积网络的主干结构是连续的下采样和多分辨率的重采样特性(SuDoRMRF)以及它是通过简单的一维卷积执行它们的聚集。以这种方式,我们能够得到具有浮点运算,存储器要求,参数和延迟数的数目有限高质量音频源分离。我们对语音和环境声音分离的数据集实验表明,SuDoRMRF执行同等甚至优于各种先进设备,最先进的具有显著更高的计算资源需求的方法。
Efthymios Tzinis, Zhepei Wang, Paris Smaragdis
Abstract: In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRMRF) as well as their aggregation which is performed through simple one-dimensional convolutions. In this way, we are able to obtain high quality audio source separation with limited number of floating point operations, memory requirements, number of parameters and latency. Our experiments on both speech and environmental sound separation datasets show that SuDoRMRF performs comparably and even surpasses various state-of-the-art approaches with significantly higher computational resource requirements.
摘要:在本文中,我们提出了端 - 端的通用音频源分离的有效的神经网络。具体地,该卷积网络的主干结构是连续的下采样和多分辨率的重采样特性(SuDoRMRF)以及它是通过简单的一维卷积执行它们的聚集。以这种方式,我们能够得到具有浮点运算,存储器要求,参数和延迟数的数目有限高质量音频源分离。我们对语音和环境声音分离的数据集实验表明,SuDoRMRF执行同等甚至优于各种先进设备,最先进的具有显著更高的计算资源需求的方法。
注:中文为机器翻译结果!封面为论文标题词云图!