摘要

1. Attention-Based Neural Networks for Sentiment Attitude Extraction using Distant Supervision [PDF] 返回目录
Nicolay Rusnachenko, Natalia Loukachevitch
Abstract: In the sentiment attitude extraction task, the aim is to identify <> -- sentiment relations between entities mentioned in text. In this paper, we provide a study on attention-based context encoders in the sentiment attitude extraction task. For this task, we adapt attentive context encoders of two types: (1) feature-based; (2) self-based. In our study, we utilize the corpus of Russian analytical texts RuSentRel and automatically constructed news collection RuAttitudes for enriching the training set. We consider the problem of attitude extraction as two-class (positive, negative) and three-class (positive, negative, neutral) classification tasks for whole documents. Our experiments with the RuSentRel corpus show that the three-class classification models, which employ the RuAttitudes corpus for training, result in 10% increase and extra 3% by F1, when model architectures include the attention mechanism. We also provide the analysis of attention weight distributions in dependence on the term type.
摘要：在感情的态度提取任务，其目的是找出<<态度>> - 在文中提到的实体之间关系的感悟。在本文中，我们提供的情绪态度提取任务的关注，基于上下文的编码器进行了研究。对于这个任务，我们适应两种类型的周到上下文编码器：（1）基于特征的; （2）自为主。在我们的研究中，我们利用俄罗斯的分析文本RuSentRel的语料库，并自动构造的新闻收集RuAttitudes欲得训练集。我们认为态度提取两个级（正，负）和三级（正面，负面，中性）分级整个文档任务的问题。我们与RuSentRel语料库上的实验，这三个等级分类模型，采用10％的增长和额外3％的F1中，RuAttitudes语料进行训练，结果在模型架构包括注意机制。我们还提供了关注的重量分布的分析，对术语类型的依赖。

2. Efficient Constituency Parsing by Pointing [PDF] 返回目录
Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty, Xiaoli Li
Abstract: We propose a novel constituency parsing model that casts the parsing problem into a series of pointing tasks. Specifically, our model estimates the likelihood of a span being a legitimate tree constituent via the pointing score corresponding to the boundary words of the span. Our parsing model supports efficient top-down decoding and our learning objective is able to enforce structural consistency without resorting to the expensive CKY inference. The experiments on the standard English Penn Treebank parsing task show that our method achieves 92.78 F1 without using pre-trained models, which is higher than all the existing methods with similar time complexity. Using pre-trained BERT, our model achieves 95.48 F1, which is competitive with the state-of-the-art while being faster. Our approach also establishes new state-of-the-art in Basque and Swedish in the SPMRL shared tasks on multilingual constituency parsing.
摘要：我们建议蒙上解析问题分解成一系列的任务指向一个新的选区分析模型。具体地，我们的模型估计间距是经由对应于该跨度的边界词的得分指示一个合法的树结构的可能性。我们的分析模型，支持高效的自上而下的解码和我们的学习目标是能够执行结构一致性而不诉诸昂贵CKY推断。在标准的英语宾州树库解析任务节目的实验，我们的方法达到92.78 F1不使用预训练的模式，这比它们都具有类似的时间复杂度与现有方法更高。使用预训练BERT，我们的模型达到95.48 F1，而速度更快这与国家的最先进的竞争力。我们的做法还建立新的国家的最先进的巴斯克和瑞典在多语种选区解析SPMRL共享任务。

3. A High-Quality Multilingual Dataset for Structured Documentation Translation [PDF] 返回目录
Kazuma Hashimoto, Raffaella Buschiazzo, James Bradbury, Teresa Marshall, Richard Socher, Caiming Xiong
Abstract: This paper presents a high-quality multilingual dataset for the documentation domain to advance research on localization of structured text. Unlike widely-used datasets for translation of plain text, we collect XML-structured parallel text segments from the online documentation for an enterprise software platform. These Web pages have been professionally translated from English into 16 languages and maintained by domain experts, and around 100,000 text segments are available for each language pair. We build and evaluate translation models for seven target languages from English, with several different copy mechanisms and an XML-constrained beam search. We also experiment with a non-English pair to show that our dataset has the potential to explicitly enable $17 \times 16$ translation settings. Our experiments show that learning to translate with the XML tags improves translation accuracy, and the beam search accurately generates XML structures. We also discuss trade-offs of using the copy mechanisms by focusing on translation of numerical words and named entities. We further provide a detailed human analysis of gaps between the model output and human translations for real-world applications, including suitability for post-editing.
摘要：本文介绍了文档域预先研究有关结构化文本的本地化高质量的多语种数据集。不像纯文本的翻译被广泛使用的数据集，我们收集了企业软件平台的在线文档的XML结构化的并行文本段。这些网页已经从英语专业被翻译成16种语言和领域专家维护，以及约10万文字段为每种语言对。我们建立和评估翻译模式从英语7个目的的语言，有几种不同的复制机制和XML约束的定向搜索。我们也实验非英语对以表明我们的数据有明确启用$ 17 \ 16所翻译$设置的潜力。我们的实验表明，学习与XML标签翻译提高翻译的准确性，光束搜索准确地生成XML结构。我们还专注于数字词和命名实体翻译讨论使用复制机制的权衡。我们进一步提供模型输出和真实世界的应用人工翻译，包括适宜后期编辑之间的缝隙进行详细分析的人。

4. One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble [PDF] 返回目录
Kaili Vesik, Muhammad Abdul-Mageed, Miikka Silfverberg
Abstract: The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis. Similar to other speech and language processing tasks, in a scenario where only small-sized training data are available, learning G2P models is challenging. We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages. Our models are developed as part of our participation in the SIGMORPHON 2020 Shared Task 1 focused at G2P. Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.
摘要：字形 - 音素（G2P）转换的任务是两种语音识别和合成重要。类似于其他语音和语言处理任务，在只有小型的训练数据是可用的情景，学习G2P模型是一个挑战。我们描述了利用模式集合的一个简单的方法，基于多语种变压器和自我培训，制定15种语言的一个非常有效的解决方案G2P。我们的模型的开发作为我们在SIGMORPHON 2020共享任务1聚焦在G2P参与的一部分。我们最好的车型达到14.99字错误率（WER）和3.30音素错误率（PER），在共享任务的竞争基准的相当大的改进。

5. Classifying Referential and Non-referential It Using Gaze [PDF] 返回目录
Victoria Yaneva, Le An Ha, Richard Evans, Ruslan Mitkov
Abstract: When processing a text, humans and machines must disambiguate between different uses of the pronoun it, including non-referential, nominal anaphoric or clause anaphoric ones. In this paper, we use eye-tracking data to learn how humans perform this disambiguation. We use this knowledge to improve the automatic classification of it. We show that by using gaze data and a POS-tagger we are able to significantly outperform a common baseline and classify between three categories of it with an accuracy comparable to that of linguisticbased approaches. In addition, the discriminatory power of specific gaze features informs the way humans process the pronoun, which, to the best of our knowledge, has not been explored using data from a natural reading task.
摘要：当处理文本，人类和机器必须代词it的不同用途，包括非指涉，标称照应或条款照应者间的歧义。在本文中，我们使用眼动跟踪数据来了解人类如何执行此消歧。我们用这些知识来提高它的自动分类。我们发现，通过使用凝视数据和POS-恶搞我们能够显著优于公共基线和三类之间它具有精度媲美的linguisticbased方法分类。此外，特定的凝视歧视功耗特性运筹学人类加工的代名词，其中，对我们所知，一直没有使用来自天然的阅读任务数据探索的方式。

6. Supervised Understanding of Word Embeddings [PDF] 返回目录
Halid Ziya Yerebakan, Parmeet Bhatia, Yoshihisa Shinagawa
Abstract: Pre-trained word embeddings are widely used for transfer learning in natural language processing. The embeddings are continuous and distributed representations of the words that preserve their similarities in compact Euclidean spaces. However, the dimensions of these spaces do not provide any clear interpretation. In this study, we have obtained supervised projections in the form of the linear keyword-level classifiers on word embeddings. We have shown that the method creates interpretable projections of original embedding dimensions. Activations of the trained classifier nodes correspond to a subset of the words in the vocabulary. Thus, they behave similarly to the dictionary features while having the merit of continuous value output. Additionally, such dictionaries can be grown iteratively with multiple rounds by adding expert labels on top-scoring words to an initial collection of the keywords. Also, the same classifiers can be applied to aligned word embeddings in other languages to obtain corresponding dictionaries. In our experiments, we have shown that initializing higher-order networks with these classifier weights gives more accurate models for downstream NLP tasks. We further demonstrate the usefulness of supervised dimensions in revealing the polysemous nature of a keyword of interest by projecting it's embedding using learned classifiers in different sub-spaces.
摘要：预先训练字嵌入物广泛用于自然语言处理转移学习。该嵌入物是是保存在紧凑的欧氏空间的相似之处的话连续和分布式表示。然而，这些空间的尺寸不提供任何明确的解释。在这项研究中，我们已在上字的嵌入线性关键字级分类器的形式获得监管突起。我们已经表明，该方法将创建原始嵌入尺寸的解释预测。经训练的分类节点激活对应于在词汇字的子集。因此，它们表现类似于字典功能，同时具有连续值的输出的优点。此外，这样的字典可以反复多轮通过增加对得分最高的词专家标签，关键字的初始集合增长。此外，相同的分类器可以应用到对齐的字的嵌入在其他语言，以获得对应的词典。在我们的实验中，我们已经表明，这些分类的权重初始化高阶网络提供了更精确的模型下游NLP任务。我们进一步证明通过投影它在不同的子空间利用学到的分类嵌入露出感兴趣的关键字的多义词的本质监督方面的实用性。

7. Automating Text Naturalness Evaluation of NLG Systems [PDF] 返回目录
Erion Çano, Ondřej Bojar
Abstract: Automatic methods and metrics that assess various quality criteria of automatically generated texts are important for developing NLG systems because they produce repeatable results and allow for a fast development cycle. We present here an attempt to automate the evaluation of text naturalness which is a very important characteristic of natural language generation methods. Instead of relying on human participants for scoring or labeling the text samples, we propose to automate the process by using a human likeliness metric we define and a discrimination procedure based on large pretrained language models with their probability distributions. We analyze the text probability fractions and observe how they are influenced by the size of the generative and discriminative models involved in the process. Based on our results, bigger generators and larger pretrained discriminators are more appropriate for a better evaluation of text naturalness. A comprehensive validation procedure with human participants is required as follow up to check how well this automatic evaluation scheme correlates with human judgments.
摘要：该评估自动生成文本的各种质量标准的自动方法和衡量标准是发展NLG系统非常重要，因为它们产生重复的结果，并允许快速的开发周期。我们在座的企图自动文本自然的评价这是自然语言生成方法非常重要的特性。而不是依靠人类参与者进行打分或标记文本样本，我们建议使用相似性指标，我们定义一个人，并根据大型预训练的语言模型及其概率分布的鉴别过程的过程自动化。我们分析文本机率分数，并观察他们是如何参与这一进程的生成和判别模型的大小的影响。根据我们的结果，更大的发电机和更大的预训练鉴别更适合文本自然的了较好的评价。与人类参与者全面的验证过程是必需的跟进检查如何与人的判断这个自动评估方案相关因素。

8. Document Classification for COVID-19 Literature [PDF] 返回目录
Bernal Jiménez Gutiérrez, Juncheng Zeng, Dongdong Zhang, Ping Zhang, Yu Su
Abstract: The global pandemic has made it more important than ever to quickly and accurately retrieve relevant scientific literature for effective consumption by researchers in a wide range of fields. We provide an analysis of several multi-label document classification models on the LitCovid dataset, a growing collection of 8,000 research papers regarding the novel 2019 coronavirus. We find that pre-trained language models fine-tuned on this dataset outperform all other baselines and that the BioBERT and novel Longformer models surpass all others with almost equivalent micro-F1 and accuracy scores of around 81% and 69% on the test set. We evaluate the data efficiency and generalizability of these models as essential features of any system prepared to deal with an urgent situation like the current health crisis. Finally, we explore 50 errors made by the best performing models on LitCovid documents and find that they often (1) correlate certain labels too closely together and (2) fail to focus on discriminative sections of the articles; both of which are important issues to address in future work. Both data and code are available on GitHub.
摘要：全球性流行病，所以比以往任何时候都能够快速，准确地更重要检索由研究人员进行有效的消费相关科学文献在广泛的领域。我们提供的几个多标签文档分类模型的LitCovid数据集，8000篇关于新型冠状2019研究论文越来越多的收集分析。我们发现，预先训练的语言模型微调对这个数据集胜过所有其他基线和该BioBERT和新颖Longformer车型超越所有其他人拥有约81％，几乎相当于微型F1和准确性分数，并在测试集69％。我们评估这些模型的数据的效率和普遍性的任何系统的基本特征准备对付像目前的健康危机的紧急情况。最后，我们探讨LitCovid文件通过性能最佳的车型做50个错误，发现他们经常（1）关联特定的标签过于紧密地联系在一起;（2）不把重点放在文章的歧视性部分;这两者都是在今后的工作中解决的重要问题。数据和代码都可以在GitHub上。

9. Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings [PDF] 返回目录
David Chang, Ivana Balazevic, Carl Allen, Daniel Chawla, Cynthia Brandt, Richard Andrew Taylor
Abstract: Much of biomedical and healthcare data is encoded in discrete, symbolic form such as text and medical codes. There is a wealth of expert-curated biomedical domain knowledge stored in knowledge bases and ontologies, but the lack of reliable methods for learning knowledge representation has limited their usefulness in machine learning applications. While text-based representation learning has significantly improved in recent years through advances in natural language processing, attempts to learn biomedical concept embeddings so far have been lacking. A recent family of models called knowledge graph embeddings have shown promising results on general domain knowledge graphs, and we explore their capabilities in the biomedical domain. We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph, provide a benchmark with comparison to existing methods and in-depth discussion on best practices, and make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation. The embeddings, code, and materials will be made available to the communitY.
摘要：许多生物医学和保健数据的以离散，符号形式编码的如文本和医疗代码。有丰富的存储在知识基础和本体专家策划的生物医学领域知识，但缺乏学习知识表示可靠的方法限制了其在机器学习应用的有用性。虽然基于文本的表示学习已经在近年来通过自然语言处理的进步显著改善，尝试学会为止一直缺乏生物医药概念的嵌入。最近的模型称为知识图嵌入家庭已经表现出对通用领域知识图可喜的成果，我们在生物医学领域开拓自己的能力。我们培养国家的最先进的一些知识图嵌入在SNOMED-CT的知识图模型，以现有的方法和深入的讨论提供一个基准与比较的最佳做法，并进行了案例借力多的重要性知识图的 - 关系性质的学习生物医学知识表示。该嵌入物，代码和材料将被提供给社会。

10. The NetHack Learning Environment [PDF] 返回目录
Heinrich Küttler, Nantas Nardelli, Alexander H. Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, Tim Rocktäschel
Abstract: Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience. We compare NLE and its task suite to existing alternatives, and discuss why it is an ideal medium for testing the robustness and systematic generalization of RL agents. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment. NLE is open source at this https URL.
摘要：在强化学习（RL）算法进展齐头并进在手具有挑战性的环境是测试当前方法的限制的发展。虽然现有RL的环境或者是足够复杂的或基于快速模拟，他们很少两者。在这里，我们目前的NetHack学习环境（NLE），基于流行的单人可扩展的，程序产生的，随机的，丰富的，并具有挑战性的环境RL研究基于终端roguelike游戏，NetHack。我们认为，NetHack是足够复杂的驱动上，如勘探，规划，获取技能和语言空调RL问题的长期研究，同时显着降低了计算资源，聚集了大量的经验要求。我们比较NLE，其任务套件，以现有的替代品，并讨论为什么它是用于测试RL代理的健壮性和系统的推广的理想平台。我们演示了使用分布式深RL基准和随机网络蒸馏探索，一起在环境训练的各种药物的定性分析游戏的早期阶段经验的成功。 NLE在这个HTTPS URL开源。

11. Differentiable Window for Dynamic Local Attention [PDF] 返回目录
Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty, Xiaoli Li
Abstract: We propose Differentiable Window, a new neural module and general purpose component for dynamic window selection. While universally applicable, we demonstrate a compelling use case of utilizing Differentiable Window to improve standard attention modules by enabling more focused attentions over the input regions. We propose two variants of Differentiable Window, and integrate them within the Transformer architecture in two novel ways. We evaluate our proposed approach on a myriad of NLP tasks, including machine translation, sentiment analysis, subject-verb agreement and language modeling. Our experimental results demonstrate consistent and sizable improvements across all tasks.
摘要：我们建议可微窗口，动态窗口选择一个新的神经模块和通用组件。虽然普遍适用的，我们展示了利用微的窗口通过在输入区启用更集中注意力，提高标准的关注模块的一个引人注目的使用情况。我们建议可微窗的两个变种，并以两种新的方式变压器架构内集成。我们评估我们提出了关于NLP任务，包括机器翻译，情感分析，主谓一致和语言建模无数的方法。我们的实验结果表明，在所有的任务一致和相当大的改进。

12. Crossmodal Language Grounding in an Embodied Neurocognitive Model [PDF] 返回目录
Stefan Heinrich, Yuan Yao, Tobias Hinz, Zhiyuan Liu, Thomas Hummel, Matthias Kerzel, Cornelius Weber, Stefan Wermter
Abstract: Human infants are able to acquire natural language seemingly easily at an early age. Their language learning seems to occur simultaneously with learning other cognitive functions as well as with playful interactions with the environment and caregivers. From a neuroscientific perspective, natural language is embodied, grounded in most, if not all, sensory and sensorimotor modalities, and acquired by means of crossmodal integration. However, characterising the underlying mechanisms in the brain is difficult and explaining the grounding of language in crossmodal perception and action remains challenging. In this paper, we present a neurocognitive model for language grounding which reflects bio-inspired mechanisms such as an implicit adaptation of timescales as well as end-to-end multimodal abstraction. It addresses developmental robotic interaction and extends its learning capabilities using larger-scale knowledge-based data. In our scenario, we utilise the humanoid robot NICO in obtaining the EMIL data collection, in which the cognitive robot interacts with objects in a children's playground environment while receiving linguistic labels from a caregiver. The model analysis shows that crossmodally integrated representations are sufficient for acquiring language merely from sensory input through interaction with objects in an environment. The representations self-organise hierarchically and embed temporal and spatial information through composition and decomposition. This model can also provide the basis for further crossmodal integration of perceptually grounded cognitive representations.
摘要：人类婴儿能够在早期的年龄看似轻松地获得自然语言。他们的语言学习似乎与学习其他认知功能以及与环境和护理人员俏皮的互动同时发生。从神经科学的角度看，自然语言体现，在大多数接地，如果不是全部，感觉和感觉方式，和crossmodal整合的方式取得。然而，表征在大脑中的基本机制是困难的，并解释在crossmodal感知语言的接地和行动仍然充满挑战。在本文中，我们提出了语言接地一个神经认知模型反映仿生机制，如时间尺度的隐式自适应以及终端到终端的多模态抽象。它涉及发展机器人互动，并延伸使用较大规模的以知识为基础的数据作为其学习能力。在我们的场景中，我们利用在获得EMIL数据采集，其中在儿童游乐场环境物体的认知机器人相互作用而从护理人员接收语言标签人形机器人NICO。该模型分析表明，crossmodally综合表现就足以从感觉输入通过互动与环境中的物体获取语言仅仅是。所述表示分层自组织并嵌入通过组合和分解的时间和空间信息。这种模式还可以提供感知接地认知表征的进一步整合crossmodal的基础。

13. Black-box Adaptation of ASR for Accented Speech [PDF] 返回目录
Kartik Khandelwal, Preethi Jyothi, Abhijeet Awasthi, Sunita Sarawagi
Abstract: We introduce the problem of adapting a black-box, cloud-based ASR system to speech from a target accent. While leading online ASR services obtain impressive performance on main-stream accents, they perform poorly on sub-populations - we observed that the word error rate (WER) achieved by Google's ASR API on Indian accents is almost twice the WER on US accents. Existing adaptation methods either require access to model parameters or overlay an error-correcting module on output transcripts. We highlight the need for correlating outputs with the original speech to fix accent errors. Accordingly, we propose a novel coupling of an open-source accent-tuned local model with the black-box service where the output from the service guides frame-level inference in the local model. Our fine-grained merging algorithm is better at fixing accent errors than existing word-level combination strategies. Experiments on Indian and Australian accents with three leading ASR models as service, show that we achieve as much as 28% relative reduction in WER over both the local and service models.
摘要：介绍从目标口音适应一个黑盒子，基于云的ASR系统语音的问题。虽然领先的在线ASR服务获得主流的口音令人印象深刻的表现，他们在亚群体表现不佳 - 我们注意到，通过谷歌的ASR API对印度口音取得的字错误率（WER）是几乎两倍于WER美国口音。现有的适应方法或者需要访问模型参数或输出上叠加转录错误校正模块。我们强调与原始语音来修复错误口音相关输出的需要。因此，我们提出了一种开源口音调谐局部模型的新颖联轴器与所述黑盒服务，其中来自服务指南帧级推理在局部模型的输出。我们的细粒度合并算法是比现有的字级组合策略固定口音错误更好。对印度的实验和澳大利亚口音拥有三大领先的ASR模型作为服务，表明我们在本地和服务模式都实现了WER高达28％的相对减少。

14. On Analyzing Annotation Consistency in Online Abusive Behavior Datasets [PDF] 返回目录
Md Rabiul Awal, Rui Cao, Roy Ka-Wei Lee, Sandra Mitrović
Abstract: Online abusive behavior is an important issue that breaks the cohesiveness of online social communities and even raises public safety concerns in our societies. Motivated by this rising issue, researchers have proposed, collected, and annotated online abusive content datasets. These datasets play a critical role in facilitating the research on online hate speech and abusive behaviors. However, the annotation of such datasets is a difficult task; it is often contentious on what should be the true label of a given text as the semantic difference of the labels may be blurred (e.g., abusive and hate) and often subjective. In this study, we proposed an analytical framework to study the annotation consistency in online hate and abusive content datasets. We applied our proposed framework to evaluate the consistency of the annotation in three popular datasets that are widely used in online hate speech and abusive behavior studies. We found that there is still a substantial amount of annotation inconsistency in the existing datasets, particularly when the labels are semantically similar.
摘要：网上辱骂的行为是休息的在线社交社区的凝聚力，甚至引起了我们社会中的公众安全问题的一个重要问题。这个上升问题的启发，研究人员提出，收集，并注明网上辱骂内容的数据集。这些数据集在促进在线仇恨言论和虐待行为的研究至关重要的作用。然而，这样的数据集的注释是一项艰巨的任务;它往往是在什么应该是一个给定文本的正确标签为标签的语义差别可能会模糊（例如，虐待和仇恨），并往往是主观的争议。在这项研究中，我们提出了一个分析框架来研究网上仇恨和恶意内容的数据集注释的一致性。我们应用我们提出的框架，以评估三种流行的数据集的注解，被广泛应用于网上仇恨言论和滥用行为研究的一致性。我们发现，仍然有标注不一致的现有数据集的大量，尤其是当标签语义相似。

15. Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes [PDF] 返回目录
Shuai Zheng, Haibin Lin, Sheng Zha, Mu Li
Abstract: BERT has recently attracted a lot of attention in natural language understanding (NLU) and achieved state-of-the-art results in various NLU tasks. However, its success requires large deep neural networks and huge amount of data, which result in long training time and impede development progress. Using stochastic gradient methods with large mini-batch has been advocated as an efficient tool to reduce the training time. Along this line of research, LAMB is a prominent example that reduces the training time of BERT from 3 days to 76 minutes on a TPUv3 Pod. In this paper, we propose an accelerated gradient method called LANS to improve the efficiency of using large mini-batches for training. As the learning rate is theoretically upper bounded by the inverse of the Lipschitz constant of the function, one cannot always reduce the number of optimization iterations by selecting a larger learning rate. In order to use larger mini-batch size without accuracy loss, we develop a new learning rate scheduler that overcomes the difficulty of using large learning rate. Using the proposed LANS method and the learning rate scheme, we scaled up the mini-batch sizes to 96K and 33K in phases 1 and 2 of BERT pretraining, respectively. It takes 54 minutes on 192 AWS EC2 P3dn.24xlarge instances to achieve a target F1 score of 90.5 or higher on SQuAD v1.1, achieving the fastest BERT training time in the cloud.
摘要：BERT最近吸引了自然语言理解（NLU），并取得国家的先进成果在各种NLU任务了大量的关注。然而，它的成功需要大量的深层神经网络和数据，导致训练时间长，阻碍发展进步的巨大数额。使用大小批量随机梯度方法一直主张作为一种有效的工具，以减少培训时间。沿着这条线的研究，羊肉是降低3天BERT的训练时间上TPUv3波德76分钟，而突出的例子。在本文中，我们提出了所谓的局域网可以提高使用大量小型批次培训的效率加速梯度法。作为学习率理论上上由函数的Lipschitz常数的倒数为界，不能总是通过选择较大的学习速率减少优化迭代的次数。为了使用较大的小批量无精度损失，我们开发了一个新的学习速率的调度程序，克服了使用大学习率的难度。使用建议LANS方法和学习率方案，我们扩大规模的小批量到96K和33K在阶段1和BERT 2分别训练前，。这需要在192个AWS EC2实例P3dn.24xlarge54分钟实现对阵容V1.1 90.5或更高的目标F1成绩，实现在云中最快的BERT培训时间。

16. Robot Object Retrieval with Contextual Natural Language Queries [PDF] 返回目录
Thao Nguyen, Nakul Gopalan, Roma Patel, Matt Corsaro, Ellie Pavlick, Stefanie Tellex
Abstract: Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying the desired object's type such as "scissors" and/or visual attributes such as "red," thus limiting the robot to only known object classes. We develop a model to retrieve objects based on descriptions of their usage. The model takes in a language command containing a verb, for example "Hand me something to cut," and RGB images of candidate objects and selects the object that best satisfies the task specified by the verb. Our model directly predicts an object's appearance from the object's use specified by a verb phrase. We do not need to explicitly specify an object's class label. Our approach allows us to predict high level concepts like an object's utility based on the language query. Based on contextual information present in the language commands, our model can generalize to unseen object classes and unknown nouns in the commands. Our model correctly selects objects out of sets of five candidates to fulfill natural language commands, and achieves an average accuracy of 62.3% on a held-out test set of unseen ImageNet object classes and 53.0% on unseen object classes and unknown nouns. Our model also achieves an average accuracy of 54.7% on unseen YCB object classes, which have a different image distribution from ImageNet objects. We demonstrate our model on a KUKA LBR iiwa robot arm, enabling the robot to retrieve objects based on natural language descriptions of their usage. We also present a new dataset of 655 verb-object pairs denoting object usage over 50 verbs and 216 object classes.
摘要：自然语言检索对象是在以人为中心的环境机器人一个非常有用而具有挑战性的任务。以前的工作主要集中在指定所需对象的类型的命令，如“剪刀”和/或视觉属性，例如“红”，因此限制了机器人唯一已知的对象类。我们开发了一个模型来获取基于其使用的描述对象。该模型采用包含一个动词语言命令，例如“我的手的东西切”和RGB图像候选对象和选择的对象，最好的满足动词所指定的任务。我们的模型预测，直接从动词短语指定对象的使用对象的外观。我们并不需要显式地指定对象的类标签。我们的方法使我们能够预测像基于语言的查询对象的效用高层次的概念。基于上下文存在于语言命令的信息，我们的模型可以推广到看不见的对象类，并在命令未知的名词。我们的模型正确地选择对象了套五名候选人履行自然语言指令，实现对保留检验一套看不见的ImageNet对象类和53.0％，在看不见的对象类和未知的名词的62.3％的平均准确度。我们的模型也达到54.7％上看不见YCB对象类，其具有从ImageNet对象不同的图像分布的平均精确度。我们证明了我们在库卡LBR iiwa机器人手臂模型，使机器人来检索根据其使用的自然语言描述的对象。我们还提出的655超过50个动词和216的对象类，表示对象使用动词 - 对象对新的数据集。

注：中文为机器翻译结果！

WITH LOVE OF WORLD

【arxiv论文】 Computation and Language 2020-06-25

目录

摘要