nlp research papers 2023

Subscribe to the PwC Newsletter

Join the community, natural language processing, language modelling.

Long-range modeling

Protein language model, sentence pair modeling, representation learning.

Disentanglement

Graph representation learning, sentence embeddings.

Network Embedding

Classification.

Text Classification

Graph Classification

Audio Classification

Medical Image Classification

Text retrieval, deep hashing, table retrieval, nlp based person retrival, question answering.

Open-Ended Question Answering

Open-Domain Question Answering

Conversational question answering.

Knowledge Base Question Answering

Image generation.

Image-to-Image Translation

Text-to-Image Generation

Image Inpainting

Conditional Image Generation

Translation, data augmentation.

Image Augmentation

Text Augmentation

Large language model.

Knowledge Graphs

Machine translation.

Transliteration

Multimodal Machine Translation

Bilingual lexicon induction.

Unsupervised Machine Translation

Knowledge graph completion, triple classification, inductive knowledge graph completion, inductive relation prediction, text generation.

Dialogue Generation

Data-to-Text Generation

Multi-Document Summarization

Story Generation

2d semantic segmentation, image segmentation, text style transfer.

Scene Parsing

Reflection Removal

Visual question answering (vqa).

Visual Question Answering

Machine Reading Comprehension

Chart Question Answering

Chart understanding.

Topic Models

Document Classification

Sentence Classification

Emotion Classification

Data-free knowledge distillation.

Benchmarking

Sentiment analysis.

Aspect-Based Sentiment Analysis (ABSA)

Multimodal Sentiment Analysis

Aspect Sentiment Triplet Extraction

Twitter Sentiment Analysis

Named entity recognition (ner).

Nested Named Entity Recognition

Chinese named entity recognition, few-shot ner, few-shot learning.

One-Shot Learning

Few-Shot Semantic Segmentation

Cross-domain few-shot.

Unsupervised Few-Shot Learning

Optical character recognition (ocr).

Active Learning

Handwriting Recognition

Handwritten digit recognition, irregular text recognition, word embeddings.

Learning Word Embeddings

Multilingual Word Embeddings

Embeddings evaluation, contextualised word representations, continual learning.

Class Incremental Learning

Continual named entity recognition, unsupervised class-incremental learning, information retrieval.

Passage Retrieval

Cross-lingual information retrieval, table search, text summarization.

Abstractive Text Summarization

Document summarization, opinion summarization, relation extraction.

Relation Classification

Document-level relation extraction, joint entity and relation extraction, temporal relation extraction, link prediction.

Inductive Link Prediction

Dynamic link prediction, hyperedge prediction, anchor link prediction, natural language inference.

Answer Generation

Visual Entailment

Cross-lingual natural language inference, reading comprehension.

Intent Recognition

Implicit relations, active object detection, emotion recognition.

Speech Emotion Recognition

Emotion Recognition in Conversation

Multimodal Emotion Recognition

Emotion-cause pair extraction, natural language understanding, vietnamese social media text processing.

Emotional Dialogue Acts

Image captioning.

3D dense captioning

Controllable image captioning, aesthetic image captioning.

Relational Captioning

Semantic textual similarity.

Paraphrase Identification

Cross-Lingual Semantic Textual Similarity

In-context learning, event extraction, event causality identification, zero-shot event extraction, dialogue state tracking, task-oriented dialogue systems.

Visual Dialog

Dialogue understanding, code generation.

Code Translation

Code documentation generation, class-level code generation, library-oriented code generation, coreference resolution, coreference-resolution, cross document coreference resolution, semantic parsing.

AMR Parsing

Semantic dependency parsing, drs parsing, ucca parsing, conformal prediction.

Text Simplification

Music Source Separation

Decision Making Under Uncertainty

Audio source separation, semantic similarity.

Sentence Embedding

Sentence compression, joint multilingual sentence representations, sentence embeddings for biomedical texts, specificity, instruction following, visual instruction following, dependency parsing.

Transition-Based Dependency Parsing

Prepositional phrase attachment, unsupervised dependency parsing, cross-lingual zero-shot dependency parsing, information extraction, extractive summarization, temporal information extraction, document-level event extraction, cross-lingual, cross-lingual transfer, cross-lingual document classification.

Cross-Lingual Entity Linking

Cross-language text summarization, common sense reasoning.

Physical Commonsense Reasoning

Riddle sense, memorization, prompt engineering.

Visual Prompting

Response generation, data integration.

Entity Alignment

Entity Resolution

Table annotation, mathematical reasoning.

Math Word Problem Solving

Formal logic, geometry problem solving, abstract algebra, entity linking.

Question Generation

Poll generation.

Topic coverage

Dynamic topic modeling, part-of-speech tagging.

Unsupervised Part-Of-Speech Tagging

Abuse detection, hate speech detection, open information extraction.

Hope Speech Detection

Hate speech normalization, hate speech detection crisishatemm benchmark, data mining.

Argument Mining

Opinion Mining

Subgroup discovery, cognitive diagnosis, sequential pattern mining, bias detection, selection bias, language identification, dialect identification, native language identification, word sense disambiguation.

Word Sense Induction

Fake news detection, few-shot relation classification, implicit discourse relation classification, cause-effect relation classification, intrusion detection.

Network Intrusion Detection

Relational Reasoning

Semantic Role Labeling

Predicate Detection

Semantic role labeling (predicted predicates).

Textual Analogy Parsing

Slot filling.

Zero-shot Slot Filling

Extracting covid-19 events from twitter, grammatical error correction.

Grammatical Error Detection

Text matching, symbolic regression, equation discovery, document text classification.

Learning with noisy labels

Multi-label classification of biomedical texts, political salient issue orientation detection, pos tagging, spoken language understanding, dialogue safety prediction, deep clustering, trajectory clustering, deep nonparametric clustering, nonparametric deep clustering, stance detection, zero-shot stance detection, few-shot stance detection, stance detection (us election 2020 - biden), stance detection (us election 2020 - trump), multi-modal entity alignment, intent detection.

Open Intent Detection

Word similarity, model editing, knowledge editing, cross-modal retrieval, image-text matching, cross-modal retrieval with noisy correspondence, multilingual cross-modal retrieval.

Zero-shot Composed Person Retrieval

Cross-modal retrieval on rsitmd, document ai, document understanding, fact verification, text-to-speech synthesis.

Prosody Prediction

Zero-shot multi-speaker tts, intent classification.

Zero-Shot Cross-Lingual Transfer

Cross-lingual ner, self-learning, language acquisition, grounded language learning, constituency parsing.

Constituency Grammar Induction

Entity typing.

Entity Typing on DH-KGs

Line items extraction, word alignment, ad-hoc information retrieval, document ranking.

Multimodal Deep Learning

Multimodal text and image classification, abstract meaning representation, open-domain dialog, dialogue evaluation, novelty detection.

text-guided-image-editing

Text-based image editing, concept alignment.

Zero-Shot Text-to-Image Generation

Conditional text-to-image synthesis.

Shallow Syntax

Molecular representation, multi-label text classification, explanation generation, discourse parsing, discourse segmentation, connective detection, de-identification, privacy preserving deep learning, morphological analysis.

Text-to-Video Generation

Text-to-video editing, subject-driven video generation, conversational search, sarcasm detection.

Lemmatization

Speech-to-text translation, simultaneous speech-to-text translation.

Aspect Extraction

Aspect category sentiment analysis, extract aspect.

Aspect-Category-Opinion-Sentiment Quadruple Extraction

Aspect-oriented Opinion Extraction

Session search, knowledge distillation.

Self-Knowledge Distillation

Chinese Word Segmentation

Handwritten chinese text recognition, chinese spelling error correction, chinese zero pronoun resolution, offline handwritten chinese character recognition, entity disambiguation, authorship attribution, source code summarization, method name prediction, text clustering.

Short Text Clustering

Open Intent Discovery

Hierarchical text clustering, linguistic acceptability.

Column Type Annotation

Cell entity annotation, columns property annotation, row annotation.

Visual Storytelling

KG-to-Text Generation

Unsupervised KG-to-Text Generation

Abusive language, keyphrase extraction, few-shot text classification, zero-shot out-of-domain detection, safety alignment, key information extraction, key-value pair extraction, multilingual nlp, protein folding, term extraction, text2text generation, keyphrase generation, figurative language visualization, sketch-to-text generation, morphological inflection, phrase grounding, grounded open vocabulary acquisition, deep attention, spam detection, context-specific spam detection, traditional spam detection, word translation, natural language transduction, image-to-text retrieval, summarization, unsupervised extractive summarization, query-focused summarization.

Conversational Response Selection

Cross-lingual word embeddings, knowledge base population, passage ranking, text annotation, authorship verification.

Multimodal Association

Multimodal generation, video generation, image to video generation.

Unconditional Video Generation

Keyword extraction, biomedical information retrieval.

SpO2 estimation

Meme classification, hateful meme classification, news classification, graph-to-sequence, nlg evaluation, automated essay scoring, morphological tagging, key point matching, component classification, argument pair extraction (ape), claim extraction with stance classification (cesc), claim-evidence pair extraction (cepe), temporal processing, timex normalization, document dating, sentence summarization, unsupervised sentence summarization, long-context understanding, weakly supervised classification, weakly supervised data denoising, entity extraction using gan.

Rumour Detection

Semantic retrieval, emotional intelligence, dark humor detection, review generation, semantic composition.

Sentence Ordering

1 image, 2*2 stitchi, comment generation.

Goal-Oriented Dialog

User simulation, lexical simplification, sentence-pair classification, conversational response generation.

Personalized and Emotional Conversation

Token classification, toxic spans detection.

Blackout Poetry Generation

Humor detection.

Passage Re-Ranking

Subjectivity analysis.

Taxonomy Learning

Taxonomy expansion, hypernym discovery, propaganda detection, propaganda span identification, propaganda technique identification, lexical normalization, pronunciation dictionary creation, negation detection, negation scope resolution, question similarity, medical question pair similarity computation, intent discovery, reverse dictionary, lexical analysis, lexical complexity prediction, question rewriting, legal reasoning, punctuation restoration, attribute value extraction.

Hallucination Evaluation

Meeting summarization, table-based fact verification, diachronic word embeddings, pretrained multilingual language models, formality style transfer, semi-supervised formality style transfer, word attribute transfer, aspect category detection, extreme summarization.

Persian Sentiment Analysis

Binary classification, llm-generated text detection, cancer-no cancer per breast classification, cancer-no cancer per image classification, stable mci vs progressive mci, suspicous (birads 4,5)-no suspicous (birads 1,2,3) per image classification, clinical concept extraction.

Clinical Information Retreival

Constrained clustering.

Only Connect Walls Dataset Task 1 (Grouping)

Incremental constrained clustering, recognizing emotion cause in conversations.

Causal Emotion Entailment

trustable and focussed LLM generated content

Game design, dialog act classification, text compression, decipherment, nested mention recognition, probing language models, relationship extraction (distant supervised), semantic entity labeling, handwriting verification, bangla spelling error correction, clickbait detection, code repair, gender bias detection, ccg supertagging, linguistic steganography, toponym resolution.

Timeline Summarization

Multimodal abstractive text summarization, reader-aware summarization, stock prediction, text-based stock prediction, pair trading, event-driven trading, vietnamese visual question answering, explanatory visual question answering, arabic text diacritization, fact selection, thai word segmentation, vietnamese datasets.

Face to Face Translation

Multimodal lexical translation, semantic shift detection, similarity explanation, aggression identification, arabic sentiment analysis, commonsense causal reasoning, complex word identification, sign language production, suggestion mining, temporal relation classification, vietnamese word segmentation, speculation detection, speculation scope resolution, aspect category polarity, cross-lingual bitext mining, morphological disambiguation, multi-agent integration, scientific document summarization, lay summarization, text anonymization, text attribute transfer.

Image-guided Story Ending Generation

Personality generation, personality alignment, abstract argumentation, chinese spell checking, dialogue rewriting, logical reasoning reading comprehension.

Unsupervised Sentence Compression

Stereotypical bias analysis, temporal tagging, anaphora resolution, bridging anaphora resolution.

Abstract Anaphora Resolution

Hope speech detection for english, hope speech detection for malayalam, hope speech detection for tamil, hidden aspect detection, latent aspect detection, attribute mining, cognate prediction, japanese word segmentation, memex question answering, polyphone disambiguation, spelling correction, table-to-text generation.

KB-to-Language Generation

Zero-shot machine translation, zero-shot sentiment classification, conditional text generation, contextualized literature-based discovery, multimedia generative script learning, image-sentence alignment, open-world social event classification, ai and safety, action parsing, author attribution, binary condescension detection, context query reformulation, conversational web navigation, croatian text diacritization, czech text diacritization, definition modelling, document-level re with incomplete labeling, domain labelling, french text diacritization, hungarian text diacritization, irish text diacritization, latvian text diacritization, literature mining, misogynistic aggression identification, morpheme segmentaiton, multi-label condescension detection, news annotation, open relation modeling, personality recognition in conversation.

Reading Order Detection

Record linking, role-filler entity extraction, romanian text diacritization, simultaneous speech-to-speech translation, slovak text diacritization, social media mental health detection, spanish text diacritization, syntax representation, text-to-video search, turkish text diacritization, turning point identification, twitter event detection.

Vietnamese Language Models

Vietnamese scene text, vietnamese text diacritization.

Conversational Sentiment Quadruple Extraction

Attribute extraction, legal outcome extraction, automated writing evaluation, binary text classification, detection of potentially void clauses, chemical indexing, clinical assertion status detection.

Coding Problem Tagging

Collaborative plan acquisition, commonsense reasoning for rl.

Variable Disambiguation

Cross-lingual text-to-image generation, crowdsourced text aggregation.

Description-guided molecule generation

Multi-modal Dialogue Generation

Page stream segmentation.

Email Thread Summarization

Emergent communications on relations, emotion detection and trigger summarization, extractive tags summarization.

Hate Intensity Prediction

Hate span identification, job prediction, joint entity and relation extraction on scientific data, joint ner and classification, math information retrieval, meme captioning, multi-grained named entity recognition, multilingual machine comprehension in english hindi, multimodal text prediction, negation and speculation cue detection, negation and speculation scope resolution, only connect walls dataset task 2 (connections), overlapping mention recognition, paraphrase generation, multilingual paraphrase generation, phrase ranking, phrase tagging, phrase vector embedding, poem meters classification, query wellformedness.

Question-Answer categorization

Readability optimization, reliable intelligence identification, sentence completion, hurtful sentence completion, speaker attribution in german parliamentary debates (germeval 2023, subtask 1), text effects transfer, text-variation, vietnamese aspect-based sentiment analysis, sentiment dependency learning, vietnamese natural language understanding, vietnamese sentiment analysis, vietnamese multimodal sentiment analysis, web page tagging, workflow discovery, answerability prediction, incongruity detection, multi-word expression embedding, multi-word expression sememe prediction, pcl detection, semeval-2022 task 4-1 (binary pcl detection), semeval-2022 task 4-2 (multi-label pcl detection), automatic writing, complaint comment classification, counterspeech detection, extractive text summarization, face selection, job classification, multi-lingual text-to-image generation, multlingual neural machine translation, optical charater recogntion, bangla text detection, question to declarative sentence, relation mention extraction.

Tweet-Reply Sentiment Analysis

Vietnamese fact checking, vietnamese parsing.

Natural language processing: state of the art, current trends and challenges

Published: 14 July 2022
Volume 82 , pages 3713–3744, ( 2023 )

Cite this article

Diksha Khurana 1 ,
Aditya Koli 1 ,
Kiran Khatter ORCID: orcid.org/0000-0002-1000-6102 2 &
Sukhdev Singh 3

165k Accesses

402 Citations

34 Altmetric

Explore all metrics

This article has been updated

Natural language processing (NLP) has recently gained much attention for representing and analyzing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. In this paper, we first distinguish four phases by discussing different levels of NLP and components of N atural L anguage G eneration followed by presenting the history and evolution of NLP. We then discuss in detail the state of the art presenting the various applications of NLP, current trends, and challenges. Finally, we present a discussion on some available datasets, models, and evaluation metrics in NLP.

Natural Language Processing: Challenges and Future Directions

Progress in Natural Language Processing and Language Understanding

Natural Language Processing

Explore related subjects.

Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

A language can be defined as a set of rules or set of symbols where symbols are combined and used for conveying information or broadcasting the information. Since all the users may not be well-versed in machine specific language, N atural Language Processing (NLP) caters those users who do not have enough time to learn new languages or get perfection in it. In fact, NLP is a tract of Artificial Intelligence and Linguistics, devoted to make computers understand the statements or words written in human languages. It came into existence to ease the user’s work and to satisfy the wish to communicate with the computer in natural language, and can be classified into two parts i.e. Natural Language Understanding or Linguistics and Natural Language Generation which evolves the task to understand and generate the text. L inguistics is the science of language which includes Phonology that refers to sound, Morphology word formation, Syntax sentence structure, Semantics syntax and Pragmatics which refers to understanding. Noah Chomsky, one of the first linguists of twelfth century that started syntactic theories, marked a unique position in the field of theoretical linguistics because he revolutionized the area of syntax (Chomsky, 1965) [ 23 ]. Further, Natural Language Generation (NLG) is the process of producing phrases, sentences and paragraphs that are meaningful from an internal representation. The first objective of this paper is to give insights of the various important terminologies of NLP and NLG.

In the existing literature, most of the work in NLP is conducted by computer scientists while various other professionals have also shown interest such as linguistics, psychologists, and philosophers etc. One of the most interesting aspects of NLP is that it adds up to the knowledge of human language. The field of NLP is related with different theories and techniques that deal with the problem of natural language of communicating with the computers. Few of the researched tasks of NLP are Automatic Summarization ( Automatic summarization produces an understandable summary of a set of text and provides summaries or detailed information of text of a known type), Co-Reference Resolution ( Co-reference resolution refers to a sentence or larger set of text that determines all words which refer to the same object), Discourse Analysis ( Discourse analysis refers to the task of identifying the discourse structure of connected text i.e. the study of text in relation to social context),Machine Translation ( Machine translation refers to automatic translation of text from one language to another),Morphological Segmentation ( Morphological segmentation refers to breaking words into individual meaning-bearing morphemes), Named Entity Recognition ( Named entity recognition (NER) is used for information extraction to recognized name entities and then classify them to different classes), Optical Character Recognition ( Optical character recognition (OCR) is used for automatic text recognition by translating printed and handwritten text into machine-readable format), Part Of Speech Tagging ( Part of speech tagging describes a sentence, determines the part of speech for each word) etc. Some of these tasks have direct real-world applications such as Machine translation, Named entity recognition, Optical character recognition etc. Though NLP tasks are obviously very closely interwoven but they are used frequently, for convenience. Some of the tasks such as automatic summarization, co-reference analysis etc. act as subtasks that are used in solving larger tasks. Nowadays NLP is in the talks because of various applications and recent developments although in the late 1940s the term wasn’t even in existence. So, it will be interesting to know about the history of NLP, the progress so far has been made and some of the ongoing projects by making use of NLP. The second objective of this paper focus on these aspects. The third objective of this paper is on datasets, approaches, evaluation metrics and involved challenges in NLP. The rest of this paper is organized as follows. Section 2 deals with the first objective mentioning the various important terminologies of NLP and NLG. Section 3 deals with the history of NLP, applications of NLP and a walkthrough of the recent developments. Datasets used in NLP and various approaches are presented in Section 4 , and Section 5 is written on evaluation metrics and challenges involved in NLP. Finally, a conclusion is presented in Section 6 .

2 Components of NLP

NLP can be classified into two parts i.e., Natural Language Understanding and Natural Language Generation which evolves the task to understand and generate the text. Figure 1 presents the broad classification of NLP. The objective of this section is to discuss the Natural Language Understanding (Linguistic) (NLU) and the Natural Language Generation (NLG) .

Broad classification of NLP

NLU enables machines to understand natural language and analyze it by extracting concepts, entities, emotion, keywords etc. It is used in customer care applications to understand the problems reported by customers either verbally or in writing. Linguistics is the science which involves the meaning of language, language context and various forms of the language. So, it is important to understand various important terminologies of NLP and different levels of NLP. We next discuss some of the commonly used terminologies in different levels of NLP.

Phonology is the part of Linguistics which refers to the systematic arrangement of sound. The term phonology comes from Ancient Greek in which the term phono means voice or sound and the suffix –logy refers to word or speech. In 1993 Nikolai Trubetzkoy stated that Phonology is “the study of sound pertaining to the system of language” whereas Lass1998 [ 66 ]wrote that phonology refers broadly with the sounds of language, concerned with sub-discipline of linguistics, behavior and organization of sounds. Phonology includes semantic use of sound to encode meaning of any Human language.

The different parts of the word represent the smallest units of meaning known as Morphemes. Morphology which comprises Nature of words, are initiated by morphemes. An example of Morpheme could be, the word precancellation can be morphologically scrutinized into three separate morphemes: the prefix pre , the root cancella , and the suffix -tion . The interpretation of morphemes stays the same across all the words, just to understand the meaning humans can break any unknown word into morphemes. For example, adding the suffix –ed to a verb, conveys that the action of the verb took place in the past. The words that cannot be divided and have meaning by themselves are called Lexical morpheme (e.g.: table, chair). The words (e.g. -ed, −ing, −est, −ly, −ful) that are combined with the lexical morpheme are known as Grammatical morphemes (eg. Worked, Consulting, Smallest, Likely, Use). The Grammatical morphemes that occur in combination called bound morphemes (eg. -ed, −ing) Bound morphemes can be divided into inflectional morphemes and derivational morphemes. Adding Inflectional morphemes to a word changes the different grammatical categories such as tense, gender, person, mood, aspect, definiteness and animacy. For example, addition of inflectional morphemes –ed changes the root park to parked . Derivational morphemes change the semantic meaning of the word when it is combined with that word. For example, in the word normalize, the addition of the bound morpheme –ize to the root normal changes the word from an adjective ( normal ) to a verb ( normalize ).

In Lexical, humans, as well as NLP systems, interpret the meaning of individual words. Sundry types of processing bestow to word-level understanding – the first of these being a part-of-speech tag to each word. In this processing, words that can act as more than one part-of-speech are assigned the most probable part-of-speech tag based on the context in which they occur. At the lexical level, Semantic representations can be replaced by the words that have one meaning. In fact, in the NLP system the nature of the representation varies according to the semantic theory deployed. Therefore, at lexical level, analysis of structure of words is performed with respect to their lexical meaning and PoS. In this analysis, text is divided into paragraphs, sentences, and words. Words that can be associated with more than one PoS are aligned with the most likely PoS tag based on the context in which they occur. At lexical level, semantic representation can also be replaced by assigning the correct POS tag which improves the understanding of the intended meaning of a sentence. It is used for cleaning and feature extraction using various techniques such as removal of stop words, stemming, lemmatization etc. Stop words such as ‘ in ’, ‘the’, ‘and’ etc. are removed as they don’t contribute to any meaningful interpretation and their frequency is also high which may affect the computation time. Stemming is used to stem the words of the text by removing the suffix of a word to obtain its root form. For example: consulting and consultant words are converted to the word consult after stemming, using word gets converted to us and driver is reduced to driv . Lemmatization does not remove the suffix of a word; in fact, it results in the source word with the use of a vocabulary. For example, in case of token drived , stemming results in “driv”, whereas lemmatization attempts to return the correct basic form either drive or drived depending on the context it is used.

After PoS tagging done at lexical level, words are grouped to phrases and phrases are grouped to form clauses and then phrases are combined to sentences at syntactic level. It emphasizes the correct formation of a sentence by analyzing the grammatical structure of the sentence. The output of this level is a sentence that reveals structural dependency between words. It is also known as parsing which uncovers the phrases that convey more meaning in comparison to the meaning of individual words. Syntactic level examines word order, stop-words, morphology and PoS of words which lexical level does not consider. Changing word order will change the dependency among words and may also affect the comprehension of sentences. For example, in the sentences “ram beats shyam in a competition” and “shyam beats ram in a competition”, only syntax is different but convey different meanings [ 139 ]. It retains the stopwords as removal of them changes the meaning of the sentence. It doesn’t support lemmatization and stemming because converting words to its basic form changes the grammar of the sentence. It focuses on identification on correct PoS of sentences. For example: in the sentence “frowns on his face”, “frowns” is a noun whereas it is a verb in the sentence “he frowns”.

On a semantic level, the most important task is to determine the proper meaning of a sentence. To understand the meaning of a sentence, human beings rely on the knowledge about language and the concepts present in that sentence, but machines can’t count on these techniques. Semantic processing determines the possible meanings of a sentence by processing its logical structure to recognize the most relevant words to understand the interactions among words or different concepts in the sentence. For example, it understands that a sentence is about “movies” even if it doesn’t comprise actual words, but it contains related concepts such as “actor”, “actress”, “dialogue” or “script”. This level of processing also incorporates the semantic disambiguation of words with multiple senses (Elizabeth D. Liddy, 2001) [ 68 ]. For example, the word “bark” as a noun can mean either as a sound that a dog makes or outer covering of the tree. The semantic level examines words for their dictionary interpretation or interpretation is derived from the context of the sentence. For example: the sentence “Krishna is good and noble.” This sentence is either talking about Lord Krishna or about a person “Krishna”. That is why, to get the proper meaning of the sentence, the appropriate interpretation is considered by looking at the rest of the sentence [ 44 ].

While syntax and semantics level deal with sentence-length units, the discourse level of NLP deals with more than one sentence. It deals with the analysis of logical structure by making connections among words and sentences that ensure its coherence. It focuses on the properties of the text that convey meaning by interpreting the relations between sentences and uncovering linguistic structures from texts at several levels (Liddy,2001) [ 68 ]. The two of the most common levels are: Anaphora Resolution an d Coreference Resolution. Anaphora resolution is achieved by recognizing the entity referenced by an anaphor to resolve the references used within the text with the same sense. For example, (i) Ram topped in the class. (ii) He was intelligent. Here i) and ii) together form a discourse. Human beings can quickly understand that the pronoun “he” in (ii) refers to “Ram” in (i). The interpretation of “He” depends on another word “Ram” presented earlier in the text. Without determining the relationship between these two structures, it would not be possible to decide why Ram topped the class and who was intelligent. Coreference resolution is achieved by finding all expressions that refer to the same entity in a text. It is an important step in various NLP applications that involve high-level NLP tasks such as document summarization, information extraction etc. In fact, anaphora is encoded through one of the processes called co-reference.

Pragmatic level focuses on the knowledge or content that comes from the outside the content of the document. It deals with what speaker implies and what listener infers. In fact, it analyzes the sentences that are not directly spoken. Real-world knowledge is used to understand what is being talked about in the text. By analyzing the context, meaningful representation of the text is derived. When a sentence is not specific and the context does not provide any specific information about that sentence, Pragmatic ambiguity arises (Walton, 1996) [ 143 ]. Pragmatic ambiguity occurs when different persons derive different interpretations of the text, depending on the context of the text. The context of a text may include the references of other sentences of the same document, which influence the understanding of the text and the background knowledge of the reader or speaker, which gives a meaning to the concepts expressed in that text. Semantic analysis focuses on literal meaning of the words, but pragmatic analysis focuses on the inferred meaning that the readers perceive based on their background knowledge. For example, the sentence “Do you know what time is it?” is interpreted to “Asking for the current time” in semantic analysis whereas in pragmatic analysis, the same sentence may refer to “expressing resentment to someone who missed the due time” in pragmatic analysis. Thus, semantic analysis is the study of the relationship between various linguistic utterances and their meanings, but pragmatic analysis is the study of context which influences our understanding of linguistic expressions. Pragmatic analysis helps users to uncover the intended meaning of the text by applying contextual background knowledge.

The goal of NLP is to accommodate one or more specialties of an algorithm or system. The metric of NLP assess on an algorithmic system allows for the integration of language understanding and language generation. It is even used in multilingual event detection. Rospocher et al. [ 112 ] purposed a novel modular system for cross-lingual event extraction for English, Dutch, and Italian Texts by using different pipelines for different languages. The system incorporates a modular set of foremost multilingual NLP tools. The pipeline integrates modules for basic NLP processing as well as more advanced tasks such as cross-lingual named entity linking, semantic role labeling and time normalization. Thus, the cross-lingual framework allows for the interpretation of events, participants, locations, and time, as well as the relations between them. Output of these individual pipelines is intended to be used as input for a system that obtains event centric knowledge graphs. All modules take standard input, to do some annotation, and produce standard output which in turn becomes the input for the next module pipelines. Their pipelines are built as a data centric architecture so that modules can be adapted and replaced. Furthermore, modular architecture allows for different configurations and for dynamic distribution.

Ambiguity is one of the major problems of natural language which occurs when one sentence can lead to different interpretations. This is usually faced in syntactic, semantic, and lexical levels. In case of syntactic level ambiguity, one sentence can be parsed into multiple syntactical forms. Semantic ambiguity occurs when the meaning of words can be misinterpreted. Lexical level ambiguity refers to ambiguity of a single word that can have multiple assertions. Each of these levels can produce ambiguities that can be solved by the knowledge of the complete sentence. The ambiguity can be solved by various methods such as Minimizing Ambiguity, Preserving Ambiguity, Interactive Disambiguation and Weighting Ambiguity [ 125 ]. Some of the methods proposed by researchers to remove ambiguity is preserving ambiguity, e.g. (Shemtov 1997; Emele & Dorna 1998; Knight & Langkilde 2000; Tong Gao et al. 2015, Umber & Bajwa 2011) [ 39 , 46 , 65 , 125 , 139 ]. Their objectives are closely in line with removal or minimizing ambiguity. They cover a wide range of ambiguities and there is a statistical element implicit in their approach.

Natural Language Generation (NLG) is the process of producing phrases, sentences and paragraphs that are meaningful from an internal representation. It is a part of Natural Language Processing and happens in four phases: identifying the goals, planning on how goals may be achieved by evaluating the situation and available communicative sources and realizing the plans as a text (Fig. 2 ). It is opposite to Understanding.

Speaker and Generator

Components of NLG

To generate a text, we need to have a speaker or an application and a generator or a program that renders the application’s intentions into a fluent phrase relevant to the situation.

Components and Levels of Representation

The process of language generation involves the following interweaved tasks. Content selection: Information should be selected and included in the set. Depending on how this information is parsed into representational units, parts of the units may have to be removed while some others may be added by default. Textual Organization : The information must be textually organized according to the grammar, it must be ordered both sequentially and in terms of linguistic relations like modifications. Linguistic Resources : To support the information’s realization, linguistic resources must be chosen. In the end these resources will come down to choices of particular words, idioms, syntactic constructs etc. Realization : The selected and organized resources must be realized as an actual text or voice output.

Application or Speaker

This is only for maintaining the model of the situation. Here the speaker just initiates the process doesn’t take part in the language generation. It stores the history, structures the content that is potentially relevant and deploys a representation of what it knows. All these forms the situation, while selecting subset of propositions that speaker has. The only requirement is the speaker must make sense of the situation [ 91 ].

3 NLP: Then and now

In the late 1940s the term NLP wasn’t in existence, but the work regarding machine translation (MT) had started. In fact, Research in this period was not completely localized. Russian and English were the dominant languages for MT (Andreev,1967) [ 4 ]. In fact, MT/NLP research almost died in 1966 according to the ALPAC report, which concluded that MT is going nowhere. But later, some MT production systems were providing output to their customers (Hutchins, 1986) [ 60 ]. By this time, work on the use of computers for literary and linguistic studies had also started. As early as 1960, signature work influenced by AI began, with the BASEBALL Q-A systems (Green et al., 1961) [ 51 ]. LUNAR (Woods,1978) [ 152 ] and Winograd SHRDLU were natural successors of these systems, but they were seen as stepped-up sophistication, in terms of their linguistic and their task processing capabilities. There was a widespread belief that progress could only be made on the two sides, one is ARPA Speech Understanding Research (SUR) project (Lea, 1980) and other in some major system developments projects building database front ends. The front-end projects (Hendrix et al., 1978) [ 55 ] were intended to go beyond LUNAR in interfacing the large databases. In early 1980s computational grammar theory became a very active area of research linked with logics for meaning and knowledge’s ability to deal with the user’s beliefs and intentions and with functions like emphasis and themes.

By the end of the decade the powerful general purpose sentence processors like SRI’s Core Language Engine (Alshawi,1992) [ 2 ] and Discourse Representation Theory (Kamp and Reyle,1993) [ 62 ] offered a means of tackling more extended discourse within the grammatico-logical framework. This period was one of the growing communities. Practical resources, grammars, and tools and parsers became available (for example: Alvey Natural Language Tools) (Briscoe et al., 1987) [ 18 ]. The (D)ARPA speech recognition and message understanding (information extraction) conferences were not only for the tasks they addressed but for the emphasis on heavy evaluation, starting a trend that became a major feature in 1990s (Young and Chase, 1998; Sundheim and Chinchor,1993) [ 131 , 157 ]. Work on user modeling (Wahlster and Kobsa, 1989) [ 142 ] was one strand in a research paper. Cohen et al. (2002) [ 28 ] had put forwarded a first approximation of a compositional theory of tune interpretation, together with phonological assumptions on which it is based and the evidence from which they have drawn their proposals. At the same time, McKeown (1985) [ 85 ] demonstrated that rhetorical schemas could be used for producing both linguistically coherent and communicatively effective text. Some research in NLP marked important topics for future like word sense disambiguation (Small et al., 1988) [ 126 ] and probabilistic networks, statistically colored NLP, the work on the lexicon, also pointed in this direction. Statistical language processing was a major thing in 90s (Manning and Schuetze,1999) [ 75 ], because this not only involves data analysts. Information extraction and automatic summarizing (Mani and Maybury,1999) [ 74 ] was also a point of focus. Next, we present a walkthrough of the developments from the early 2000.

3.1 A walkthrough of recent developments in NLP

The main objectives of NLP include interpretation, analysis, and manipulation of natural language data for the intended purpose with the use of various algorithms, tools, and methods. However, there are many challenges involved which may depend upon the natural language data under consideration, and so makes it difficult to achieve all the objectives with a single approach. Therefore, the development of different tools and methods in the field of NLP and relevant areas of studies have received much attention from several researchers in the recent past. The developments can be seen in the Fig. 3 :

A walkthrough of recent developments in NLP

In early 2000, neural language modeling in which the probability of occurring of next word (token) is determined given n previous words. Bendigo et al. [ 12 ] proposed the concept of feed forward neural network and lookup table which represents the n previous words in sequence. Collobert et al. [ 29 ] proposed the application of multitask learning in the field of NLP, where two convolutional models with max pooling were used to perform parts-of-speech and named entity recognition tagging. Mikolov et.al. [ 87 ] proposed a word embedding process where the dense vector representation of text was addressed. They also report the challenges faced by traditional sparse bag-of-words representation. After the advancement of word embedding, neural networks were introduced in the field of NLP where variable length input is taken for further processing. Sutskever et al. [ 132 ] proposed a general framework for sequence-to-sequence mapping where encoder and decoder networks are used to map from sequence to vector and vector to sequence respectively. In fact, the use of neural networks have played a very important role in NLP. One can observe from the existing literature that enough use of neural networks was not there in the early 2000s but till the year 2013enough discussion had happened about the use of neural networks in the field of NLP which transformed many things and further paved the way to implement various neural networks in NLP. Earlier the use of Convolutional neural networks ( CNN ) contributed to the field of image classification and analyzing visual imagery for further analysis. Later the use of CNNs can be observed in tackling problems associated with NLP tasks like Sentence Classification [ 127 ], Sentiment Analysis [ 135 ], Text Classification [ 118 ], Text Summarization [ 158 ], Machine Translation [ 70 ] and Answer Relations [ 150 ] . An article by Newatia (2019) [ 93 ] illustrates the general architecture behind any CNN model, and how it can be used in the context of NLP. One can also refer to the work of Wang and Gang [ 145 ] for the applications of CNN in NLP. Further Neural Networks those are recurrent in nature due to performing the same function for every data, also known as Recurrent Neural Networks (RNNs), have also been used in NLP, and found ideal for sequential data such as text, time series, financial data, speech, audio, video among others, see article by Thomas (2019) [ 137 ]. One of the modified versions of RNNs is Long Short-Term Memory (LSTM) which is also very useful in the cases where only the desired important information needs to be retained for a much longer time discarding the irrelevant information, see [ 52 , 58 ]. Further development in the LSTM has also led to a slightly simpler variant, called the gated recurrent unit (GRU), which has shown better results than standard LSTMs in many tasks [ 22 , 26 ]. Attention mechanisms [ 7 ] which suggest a network to learn what to pay attention to in accordance with the current hidden state and annotation together with the use of transformers have also made a significant development in NLP, see [ 141 ]. It is to be noticed that Transformers have a potential of learning longer-term dependency but are limited by a fixed-length context in the setting of language modeling. In this direction recently Dai et al. [ 30 ] proposed a novel neural architecture Transformer-XL (XL as extra-long) which enables learning dependencies beyond a fixed length of words. Further the work of Rae et al. [ 104 ] on the Compressive Transformer, an attentive sequence model which compresses memories for long-range sequence learning, may be helpful for the readers. One may also refer to the recent work by Otter et al. [ 98 ] on uses of Deep Learning for NLP, and relevant references cited therein. The use of BERT (Bidirectional Encoder Representations from Transformers) [ 33 ] model and successive models have also played an important role for NLP.

Many researchers worked on NLP, building tools and systems which makes NLP what it is today. Tools like Sentiment Analyser, Parts of Speech (POS) Taggers, Chunking, Named Entity Recognitions (NER), Emotion detection, Semantic Role Labeling have a huge contribution made to NLP, and are good topics for research. Sentiment analysis (Nasukawaetal.,2003) [ 156 ] works by extracting sentiments about a given topic, and it consists of a topic specific feature term extraction, sentiment extraction, and association by relationship analysis. It utilizes two linguistic resources for the analysis: the sentiment lexicon and the sentiment pattern database. It analyzes the documents for positive and negative words and tries to give ratings on scale −5 to +5. The mainstream of currently used tagsets is obtained from English. The most widely used tagsets as standard guidelines are designed for Indo-European languages but it is less researched on Asian languages or middle- eastern languages. Various authors have done research on making parts of speech taggers for various languages such as Arabic (Zeroual et al., 2017) [ 160 ], Sanskrit (Tapswi & Jain, 2012) [ 136 ], Hindi (Ranjan & Basu, 2003) [ 105 ] to efficiently tag and classify words as nouns, adjectives, verbs etc. Authors in [ 136 ] have used treebank technique for creating rule-based POS Tagger for Sanskrit Language. Sanskrit sentences are parsed to assign the appropriate tag to each word using suffix stripping algorithm, wherein the longest suffix is searched from the suffix table and tags are assigned. Diab et al. (2004) [ 34 ] used supervised machine learning approach and adopted Support Vector Machines (SVMs) which were trained on the Arabic Treebank to automatically tokenize parts of speech tag and annotate base phrases in Arabic text.

Chunking is a process of separating phrases from unstructured text. Since simple tokens may not represent the actual meaning of the text, it is advisable to use phrases such as “North Africa” as a single word instead of ‘North’ and ‘Africa’ separate words. Chunking known as “Shadow Parsing” labels parts of sentences with syntactic correlated keywords like Noun Phrase (NP) and Verb Phrase (VP). Chunking is often evaluated using the CoNLL 2000 shared task. Various researchers (Sha and Pereira, 2003; McDonald et al., 2005; Sun et al., 2008) [ 83 , 122 , 130 ] used CoNLL test data for chunking and used features composed of words, POS tags, and tags.

There are particular words in the document that refer to specific entities or real-world objects like location, people, organizations etc. To find the words which have a unique context and are more informative, noun phrases are considered in the text documents. Named entity recognition (NER) is a technique to recognize and separate the named entities and group them under predefined classes. But in the era of the Internet, where people use slang not the traditional or standard English which cannot be processed by standard natural language processing tools. Ritter (2011) [ 111 ] proposed the classification of named entities in tweets because standard NLP tools did not perform well on tweets. They re-built NLP pipeline starting from PoS tagging, then chunking for NER. It improved the performance in comparison to standard NLP tools.

Emotion detection investigates and identifies the types of emotion from speech, facial expressions, gestures, and text. Sharma (2016) [ 124 ] analyzed the conversations in Hinglish means mix of English and Hindi languages and identified the usage patterns of PoS. Their work was based on identification of language and POS tagging of mixed script. They tried to detect emotions in mixed script by relating machine learning and human knowledge. They have categorized sentences into 6 groups based on emotions and used TLBO technique to help the users in prioritizing their messages based on the emotions attached with the message. Seal et al. (2020) [ 120 ] proposed an efficient emotion detection method by searching emotional words from a pre-defined emotional keyword database and analyzing the emotion words, phrasal verbs, and negation words. Their proposed approach exhibited better performance than recent approaches.

Semantic Role Labeling (SRL) works by giving a semantic role to a sentence. For example, in the PropBank (Palmer et al., 2005) [ 100 ] formalism, one assigns roles to words that are arguments of a verb in the sentence. The precise arguments depend on the verb frame and if multiple verbs exist in a sentence, it might have multiple tags. State-of-the-art SRL systems comprise several stages: creating a parse tree, identifying which parse tree nodes represent the arguments of a given verb, and finally classifying these nodes to compute the corresponding SRL tags.

Event discovery in social media feeds (Benson et al.,2011) [ 13 ], using a graphical model to analyze any social media feeds to determine whether it contains the name of a person or name of a venue, place, time etc. The model operates on noisy feeds of data to extract records of events by aggregating multiple information across multiple messages, despite the noise of irrelevant noisy messages and very irregular message language, this model was able to extract records with a broader array of features on factors.

We first give insights on some of the mentioned tools and relevant work done before moving to the broad applications of NLP.

3.2 Applications of NLP

Natural Language Processing can be applied into various areas like Machine Translation, Email Spam detection, Information Extraction, Summarization, Question Answering etc. Next, we discuss some of the areas with the relevant work done in those directions.

Machine Translation

As most of the world is online, the task of making data accessible and available to all is a challenge. Major challenge in making data accessible is the language barrier. There are a multitude of languages with different sentence structure and grammar. Machine Translation is generally translating phrases from one language to another with the help of a statistical engine like Google Translate. The challenge with machine translation technologies is not directly translating words but keeping the meaning of sentences intact along with grammar and tenses. The statistical machine learning gathers as many data as they can find that seems to be parallel between two languages and they crunch their data to find the likelihood that something in Language A corresponds to something in Language B. As for Google, in September 2016, announced a new machine translation system based on artificial neural networks and Deep learning. In recent years, various methods have been proposed to automatically evaluate machine translation quality by comparing hypothesis translations with reference translations. Examples of such methods are word error rate, position-independent word error rate (Tillmann et al., 1997) [ 138 ], generation string accuracy (Bangalore et al., 2000) [ 8 ], multi-reference word error rate (Nießen et al., 2000) [ 95 ], BLEU score (Papineni et al., 2002) [ 101 ], NIST score (Doddington, 2002) [ 35 ] All these criteria try to approximate human assessment and often achieve an astonishing degree of correlation to human subjective evaluation of fluency and adequacy (Papineni et al., 2001; Doddington, 2002) [ 35 , 101 ].

Text Categorization

Categorization systems input a large flow of data like official documents, military casualty reports, market data, newswires etc. and assign them to predefined categories or indices. For example, The Carnegie Group’s Construe system (Hayes, 1991) [ 54 ], inputs Reuters articles and saves much time by doing the work that is to be done by staff or human indexers. Some companies have been using categorization systems to categorize trouble tickets or complaint requests and routing to the appropriate desks. Another application of text categorization is email spam filters. Spam filters are becoming important as the first line of defence against the unwanted emails. A false negative and false positive issue of spam filters is at the heart of NLP technology, it has brought down the challenge of extracting meaning from strings of text. A filtering solution that is applied to an email system uses a set of protocols to determine which of the incoming messages are spam; and which are not. There are several types of spam filters available. Content filters : Review the content within the message to determine whether it is spam or not. Header filters : Review the email header looking for fake information. General Blacklist filters : Stop all emails from blacklisted recipients. Rules Based Filters : It uses user-defined criteria. Such as stopping mails from a specific person or stopping mail including a specific word. Permission Filters : Require anyone sending a message to be pre-approved by the recipient. Challenge Response Filters : Requires anyone sending a message to enter a code to gain permission to send email.

Spam Filtering

It works using text categorization and in recent times, various machine learning techniques have been applied to text categorization or Anti-Spam Filtering like Rule Learning (Cohen 1996) [ 27 ], Naïve Bayes (Sahami et al., 1998; Androutsopoulos et al., 2000; Rennie.,2000) [ 5 , 109 , 115 ],Memory based Learning (Sakkiset al.,2000b) [ 117 ], Support vector machines (Druker et al., 1999) [ 36 ], Decision Trees (Carreras and Marquez, 2001) [ 19 ], Maximum Entropy Model (Berger et al. 1996) [ 14 ], Hash Forest and a rule encoding method (T. Xia, 2020) [ 153 ], sometimes combining different learners (Sakkis et al., 2001) [ 116 ]. Using these approaches is better as classifier is learned from training data rather than making by hand. The naïve bayes is preferred because of its performance despite its simplicity (Lewis, 1998) [ 67 ] In Text Categorization two types of models have been used (McCallum and Nigam, 1998) [ 77 ]. Both modules assume that a fixed vocabulary is present. But in first model a document is generated by first choosing a subset of vocabulary and then using the selected words any number of times, at least once irrespective of order. This is called Multi-variate Bernoulli model. It takes the information of which words are used in a document irrespective of number of words and order. In second model, a document is generated by choosing a set of word occurrences and arranging them in any order. This model is called multi-nomial model, in addition to the Multi-variate Bernoulli model, it also captures information on how many times a word is used in a document. Most text categorization approaches to anti-spam Email filtering have used multi variate Bernoulli model (Androutsopoulos et al., 2000) [ 5 ] [ 15 ].

Information Extraction

Information extraction is concerned with identifying phrases of interest of textual data. For many applications, extracting entities such as names, places, events, dates, times, and prices is a powerful way of summarizing the information relevant to a user’s needs. In the case of a domain specific search engine, the automatic identification of important information can increase accuracy and efficiency of a directed search. There is use of hidden Markov models (HMMs) to extract the relevant fields of research papers. These extracted text segments are used to allow searched over specific fields and to provide effective presentation of search results and to match references to papers. For example, noticing the pop-up ads on any websites showing the recent items you might have looked on an online store with discounts. In Information Retrieval two types of models have been used (McCallum and Nigam, 1998) [ 77 ]. Both modules assume that a fixed vocabulary is present. But in first model a document is generated by first choosing a subset of vocabulary and then using the selected words any number of times, at least once without any order. This is called Multi-variate Bernoulli model. It takes the information of which words are used in a document irrespective of number of words and order. In second model, a document is generated by choosing a set of word occurrences and arranging them in any order. This model is called multi-nominal model, in addition to the Multi-variate Bernoulli model, it also captures information on how many times a word is used in a document.

Discovery of knowledge is becoming important areas of research over the recent years. Knowledge discovery research use a variety of techniques to extract useful information from source documents like Parts of Speech (POS) tagging , Chunking or Shadow Parsing , Stop-words (Keywords that are used and must be removed before processing documents), Stemming (Mapping words to some base for, it has two methods, dictionary-based stemming and Porter style stemming (Porter, 1980) [ 103 ]. Former one has higher accuracy but higher cost of implementation while latter has lower implementation cost and is usually insufficient for IR). Compound or Statistical Phrases (Compounds and statistical phrases index multi token units instead of single tokens.) Word Sense Disambiguation (Word sense disambiguation is the task of understanding the correct sense of a word in context. When used for information retrieval, terms are replaced by their senses in the document vector.)

The extracted information can be applied for a variety of purposes, for example to prepare a summary, to build databases, identify keywords, classifying text items according to some pre-defined categories etc. For example, CONSTRUE, it was developed for Reuters, that is used in classifying news stories (Hayes, 1992) [ 54 ]. It has been suggested that many IE systems can successfully extract terms from documents, acquiring relations between the terms is still a difficulty. PROMETHEE is a system that extracts lexico-syntactic patterns relative to a specific conceptual relation (Morin,1999) [ 89 ]. IE systems should work at many levels, from word recognition to discourse analysis at the level of the complete document. An application of the Blank Slate Language Processor (BSLP) ( Bondale et al., 1999) [ 16 ] approach for the analysis of a real-life natural language corpus that consists of responses to open-ended questionnaires in the field of advertising.

There is a system called MITA (Metlife’s Intelligent Text Analyzer) (Glasgow et al. (1998) [ 48 ]) that extracts information from life insurance applications. Ahonen et al. (1998) [ 1 ] suggested a mainstream framework for text mining that uses pragmatic and discourse level analyses of text .

Summarization

Overload of information is the real thing in this digital age, and already our reach and access to knowledge and information exceeds our capacity to understand it. This trend is not slowing down, so an ability to summarize the data while keeping the meaning intact is highly required. This is important not just allowing us the ability to recognize the understand the important information for a large set of data, it is used to understand the deeper emotional meanings; For example, a company determines the general sentiment on social media and uses it on their latest product offering. This application is useful as a valuable marketing asset.

The types of text summarization depends on the basis of the number of documents and the two important categories are single document summarization and multi document summarization (Zajic et al. 2008 [ 159 ]; Fattah and Ren 2009 [ 43 ]).Summaries can also be of two types: generic or query-focused (Gong and Liu 2001 [ 50 ]; Dunlavy et al. 2007 [ 37 ]; Wan 2008 [ 144 ]; Ouyang et al. 2011 [ 99 ]).Summarization task can be either supervised or unsupervised (Mani and Maybury 1999 [ 74 ]; Fattah and Ren 2009 [ 43 ]; Riedhammer et al. 2010 [ 110 ]). Training data is required in a supervised system for selecting relevant material from the documents. Large amount of annotated data is needed for learning techniques. Few techniques are as follows–

Bayesian Sentence based Topic Model (BSTM) uses both term-sentences and term document associations for summarizing multiple documents. (Wang et al. 2009 [ 146 ])

Factorization with Given Bases (FGB) is a language model where sentence bases are the given bases and it utilizes document-term and sentence term matrices. This approach groups and summarizes the documents simultaneously. (Wang et al. 2011) [ 147 ])

Topic Aspect-Oriented Summarization (TAOS) is based on topic factors. These topic factors are various features that describe topics such as capital words are used to represent entity. Various topics can have various aspects and various preferences of features are used to represent various aspects. (Fang et al. 2015 [ 42 ])

Dialogue System

Dialogue systems are very prominent in real world applications ranging from providing support to performing a particular action. In case of support dialogue systems, context awareness is required whereas in case to perform an action, it doesn’t require much context awareness. Earlier dialogue systems were focused on small applications such as home theater systems. These dialogue systems utilize phonemic and lexical levels of language. Habitable dialogue systems offer potential for fully automated dialog systems by utilizing all levels of a language. (Liddy, 2001) [ 68 ].This leads to producing systems that can enable robots to interact with humans in natural languages such as Google’s assistant, Windows Cortana, Apple’s Siri and Amazon’s Alexa etc.

NLP is applied in the field as well. The Linguistic String Project-Medical Language Processor is one the large scale projects of NLP in the field of medicine [ 21 , 53 , 57 , 71 , 114 ]. The LSP-MLP helps enabling physicians to extract and summarize information of any signs or symptoms, drug dosage and response data with the aim of identifying possible side effects of any medicine while highlighting or flagging data items [ 114 ]. The National Library of Medicine is developing The Specialist System [ 78 , 79 , 80 , 82 , 84 ]. It is expected to function as an Information Extraction tool for Biomedical Knowledge Bases, particularly Medline abstracts. The lexicon was created using MeSH (Medical Subject Headings), Dorland’s Illustrated Medical Dictionary and general English Dictionaries. The Centre d’Informatique Hospitaliere of the Hopital Cantonal de Geneve is working on an electronic archiving environment with NLP features [ 81 , 119 ]. In the first phase, patient records were archived. At later stage the LSP-MLP has been adapted for French [ 10 , 72 , 94 , 113 ], and finally, a proper NLP system called RECIT [ 9 , 11 , 17 , 106 ] has been developed using a method called Proximity Processing [ 88 ]. It’s task was to implement a robust and multilingual system able to analyze/comprehend medical sentences, and to preserve a knowledge of free text into a language independent knowledge representation [ 107 , 108 ]. The Columbia university of New York has developed an NLP system called MEDLEE (MEDical Language Extraction and Encoding System) that identifies clinical information in narrative reports and transforms the textual information into structured representation [ 45 ].

3.3 NLP in talk

We next discuss some of the recent NLP projects implemented by various companies:

ACE Powered GDPR Robot Launched by RAVN Systems [ 134 ]

RAVN Systems, a leading expert in Artificial Intelligence (AI), Search and Knowledge Management Solutions, announced the launch of a RAVN (“Applied Cognitive Engine”) i.e. powered software Robot to help and facilitate the GDPR (“General Data Protection Regulation”) compliance. The Robot uses AI techniques to automatically analyze documents and other types of data in any business system which is subject to GDPR rules. It allows users to search, retrieve, flag, classify, and report on data, mediated to be super sensitive under GDPR quickly and easily. Users also can identify personal data from documents, view feeds on the latest personal data that requires attention and provide reports on the data suggested to be deleted or secured. RAVN’s GDPR Robot is also able to hasten requests for information (Data Subject Access Requests - “DSAR”) in a simple and efficient way, removing the need for a physical approach to these requests which tends to be very labor thorough. Peter Wallqvist, CSO at RAVN Systems commented, “GDPR compliance is of universal paramountcy as it will be exploited by any organization that controls and processes data concerning EU citizens.

Link: http://markets.financialcontent.com/stocks/news/read/33888795/RAVN_Systems_Launch_the_ACE_Powered_GDPR_Robot

Eno A Natural Language Chatbot Launched by Capital One [ 56 ]

Capital One announces a chatbot for customers called Eno. Eno is a natural language chatbot that people socialize through texting. CapitalOne claims that Eno is First natural language SMS chatbot from a U.S. bank that allows customers to ask questions using natural language. Customers can interact with Eno asking questions about their savings and others using a text interface. Eno makes such an environment that it feels that a human is interacting. This provides a different platform than other brands that launch chatbots like Facebook Messenger and Skype. They believed that Facebook has too much access to private information of a person, which could get them into trouble with privacy laws U.S. financial institutions work under. Like Facebook Page admin can access full transcripts of the bot’s conversations. If that would be the case then the admins could easily view the personal banking information of customers with is not correct.

Link: https://www.macobserver.com/analysis/capital-one-natural-language-chatbot-eno/

Future of BI in Natural Language Processing [ 140 ]

Several companies in BI spaces are trying to get with the trend and trying hard to ensure that data becomes more friendly and easily accessible. But still there is a long way for this.BI will also make it easier to access as GUI is not needed. Because nowadays the queries are made by text or voice command on smartphones.one of the most common examples is Google might tell you today what tomorrow’s weather will be. But soon enough, we will be able to ask our personal data chatbot about customer sentiment today, and how we feel about their brand next week; all while walking down the street. Today, NLP tends to be based on turning natural language into machine language. But with time the technology matures – especially the AI component –the computer will get better at “understanding” the query and start to deliver answers rather than search results. Initially, the data chatbot will probably ask the question ‘how have revenues changed over the last three-quarters?’ and then return pages of data for you to analyze. But once it learns the semantic relations and inferences of the question, it will be able to automatically perform the filtering and formulation necessary to provide an intelligible answer, rather than simply showing you data.

Link: http://www.smartdatacollective.com/eran-levy/489410/here-s-why-natural-language-processing-future-bi

Using Natural Language Processing and Network Analysis to Develop a Conceptual Framework for Medication Therapy Management Research [ 97 ]

Natural Language Processing and Network Analysis to Develop a Conceptual Framework for Medication Therapy Management Research describes a theory derivation process that is used to develop a conceptual framework for medication therapy management (MTM) research. The MTM service model and chronic care model are selected as parent theories. Review article abstracts target medication therapy management in chronic disease care that were retrieved from Ovid Medline (2000–2016). Unique concepts in each abstract are extracted using Meta Map and their pair-wise co-occurrence are determined. Then the information is used to construct a network graph of concept co-occurrence that is further analyzed to identify content for the new conceptual model. 142 abstracts are analyzed. Medication adherence is the most studied drug therapy problem and co-occurred with concepts related to patient-centered interventions targeting self-management. The enhanced model consists of 65 concepts clustered into 14 constructs. The framework requires additional refinement and evaluation to determine its relevance and applicability across a broad audience including underserved settings.

Link: https://www.ncbi.nlm.nih.gov/pubmed/28269895?dopt=Abstract

Meet the Pilot, world’s first language translating earbuds [ 96 ]

The world’s first smart earpiece Pilot will soon be transcribed over 15 languages. According to Spring wise, Waverly Labs’ Pilot can already transliterate five spoken languages, English, French, Italian, Portuguese, and Spanish, and seven written affixed languages, German, Hindi, Russian, Japanese, Arabic, Korean and Mandarin Chinese. The Pilot earpiece is connected via Bluetooth to the Pilot speech translation app, which uses speech recognition, machine translation and machine learning and speech synthesis technology. Simultaneously, the user will hear the translated version of the speech on the second earpiece. Moreover, it is not necessary that conversation would be taking place between two people; only the users can join in and discuss as a group. As if now the user may experience a few second lag interpolated the speech and translation, which Waverly Labs pursue to reduce. The Pilot earpiece will be available from September but can be pre-ordered now for $249. The earpieces can also be used for streaming music, answering voice calls, and getting audio notifications.

Link: https://www.indiegogo.com/projects/meet-the-pilot-smart-earpiece-language-translator-headphones-travel#/

4 Datasets in NLP and state-of-the-art models

The objective of this section is to present the various datasets used in NLP and some state-of-the-art models in NLP.

4.1 Datasets in NLP

Corpus is a collection of linguistic data, either compiled from written texts or transcribed from recorded speech. Corpora are intended primarily for testing linguistic hypotheses - e.g., to determine how a certain sound, word, or syntactic construction is used across a culture or language. There are various types of corpus: In an annotated corpus, the implicit information in the plain text has been made explicit by specific annotations. Un-annotated corpus contains raw state of plain text. Different languages can be compared using a reference corpus. Monitor corpora are non-finite collections of texts which are mostly used in lexicography. Multilingual corpus refers to a type of corpus that contains small collections of monolingual corpora based on the same sampling procedure and categories for different languages. Parallel corpus contains texts in one language and their translations into other languages which are aligned sentence phrase by phrase. Reference corpus contains text of spoken (formal and informal) and written (formal and informal) language which represents various social and situational contexts. Speech corpus contains recorded speech and transcriptions of recording and the time each word occurred in the recorded speech. There are various datasets available for natural language processing; some of these are listed below for different use cases:

Sentiment Analysis: Sentiment analysis is a rapidly expanding field of natural language processing (NLP) used in a variety of fields such as politics, business etc. Majorly used datasets for sentiment analysis are:

Stanford Sentiment Treebank (SST): Socher et al. introduced SST containing sentiment labels for 215,154 phrases in parse trees for 11,855 sentences from movie reviews posing novel sentiment compositional difficulties [ 127 ].

Sentiment140: It contains 1.6 million tweets annotated with negative, neutral and positive labels.

Paper Reviews: It provides reviews of computing and informatics conferences written in English and Spanish languages. It has 405 reviews which are evaluated on a 5-point scale ranging from very negative to very positive.

IMDB: For natural language processing, text analytics, and sentiment analysis, this dataset offers thousands of movie reviews split into training and test datasets. This dataset was introduced in by Mass et al. in 2011 [ 73 ].

G.Rama Rohit Reddy of the Language Technologies Research Centre, KCIS, IIIT Hyderabad, generated the corpus “Sentiraama.” The corpus is divided into four datasets, each of which is annotated with a two-value scale that distinguishes between positive and negative sentiment at the document level. The corpus contains data from a variety of fields, including book reviews, product reviews, movie reviews, and song lyrics. The annotators meticulously followed the annotation technique for each of them. The folder “Song Lyrics” in the corpus contains 339 Telugu song lyrics written in Telugu script [ 121 ].

Language Modelling: Language models analyse text data to calculate word probability. They use an algorithm to interpret the data, which establishes rules for context in natural language. The model then uses these rules to accurately predict or construct new sentences. The model basically learns the basic characteristics and features of language and then applies them to new phrases. Majorly used datasets for Language modeling are as follows:

Salesforce’s WikiText-103 dataset has 103 million tokens collected from 28,475 featured articles from Wikipedia.

WikiText-2 is a scaled-down version of WikiText-103. It contains 2 million tokens with a 33,278 jargon size.

Penn Treebank piece of the Wall Street Diary corpus includes 929,000 tokens for training, 73,000 tokens for validation, and 82,000 tokens for testing purposes. Its context is limited since it comprises sentences rather than paragraphs [ 76 ].

The Ministry of Electronics and Information Technology’s Technology Development Programme for Indian Languages (TDIL) launched its own data distribution portal ( www.tdil-dc.in ) which has cataloged datasets [ 24 ].

Machine Translation: The task of converting the text of one natural language into another language while keeping the sense of the input text is known as machine translation. Majorly used datasets are as follows:

Tatoeba is a collection of multilingual sentence pairings. A tab-delimited pair of an English text sequence and the translated French text sequence appears on each line of the dataset. Each text sequence might be as simple as a single sentence or as complex as a paragraph of many sentences.

The Europarl parallel corpus is derived from the European Parliament’s proceedings. It is available in 21 European languages [ 40 ].

WMT14 provides machine translation pairs for English-German and English-French. Separately, these datasets comprise 4.5 million and 35 million sentence sets. Byte-Pair Encoding with 32 K tasks is used to encode the phrases.

There are around 160,000 sentence pairings in the IWSLT 14. The dataset includes descriptions in English-German (En-De) and German-English (De-En) languages. There are around 200 K training sentence sets in the IWSLT 13 dataset.

The IIT Bombay English-Hindi corpus comprises parallel corpora for English-Hindi as well as monolingual Hindi corpora gathered from several existing sources and corpora generated over time at IIT Bombay’s Centre for Indian Language Technology.

Question Answering System: Question answering systems provide real-time responses which are widely used in customer care services. The datasets used for dialogue system/question answering system are as follows:

Stanford Question Answering Dataset (SQuAD): it is a reading comprehension dataset made up of questions posed by crowd workers on a collection of Wikipedia articles.

Natural Questions: It is a large-scale corpus presented by Google used for training and assessing open-domain question answering systems. It includes 300,000 naturally occurring queries as well as human-annotated responses from Wikipedia pages for use in QA system training.

Question Answering in Context (QuAC): This dataset is used to describe, comprehend, and participate in information seeking conversation. In this dataset, instances are made up of an interactive discussion between two crowd workers: a student who asks a series of open-ended questions about an unknown Wikipedia text, and a teacher who responds by offering brief extracts from the text.

The neural learning models are overtaking traditional models for NLP [ 64 , 127 ]. In [ 64 ], authors used CNN (Convolutional Neural Network) model for sentiment analysis of movie reviews and achieved 81.5% accuracy. The results illustrate that using CNN was an appropriate replacement for state-of-the-art methods. Authors [ 127 ] have combined SST and Recursive Neural Tensor Network for sentiment analysis of the single sentence. This model amplifies the accuracy by 5.4% for sentence classification compared to traditional NLP models. Authors [ 135 ] proposed a combined Recurrent Neural Network and Transformer model for sentiment analysis. This hybrid model was tested on three different datasets: Twitter US Airline Sentiment, IMDB, and Sentiment 140: and achieved F1 scores of 91%, 93%, and 90%, respectively. This model’s performance outshined the state-of-art methods.

Santoro et al. [ 118 ] introduced a rational recurrent neural network with the capacity to learn on classifying the information and perform complex reasoning based on the interactions between compartmentalized information. They used the relational memory core to handle such interactions. Finally, the model was tested for language modeling on three different datasets (GigaWord, Project Gutenberg, and WikiText-103). Further, they mapped the performance of their model to traditional approaches for dealing with relational reasoning on compartmentalized information. The results achieved with RMC show improved performance.

Merity et al. [ 86 ] extended conventional word-level language models based on Quasi-Recurrent Neural Network and LSTM to handle the granularity at character and word level. They tuned the parameters for character-level modeling using Penn Treebank dataset and word-level modeling using WikiText-103. In both cases, their model outshined the state-of-art methods.

Luong et al. [ 70 ] used neural machine translation on the WMT14 dataset and performed translation of English text to French text. The model demonstrated a significant improvement of up to 2.8 bi-lingual evaluation understudy (BLEU) scores compared to various neural machine translation systems. It outperformed the commonly used MT system on a WMT 14 dataset.

Fan et al. [ 41 ] introduced a gradient-based neural architecture search algorithm that automatically finds architecture with better performance than a transformer, conventional NMT models. They tested their model on WMT14 (English-German Translation), IWSLT14 (German-English translation), and WMT18 (Finnish-to-English translation) and achieved 30.1, 36.1, and 26.4 BLEU points, which shows better performance than Transformer baselines.

Wiese et al. [ 150 ] introduced a deep learning approach based on domain adaptation techniques for handling biomedical question answering tasks. Their model revealed the state-of-the-art performance on biomedical question answers, and the model outperformed the state-of-the-art methods in domains.

Seunghak et al. [ 158 ] designed a Memory-Augmented-Machine-Comprehension-Network (MAMCN) to handle dependencies faced in reading comprehension. The model achieved state-of-the-art performance on document-level using TriviaQA and QUASAR-T datasets, and paragraph-level using SQuAD datasets.

Xie et al. [ 154 ] proposed a neural architecture where candidate answers and their representation learning are constituent centric, guided by a parse tree. Under this architecture, the search space of candidate answers is reduced while preserving the hierarchical, syntactic, and compositional structure among constituents. Using SQuAD, the model delivers state-of-the-art performance.

4.2 State-of-the-art models in NLP

Rationalist approach or symbolic approach assumes that a crucial part of the knowledge in the human mind is not derived by the senses but is firm in advance, probably by genetic inheritance. Noam Chomsky was the strongest advocate of this approach. It was believed that machines can be made to function like the human brain by giving some fundamental knowledge and reasoning mechanism linguistics knowledge is directly encoded in rule or other forms of representation. This helps the automatic process of natural languages [ 92 ]. Statistical and machine learning entail evolution of algorithms that allow a program to infer patterns. An iterative process is used to characterize a given algorithm’s underlying algorithm that is optimized by a numerical measure that characterizes numerical parameters and learning phase. Machine-learning models can be predominantly categorized as either generative or discriminative. Generative methods can generate synthetic data because of which they create rich models of probability distributions. Discriminative methods are more functional and have right estimating posterior probabilities and are based on observations. Srihari [ 129 ] explains the different generative models as one with a resemblance that is used to spot an unknown speaker’s language and would bid the deep knowledge of numerous languages to perform the match. Discriminative methods rely on a less knowledge-intensive approach and using distinction between languages. Whereas generative models can become troublesome when many features are used and discriminative models allow use of more features [ 38 ]. Few of the examples of discriminative methods are Logistic regression and conditional random fields (CRFs), generative methods are Naive Bayes classifiers and hidden Markov models (HMMs).

Naive Bayes Classifiers

Naive Bayes is a probabilistic algorithm which is based on probability theory and Bayes’ Theorem to predict the tag of a text such as news or customer review. It helps to calculate the probability of each tag for the given text and return the tag with the highest probability. Bayes’ Theorem is used to predict the probability of a feature based on prior knowledge of conditions that might be related to that feature. The choice of area in NLP using Naïve Bayes Classifiers could be in usual tasks such as segmentation and translation but it is also explored in unusual areas like segmentation for infant learning and identifying documents for opinions and facts. Anggraeni et al. (2019) [ 61 ] used ML and AI to create a question-and-answer system for retrieving information about hearing loss. They developed I-Chat Bot which understands the user input and provides an appropriate response and produces a model which can be used in the search for information about required hearing impairments. The problem with naïve bayes is that we may end up with zero probabilities when we meet words in the test data for a certain class that are not present in the training data.

Hidden Markov Model (HMM)

An HMM is a system where a shifting takes place between several states, generating feasible output symbols with each switch. The sets of viable states and unique symbols may be large, but finite and known. We can describe the outputs, but the system’s internals are hidden. Few of the problems could be solved by Inference A certain sequence of output symbols, compute the probabilities of one or more candidate states with sequences. Patterns matching the state-switch sequence are most likely to have generated a particular output-symbol sequence. Training the output-symbol chain data, reckon the state-switch/output probabilities that fit this data best.

Hidden Markov Models are extensively used for speech recognition, where the output sequence is matched to the sequence of individual phonemes. HMM is not restricted to this application; it has several others such as bioinformatics problems, for example, multiple sequence alignment [ 128 ]. Sonnhammer mentioned that Pfam holds multiple alignments and hidden Markov model-based profiles (HMM-profiles) of entire protein domains. The cue of domain boundaries, family members and alignment are done semi-automatically found on expert knowledge, sequence similarity, other protein family databases and the capability of HMM-profiles to correctly identify and align the members. HMM may be used for a variety of NLP applications, including word prediction, sentence production, quality assurance, and intrusion detection systems [ 133 ].

Neural Network

Earlier machine learning techniques such as Naïve Bayes, HMM etc. were majorly used for NLP but by the end of 2010, neural networks transformed and enhanced NLP tasks by learning multilevel features. Major use of neural networks in NLP is observed for word embedding where words are represented in the form of vectors. These vectors can be used to recognize similar words by observing their closeness in this vector space, other uses of neural networks are observed in information retrieval, text summarization, text classification, machine translation, sentiment analysis and speech recognition. Initially focus was on feedforward [ 49 ] and CNN (convolutional neural network) architecture [ 69 ] but later researchers adopted recurrent neural networks to capture the context of a word with respect to surrounding words of a sentence. LSTM (Long Short-Term Memory), a variant of RNN, is used in various tasks such as word prediction, and sentence topic prediction. [ 47 ] In order to observe the word arrangement in forward and backward direction, bi-directional LSTM is explored by researchers [ 59 ]. In case of machine translation, encoder-decoder architecture is used where dimensionality of input and output vector is not known. Neural networks can be used to anticipate a state that has not yet been seen, such as future states for which predictors exist whereas HMM predicts hidden states.

Bi-directional Encoder Representations from Transformers (BERT) is a pre-trained model with unlabeled text available on BookCorpus and English Wikipedia. This can be fine-tuned to capture context for various NLP tasks such as question answering, sentiment analysis, text classification, sentence embedding, interpreting ambiguity in the text etc. [ 25 , 33 , 90 , 148 ]. Earlier language-based models examine the text in either of one direction which is used for sentence generation by predicting the next word whereas the BERT model examines the text in both directions simultaneously for better language understanding. BERT provides contextual embedding for each word present in the text unlike context-free models (word2vec and GloVe). For example, in the sentences “he is going to the riverbank for a walk” and “he is going to the bank to withdraw some money”, word2vec will have one vector representation for “bank” in both the sentences whereas BERT will have different vector representation for “bank”. Muller et al. [ 90 ] used the BERT model to analyze the tweets on covid-19 content. The use of the BERT model in the legal domain was explored by Chalkidis et al. [ 20 ].

Since BERT considers up to 512 tokens, this is the reason if there is a long text sequence that must be divided into multiple short text sequences of 512 tokens. This is the limitation of BERT as it lacks in handling large text sequences.

5 Evaluation metrics and challenges

The objective of this section is to discuss evaluation metrics used to evaluate the model’s performance and involved challenges.

5.1 Evaluation metrics

Since the number of labels in most classification problems is fixed, it is easy to determine the score for each class and, as a result, the loss from the ground truth. In image generation problems, the output resolution and ground truth are both fixed. As a result, we can calculate the loss at the pixel level using ground truth. But in NLP, though output format is predetermined in the case of NLP, dimensions cannot be specified. It is because a single statement can be expressed in multiple ways without changing the intent and meaning of that statement. Evaluation metrics are important to evaluate the model’s performance if we were trying to solve two problems with one model.

BLEU (BiLingual Evaluation Understudy) Score: Each word in the output sentence is scored 1 if it appears in either of the reference sentences and a 0 if it does not. Further the number of words that appeared in one of the reference translations is divided by the total number of words in the output sentence to normalize the count so that it is always between 0 and 1. For example, if ground truth is “He is playing chess in the backyard” and output sentences are S1: “He is playing tennis in the backyard”, S2: “He is playing badminton in the backyard”, S3: “He is playing movie in the backyard” and S4: “backyard backyard backyard backyard backyard backyard backyard”. The score of S1, S2 and S3 would be 6/7,6/7 and 6/7. All sentences are getting the same score though information in S1 and S3 is not same. This is because BELU considers words in a sentence contribute equally to the meaning of a sentence which is not the case in real-world scenario. Using combination of uni-gram, bi-gram and n-grams, we can to capture the order of a sentence. We may also set a limit on how many times each word is counted based on how many times it appears in each reference phrase, which helps us prevent excessive repetition.

GLUE (General Language Understanding Evaluation) score: Previously, NLP models were almost usually built to perform effectively on a unique job. Various models such as LSTM, Bi-LSTM were trained solely for this task, and very rarely generalized to other tasks. The model which is used for named entity recognition can perform for textual entailment. GLUE is a set of datasets for training, assessing, and comparing NLP models. It includes nine diverse task datasets designed to test a model’s language understanding. To acquire a comprehensive assessment of a model’s performance, GLUE tests the model on a variety of tasks rather than a single one. Single-sentence tasks, similarity and paraphrase tasks, and inference tasks are among them. For example, in sentiment analysis of customer reviews, we might be interested in analyzing ambiguous reviews and determining which product the client is referring to in his reviews. Thus, the model obtains a good “knowledge” of language in general after some generalized pre-training. When the time comes to test out a model to meet a given task, this universal “knowledge” gives us an advantage. With GLUE, researchers can evaluate their model and score it on all nine tasks. The final performance score model is the average of those nine scores. It makes little difference how the model looks or works if it can analyze inputs and predict outcomes for all the activities.

Considering these metrics in mind, it helps to evaluate the performance of an NLP model for a particular task or a variety of tasks.

5.2 Challenges

The applications of NLP have been growing day by day, and with these new challenges are also occurring despite a lot of work done in the recent past. Some of the common challenges are: Contextual words and phrases in the language where same words and phrases can have different meanings in a sentence which are easy for the humans to understand but makes a challenging task. Such type of challenges can also be faced with dealing Synonyms in the language because humans use many different words to express the same idea, also in the language different levels of complexity such as large, huge, and big may be used by the different persons which become a challenging task to process the language and design algorithms to adopt all these issues. Further in language, Homonyms, the words used to be pronounced the same but have different definitions are also problematic for question answering and speech-to-text applications because they aren’t written in text form. Sentences using sarcasm and irony sometimes may be understood in the opposite way by the humans, and so designing models to deal with such sentences is a really challenging task in NLP. Furthermore, the sentences in the language having any type of ambiguity in the sense of interpreting in more than one way is also an area to work upon where more accuracy can be achieved. Language containing informal phrases, expressions, idioms, and culture-specific lingo make difficult to design models intended for the broad use, however having a lot of data on which training and updating on regular basis may improve the models, but it is a really challenging task to deal with the words having different meaning in different geographic areas. In fact, such types of issues also occur in dealing with different domains such as the meaning of words or sentences may be different in the education industry but have different meaning in health, law, defense etc. So, the models for NLP may be working good for an individual domain, geographic area but for a broad use such challenges need to be tackled. Further together with the above-mentioned challenges misspelled or misused words can also create a problem, although autocorrect and grammar corrections applications have improved a lot due to the continuous developments in the direction but predicting the intention of the writer that to from a specific domain, geographic area by considering sarcasm, expressions, informal phrases etc. is really a big challenge. There is no doubt that for most common widely used languages models for NLP have been doing very well, and further improving day by day but still there is a need for models for all the persons rather than specific knowledge of a particular language and technology. One may further refer to the work of Sharifirad and Matwin (2019) [ 123 ] for classification of different online harassment categories and challenges, Baclic et.al. (2020) [ 6 ] and Wong et al. (2018) [ 151 ] for challenges and opportunities in public health, Kang et.al. (2020) [ 63 ] for detailed literature survey and technological challenges relevant to management research and NLP, and a recent review work by Alshemali and Kalita (2020) [ 3 ], and references cited there in.

In the recent past, models dealing with Visual Commonsense Reasoning [ 31 ] and NLP have also been getting attention of the several researchers and seems a promising and challenging area to work upon. These models try to extract the information from an image, video using a visual reasoning paradigm such as the humans can infer from a given image, video beyond what is visually obvious, such as objects’ functions, people’s intents, and mental states. In this direction, recently Wen and Peng (2020) [ 149 ] suggested a model to capture knowledge from different perspectives, and perceive common sense in advance, and the results based on the conducted experiments on visual commonsense reasoning dataset VCR seems very satisfactory and effective. The work of Peng and Chi (2019) [ 102 ], that proposes Domain Adaptation with Scene Graph approach to transfer knowledge from the source domain with the objective to improve cross-media retrieval in the target domain, and Yen et al. (2019) [ 155 ] is also very useful to further explore the use of NLP and in its relevant domains.

6 Conclusion

This paper is written with three objectives. The first objective gives insights of the various important terminologies of NLP and NLG, and can be useful for the readers interested to start their early career in NLP and work relevant to its applications. The second objective of this paper focuses on the history, applications, and recent developments in the field of NLP. The third objective is to discuss datasets, approaches and evaluation metrics used in NLP. The relevant work done in the existing literature with their findings and some of the important applications and projects in NLP are also discussed in the paper. The last two objectives may serve as a literature survey for the readers already working in the NLP and relevant fields, and further can provide motivation to explore the fields mentioned in this paper. It is to be noticed that even though a great amount of work on natural language processing is available in literature surveys (one may refer to [ 15 , 32 , 63 , 98 , 133 , 151 ] focusing on one domain such as usage of deep-learning techniques in NLP, techniques used for email spam filtering, medication safety, management research, intrusion detection, and Gujarati language etc.), still there is not much work on regional languages, which can be the focus of future research.

Change history

25 july 2022.

Affiliation 3 has been added into the online PDF.

Ahonen H, Heinonen O, Klemettinen M, Verkamo AI (1998) Applying data mining techniques for descriptive phrase extraction in digital document collections. In research and technology advances in digital libraries, 1998. ADL 98. Proceedings. IEEE international forum on (pp. 2-11). IEEE

Alshawi H (1992) The core language engine. MIT press

Alshemali B, Kalita J (2020) Improving the reliability of deep neural networks in NLP: A review. Knowl-Based Syst 191:105210

Article Google Scholar

Andreev ND (1967) The intermediary language as the focal point of machine translation. In: Booth AD (ed) Machine translation. North Holland Publishing Company, Amsterdam, pp 3–27

Google Scholar

Androutsopoulos I, Paliouras G, Karkaletsis V, Sakkis G, Spyropoulos CD, Stamatopoulos P (2000) Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. arXiv preprint cs/0009009

Baclic O, Tunis M, Young K, Doan C, Swerdfeger H, Schonfeld J (2020) Artificial intelligence in public health: challenges and opportunities for public health made possible by advances in natural language processing. Can Commun Dis Rep 46(6):161

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In ICLR 2015

Bangalore S, Rambow O, Whittaker S (2000) Evaluation metrics for generation. In proceedings of the first international conference on natural language generation-volume 14 (pp. 1-8). Assoc Comput Linguist

Baud RH, Rassinoux AM, Scherrer JR (1991) Knowledge representation of discharge summaries. In AIME 91 (pp. 173–182). Springer, Berlin Heidelberg

Baud RH, Rassinoux AM, Scherrer JR (1992) Natural language processing and semantical representation of medical texts. Methods Inf Med 31(2):117–125

Baud RH, Alpay L, Lovis C (1994) Let’s meet the users with natural language understanding. Knowledge and Decisions in Health Telematics: The Next Decade 12:103

Bengio Y, Ducharme R, Vincent P (2001) A neural probabilistic language model. Proceedings of NIPS

Benson E, Haghighi A, Barzilay R (2011) Event discovery in social media feeds. In proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies-volume 1 (pp. 389-398). Assoc Comput Linguist

Berger AL, Della Pietra SA, Della Pietra VJ (1996) A maximum entropy approach to natural language processing. Computational Linguistics 22(1):39–71

Blanzieri E, Bryl A (2008) A survey of learning-based techniques of email spam filtering. Artif Intell Rev 29(1):63–92

Bondale N, Maloor P, Vaidyanathan A, Sengupta S, Rao PV (1999) Extraction of information from open-ended questionnaires using natural language processing techniques. Computer Science and Informatics 29(2):15–22

Borst F, Sager N, Nhàn NT, Su Y, Lyman M, Tick LJ, ..., Scherrer JR (1989) Analyse automatique de comptes rendus d'hospitalisation. In Degoulet P, Stephan JC, Venot A, Yvon PJ, rédacteurs. Informatique et Santé, Informatique et Gestion des Unités de Soins, Comptes Rendus du Colloque AIM-IF, Paris (pp. 246–56). [5]

Briscoe EJ, Grover C, Boguraev B, Carroll J (1987) A formalism and environment for the development of a large grammar of English. IJCAI 87:703–708

Carreras X, Marquez L (2001) Boosting trees for anti-spam email filtering. arXiv preprint cs/0109015

Chalkidis I, Fergadiotis M, Malakasiotis P, Aletras N, Androutsopoulos I (2020) LEGAL-BERT: the muppets straight out of law school. arXiv preprint arXiv:2010.02559

Chi EC, Lyman MS, Sager N, Friedman C, Macleod C (1985) A database of computer-structured narrative: methods of computing complex relations. In proceedings of the annual symposium on computer application in medical care (p. 221). Am Med Inform Assoc

Cho K, Van Merriënboer B, Bahdanau D, Bengio Y, (2014) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259

Chomsky N (1965) Aspects of the theory of syntax. MIT Press, Cambridge, Massachusetts

Choudhary N (2021) LDC-IL: the Indian repository of resources for language technology. Lang Resources & Evaluation 55:855–867. https://doi.org/10.1007/s10579-020-09523-3

Chouikhi H, Chniter H, Jarray F (2021) Arabic sentiment analysis using BERT model. In international conference on computational collective intelligence (pp. 621-632). Springer, Cham

Chung J, Gulcehre C, Cho K, Bengio Y, (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555

Cohen WW (1996) Learning rules that classify e-mail. In AAAI spring symposium on machine learning in information access (Vol. 18, p. 25)

Cohen PR, Morgan J, Ramsay AM (2002) Intention in communication, Am J Psychol 104(4)

Collobert R, Weston J (2008) A unified architecture for natural language processing. In proceedings of the 25th international conference on machine learning (pp. 160–167)

Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R, (2019) Transformer-xl: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860

Davis E, Marcus G (2015) Commonsense reasoning and commonsense knowledge in artificial intelligence. Commun ACM 58(9):92–103

Desai NP, Dabhi VK (2022) Resources and components for Gujarati NLP systems: a survey. Artif Intell Rev:1–19

Devlin J, Chang MW, Lee K, Toutanova K, (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proceedings of HLT-NAACL 2004: Short papers (pp. 149–152). Assoc Computat Linguist

Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In proceedings of the second international conference on human language technology research (pp. 138-145). Morgan Kaufmann publishers Inc

Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054

Dunlavy DM, O’Leary DP, Conroy JM, Schlesinger JD (2007) QCS: A system for querying, clustering and summarizing documents. Inf Process Manag 43(6):1588–1605

Elkan C (2008) Log-Linear Models and Conditional Random Fields. http://cseweb.ucsd.edu/welkan/250B/cikmtutorial.pdf accessed 28 Jun 2017.

Emele MC, Dorna M (1998) Ambiguity preserving machine translation using packed representations. In proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics-volume 1 (pp. 365-371). Association for Computational Linguistics

Europarl: A Parallel Corpus for Statistical Machine Translation (2005) Philipp Koehn , MT Summit 2005

Fan Y, Tian F, Xia Y, Qin T, Li XY, Liu TY (2020) Searching better architectures for neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1574–1585

Fang H, Lu W, Wu F, Zhang Y, Shang X, Shao J, Zhuang Y (2015) Topic aspect-oriented summarization via group selection. Neurocomputing 149:1613–1619

Fattah MA, Ren F (2009) GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang 23(1):126–144

Feldman S (1999) NLP meets the jabberwocky: natural language processing in information retrieval. Online-Weston Then Wilton 23:62–73

Friedman C, Cimino JJ, Johnson SB (1993) A conceptual model for clinical radiology reports. In proceedings of the annual symposium on computer application in medical care (p. 829). Am Med Inform Assoc

Gao T, Dontcheva M, Adar E, Liu Z, Karahalios K DataTone: managing ambiguity in natural language interfaces for data visualization, UIST ‘15: proceedings of the 28th annual ACM symposium on User Interface Software & Technology, November 2015, 489–500, https://doi.org/10.1145/2807442.2807478

Ghosh S, Vinyals O, Strope B, Roy S, Dean T, Heck L (2016) Contextual lstm (clstm) models for large scale nlp tasks. arXiv preprint arXiv:1602.06291

Glasgow B, Mandell A, Binney D, Ghemri L, Fisher D (1998) MITA: an information-extraction approach to the analysis of free-form text in life insurance applications. AI Mag 19(1):59

Goldberg Y (2017) Neural network methods for natural language processing. Synthesis lectures on human language technologies 10(1):1–309

Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 19-25). ACM

Green Jr, BF, Wolf AK, Chomsky C, Laughery K (1961) Baseball: an automatic question-answerer. In papers presented at the may 9-11, 1961, western joint IRE-AIEE-ACM computer conference (pp. 219-224). ACM

Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28(10):2222–2232

Article MathSciNet Google Scholar

Grishman R, Sager N, Raze C, Bookchin B (1973) The linguistic string parser. In proceedings of the June 4-8, 1973, national computer conference and exposition (pp. 427-434). ACM

Hayes PJ (1992) Intelligent high-volume text processing using shallow, domain-specific techniques. Text-based intelligent systems: current research and practice in information extraction and retrieval, 227-242.

Hendrix GG, Sacerdoti ED, Sagalowicz D, Slocum J (1978) Developing a natural language interface to complex data. ACM Transactions on Database Systems (TODS) 3(2):105–147

"Here’s Why Natural Language Processing is the Future of BI (2017) " SmartData Collective. N.p., n.d. Web. 19

Hirschman L, Grishman R, Sager N (1976) From text to structured information: automatic processing of medical reports. In proceedings of the June 7-10, 1976, national computer conference and exposition (pp. 267-275). ACM

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991

Hutchins WJ (1986) Machine translation: past, present, future (p. 66). Ellis Horwood, Chichester

Jurafsky D, Martin J (2008) H. Speech and language processing. 2nd edn. Prentice-Hall, Englewood Cliffs, NJ

Kamp H, Reyle U (1993) Tense and aspect. In from discourse to logic (pp. 483-689). Springer Netherlands

Kang Y, Cai Z, Tan CW, Huang Q, Liu H (2020) Natural language processing (NLP) in management research: A literature review. Journal of Management Analytics 7(2):139–172

Kim Y. (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882

Knight K, Langkilde I (2000) Preserving ambiguities in generation via automata intersection. In AAAI/IAAI (pp. 697-702)

Lass R (1998) Phonology: An Introduction to Basic Concepts. Cambridge, UK; New York; Melbourne, Australia: Cambridge University Press. p. 1. ISBN 978–0–521-23728-4. Retrieved 8 January 2011Paperback ISBN 0–521–28183-0

Lewis DD (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In European conference on machine learning (pp. 4–15). Springer, Berlin Heidelberg

Liddy ED (2001). Natural language processing

Lopez MM, Kalita J (2017) Deep learning applied to NLP. arXiv preprint arXiv:1703.03091

Luong MT, Sutskever I, Le Q V, Vinyals O, Zaremba W (2014) Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206

Lyman M, Sager N, Friedman C, Chi E (1985) Computer-structured narrative in ambulatory care: its use in longitudinal review of clinical data. In proceedings of the annual symposium on computer application in medical care (p. 82). Am Med Inform Assoc

Lyman M, Sager N, Chi EC, Tick LJ, Nhan NT, Su Y, ..., Scherrer, J. (1989) Medical Language Processing for Knowledge Representation and Retrievals. In Proceedings. Symposium on Computer Applications in Medical Care (pp. 548–553). Am Med Inform Assoc

Maas A, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (pp. 142-150)

Mani I, Maybury MT (eds) (1999) Advances in automatic text summarization, vol 293. MIT press, Cambridge, MA

Manning CD, Schütze H (1999) Foundations of statistical natural language processing, vol 999. MIT press, Cambridge

MATH Google Scholar

Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330

McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (Vol. 752, pp. 41-48)

McCray AT (1991) Natural language processing for intelligent information retrieval. In Engineering in Medicine and Biology Society, 1991. Vol. 13: 1991., Proceedings of the Annual International Conference of the IEEE (pp. 1160–1161). IEEE

McCray AT (1991) Extending a natural language parser with UMLS knowledge. In proceedings of the annual symposium on computer application in medical care (p. 194). Am Med Inform Assoc

McCray AT, Nelson SJ (1995) The representation of meaning in the UMLS. Methods Inf Med 34(1–2):193–201

McCray AT, Razi A (1994) The UMLS knowledge source server. Medinfo MedInfo 8:144–147

McCray AT, Srinivasan S, Browne AC (1994) Lexical methods for managing variation in biomedical terminologies. In proceedings of the annual symposium on computer application in medical care (p. 235). Am Med Inform Assoc

McDonald R, Crammer K, Pereira F (2005) Flexible text segmentation with structured multilabel classification. In proceedings of the conference on human language technology and empirical methods in natural language processing (pp. 987-994). Assoc Comput Linguist

McGray AT, Sponsler JL, Brylawski B, Browne AC (1987) The role of lexical knowledge in biomedical text understanding. In proceedings of the annual symposium on computer application in medical care (p. 103). Am Med Inform Assoc

McKeown KR (1985) Text generation. Cambridge University Press, Cambridge

Book Google Scholar

Merity S, Keskar NS, Socher R (2018) An analysis of neural language modeling at multiple scales. arXiv preprint arXiv:1803.08240

Mikolov T, Chen K, Corrado G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems

Morel-Guillemaz AM, Baud RH, Scherrer JR (1990) Proximity processing of medical text. In medical informatics Europe’90 (pp. 625–630). Springer, Berlin Heidelberg

Morin E (1999) Automatic acquisition of semantic relations between terms from technical corpora. In proc. of the fifth international congress on terminology and knowledge engineering-TKE’99

Müller M, Salathé M, Kummervold PE (2020) Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. arXiv preprint arXiv:2005.07503

"Natural Language Processing (2017) " Natural Language Processing RSS. N.p., n.d. Web. 25

"Natural Language Processing" (2017) Natural Language Processing RSS. N.p., n.d. Web. 23

Newatia R (2019) https://medium.com/saarthi-ai/sentence-classification-using-convolutional-neural-networks-ddad72c7048c . Accessed 15 Dec 2021

Nhàn NT, Sager N, Lyman M, Tick LJ, Borst F, Su Y (1989) A medical language processor for two indo-European languages. In proceedings. Symposium on computer applications in medical care (pp. 554-558). Am Med Inform Assoc

Nießen S, Och FJ, Leusch G, Ney H (2000) An evaluation tool for machine translation: fast evaluation for MT research. In LREC

Ochoa, A. (2016). Meet the Pilot: Smart Earpiece Language Translator. https://www.indiegogo.com/projects/meet-the-pilot-smart-earpiece-language-translator-headphones-travel . Accessed April 10, 2017

Ogallo, W., & Kanter, A. S. (2017). Using natural language processing and network analysis to develop a conceptual framework for medication therapy management research. https://www.ncbi.nlm.nih.gov/pubmed/28269895?dopt=Abstract . Accessed April 10, 2017

Otter DW, Medina JR, Kalita JK (2020) A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems 32(2):604–624

Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237

Palmer M, Gildea D, Kingsbury P (2005) The proposition bank: an annotated corpus of semantic roles. Computational linguistics 31(1):71–106

Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Assoc Comput Linguist

Peng Y, Chi J (2019) Unsupervised cross-media retrieval using domain adaptation with scene graph. IEEE Transactions on Circuits and Systems for Video Technology 30(11):4368–4379

Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137

Rae JW, Potapenko A, Jayakumar SM, Lillicrap TP, (2019) Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:1911.05507

Ranjan P, Basu HVSSA (2003) Part of speech tagging and local word grouping techniques for natural language parsing in Hindi. In Proceedings of the 1st International Conference on Natural Language Processing (ICON 2003)

Rassinoux AM, Baud RH, Scherrer JR (1992) Conceptual graphs model extension for knowledge representation of medical texts. MEDINFO 92:1368–1374

Rassinoux AM, Michel PA, Juge C, Baud R, Scherrer JR (1994) Natural language processing of medical texts within the HELIOS environment. Comput Methods Prog Biomed 45:S79–S96

Rassinoux AM, Juge C, Michel PA, Baud RH, Lemaitre D, Jean FC, Scherrer JR (1995) Analysis of medical jargon: The RECIT system. In Conference on Artificial Intelligence in Medicine in Europe (pp. 42–52). Springer, Berlin Heidelberg

Rennie J (2000) ifile: An application of machine learning to e-mail filtering. In Proc. KDD 2000 Workshop on text mining, Boston, MA

Riedhammer K, Favre B, Hakkani-Tür D (2010) Long story short–global unsupervised models for keyphrase based meeting summarization. Speech Comm 52(10):801–815

Ritter A, Clark S, Etzioni O (2011) Named entity recognition in tweets: an experimental study. In proceedings of the conference on empirical methods in natural language processing (pp. 1524-1534). Assoc Comput Linguist

Rospocher M, van Erp M, Vossen P, Fokkens A, Aldabe I, Rigau G, Soroa A, Ploeger T, Bogaard T(2016) Building event-centric knowledge graphs from news. Web Semantics: Science, Services and Agents on the World Wide Web, In Press

Sager N, Lyman M, Tick LJ, Borst F, Nhan NT, Revillard C, … Scherrer JR (1989) Adapting a medical language processor from English to French. Medinfo 89:795–799

Sager N, Lyman M, Nhan NT, Tick LJ (1995) Medical language processing: applications to patient data representation and automatic encoding. Methods Inf Med 34(1–2):140–146

Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. In learning for text categorization: papers from the 1998 workshop (Vol. 62, pp. 98-105)

Sakkis G, Androutsopoulos I, Paliouras G, Karkaletsis V, Spyropoulos CD, Stamatopoulos P (2001) Stacking classifiers for anti-spam filtering of e-mail. arXiv preprint cs/0106040

Sakkis G, Androutsopoulos I, Paliouras G et al (2003) A memory-based approach to anti-spam filtering for mailing lists. Inf Retr 6:49–73. https://doi.org/10.1023/A:1022948414856

Santoro A, Faulkner R, Raposo D, Rae J, Chrzanowski M, Weber T, ..., Lillicrap T (2018) Relational recurrent neural networks. Adv Neural Inf Proces Syst, 31

Scherrer JR, Revillard C, Borst F, Berthoud M, Lovis C (1994) Medical office automation integrated into the distributed architecture of a hospital information system. Methods Inf Med 33(2):174–179

Seal D, Roy UK, Basak R (2020) Sentence-level emotion detection from text based on semantic rules. In: Tuba M, Akashe S, Joshi A (eds) Information and communication Technology for Sustainable Development. Advances in intelligent Systems and computing, vol 933. Springer, Singapore. https://doi.org/10.1007/978-981-13-7166-0_42

Chapter Google Scholar

Sentiraama Corpus by Gangula Rama Rohit Reddy, Radhika Mamidi. Language Technologies Research Centre, KCIS, IIIT Hyderabad (n.d.) ltrc.iiit.ac.in/showfile.php?filename=downloads/sentiraama/

Sha F, Pereira F (2003) Shallow parsing with conditional random fields. In proceedings of the 2003 conference of the north American chapter of the Association for Computational Linguistics on human language technology-volume 1 (pp. 134-141). Assoc Comput Linguist

Sharifirad S, Matwin S, (2019) When a tweet is actually sexist. A more comprehensive classification of different online harassment categories and the challenges in NLP. arXiv preprint arXiv:1902.10584

Sharma S, Srinivas PYKL, Balabantaray RC (2016) Emotion Detection using Online Machine Learning Method and TLBO on Mixed Script. In Proceedings of Language Resources and Evaluation Conference 2016 (pp. 47–51)

Shemtov H (1997) Ambiguity management in natural language generation. Stanford University

Small SL, Cortell GW, Tanenhaus MK (1988) Lexical Ambiguity Resolutions. Morgan Kauffman, San Mateo, CA

Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642)

Sonnhammer EL, Eddy SR, Birney E, Bateman A, Durbin R (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res 26(1):320–322

Srihari S (2010) Machine Learning: Generative and Discriminative Models. http://www.cedar.buffalo.edu/wsrihari/CSE574/Discriminative-Generative.pdf . accessed 31 May 2017.]

Sun X, Morency LP, Okanohara D, Tsujii JI (2008) Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference. In proceedings of the 22nd international conference on computational linguistics-volume 1 (pp. 841-848). Assoc Comput Linguist

Sundheim BM, Chinchor NA (1993) Survey of the message understanding conferences. In proceedings of the workshop on human language technology (pp. 56-60). Assoc Comput Linguist

Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems

Sworna ZT, Mousavi Z, Babar MA (2022) NLP methods in host-based intrusion detection Systems: A systematic review and future directions. arXiv preprint arXiv:2201.08066

Systems RAVN (2017) "RAVN Systems Launch the ACE Powered GDPR Robot - Artificial Intelligence to Expedite GDPR Compliance." Stock Market. PR Newswire, n.d. Web. 19

Tan KL, Lee CP, Anbananthen KSM, Lim KM (2022) RoBERTa-LSTM: A hybrid model for sentiment analysis with transformers and recurrent neural network. IEEE Access, RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network

Tapaswi N, Jain S (2012) Treebank based deep grammar acquisition and part-of-speech tagging for Sanskrit sentences. In software engineering (CONSEG), 2012 CSI sixth international conference on (pp. 1-4). IEEE

Thomas C (2019) https://towardsdatascience.com/recurrent-neural-networks-and-natural-language-processing-73af640c2aa1 . Accessed 15 Dec 2021

Tillmann C, Vogel S, Ney H, Zubiaga A, Sawaf H (1997) Accelerated DP based search for statistical translation. In Eurospeech

Umber A, Bajwa I (2011) “Minimizing ambiguity in natural language software requirements specification,” in Sixth Int Conf Digit Inf Manag, pp. 102–107

"Using Natural Language Processing and Network Analysis to Develop a Conceptual Framework for Medication Therapy Management Research (2017) " AMIA ... Annual Symposium proceedings. AMIA Symposium. U.S. National Library of Medicine, n.d. Web. 19

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I, (2017) Attention is all you need. In advances in neural information processing systems (pp. 5998-6008)

Wahlster W, Kobsa A (1989) User models in dialog systems. In user models in dialog systems (pp. 4–34). Springer Berlin Heidelberg, User Models in Dialog Systems

Walton D (1996) A pragmatic synthesis. In: fallacies arising from ambiguity. Applied logic series, vol 1. Springer, Dordrecht)

Wan X (2008) Using only cross-document relationships for both generic and topic-focused multi-document summarizations. Inf Retr 11(1):25–49

Wang W, Gang J, 2018 Application of convolutional neural network in natural language processing. In 2018 international conference on information Systems and computer aided education (ICISCAE) (pp. 64-70). IEEE

Wang D, Zhu S, Li T, Gong Y (2009) Multi-document summarization using sentence-based topic models. In proceedings of the ACL-IJCNLP 2009 conference short papers (pp. 297-300). Assoc Comput Linguist

Wang D, Zhu S, Li T, Chi Y, Gong Y (2011) Integrating document clustering and multidocument summarization. ACM Transactions on Knowledge Discovery from Data (TKDD) 5(3):14–26

Wang Z, Ng P, Ma X, Nallapati R, Xiang B (2019) Multi-passage bert: A globally normalized bert model for open-domain question answering. arXiv preprint arXiv:1908.08167

Wen Z, Peng Y (2020) Multi-level knowledge injecting for visual commonsense reasoning. IEEE Transactions on Circuits and Systems for Video Technology 31(3):1042–1054

Wiese G, Weissenborn D, Neves M (2017) Neural domain adaptation for biomedical question answering. arXiv preprint arXiv:1706.03610

Wong A, Plasek JM, Montecalvo SP, Zhou L (2018) Natural language processing and its implications for the future of medication safety: a narrative review of recent advances and challenges. Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy 38(8):822–841

Woods WA (1978) Semantics and quantification in natural language question answering. Adv Comput 17:1–87

Xia T (2020) A constant time complexity spam detection algorithm for boosting throughput on rule-based filtering Systems. IEEE Access 8:82653–82661. https://doi.org/10.1109/ACCESS.2020.2991328

Xie P, Xing E (2017) A constituent-centric neural architecture for reading comprehension. In proceedings of the 55th annual meeting of the Association for Computational Linguistics (volume 1: long papers) (pp. 1405-1414)

Yan X, Ye Y, Mao Y, Yu H (2019) Shared-private information bottleneck method for cross-modal clustering. IEEE Access 7:36045–36056

Yi J, Nasukawa T, Bunescu R, Niblack W (2003) Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In data mining, 2003. ICDM 2003. Third IEEE international conference on (pp. 427-434). IEEE

Young SJ, Chase LL (1998) Speech recognition evaluation: a review of the US CSR and LVCSR programmes. Comput Speech Lang 12(4):263–279

Yu S, et al. (2018) "A multi-stage memory augmented neural network for machine reading comprehension." Proceedings of the workshop on machine reading for question answering

Zajic DM, Dorr BJ, Lin J (2008) Single-document and multi-document summarization techniques for email threads using sentence compression. Inf Process Manag 44(4):1600–1610

Zeroual I, Lakhouaja A, Belahbib R (2017) Towards a standard part of speech tagset for the Arabic language. J King Saud Univ Comput Inf Sci 29(2):171–178

Download references

Acknowledgements

Authors would like to express the gratitude to Research Mentors from CL Educate: Accendere Knowledge Management Services Pvt. Ltd. for their comments on earlier versions of the manuscript. Although any errors are our own and should not tarnish the reputations of these esteemed persons. We would also like to appreciate the Editor, Associate Editor, and anonymous referees for their constructive suggestions that led to many improvements on an earlier version of this manuscript.

Author information

Authors and affiliations.

Department of Computer Science, Manav Rachna International Institute of Research and Studies, Faridabad, India

Diksha Khurana & Aditya Koli

Department of Computer Science, BML Munjal University, Gurgaon, India

Kiran Khatter

Department of Statistics, Amity University Punjab, Mohali, India

Sukhdev Singh

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kiran Khatter .

Ethics declarations

Conflict of interest.

The first draft of this paper was written under the supervision of Dr. Kiran Khatter and Dr. Sukhdev Singh, associated with CL- Educate: Accendere Knowledge Management Services Pvt. Ltd. and deputed at the Manav Rachna International University. The draft is also available on arxiv.org at https://arxiv.org/abs/1708.05148

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Khurana, D., Koli, A., Khatter, K. et al. Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl 82 , 3713–3744 (2023). https://doi.org/10.1007/s11042-022-13428-4

Download citation

Received : 03 February 2021

Revised : 23 March 2022

Accepted : 02 July 2022

Published : 14 July 2022

Issue Date : January 2023

DOI : https://doi.org/10.1007/s11042-022-13428-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Natural language processing
Natural language understanding
Natural language generation
NLP applications
NLP evaluation metrics
Find a journal
Publish with us
Track your research

Flitto DataLab

Latest AI and Data Insights

Four innovative NLP research papers in ACL 2023

Part 3 of the ACL 2023 Review Series for NLP research papers on recent findings and suggestions

The 61st Annual Meeting of the Association for Computational Linguistics ( ACL 2023 ) showcased numerous astounding NLP research papers during the first half of this year. For the first two sessions, we looked at some recognized papers and ones that reflect where the technology stands today .

Concluding the series, Flitto DataLab highlights some NLP research papers that have particularly caught our interest. These studies offer an innovative standpoint on where the technology can head next.

Causes and Cures for Interference in Multilingual Translation

Uri Shaham, Maha Elbayad, Vedanuj Goswami, Omer Levy and Shruti Bhosale

This research poses a direct implication to any language translation service.

In his opening, presenter Uri Shaham stated how multilingual interference models can benefit from the synergy or suffer from interference between language pairs. For example, an MT model trained to translate English to Finnish may witness better performance in Estonian but negatively impact Chinese translation.

Uri Shaham and co-authors present their NLP research papers

Shaham led his team to examine major factors aggravating interference in multilingual translation and mitigation strategies. They found that language similarity and the number of languages are not major contributors to the model’s loss. This leaves model size, data size, and the size of datasets of other languages in the equation for improving MT’s performance.

For this NLP research paper, Shaham experimented with four transformer variants to support their hypothesis, each bearing different sizes and parameters. Then, they trained the models with linguistic datasets consisting of 15 languages before measuring performance loss between the source and target interference language.

They reported only a slight performance difference between English -> French and English -> Russian when training the model with 15.2 million English-Spanish samples. The result showed that the level of similarity between languages were not the decisive factor for interference.

Meanwhile, data size, model size and parameter poverty tended to cause severe interference. They also found that the smallest model performed poorly because of the limited parameter counts. Shaham recommended increasing the sampling rate to minimize interference loss, mainly when training with a low-resource language.

Towards Speech Dialogue Translation Mediating Speakers of Different Languages

Shuichiro Shimizu, Chenhui Chu, Sheng Li and Sadao Kurohashi

Shuichiro Shimizu drew our attention to multilingual speech dialogue translation (SDT), an NLP discipline less developed than monologue translation.

Speech dialogue translation (SDT) applies automatic speech recognition and machine translation to enable listeners to comprehend speech in their native language. For example, an audio translator software incorporating SDT will enable someone to say “ Konnichiwa ” and let another person read “ Hello ,” its English translation.

NLP research papers in the session included Shuichiro Shimizu's research on SDT

To support further developments in this discipline, Shuichiro and his fellow researchers created the SpeechBSD dataset. They curated crowdsourced audio from different platforms, taking both monolingual and bilingual contexts into consideration.

Shuichiro highlighted the importance of the model’s ability to infer the speaker’s pronouns from the spoken dialogue. Subsequent experiments confirmed that applying bilingual context in the dataset improves the model’s performance when translating speech dialogue.

Towards Understanding Omission in Dialogue Summarization

Yicheng Zou, Kaitao Song, Xu Tan, Zhongkai Fu, Qi Zhang, Dongsheng Li and Tao Gui

Pre-trained large language models play an increasingly important role in summarizing dialogue. Dialogue summarization aims to extract essential information in customer service, medical consultation, meetings, and other use cases. However, such models occasionally omit specific facts, leading to inaccurate summaries. This limits their practicality in real-life applications.

Yicheng Zou presents his NLP research papers on omisson in dialogue summarization

Researcher Yicheng Zou, in his presentation, highlighted the lack of efforts to address omission issues affecting language models. He compared several models across five domains to gain insight into the severity of the problem. He noted that even established models like BART experience substantial omission issues.

Zou proposed a specific task to enable a model to identify and overcome omissions. Here, the model is presented with several candidate summaries, where it needs to predict which key information is omitted. With this principle, the researcher created OLDS, a high-quality annotated dataset that trains a model to detect omission.

OLDS was constructed with summaries of datasets from different domains. It consists of candidate summaries generated by different language models. Then, Zou tested the dataset with BERT and RoBERTa on several baseline tasks, including text matching, extractive summarization, and question answering. The results reaffirm Zou’s hypothesis that large language models might omit key points when summarizing. But the speaker further revealed that refining the model with omitted content can improve its performance.

Annotating and Detecting Fine-grained Factual Errors for Dialogue Summarization

Rongxin Zhu, Jianzhong Qi and Jey Han Lau

In this session, Ph.D. student Rongxin Zhu shed light on the complexities of dialogue summarization. In his paper, he discussed summarization models’ challenges in detecting fine-grained factual errors. Such occurrences lead the model to misrepresent or fabricate facts from source conversations.

Rongxin Zhu's presentation screen for his NLP research papers

According to the speaker, there is a lack of datasets that could effectively benchmark a model’s ability to detect factual errors in dialogue. Specifically, Zhu pointed out that current datasets cannot answer pressing questions, such as:

Where the error appears
Why the error exists
What the types of errors are

To support better factual error analysis in summarization models, the speaker’s team developed DiaSumFact. DiaSumFact is an annotated representation of various fine-grained factual errors, including entity, predicate, and circumstance errors. Then, they tested several models with the dataset, revealing questionable performance that produces intrinsic and extrinsic factual errors.

The speaker also proposed EnDeRanker, an encoder-decoder model that scores and ranks spans of probable error to enable better prediction. It shows comparable performance against two state-of-the-art models that excel in detecting entity errors. Yet, all models have major room for improvement as none demonstrated significantly impressive performance across all tasks.

Wrapping up in NLP research papers of 2023 so far…

ACL 2023 revealed remarkable insights and advancements on the NLP research landscape of 2023 that continue to shape the future of language technology.

As we wrap up our exploration of these notable research papers, it’s clear that the field is evolving at an unprecedented pace. We’re excited for more exciting possibilities to be enabled by these AI-driven language understanding and communication. Upon further progress, technologies like speech dialogue translation will truly revolutionize the field.

Stay tuned as we bring you further updates on the ever-evolving realm of NLP innovation.

By Flitto DataLab

Related post, ai virtual assistants vs. chatbots: what are the differences, ai case study: how figma ai uses language models, real-time ai interpretation: a closer look, more on flitto datalab, ai personalization: techniques and applications, what to expect in gitex africa 2024.

CEO Simon Lee

CPO Simon Lee

Business Registration Number 215-87-72878

E-Commerce Registration Number 2014-SeoulGangnam-02858

Address (06173) 6F, 20 Yeongdong-daero 96-gil, Gangnam-gu, Seoul, Republic of Korea (169 Samsung-dong)

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Document text extraction

Online Tool To Extract Text From PDFs & Images

nlp consulting

Building Advanced Natural Language Processing (NLP) Applications

API & custom applications

Custom Machine Learning Models Extract Just What You Need

AI for legal documents

The Doc Hawk, Our Custom Application For Legal Documents

Natural Language Processing

Machine learning, deep learning, neural networks, large language models, pre-processing, optimisation, learning types, top 10 natural language processing (nlp) research papers worth reading for beginners.

by Neri Van Otten | Feb 7, 2023 | Data Science , Natural Language Processing

Reading research papers is integral to staying current and advancing in the field of NLP. Research papers are a way to share new ideas, discoveries, and innovations in NLP. They also give a more detailed and technical explanation of NLP concepts and techniques. They also provide benchmark results for different models and methods, which can help practitioners and researchers make informed decisions about which models and techniques to use for a specific task.

Table of Contents

Getting started with reading research papers in NLP can seem daunting, but it can be a valuable and rewarding experience with the right approach. This article provides tips for reading research papers and a top-10 list of articles to get you started.

Learning NLP from research papers is one of the best things you can do to improve your understanding.

Why read research papers in NLP?

Reading research papers is vital in the field of natural language processing (NLP) and other related fields for several reasons:

Advancement of knowledge : Research papers are the primary means of disseminating new ideas, findings, and innovations in NLP and other related fields. Reading research papers allows practitioners and researchers to stay up-to-date with the latest advancements.
A better understanding of NLP : Research papers often give a more detailed and technical explanation of NLP concepts and techniques, which can help practitioners and researchers learn more about the field.
Inspiration for new ideas : Reading research papers can inspire new ideas and approaches to NLP problems, leading to breakthroughs and innovations.
Benchmarking performance : Research papers often present the results of experiments and benchmarks, which can be used to compare the performance of different NLP models and techniques. This can help practitioners and researchers make informed decisions about which models and techniques to use for a specific task.
Collaboration and networking : Reading research papers can also help practitioners and researchers build connections with others in the field and find potential collaborators for future projects.

Reading research papers is one of the best ways to stay up-to-date and progress in the field of NLP and other related fields.

How to get started reading research papers in NLP?

Here are some tips for getting started reading research papers in NLP and other related fields:

Choose a specific area of interest : NLP is a broad field with many subfields, so it’s helpful to focus on a particular area of interest, such as machine translation, sentiment analysis, or question answering. This will help you narrow down the list of papers to read and make it easier to understand the context and significance of each paper.
Start with survey papers : Survey papers provide an overview of the current state-of-the-art in a specific subfield of NLP and can be a great starting point for getting up to speed. They often summarise important papers, concepts, and techniques in the field.
Read the abstract and introduction first : Before diving into the details of a paper, start by reading the abstract and introduction. These sections provide a high-level overview of the paper’s contribution and the context in which it was written.
Focus on the methodology: The methodology section is often essential in NLP papers. It describes the techniques and models used in the paper and how they were evaluated. Make sure to understand the methodology before diving into the results.
T ake notes and summarize the key points : While reading, take notes and summarize the critical issues of each paper. This will help you remember the most crucial information and make it easier to compare and contrast different papers.
Be bold and ask for help : If you have questions or trouble understanding a paper, feel free to ask a colleague or reach out to the authors. They will be happy to help and may provide additional insights and perspectives on the work.
Practice, practice, practice : The more research papers you read, the easier it will become. Set aside time each week to read a few papers and practice summarizing the key points. Over time, you’ll develop a better understanding of NLP and the research in the field.

Top 10 research papers for NLP for beginners

1. an introduction to natural language processing, computational linguistics, and speech recognition.

An article by Daniel Jurafsky and James H. Martin provides an overview of NLP, computational linguistics , and speech recognition. The authors introduce key concepts and techniques used in the field, including syntax, semantics, and pragmatics.

2. A Primer on Neural Network Models for Natural Language Processing

The article by Yoav Goldberg explores the use of deep learning techniques in NLP . The author covers word embeddings , convolutional neural networks , recurrent neural networks , and attention mechanisms .

3. Efficient Estimation of Word Representations in Vector Space

An article by Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean introduces the concept of word embeddings and proposes a method for efficiently estimating them. The authors show how their method outperforms previous methods on various NLP tasks.

4. Bag of Tricks for Efficient Text Classification

The article by Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov proposes a set of simple, effective techniques for text classification that can be combined to achieve state-of-the-art performance. The authors demonstrate the effectiveness of their approach on a range of benchmark datasets.

5. A Structured Self-Attentive Sentence Embedding

The article by Yang Liu, Minjian Wang, Zhen Huang, Xiaodong Liu, Ming Zhou, and Wei-Ying Ma proposes a new method for creating sentence embeddings that incorporate local and global information. The authors show that their method outperforms previous methods on various NLP tasks.

6. Attention Is All You Need

The article by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin proposes a new type of neural network architecture called the Transformer , which uses attention mechanisms instead of recurrence or convolutions. The authors show that the Transformer outperforms previous models on a range of NLP tasks.

7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

The article by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova proposes a new pre-training method for deep bidirectional transformers that outperforms previous models on a range of NLP tasks. Furthermore, the authors show that fine-tuning their pre-trained models on specific tasks significantly improves performance.

8. Language Models are Few-Shot Learners

The article by Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Mateusz Litwin, Scott Gray, Jack Rae, Sam McCandlish, Tom Fansi, Christopher Hesse, Mark Chen, Will Dabney, Jianfeng Gao, Ilya Sutskever, and Dario Amodei proposes a new pre-training method for language models that outperforms previous models on a range of NLP tasks.

The authors demonstrate the effectiveness of their approach by training the largest language model to date, GPT-3 , on a massive corpus of text. Furthermore, they show that the pre-trained GPT-3 can be fine-tuned to do better at many NLP tasks, such as answering questions , translating , and summarizing .

9. ELMo: Deep contextualized word representations

The article by Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer introduces a deep contextualized word representation method that outperforms previous word embedding strategies on a range of NLP tasks. The authors show that their approach, called ELMo , can capture the context-dependent semantics of words and significantly improve the performance of NLP models.

10. ULMFiT: Universal Language Model Fine-tuning for Text Classification

The article by Jeremy Howard and Sebastian Ruder proposes a transfer learning method for NLP that fine-tunes a pre-trained language model on a target task with limited training data. The authors show that their approach, called ULMFiT, outperforms previous models on a range of text classification tasks and demonstrates the effectiveness of transfer learning in NLP .

Conclusion – reading NLP research papers

In conclusion, Natural Language Processing (NLP) is a critical subfield of AI that plays a crucial role in many areas. Reading research papers is essential to staying current and advancing in the field of NLP. Research papers are a way to share new ideas, findings, and innovations and learn more about NLP’s ideas and methods.

Getting started with reading research papers in NLP can be a challenge, but it can be a valuable and rewarding experience with the right approach. You can learn more about NLP and research in the field by focusing on a specific area of interest, starting with survey papers, reading the abstract and introduction, focusing on the methodology, taking notes, summarising key points, and practising regularly.

Overall, reading research papers is an essential investment in your career and personal growth in NLP and other related fields.

About the Author

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Neri Van Otten is a machine learning and software engineer with over 12 years of Natural Language Processing (NLP) experience. Dedicated to making your projects succeed.

Join the NLP Community

Stay Updated With Our Newsletter

Recent Articles

glove vector example "king" is to "queen" as "man" is to "woman"

BERTScore – A Powerful NLP Evaluation Metric Explained & How To Tutorial In Python

What is BERTScore? BERTScore is an innovative evaluation metric in natural language processing (NLP) that leverages the power of BERT (Bidirectional Encoder...

Perplexity In NLP: Understand How To Evaluate LLMs [Practical Guide]

Introduction to Perplexity in NLP In the rapidly evolving field of Natural Language Processing (NLP), evaluating the effectiveness of language models is crucial. One of...

BLEU Score In NLP: What Is It & How To Implement In Python

What is the BLEU Score in NLP? BLEU, Bilingual Evaluation Understudy, is a metric used to evaluate the quality of machine-generated text in NLP, most commonly in...

ROUGE Metric In NLP: Complete Guide & How To Tutorial In Python

What is the ROUGE Metric? ROUGE, which stands for Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics used to evaluate the quality of summaries and...

Normalised Discounted Cumulative Gain (NDCG) in ranking

Normalised Discounted Cumulative Gain (NDCG): Complete How To Guide

What is Normalised Discounted Cumulative Gain (NDCG)? Normalised Discounted Cumulative Gain (NDCG) is a popular evaluation metric used to measure the effectiveness of...

mean reciprocal rank MRR visually explained

Mean Reciprocal Rank (MRR): How It Works [A Complete Guide]

What is Mean Reciprocal Rank (MRR)? Mean Reciprocal Rank (MRR) is a metric used to evaluate the effectiveness of information retrieval systems, such as search engines...

Ethical AI Explained: Key Issues & Practical How To Implement Guide

What is Ethical AI? Ethical AI involves developing and deploying artificial intelligence systems prioritising fairness, transparency, accountability, and respect for...

mean average precision is used to evaluate search engine rankings

Understanding Ranking Algorithms: A Comprehensive Guide & How To Implement

What are Ranking Algorithms? Ranking algorithms are computational processes used to order items, such as web pages, products, or multimedia content, based on their...

Ultimate Guide To Data Structure Hashing With How To Tutorial In Python

What is Hashing? Hashing is used in computer science as a data structure to store and retrieve data efficiently. At its core, hashing involves taking an input (or...

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Submit Comment

2024 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2024. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!

* Unsubscribe to our weekly newsletter at any time. We comply with GDPR and do not share your data.

By subscribing you agree to our terms & conditions.

Tech At Bloomberg

Bloomberg’s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023

December 08, 2023

During the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023) in Singapore this week, researchers from Bloomberg’s AI Engineering Group are showcasing their expertise in natural language processing (NLP) by publishing four papers, one of which will appear in Findings of EMNLP 2023.

Through these papers, the authors and their collaborators highlight a variety of NLP applications, novel approaches and improved models used in key tasks, and other advances to the state-of-the-art in the field of computational linguistics.

We asked some of the authors to summarize their research and explain why the results were notable:

EntSUMv2: Data, Models and Evaluation for More Abstractive Entity-Centric Summarization

Dhruv Mehra (Bloomberg), Lingjue Xie (Bloomberg), Ella Hofmann-Coyle (Bloomberg), Mayank Kulkarni (work done while at Bloomberg) and Daniel Preoţiuc-Pietro (Bloomberg)

Poster Session 5 (Saturday, December 9, 2023 @ 11:00 AM SGT)

Front page of EMNLP 2023 paper "ENTSUMV2: Data, Models and Evaluation for More Abstractive Entity-Centric Summarization"

Please summarize your research. Why are your results notable?

Ella: Entity-centric summarization is a form of controllable summarization that requires producing a synopsis of a text document with respect to a specific entity. Our research focuses on abstractive summarization, which involves generating a new summary from scratch. This is in contrast to our previous work on extractive summarization , where the summary was constructed using only text that is present in the original text.

Exploration of this entity-centric summarization task was enabled by our past work at ACL 2022 , where we introduced the EntSUM dataset . In this paper, we release the EntSUMv2 dataset, which builds upon the original EntSUM dataset to include new annotated abstractive summaries that are intentionally shortened to aid in generating more specific and useful entity-centric summaries.

In addition to releasing EntSUMv2, we explore supervised fine-tuning and instruction tuning of large language models to generate entity-specific abstractive summaries and perform evaluation against EntSUMv2.

Table 1. Automated metrics for the different fine-tuned and instruction-tuned summarization models on the EntSUMv2 dataset (bold typeface denotes the best performance overall and underlined numbers represent best performance within a class of methods).

Dhruv: As you can see, it is clear that fine-tuned models (the middle section) fare much better than instruction-tuned models (the last section), but it is not clear what the differences between each of these models are. Are they producing short and relevant summaries about an entity that are incomplete? Or are they producing verbose and complete summaries about an entity that contain extra, yet irrelevant, information?

Ella: To answer these questions, we propose a new method of qualitative human evaluation that evaluates each model across five crucial facets that high quality entity-centric summaries possess: Entity-Specificity, Factuality, Completeness, Fluency and Quality. These qualitative metrics provide a more fine-grained interpretation of the current state-of-the-art systems.

Table 2. Human evaluation results of three types of summarization models on a subset of the ENTSUMv2 dataset (bold typeface denotes the best performance).

Dhruv: We evaluated the best performing models in each category along these metrics, which reveal some insights. For example, GSum models give more relevant and complete summaries that are less fluent, while the T5-based models provide more fluent summaries that are less complete and less factually accurate.

How does your research advance the state-of-the-art in the field of natural language processing?

Dhruv: Our research provides a new dataset which can be used to evaluate models on the generative entity-centric summarization task, as well as provides a new framework for obtaining human evaluations which captures a more holistic view of the summaries as opposed to industry standard automated metrics.

Make it happen here.

Multilingual Large Language Models Are Not (Yet) Code-Switchers

Ruochen Zhang (Brown University), Samuel Cahyawijaya (HKUST), Jan Christian Blaise Cruz (Samsung R&D Institute Philippines), Genta Indra Winata (Bloomberg) and Alham Fikri Aji (MBZUAI)

Multilinguality and Linguistic Diversity 2 (Saturday, December 9, 2023 @ 11:00 AM SGT)

Front page of EMNLP 2023 paper "Multilingual Large Language Models Are Not (Yet) Code-Switchers"

Genta: Large Language Models (LLMs) have shown their potential in the context of zero-shot and few-shot prompting. The successes of these LLMs have also been effective in multilingual settings, where models are specifically trained to learn individual languages, which has proven to be highly beneficial for monolingual tasks. However, in multilingual communities, people do not confine themselves to speaking only a single language; instead, they use two or more languages interchangeably during a conversation – a phenomenon known as code-switching. This allows individuals to communicate cultural-specific concepts more effectively, signaling their group identity and reinforcing their social connection.

The main challenge of developing multilingual LLMs optimized for code-switching lies in data scarcity. Given the highly colloquial characteristic of code-switching, existing resources dedicated to code-switching are rare, and large-scale collection requires considerable annotation efforts.

In this paper, we benchmark the ability of LLMs to understand and generate code-switching on existing code-switching datasets to gauge the limitations of LLMs on four different tasks and a variety of language pairs. Figure 1 shows the illustration of tasks included in our benchmark study.

Figure 1. Illustration of NLP tasks included in the study.

Our results suggest that the scaling law is applicable to multilingual LLMs across diverse code-switching tasks and model architectures. However, smaller-scale, fine-tuned models substantially outperform the largest multilingual LLM with prompting methods. In addition, while hosted LLMs achieve scores comparable to our fine-tuned models, such performance remains uninterpretable due to their closed natures. We argue that existing multilingual LLMs exhibit limited proficiency in code-switching contexts, highlighting future research opportunities to transform them into true polyglots.

TempTabQA: Temporal Question Answering for Semi-Structured Tables

Vivek Gupta (University of Pennsylvania), Pranshu Kandoi (IIT Guwahati), Mahek Bhavesh Vora (IIT Guwahati), Shuo Zhang (Bloomberg), Yujie He (Bloomberg), Ridho Reinanda (Bloomberg) and Vivek Srikumar (University of Utah)

Resources and Evaluation 2 (Sunday, December 10, 2023 @ 12:00 PM SGT)

Front page of EMNLP 2023 paper "TEMPTABQA: Temporal Question Answering for Semi-Structured Tables"

Shuo: Factual information pertaining to a particular entity often undergoes temporal changes, necessitating a thorough comprehension of the scope of knowledge and temporal intervals. This factual data is typically dispersed across semi-structured formats, such as tables, and includes both implicit and explicit representations (see Figure 2 for an example). The extensive presence of these characteristics presents significant challenges for NLP models, necessitating them to proficiently manage temporal changes and extract meaningful insights from time-sensitive data.

To address this issue effectively, we introduce a new task, referred to as “temporal question answering on entity-centric semi-structured tables,” demonstrated in Figure 2. Furthermore, we have curated a comprehensive, temporally-aligned dataset ( TempTabQA ), which covers a variety of domains and has undergone human verification. We conducted extensive experiments and found that temporal reasoning in TempTabQA presents a greater challenge compared to non-temporal reasoning in preceding tabular datasets.

Figure 2. A semi-structured table of women badminton players (source: Wikipedia), along with accompanying temporal questions and their respective answers from TempTabQA.

Our paper is a significant step forward because it’s the first to develop complex datasets for answering time-based questions that are specifically designed for tables focused on specific topics or entities. Our main goal was to introduce a new challenge – answering complex questions about time within this context.

The TempTabQA dataset requires not only high-level reasoning but also a solid understanding of how time works, as well as good math skills. Our work highlights how unique this dataset is because of its focus on time, making it different from existing models. We dig deep into this difference, providing a detailed set of statistics and analyses that show the many challenges of reasoning about time that the dataset presents. These findings help us better understand how to reason about time in tables and encourage more research in this area.

Semantic Similarity Covariance Matrix Shrinkage

Guillaume Becquin (Bloomberg) and Saher Esmeir (Bloomberg)

Findings of EMNLP 2023

Guillaume: When building an investment portfolio, asset managers often aim to maximize the expected returns while minimizing the expected volatility (a proxy for the portfolio level of risk). A common technique to reduce the volatility is to build a diversified portfolio – find uncorrelated assets so the volatility of the portfolio is significantly lower than its individual components. Unfortunately, estimating the degree of correlation between assets in a portfolio (covariance matrix) is very challenging since the number of random variables (components of the portfolio) is typically larger than the number of historical price observations.

Covariance shrinkage is an established regularization method in quantitative finance that regularizes the estimation of the covariance matrix. Our work extends the idea of shrinkage by making use of additional information from company fundamentals to regularize the covariance matrix. Embeddings (vector representations) of portfolio components (e.g., company stocks) can be generated using modern NLP techniques via sentence encoder or knowledge graphs. These embeddings are used to compute a similarity matrix for the portfolio assets that includes fundamental information about the assets, and are an effective regularization target for use in the well-established shrinkage framework.

Figure 3. The semantic similarity between companies is used as a target to shrink (regularize) the sample covariance matrix.

Natural language processing approaches are increasingly being adopted in the fields of finance and portfolio management. Previous work has mainly focused on improving the prediction of future returns to maximize expected profit. However, the estimation of portfolio volatility is also a critical element for finding the optimum portfolio at a given level of acceptable risk (risk-return trade-off).

Our research provides a robust framework that uses the output of NLP models to produce robust estimates of the portfolio covariance matrix by extending established methods in quantitative finance. Implemented as a simple post-processing step, it is widely applicable to any semantic model (including sentence embeddings and knowledge graph embeddings).

Natural Language Processing

Natural Language Processing (NLP) research at Google focuses on algorithms that apply at scale, across languages, and across domains. Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more.

Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems. We are particularly interested in algorithms that scale well and can be run efficiently in a highly distributed environment.

Our syntactic systems predict part-of-speech tags for each word in a given sentence, as well as morphological features such as gender and number. They also label relationships between words, such as subject, object, modification, and others. We focus on efficient algorithms that leverage large amounts of unlabeled data, and recently have incorporated neural net technology.

On the semantic side, we identify entities in free text, label them with types (such as person, location, or organization), cluster mentions of those entities within and across documents (coreference resolution), and resolve the entities to the Knowledge Graph.

Recent work has focused on incorporating multiple sources of knowledge and information to aid with analysis of text, as well as applying frame semantics at the noun phrase, sentence, and document level.

Recent Publications

Some of our teams.

Africa team

Impact-Driven Research, Innovation and Moonshots

We're always looking for more talented, passionate people.

Best Papers

ACL’23 implemented the new award policy , which aims for broader recognition of exceptional research, in particular by significantly increasing the pool of outstanding papers to 1.5-2.5% of the total submissions. So, this year we have a total of 3 best papers, 4 special awards papers (Resource Award, Social Impact Award, Reproduction Award, Theme Paper Award)—and 39 outstanding papers! Additionally, there are Area Chair Awards: the Senior Area Chairs of each track had the opportunity to nominate one of their papers for a separate award. Many thanks to our Best Paper Committee for helping us with the selection process!

This page lists all the awards and honorable mentions, as well as demo track and SRW awards. But we congratulate everybody who was considered for the award: only 1.6% papers were even nominated by the reviewers. Next year, let’s all be more generous with nominations!

Best Paper Awards

Do Androids Laugh at Electric Sheep? Humor “Understanding” Benchmarks from The New Yorker Caption Contest Jack Hessel, Ana Marasovic, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff and Yejin Choi

What the DAAM: Interpreting Stable Diffusion Using Cross Attention Raphael Tang, Linqing Liu, Akshat Pandey, Zhiying Jiang, Gefei Yang, Karun Kumar, Pontus Stenetorp, Jimmy Lin and Ferhan Ture

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models Shangbin Feng, Chan Young Park, Yuhan Liu and Yulia Tsvetkov

Special Awards

Reproduction Award :

Do CoNLL-2003 Named Entity Taggers Still Work Well in 2023? Shuheng Liu and Alan Ritter

Resource Award :

When Does Translation Require Context? A Data-driven, Multilingual Exploration Patrick Fernandes, Kayo Yin, Emmy Liu, André Martins and Graham Neubig

Social Impact Award :

Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models Myra Cheng, Esin Durmus and Dan Jurafsky

Theme Paper Award :

Weaker Than You Think: A Critical Look at Weakly Supervised Learning Dawei Zhu, Xiaoyu Shen, Marius Mosbach, Andreas Stephan and Dietrich Klakow

Outstanding Papers

Backpack Language Models John Hewitt, John Thickstun, Christopher Manning and Percy Liang
CAME: Confidence-guided Adaptive Memory Efficient Optimization Yang Luo, Xiaozhe REN, Zangwei Zheng, ZHUO JIANG, Xin Jiang and Yang You
Causes and Cures for Interference in Multilingual Translation Uri Shaham, Maha Elbayad, Vedanuj Goswami, Omer Levy and Shruti Bhosale
Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction Ashish Sharma, Kevin Rushton, Inna Lin, David Wadden, Khendra Lucas, Adam Miner, Theresa Nguyen and Tim Althoff
Compositional Generalization without Trees using Multiset Tagging and Latent Permutations Matthias Lindemann, Alexander Koller and Ivan Titov
Considerations for meaningful sign language machine translation based on glosses Mathias Müller, Zifan Jiang, Amit Moryossef, Annette Rios and Sarah Ebling
Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths Xiangqing Shen, Siwei Wu and Rui Xia
Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis Ta-Chung Chi, Ting-Han Fan, alexander rudnicky and Peter Ramadge
Distilling Script Knowledge from Large Language Models for Constrained Language Planning Siyu Yuan, Jiangjie Chen, Ziquan Fu, Xuyang Ge, Soham Shah, Charles Jankowski, Yanghua Xiao and Deqing Yang
Do PLMs Know and Understand Ontological Knowledge? Weiqi Wu, Chengyue Jiang, Yong Jiang, Pengjun Xie and Kewei Tu
Don’t Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments Yu Gu, Xiang Deng and Yu Su
Extrinsic Evaluation of Machine Translation Metrics Nikita Moghe, Tom Sherborne, Mark Steedman and Alexandra Birch
Faithful Low-Resource Data-to-Text Generation through Cycle Training Zhuoer Wang, Marcus Collins, Nikhita Vedula, Simone Filice, Shervin Malmasi and Oleg Rokhlenko
Generalizing Backpropagation for Gradient-Based Interpretability Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alex Warstadt and Ryan Cotterell
Hexatagging: Projective Dependency Parsing as Tagging Afra Amini, Tianyu Liu and Ryan Cotterell
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks Yun Tang, Anna Sun, Hirofumi Inaguma, Xinyue Chen, Ning Dong, Xutai Ma, Paden Tomasello and Juan Pino
Improving Pretraining Techniques for Code-Switched NLP Richeek Das, Sahasra Ranjan, Shreya Pathak and Preethi Jyothi
Knowledge Transfer in Incremental Learning for Multilingual Neural Machine Translation Kaiyu Huang, Peng Li, Jin Ma, Ting Yao and Yang Liu
Language model acceptability judgements are not always robust to context Koustuv Sinha, Jon Gauthier, Aaron Mueller, Kanishka Misra, Keren Fuentes, Roger Levy and Adina Williams
Linear Classifier: An Often-Forgotten Baseline for Text Classification Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu and Chih-Jen Lin
Minding Language Models’ (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi and Yulia Tsvetkov
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning Zhiyang Xu, Ying Shen and Lifu Huang
Multilingual LLMs are Better Cross-lingual In-context Learners with Alignment Eshaan Tanwar, Subhabrata Dutta, Manish Borthakur and Tanmoy Chakraborty
Neural Machine Translation Methods for Translating Text to Sign Language Glosses Dele Zhu, Vera Czehmann and Eleftherios Avramidis
NLPositionality: Characterizing Design Biases of Datasets and Models Sebastin Santy, Jenny Liang, Ronan Le Bras, Katharina Reinecke and Maarten Sap
PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives Silin Gao, Beatriz Borges, Soyoung Oh, Deniz Bayazit, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji and Antoine Bosselut
QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations Chaitanya Malaviya, Peter Shaw, Ming-Wei Chang, Kenton Lee and Kristina Toutanova
Question-Answering in a Low-resourced Language: Benchmark Dataset and Models for Tigrinya Fitsum Gaim, Wonsuk Yang, Hancheol Park and Jong Park
Scaling in Cognitive Modelling: a Multilingual Approach to Human Reading Times Andrea Gregor de Varda and Marco Marelli
SCOTT: Self-Consistent Chain-of-Thought Distillation Peifeng Wang, Zhengyang Wang, Zheng Li, Yifan Gao, Bing Yin and Xiang Ren
The Mechanical Bard: An Interpretable Machine Learning Approach to Shakespearean Sonnet Generation Edwin Agnew, Michelle Qiu, Lily Zhu, Sam Wiseman and Cynthia Rudin
The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks Nikil Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot and Kai-Wei Chang
Towards Zero-Shot Multilingual Transfer for Code-Switched Responses Ting-Wei Wu, Changsheng Zhao, Ernie Chang, Yangyang Shi, Pierce Chuang, Vikas Chandra and Biing Juang
Transfer and Active Learning for Dissonance Detection: Addressing the Rare-Class Challenge Vasudha Varadarajan, Swanie Juhng, Syeda Mahwish, Xiaoran Liu, Jonah Luby, Christian Luhmann and H. Andrew Schwartz
VisText: A Benchmark for Semantically Rich Chart Captioning Benny Tang, Angie Boggust and Arvind Satyanarayan
What’s the Meaning of Superhuman Performance in Today’s NLU? Simone Tedeschi, Johan Bos, Thierry Declerck, Jan Hajič, Daniel Hershcovich, Eduard Hovy, Alexander Koller, Simon Krek, Steven Schockaert, Rico Sennrich, Ekaterina Shutova and Roberto Navigli
WikiBio: a Semantic Resource for the Intersectional Analysis of Biographical Events Marco Antonio Stranisci, Rossana Damiano, Enrico Mensa, Viviana Patti, Daniele Radicioni and Tommaso Caselli
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models Ziqiao Ma, Jiayi Pan and Joyce Chai

Area Chair Awards

Linguistic Diversity :

Small Data, Big Impact: Leveraging Minimal Data for Effective Machine Translation Jean Maillard, Cynthia Gao, Elahe Kalbassi, Kaushik Ram Sadagopan, Vedanuj Goswami, Philipp Koehn, Angela Fan and Francisco Guzman

Sentiment Analysis, Stylistic Analysis, and Argument Mining :

StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse Representations and Content Enhancing Xuekai Zhu, Jian Guan, Minlie Huang and Juan Liu

Discourse and Pragmatics :

Resolving Indirect Referring Expressions for Entity Selection Mohammad Javad Hosseini, Filip Radlinski, Silvia Pareti and Annie Louis

Semantics: Sentence-level Semantics, Textual Inference, and Other Areas :

ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation Kuan-Hao Huang, Varun Iyer, I-Hung Hsu, Anoop Kumar, Kai-Wei Chang and Aram Galstyan

Question Answering :

DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering Ella Neeman, Roee Aharoni, Or Honovich, Leshem Choshen, Idan Szpektor and Omri Abend

Semantics: Lexical :

LexSym: Compositionality as Lexical Symmetry Ekin Akyurek and Jacob Andreas

NLP Applications :

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark Wenjun Peng, Jingwei Yi, Fangzhao Wu, Shangxi Wu, Bin Bin Zhu, Lingjuan Lyu, Binxing Jiao, Tong Xu, Guangzhong Sun and Xing Xie

Speech and Multimodality :

Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition Yuchen Hu, Ruizhe Li, Chen Chen, Chengwei Qin, Qiu-Shi Zhu and Eng Siong Chng

Interpretability and Analysis of Models for NLP :

Entity Tracking in Language Models Najoung Kim and Sebastian Schuster

Linguistic Theories, Cognitive Modeling, and Psycholinguistics :

Exploring How Generative Adversarial Networks Learn Phonological Representations Jingyi Chen and Micha Elsner

Resources and Evaluation :

Tell2Design: A Dataset for Language-Guided Floor Plan Generation Sicong Leng, Yang Zhou, Mohammed Haroon Dupty, Wee Sun Lee, Sam Joyce and Wei Lu

Multilingualism and Cross-Lingual NLP :

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages Ayyoob ImaniGooghari, Peiqin Lin, Amir Hossein Kargaran, Silvia Severini, Masoud Jalili Sabet, Nora Kassner, Chunlan Ma, Helmut Schmid, André Martins, François Yvon and Hinrich Schütze

Demo Track Awards

Best Paper Award : VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering Zijun Yao, YUANYONG CHEN, Xin Lv, Shulin Cao, Amy Xin, Jifan Yu, Hailong Jin, jianjun xu, Peng Zhang, Lei Hou and Juanzi Li
Outstanding demo paper : CB2: Collaborative Natural Language Interaction Research Platform Jacob Sharf, Mustafa Omer Gul and Yoav Artzi
Outstanding demo paper : disco: a toolkit for Distributional Control of Generative Models Germán Kruszewski, Jos Rozen and Marc Dymetman

Student Research Workshop Awards

Assessing Chain-of-Thought Reasoning against Lexical Negation: A Case Study on Syllogism Mengyu Ye, Tatsuki Kuribayashi, Jun Suzuki, Hiroaki Funayama, Goro Kobayashi
Is a Knowledge-based Response Engaging?: An Analysis on Knowledge-Grounded Dialogue with Information Source Annotation Takashi Kodama, Hirokazu Kiyomaru, Yin Jou Huang, Taro Okahisa, Sadao Kurohashi
LECO: Improving Early Exiting via Learned Exits and Comparison-based Exiting Mechanism Jingfan Zhang, Ming Tan, Pengyu Dai, Wei Zhu
How-to Guides for Specific Audiences: A Corpus and Initial Findings Nicola Fanton, Agnieszka Falenska, Michael Roth

Honorable Mentions

ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free Language Models Jonas Belouadi and Steffen Eger
DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover and Duen Horng Chau
DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille and Pierre-Antoine Gourraud
Forgotten Knowledge: Examining the Citational Amnesia in NLP Janvijay Singh, Mukund Rungta, Diyi Yang and Saif Mohammad
From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding Li Sun, Florian Luisier, Kayhan Batmanghelich, Dinei Florencio and Cha Zhang
GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding Jia-Chen Gu, Zhenhua Ling, Quan Liu, Cong Liu and Guoping Hu
Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition Yuwei Bao, Barrett Lattimer and Joyce Chai
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings Ta-Chung Chi, Ting-Han Fan, Li-Wei Chen, alexander rudnicky and Peter Ramadge
Revisiting non-English Text Simplification: A Unified Multilingual Benchmark Michael Ryan, Tarek Naous and Wei Xu
Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe Xiang Yue, Huseyin Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan and Robert Sim
Theory-Grounded Computational Text Analysis Arya D. McCarthy and Giovanna Maria Dora Dore
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer and Huan Sun
UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language Nuwa Xi, Sendong Zhao, Haochun Wang, Chi Liu, Bing Qin and Ting Liu

Best Video Recordings

You can watch my seven minute highlights of the best video recordings via the video below. After the highlights, all of the complete video recordings are viewable.

Main Conference

Long papers.

IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions Zhebin Zhang, Xinyu Zhang, Yuanhang Ren, Saijiang Shi, Meng Han, Yongkang Wu, Ruofei Lai, Zhao Cao

Absolute Position Embedding Learns Sinusoid-like Waves for Attention Based on Relative Position Yuji Yamamoto, Takuya Matsuzaki

Chinese Lexical Substitution: Dataset and Method Jipeng Qiang, Kang Liu, Ying Li, Yun Li, Yi Zhu, Yun-Hao Yuan, Xiaocheng Hu, Xiaoye Ouyang

Decoding the Silent Majority: Inducing Belief Augmented Social Graph with Large Language Model for Response Forecasting Chenkai Sun, Jinning Li, Yi Fung, Hou Chan, Tarek Abdelzaher, ChengXiang Zhai, Heng Ji

Holistic Inter-Annotator Agreement and Corpus Coherence Estimation in a Large-scale Multilingual Annotation Campaign Nicolas Stefanovitch, Jakub Piskorski

PHD: Pixel-Based Language Modeling of Historical Documents Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein

Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension Akira Kawabata, Saku Sugawara

Evaluating and Modeling Attribution for Cross-Lingual Question Answering Benjamin Muller, John Wieting, Jonathan Clark, Tom Kwiatkowski, Sebastian Ruder, Livio Soares, Roee Aharoni, Jonathan Herzig, Xinyi Wang

Sparse Universal Transformer Shawn Tan, Yikang Shen, Zhenfang Chen, Aaron Courville, Chuang Gan

Theory of Mind for Multi-Agent Collaboration via Large Language Models Huao Li, Yu Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Charles Lewis, Katia Sycara

Let’s Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought Vaishnavi Himakunthala, Andy Ouyang, Daniel Rose, Ryan He, Alex Mei, Yujie Lu, Chinmay Sonar, Michael Saxon, William Wang

GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP Md Tawkat Islam Khondaker, Abdul Waheed, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Dual-Channel Span for Aspect Sentiment Triplet Extraction Pan Li, Ping Li, Kai Zhang

Cultural Concept Adaptation on Multimodal Reasoning Zhi Li, Yin Zhang

Understanding Compositional Data Augmentation in Typologically Diverse Morphological Inflection Farhan Samir, Miikka Silfverberg

Evaluating Object Hallucination in Large Vision-Language Models Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, Ji-Rong Wen

Event Ontology Completion with Hierarchical Structure Evolution Networks Pengfei Cao, Yupu Hao, Yubo Chen, Kang Liu, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Jun Zhao

Parameter-efficient Tuning for Large Language Model without Calculating Its Gradients Feihu Jin, Jiajun Zhang, Chengqing Zong

Discourse Structures Guided Fine-grained Propaganda Identification Yuanyuan Lei, Ruihong Huang

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models Benjamin Minixhofer, Jonas Pfeiffer, Ivan Vulić

Improving Image Captioning via Predicting Structured Concepts Ting Wang, Weidong Chen, Yuanhe Tian, Yan Song, Zhendong Mao

GATITOS: Using a New Multilingual Lexicon for Low-resource Machine Translation Alexander Jones, Isaac Caswell, Orhan Firat, Ishank Saxena

Continually Improving Extractive QA via Human Feedback Ge Gao, Hung-Ting Chen, Yoav Artzi, Eunsol Choi

Using Interpretation Methods for Model Enhancement Zhuo Chen, Chengyue Jiang, Kewei Tu

An Expression Tree Decoding Strategy for Mathematical Equation Generation Wenqi Zhang, Yongliang Shen, Qingpeng Nong, Zeqi Tan, Yanna Ma, Weiming Lu

Diversity Enhanced Narrative Question Generation for Storybooks Hokeun Yoon, JinYeong Bak

Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification Chengyu Dong, Zihan Wang, Jingbo Shang

How to Enhance Causal Discrimination of Utterances: A Case on Affective Reasoning Hang Chen, Xinyu Yang, Jing Luo, Wenjing Zhu

Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Question Answering Qingyi Si, Yuanxin Liu, Zheng Lin, Peng Fu, Yanan Cao, Weiping Wang

Selectively Answering Ambiguous Questions Jeremy Cole, Michael Zhang, Daniel Gillick, Julian Eisenschlos, Bhuwan Dhingra, Jacob Eisenstein

Temporal Knowledge Graph Forecasting Without Knowledge Using In-Context Learning Dong-Ho Lee, Kian Ahrabian, Woojeong Jin, Fred Morstatter, Jay Pujara

Knowledge Graph Compression Enhances Diverse Commonsense Generation EunJeong Hwang, Veronika Thost, Vered Shwartz, Tengfei Ma

Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models Yiyuan Li, Rakesh Menon, Sayan Ghosh, Shashank Srivastava

LLM-FP4: 4-Bit Floating-Point Quantized Transformers Shih-yang Liu, Zechun Liu, Xijie Huang, Pingcheng Dong, Kwang-Ting Cheng

Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers Chen Tang, Shun Wang, Tomas Goldsack, Chenghua Lin

Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting Xi Ye, Greg Durrett

HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation David Dale, Elena Voita, Janice Lam, Prangthip Hansanti, Christophe Ropers, Elahe Kalbassi, Cynthia Gao, Loic Barrault, Marta Costa-jussà

Gradient-based Gradual Pruning for Language-Specific Multilingual Neural Machine Translation Dan He, Minh-Quang Pham, Thanh-Le Ha, Marco Turchi

LLM-powered Data Augmentation for Enhanced Cross-lingual Performance Chenxi Whitehouse, Monojit Choudhury, Alham Aji

Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition Chenxu Wang, Ping Jian, Mu Huang

VLIS: Unimodal Language Models Guide Multimodal Language Generation Jiwan Chung, Youngjae Yu

Conceptual structure coheres in human cognition but not in large language models Siddharth Suresh, Kushin Mukherjee, Xizheng Yu, Wei-Chun Huang, Lisa Padua, Timothy Rogers

Towards LLM-driven Dialogue State Tracking Yujie Feng, Zexin Lu, Bo Liu, Liming Zhan, Xiao-Ming Wu

Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis Haoyu Zhang, Yu Wang, Guanghao Yin, Kejun Liu, Yuanyuan Liu, Tianshu Yu

Multitask Multimodal Prompted Training for Interactive Embodied Task Completion Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia

We’re Afraid Language Models Aren’t Modeling Ambiguity Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah Smith, Yejin Choi

Linear-Time Modeling of Linguistic Structure: An Order-Theoretic Perspective Tianyu Liu, Afra Amini, Mrinmaya Sachan, Ryan Cotterell

GEMINI: Controlling The Sentence-Level Summary Style in Abstractive Text Summarization Guangsheng Bao, Zebin Ou, Yue Zhang

Analyzing Norm Violations in Live-Stream Chat Jihyung Moon, Dong-Ho Lee, Hyundong Cho, Woojeong Jin, Chan Park, Minwoo Kim, Jonathan May, Jay Pujara, Sungjoon Park

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality Harman Singh, Pengchuan Zhang, Qifan Wang, Mengjiao Wang, Wenhan Xiong, Jingfei Du, Yu Chen

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, Luoyi Fu

FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge Shangbin Feng, Vidhisha Balachandran, Yuyang Bai, Yulia Tsvetkov

Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation Xuanli He, Qiongkai Xu, Jun Wang, Benjamin Rubinstein, Trevor Cohn

Symbol tuning improves in-context learning in language models Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc Le

The neural dynamics of word recognition and integration Jon Gauthier, Roger Levy

Incorporating Worker Perspectives into MTurk Annotation Practices for NLP Olivia Huang, Eve Fleisig, Dan Klein

Predict the Future from the Past? On the Temporal Data Distribution Shift in Financial Sentiment Classifications Yue Guo, Chenxi Hu, Yi Yang

Look-back Decoding for Open-Ended Text Generation Nan Xu, Chunting Zhou, Asli Celikyilmaz, Xuezhe Ma

Large Language Models Can Self-Improve Jiaxin Huang, Shixiang Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han

CodeT5+: Open Code Large Language Models for Code Understanding and Generation Yue Wang, Hung Le, Akhilesh Gotmare, Nghi Bui, Junnan Li, Steven Hoi

Structural generalization in COGS: Supertagging is (almost) all you need Alban Petit, Caio Corro, François Yvon

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, Rui Yan

SeqXGPT: Sentence-Level AI-Generated Text Detection Pengyu Wang, Linyang Li, Ke Ren, Botian Jiang, Dong Zhang, Xipeng Qiu

QTSumm: Query-Focused Summarization over Tabular Data Yilun Zhao, Zhenting Qi, Linyong Nan, Boyu Mi, Yixin Liu, Weijin Zou, Simeng Han, Ruizhe Chen, Xiangru Tang, Yumo Xu, Dragomir Radev, Arman Cohan

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, Boyi Li

`Don’t Get Too Technical with Me’: A Discourse Structure-Based Framework for Automatic Science Journalism Ronald Cardenas, Bingsheng Yao, Dakuo Wang, Yufang Hou

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following Cheng-Fu Yang, Yen-Chun Chen, Jianwei Yang, Xiyang Dai, Lu Yuan, Yu-Chiang Wang, Kai-Wei Chang

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models Jianwei Li, Qi Lei, Wei Cheng, Dongkuan Xu

Clinical Contradiction Detection Dave Makhervaks, Plia Gillis, Kira Radinsky

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements Jiacheng Liu, Wenya Wang, Dianzhuo Wang, Noah Smith, Yejin Choi, Hannaneh Hajishirzi

Text-Transport: Toward Learning Causal Effects of Natural Language Victoria Lin, Louis-Philippe Morency, Eli Ben-Michael

How Does Generative Retrieval Scale to Millions of Passages? Ronak Pradeep, Kai Hui, Jai Gupta, Adam Lelkes, Honglei Zhuang, Jimmy Lin, Donald Metzler, Vinh Tran

Unveiling the Implicit Toxicity in Large Language Models Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Chengfei Li, Jinfeng Bai, Minlie Huang

Is ChatGPT a General-Purpose Natural Language Processing Task Solver? Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang

Length is a Curse and a Blessing for Document-level Semantics Chenghao Xiao, Yizhi Li, G Hudson, Chenghua Lin, Noura Al Moubayed

ALCUNA: Large Language Models Meet New Knowledge Xunjian Yin, Baizhou Huang, Xiaojun Wan

Location-Aware Visual Question Generation with Lightweight Models Nicholas Suwono, Justin Chen, Tun Hung, Ting-Hao Huang, I-Bin Liao, Yung-Hui Li, Lun-Wei Ku, Shao-Hua Sun

MemeCap: A Dataset for Captioning and Interpreting Memes EunJeong Hwang, Vered Shwartz

Where to start? Analyzing the potential value of intermediate models Leshem Choshen, Elad Venezian, Shachar Don-Yehiya, Noam Slonim, Yoav Katz

Transcending Scaling Laws with 0.1% Extra Compute Yi Tay, Jason Wei, Hyung Chung, Vinh Tran, David So, Siamak Shakeri, Xavier Garcia, Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc Le, Mostafa Dehghani

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation Minzhi Li, Taiwei Shi, Caleb Ziems, Min-Yen Kan, Nancy Chen, Zhengyuan Liu, Diyi Yang

Optimizing Retrieval-augmented Reader Models via Token Elimination Moshe Berchansky, Peter Izsak, Avi Caciularu, Ido Dagan, Moshe Wasserblat

WSDMS: Debunk Fake News via Weakly Supervised Detection of Misinforming Sentences with Contextualized Social Wisdom Ruichao Yang, Wei Gao, Jing Ma, Hongzhan Lin, Zhiwei Yang

Robust Prompt Optimization for Large Language Models Against Distribution Shifts Moxin Li, Wenjie Wang, Fuli Feng, Yixin Cao, Jizhi Zhang, Tat-Seng Chua

Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction Martin Josifoski, Marija Sakota, Maxime Peyrard, Robert West

Condensing Multilingual Knowledge with Lightweight Language-Specific Modules Haoran Xu, Weiting Tan, Shuyue Li, Yunmo Chen, Benjamin Van Durme, Philipp Koehn, Kenton Murray

The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment Jared Fernandez, Jacob Kahn, Clara Na, Yonatan Bisk, Emma Strubell

Evaluating Cross-Domain Text-to-SQL Models and Benchmarks Mohammadreza Pourreza, Davood Rafiei

Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs Simone Conia, Min Li, Daniel Lee, Umar Minhas, Ihab Ilyas, Yunyao Li

Memory-Based Invariance Learning for Out-of-Domain Text Classification Chen Jia, Yue Zhang

Outlier Suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling Xiuying Wei, Yunchen Zhang, Yuhang Li, Xiangguo Zhang, Ruihao Gong, Jinyang Guo, Xianglong Liu

Three Stream Based Multi-level Event Contrastive Learning for Text-Video Event Extraction Jiaqi Li, Chuanyi Zhang, Miaozeng Du, Dehai Min, Yongrui Chen, Guilin Qi

Diversify Question Generation with Retrieval-Augmented Style Transfer Qi Gou, Zehua Xia, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li, Nguyen Cam-Tu

Fast and Accurate Factual Inconsistency Detection Over Long Documents Barrett Lattimer, Patrick CHen, Xinyuan Zhang, Yi Yang

Interpreting Embedding Spaces by Conceptualization Adi Simhi, Shaul Markovitch

Knowledge-Augmented Language Model Verification Jinheon Baek, Soyeong Jeong, Minki Kang, Jong Park, Sung Hwang

A Generation-based Deductive Method for Math Word Problems Yuxuan Hu, Jing Zhang, Haoyang Li, Cuiping Li, Hong Chen

Failures Pave the Way: Enhancing Large Language Models through Tuning-free Rule Accumulation Zeyuan Yang, Peng Li, Yang Liu

Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning Ryan Shea, Zhou Yu

Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han, Paul Bennett

Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks Po-Nien Kung, Fan Yin, Di Wu, Kai-Wei Chang, Nanyun Peng

Towards Example-Based NMT with Multi-Levenshtein Transformers Maxime Bouthors, Josep Crego, François Yvon

DUnE: Dataset for Unified Editing Afra Akyürek, Eric Pan, Garry Kuwanto, Derry Wijaya

``Fifty Shades of Bias’’: Normative Ratings of Gender Bias in GPT Generated English Text Rishav Hada, Agrima Seth, Harshita Diddee, Kalika Bali

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval Peitian Zhang, Zheng Liu, Shitao Xiao, Zhicheng Dou, Jing Yao

ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness Jan Cegin, Jakub Simko, Peter Brusilovsky

Query-as-context Pre-training for Dense Passage Retrieval Xing W, Guangyuan Ma, Wanhui Qian, Zijia Lin, Songlin Hu

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan Plummer, Kate Saenko, Jianmo Ni, Mandy Guo

Democratizing Reasoning Ability: Tailored Learning from Large Language Model Zhaoyang Wang, Shaohan Huang, Yuxuan Liu, Jiahai Wang, Minghui Song, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang

OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization Shmuel Amar, Liat Schiff, Ori Ernst, Asi Shefer, Ori Shapira, Ido Dagan

Byte Pair Encoding for Symbolic Music Nathan Fradet, Nicolas Gutowski, Fabien Chhel, Jean-Pierre Briot

Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models Alejo Lopez-Avila, Víctor Suárez-Paniagua

Self-Influence Guided Data Reweighting for Language Model Pre-training Megh Thakkar, Tolga Bolukbasi, Sriram Ganapathy, Shikhar Vashishth, Sarath Chandar, Partha Talukdar

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models Zorik Gekhman, Jonathan Herzig, Roee Aharoni, Chen Elkind, Idan Szpektor

Tagging-Assisted Generation Model with Encoder and Decoder Supervision for Aspect Sentiment Triplet Extraction Luo Xianlong, Meng Yang, Yihao Wang

Norm of Word Embedding Encodes Information Gain Momose Oyama, Sho Yokoi, Hidetoshi Shimodaira

CRT-QA: A Dataset of Complex Reasoning Question Answering over Tabular Data Zhehao Zhang, Xitao Li, Yan Gao, Jian-Guang Lou

Promoting Topic Coherence and Inter-Document Consorts in Multi-Document Summarization via Simplicial Complex and Sheaf Graph Yash Atri, Arun Iyer, Tanmoy Chakraborty, Vikram Goyal

MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations Arkil Patel, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau

Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency Eric Zelikman, Wanjing Ma, Jasmine Tran, Diyi Yang, Jason Yeatman, Nick Haber

Counter Turing Test (CT2): AI-Generated Text Detection is Not as Easy as You May Think - Introducing AI Detectability Index (ADI) Megha Chakraborty, S.M Towhidul Islam Tonmoy, S M Mehedi Zaman, Shreya Gautam, Tanay Kumar, Krish Sharma, Niyar Barman, Chandan Gupta, Vinija Jain, Aman Chadha, Amit Sheth, Amitava Das

Revisiting the Optimality of Word Lengths Tiago Pimentel, Clara Meister, Ethan Wilcox, Kyle Mahowald, Ryan Cotterell

Document-level Relationship Extraction by Bidirectional Constraints of Beta Rules Yichun Liu, Zizhong Zhu, Xiaowang Zhang, Zhiyong Feng, Daoqi Chen, Yaxin Li

Instructed Language Models with Retrievers Are Powerful Entity Linkers Zilin Xiao, Ming Gong, Jie Wu, Xingyao Zhang, Linjun Shou, Daxin Jiang

Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text Xiang Li, Jinglu Wang, Xiaohao Xu, Muqiao Yang, Fan Yang, Yizhou Zhao, Rita Singh, Bhiksha Raj

PROSE: A Pronoun Omission Solution for Chinese-English Spoken Language Translation Ke Wang, Xiutian Zhao, Yanghui Li, Wei Peng

A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why? Aniket Pramanick, Yufang Hou, Saif Mohammad, Iryna Gurevych

Does the Correctness of Factual Knowledge Matter for Factual Knowledge-Enhanced Pre-trained Language Models? Boxi Cao, Qiaoyu Tang, Hongyu Lin, Xianpei Han, Le Sun

Syntactic Substitutability as Unsupervised Dependency Syntax Jasper Jian, Siva Reddy

MProto: Multi-Prototype Network with Denoised Optimal Transport for Distantly Supervised Named Entity Recognition Shuhui Wu, Yongliang Shen, Zeqi Tan, Wenqi Ren, Jietian Guo, Shiliang Pu, Weiming Lu

The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions Siru Ouyang, Shuohang Wang, Yang Liu, Ming Zhong, Yizhu Jiao, Dan Iter, Reid Pryzant, Chenguang Zhu, Heng Ji, Jiawei Han

Learning the Visualness of Text Using Large Vision-Language Models Gaurav Verma, Ryan Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values Hannah Kirk, Andrew Bean, Bertie Vidgen, Paul Rottger, Scott Hale

TempTabQA: Temporal Question Answering for Semi-Structured Tables Vivek Gupta, Pranshu Kandoi, Mahek Vora, Shuo Zhang, Yujie He, Ridho Reinanda, Vivek Srikumar

Task-Level Thinking Steps Help Large Language Models for Challenging Classification Task Chunhui Du, Jidong Tian, Haoran Liao, Jindou Chen, Hao He, Yaohui Jin

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, Weizhu Chen

Influence Scores at Scale for Efficient Language Data Sampling Nikhil Anand, Joshua Tan, Maria Minakova

G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, Chenguang Zhu

Learning Retrieval Augmentation for Personalized Dialogue Generation Qiushi Huang, Shuai Fu, Xubo Liu, Wenwu Wang, Tom Ko, Yu Zhang, Lilian Tang

The Troubling Emergence of Hallucination in Large Language Models - An Extensive Definition, Quantification, and Prescriptive Remediations Vipula Rawte, Swagata Chakraborty, Agnibh Pathak, Anubhav Sarkar, S.M Towhidul Islam Tonmoy, Aman Chadha, Amit Sheth, Amitava Das

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders Livio Soares, Daniel Gillick, Jeremy Cole, Tom Kwiatkowski

Analyzing Modular Approaches for Visual Question Decomposition Apoorv Khandelwal, Ellie Pavlick, Chen Sun

Improving Summarization with Human Edits Zonghai Yao, Benjamin Schloss, Sai Selvaraj

The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages Chiyu Zhang, Khai Doan, Qisheng Liao, Muhammad Abdul-Mageed

BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology Odhran O’Donoghue, Aleksandar Shtedritski, John Ginger, Ralph Abboud, Ali Ghareeb, Samuel Rodriques

Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages Libo Qin, Qiguang Chen, Fuxuan Wei, Shijue Huang, Wanxiang Che

FinGPT: Large Generative Models for a Small Language Risto Luukkonen, Ville Komulainen, Jouni Luoma, Anni Eskelinen, Jenna Kanerva, Hanna-Mari Kupari, Filip Ginter, Veronika Laippala, Niklas Muennighoff, Aleksandra Piktus, Thomas Wang, Nouamane Tazi, Teven Scao, Thomas Wolf, Osma Suominen, Samuli Sairanen, Mikko Merioksa, Jyrki Heinonen, Aija Vahtola, Samuel Antao, Sampo Pyysalo

Boosting Summarization with Normalizing Flows and Aggressive Training Yu Yang, Xiaotong Shen

Indicative Summarization of Long Discussions Shahbaz Syed, Dominik Schwabe, Khalid Khatib, Martin Potthast

A Framework for Vision-Language Warm-up Tasks in Multimodal Dialogue Models Jaewook Lee, Seongsik Park, Seong-Heum Park, Hongjin Kim, Harksoo Kim

Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts Tengxiao Liu, Qipeng Guo, Yuqing Yang, Xiangkun Hu, Yue Zhang, Xipeng Qiu, Zheng Zhang

GLEN: General-Purpose Event Detection for Thousands of Types Sha Li, Qiusi Zhan, Kathryn Conger, Martha Palmer, Heng Ji, Jiawei Han

Hierarchical Pretraining on Multimodal Electronic Health Records Xiaochen Wang, Junyu Luo, Jiaqi Wang, Ziyi Yin, Suhan Cui, Yuan Zhong, Yaqing Wang, Fenglong Ma

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation Wenyu Guo, Qingkai Fang, Dong Yu, Yang Feng

DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models Xinwei Wu, Junzhuo Li, Minghui Xu, Weilong Dong, Shuangzhi Wu, Chao Bian, Deyi Xiong

Can Language Models Laugh at YouTube Short-form Videos? Dayoon Ko, Sangho Lee, Gunhee Kim

Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation Jiaang Li, Quan Wang, Yi Liu, Licheng Zhang, Zhendong Mao

Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation Zhongjian Miao, Wen Zhang, Jinsong Su, Xiang Li, Jian Luan, Yidong Chen, Bin Wang, Min Zhang

HistAlign: Improving Context Dependency in Language Generation by Aligning with History David Wan, Shiyue Zhang, Mohit Bansal

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models Aitor Ormazabal, Mikel Artetxe, Eneko Agirre

Image Manipulation via Multi-Hop Instructions - A New Dataset and Weakly-Supervised Neuro-Symbolic Approach Harman Singh, Poorva Garg, Mohit Gupta, Kevin Shah, Ashish Goswami, Satyam Modi, Arnab Mondal, Dinesh Khandelwal, Dinesh Garg, Parag Singla

Generative Spoken Language Model based on continuous word-sized audio tokens Robin Algayres, Yossi Adi, Tu Nguyen, Jade Copet, Gabriel Synnaeve, Benoît Sagot, Emmanuel Dupoux

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Shengding Hu, Zhiyuan Liu, Maosong Sun, Bowen Zhou

Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining Emanuele Bugliarello, Aida Nematzadeh, Lisa Hendricks

Unsupervised Grammatical Error Correction Rivaling Supervised Methods Hannan Cao, Liping Yuan, Yuchen Zhang, Hwee Ng

S2abEL: A Dataset for Entity Linking from Scientific Tables Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik, Doug Downey

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Minghao Li, Yingxiu Zhao, Bowen Yu, Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei Huang, Yongbin Li

Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers Daniela Teodorescu, Tiffany Cheng, Alona Fyshe, Saif Mohammad

Lion: Adversarial Distillation of Proprietary Large Language Models Yuxin Jiang, Chunkit Chan, Mingyang Chen, Wei Wang

Evaluating Large Language Models on Controlled Generation Tasks Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Wieting, Nanyun Peng, Xuezhe Ma

DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding Xiao-Yu Guo, Yuan-Fang Li, Reza Haf

Why LLMs Hallucinate, and How to Get (Evidential) Closure: Perceptual, Intensional, and Extensional Learning for Faithful Natural Language Generation Adam Bouyamourn

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, Kyle Lo

SLOG: A Structural Generalization Benchmark for Semantic Parsing Bingzhi Li, Lucia Donatelli, Alexander Koller, Tal Linzen, Yuekun Yao, Najoung Kim

Pushdown Layers: Encoding Recursive Structure in Transformer Language Models Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher Manning

Can LLMs Facilitate Interpretation of Pre-trained Language Models? Basel Mousi, Nadir Durrani, Fahim Dalvi

Enhancing Low-resource Fine-grained Named Entity Recognition by Leveraging Coarse-grained Datasets Su Lee, Seokjin Oh, Woohwan Jung

Non-Autoregressive Math Word Problem Solver with Unified Tree Structure Yi Bin, Mengqun Han, Wenhao Shi, Lei Wang, Yang Yang, See-Kiong Ng, Heng Shen

Improving Chinese Pop Song and Hokkien Gezi Opera Singing Voice Synthesis by Enhancing Local Modeling Peng Bai, Yue Zhou, Meizhen Zheng, Wujin Sun, Xiaodong Shi

What Else Do I Need to Know? The Effect of Background Information on Users’ Reliance on QA Systems Navita Goyal, Eleftheria Briakou, Amanda Liu, Connor Baumler, Claire Bonial, Jeffrey Micher, Clare Voss, Marine Carpuat, Hal Daumé III

VIBE: Topic-Driven Temporal Adaptation for Twitter Classification Yuji Zhang, Jing Li, Wenjie Li

TOD-Flow: Modeling the Structure of Task-Oriented Dialogues Sungryull Sohn, Yiwei Lyu, Anthony Liu, Lajanugen Logeswaran, Dong-Ki Kim, Dongsub Shim, Honglak Lee

TopWORDS-Poetry: Simultaneous Text Segmentation and Word Discovery for Classical Chinese Poetry via Bayesian Inference Changzai Pan, Feiyue Li, Ke Deng

Knowledge Rumination for Pre-trained Language Models Yunzhi Yao, Peng Wang, Shengyu Mao, Chuanqi Tan, Fei Huang, Huajun Chen, Ningyu Zhang

Struct-XLM: A Structure Discovery Multilingual Language Model for Enhancing Cross-lingual Transfer through Reinforcement Learning Linjuan Wu, Weiming Lu

AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification Yongxin Huang, Kexin Wang, Sourav Dutta, Raj Patel, Goran Glavaš, Iryna Gurevych

Interview Evaluation: A Novel Approach for Automatic Evaluation of Conversational Question Answering Models Xibo Li, Bowei Zou, Yifan Fan, Yanling Li, Ai Ti Aw, Yu Hong

TCFLE-8: a Corpus of Learner Written Productions for French as a Foreign Language and its Application to Automated Essay Scoring Rodrigo Wilkens, Alice Pintard, David Alfter, Vincent Folny, Thomas François

Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA David Heineman, Yao Dou, Mounica Maddela, Wei Xu

Confidence-based Ensembling of Perspective-aware Models Silvia Casola, Soda Lo, Valerio Basile, Simona Frenda, Alessandra Cignarella, Viviana Patti, Cristina Bosco

ToViLaG: Your Visual-Language Generative Model is Also An Evildoer Xinpeng Wang, Xiaoyuan Yi, Han Jiang, Shanlin Zhou, Zhihua Wei, Xing Xie

GPT-RE: In-context Learning for Relation Extraction using Large Language Models Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi

Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment Sky CH-Wang, Arkadiy Saakyan, Oliver Li, Zhou Yu, Smaranda Muresan

INFORM : Information eNtropy based multi-step reasoning FOR large language Models Chuyue Zhou, Wangjie You, Juntao Li, Jing Ye, Kehai Chen, Min Zhang

Adaptive Gating in Mixture-of-Experts based Language Models Jiamin Li, Qiang Su, Yitao Yang, Yimin Jiang, Cong Wang, Hong Xu

On the Automatic Generation and Simplification of Children’s Stories Maria Valentini, Jennifer Weber, Jesus Salcido, Téa Wright, Eliana Colunga, Katharina von der Wense

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models Aviv Slobodkin, Omer Goldman, Avi Caciularu, Ido Dagan, Shauli Ravfogel

Identifying Informational Sources in News Articles Alexander Spangher, Nanyun Peng, Emilio Ferrara, Jonathan May

Retrofitting Light-weight Language Models for Emotions using Supervised Contrastive Learning Sapan Shah, Sreedhar Reddy, Pushpak Bhattacharyya

Longtriever: a Pre-trained Long Text Encoder for Dense Document Retrieval Junhan Yang, Zheng Liu, Chaozhuo Li, Guangzhong Sun, Xing Xie

Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning Gurusha Juneja, Subhabrata Dutta, Soumen Chakrabarti, Sunny Manchanda, Tanmoy Chakraborty

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models James Michaelov, Catherine Arnett, Tyler Chang, Ben Bergen

ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained Language Models for Question Answering over Knowledge Graph Jinhao Jiang, Kun Zhou, Xin Zhao, Yaliang Li, Ji-Rong Wen

Deep Natural Language Feature Learning for Interpretable Prediction Felipe Urrutia, Cristian Calderon, Valentin Barriere

ROBBIE: Robust Bias Evaluation of Large Generative Language Models David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, Eric Smith

Enhancing Task-oriented Dialogue Systems with Generative Post-processing Networks Atsumoto Ohashi, Ryuichiro Higashinaka

Adapting Language Models to Compress Contexts Alexis Chevalier, Alexander Wettig, Anirudh Ajith, Danqi Chen

Selective Labeling: How to Radically Lower Data-Labeling Costs for Document Extraction Models Yichao Zhou, James Wendt, Navneet Potti, Jing Xie, Sandeep Tata

TRAVEL: Tag-Aware Conversational FAQ Retrieval via Reinforcement Learning Yue Chen, Dingnan Jin, Chen Huang, Jia Liu, Wenqiang Lei

Continual Dialogue State Tracking via Example-Guided Question Answering Hyundong Cho, Andrea Madotto, Zhaojiang Lin, Khyathi Chandu, Satwik Kottur, Jing Xu, Jonathan May, Chinnadhurai Sankar

Lost in Translation, Found in Spans: Identifying Claims in Multilingual Social Media Shubham Mittal, Megha Sundriyal, Preslav Nakov

COVID-19 Vaccine Misinformation in Middle Income Countries Jongin Kim, Byeo Bak, Aditya Agrawal, Jiaxi Wu, Veronika Wirtz, Traci Hong, Derry Wijaya

Contrastive Learning of Sentence Embeddings from Scratch Junlei Zhang, Zhenzhong Lan, Junxian He

A Rose by Any Other Name would not Smell as Sweet: Social Bias in Names Mistranslation Sandra Sandoval, Jieyu Zhao, Marine Carpuat, Hal Daumé III

Investigating Efficiently Extending Transformers for Long Input Summarization Jason Phang, Yao Zhao, Peter Liu

CS2W: A Chinese Spoken-to-Written Style Conversion Dataset with Multiple Conversion Types Zishan Guo, Linhao Yu, Minghui Xu, Renren Jin, Deyi Xiong

Unifying Cross-Lingual Transfer across Scenarios of Resource Scarcity Alan Ansell, Marinela Parović, Ivan Vulić, Anna Korhonen, Edoardo Ponti

A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation Giuseppe Attanasio, Flor Plaza del Arco, Debora Nozza, Anne Lauscher

DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining Weifeng Jiang, Qianren Mao, Chenghua Lin, Jianxin Li, Ting Deng, Weiyi Yang, Zheng Wang

Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang

Language Model is Suitable for Correction of Handwritten Mathematical Expressions Recognition Zui Chen, Jiaqi Han, Chaofan Yang, Yi Zhou

Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive Language Detection Gretel De la Peña Sarracén, Paolo Rosso, Robert Litschko, Goran Glavaš, Simone Ponzetto

SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation Junfeng Jiang, Chengzhang Dong, Sadao Kurohashi, Akiko Aizawa

ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs Yang Bai, Wenqian Zhao, Shuo Yin, Zixiao Wang, Bei Yu

mRedditSum: A Multimodal Abstractive Summarization Dataset of Reddit Threads with Images Keighley Overbay, Jaewoo Ahn, Fatemeh Pesaran zadeh, Joonsuk Park, Gunhee Kim

Sparse Low-rank Adaptation of Pre-trained Language Models Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, Bowen Zhou, Zhiyuan Liu, Maosong Sun

Human Learning by Model Feedback: The Dynamics of Iterative Prompting with Midjourney Shachar Don-Yehiya, Leshem Choshen, Omri Abend

The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models Jingyuan Qi, Zhiyang Xu, Ying Shen, Minqian Liu, Di Jin, Qifan Wang, Lifu Huang

Ideology Takes Multiple Looks: A High-Quality Dataset for Multifaceted Ideology Detection Songtao Liu, Ziling Luo, Minghua Xu, Lixiao Wei, Ziyao Wei, Han Yu, Wei Xiang, Bang Wang

Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models Pierre Colombo, Victor Pellegrain, Malik Boudiaf, Myriam Tami, Victor Storchan, Ismail Ayed, Pablo Piantanida

MEGA: Multilingual Evaluation of Generative AI Kabir Ahuja, Harshita Diddee, Rishav Hada, Millicent Ochieng, Krithika Ramesh, Prachi Jain, Akshay Nambi, Tanuja Ganu, Sameer Segal, Mohamed Ahmed, Kalika Bali, Sunayana Sitaram

Support or Refute: Analyzing the Stance of Evidence to Detect Out-of-Context Mis- and Disinformation Xin Yuan, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li

Video-Helpful Multimodal Machine Translation Yihang Li, Shuichiro Shimizu, Chenhui Chu, Sadao Kurohashi, Wei Li

Large Language Models are Temporal and Causal Reasoners for Video Question Answering Dohwan Ko, Ji Lee, Woo-Young Kang, Byungseok Roh, Hyunwoo Kim

Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation Yuanyuan Liang, Jianing Wang, Hanlun Zhu, Lei Wang, Weining Qian, Yunshi Lan

TrojanSQL: SQL Injection against Natural Language Interface to Database Jinchuan Zhang, Yan Zhou, Binyuan Hui, Yaxin Liu, Ziming Li, Songlin Hu

Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models Aly Kassem, Omar Mahmoud, Sherif Saad

MingOfficial: A Ming Official Career Dataset and a Historical Context-Aware Representation Learning Framework You-Jun Chen, Hsin-Yi Hsieh, Yu Lin, Yingtao Tian, Bert Chan, Yu-Sin Liu, Yi-Hsuan Lin, Richard Tsai

DPP-TTS: Diversifying prosodic features of speech via determinantal point processes Seongho Joo, Hyukhun Koh, Kyomin Jung

Meta-Learning Online Adaptation of Language Models Nathan Hu, Eric Mitchell, Christopher Manning, Chelsea Finn

Self-Detoxifying Language Models via Toxification Reversal Chak Leong, Yi Cheng, Jiashuo Wang, Jian Wang, Wenjie Li

Interactive Text Generation Felix Faltings, Michel Galley, Kianté Brantley, Baolin Peng, Weixin Cai, Yizhe Zhang, Jianfeng Gao, Bill Dolan

NeuSTIP: A Neuro-Symbolic Model for Link and Time Prediction in Temporal Knowledge Graphs Ishaan Singh, Navdeep Kaur, Garima Gaur, Mausam

Standardizing Distress Analysis: Emotion-Driven Distress Identification and Cause Extraction (DICE) in Multimodal Online Posts Gopendra Singh, Soumitra Ghosh, Atul Verma, Chetna Painkra, Asif Ekbal

Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future Linyi Yang, Yaoxian Song, Xuan Ren, Chenyang Lyu, Yidong Wang, Jingming Zhuo, Lingqiao Liu, Jindong Wang, Jennifer Foster, Yue Zhang

Can Large Language Models Capture Dissenting Human Voices? Noah Lee, Na An, James Thorne

DecoMT: Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models Ratish Puduppully, Anoop Kunchukuttan, Raj Dabre, Ai Ti Aw, Nancy Chen

Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning Hao Zhao, Jie Fu, Zhaofeng He

Towards Building More Robust NER datasets: An Empirical Study on NER Dataset Bias from a Dataset Difficulty View Ruotian Ma, Xiaolei Wang, Xin Zhou, Qi Zhang, Xuanjing Huang

GradSim: Gradient-Based Language Grouping for Effective Multilingual Training Mingyang Wang, Heike Adel, Lukas Lange, Jannik Strötgen, Hinrich Schuetze

Discovering Universal Geometry in Embeddings with ICA Hiroaki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira

Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City Mikael Brunila, Jack LaViolette, Sky CH-Wang, Priyanka Verma, Clara Féré, Grant McKenzie

Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded Dialogue Lang Qin, Yao Zhang, Hongru Liang, Jun Wang, Zhenglu Yang

Merging Generated and Retrieved Knowledge for Open-Domain QA Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang

Text Fact Transfer Nishant Balepur, Jie Huang, Kevin Chang

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise Jiaao Chen, Aston Zhang, Mu Li, Alex Smola, Diyi Yang

Mirages. On Anthropomorphism in Dialogue Systems Gavin Abercrombie, Amanda Curry, Tanvi Dinkar, Verena Rieser, Zeerak Talat

KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

Adaptive Policy with Wait-k Model for Simultaneous Translation Libo Zhao, Kai Fan, Wei Luo, Wu Jing, Shushu Wang, Ziqian Zeng, Zhongqiang Huang

Cross-Document Event Coreference Resolution on Discourse Structure Xinyu Chen, Sheng Xu, Peifeng Li, Qiaoming Zhu

Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations Yoonna Jang, Suhyune Son, Jeongwoo Lee, Junyoung Son, Yuna Hur, Jungwoo Lim, Hyeonseok Moon, Kisu Yang, Heuiseok Lim

Can We Edit Factual Knowledge by In-Context Learning? Ce Zheng, Lei Li, Qingxiu Dong, Yuxuan Fan, Zhiyong Wu, Jingjing Xu, Baobao Chang

EDIS: Entity-Driven Image Search over Multimodal Web Content Siqi Liu, Weixi Feng, Tsu-Jui Fu, Wenhu Chen, William Wang

Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models Yifan Hou, Jiaoda Li, Yu Fei, Alessandro Stolfo, Wangchunshu Zhou, Guangtao Zeng, Antoine Bosselut, Mrinmaya Sachan

Text encoders bottleneck compositionality in contrastive vision-language models Amita Kamath, Jack Hessel, Kai-Wei Chang

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Kost, Christopher Carnahan, Jordan Boyd-Graber

MMNMT: Modularizing Multilingual Neural Machine Translation with Flexibly Assembled MoE and Dense Blocks Shangjie Li, Xiangpeng Wei, Shaolin Zhu, Jun Xie, Baosong Yang, Deyi Xiong

Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge Te-Lin Wu, Yu Zhou, Nanyun Peng

Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines Stephen Bothwell, Justin DeBenedetto, Theresa Crnkovich, Hildegund Muller, David Chiang

Prompting is not a substitute for probability measurements in large language models Jennifer Hu, Roger Levy

Parameter-Efficient Language Model Tuning with Active Learning in Low-Resource Settings Josip Jukić, Jan Snajder

CoLT5: Faster Long-Range Transformers with Conditional Computation Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Ontanon, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai

DiSTRICT: Dialogue State Tracking with Retriever Driven In-Context Tuning Praveen Venkateswaran, Evelyn Duesterwald, Vatche Isahagian

Cross-Cultural Analysis of Human Values, Morals, and Biases in Folk Tales Winston Wu, Lu Wang, Rada Mihalcea

Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL Ruiqi Zhong, Charlie Snell, Dan Klein, Jason Eisner

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers Theo Olausson, Alex Gu, Ben Lipkin, Cedegao Zhang, Armando Solar-Lezama, Joshua Tenenbaum, Roger Levy

Non-autoregressive Streaming Transformer for Simultaneous Translation Zhengrui Ma, Shaolei Zhang, Shoutao Guo, Chenze Shao, Min Zhang, Yang Feng

ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing Nam Nguyen, Thang Phan, Duc-Vu Nguyen, Kiet Nguyen

RAPL: A Relation-Aware Prototype Learning Approach for Few-Shot Document-Level Relation Extraction Shiao Meng, Xuming Hu, Aiwei Liu, Shuang Li, Fukun Ma, Yawen Yang, Lijie Wen

GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding Zekun Li, Wenxuan Zhou, Yao-Yi Chiang, Muhao Chen

Cross-Modal Conceptualization in Bottleneck Models Danis Alukaev, Semen Kiselev, Ilya Pershin, Bulat Ibragimov, Vladimir Ivanov, Alexey Kornaev, Ivan Titov

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, Roy Lee

DREAM: Deployment of Recombination and Ensembles in Argument Mining Florian Ruosch, Cristina Sarasua, Abraham Bernstein

Query Rewriting in Retrieval-Augmented Large Language Models Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation Gaurav Sahu, Olga Vechtomova, Dzmitry Bahdanau, Issam Laradji

COHESENTIA: A Novel Benchmark of Incremental versus Holistic Assessment of Coherence in Generated Texts Aviya Maimon, Reut Tsarfaty

QUDeval: The Evaluation of Questions Under Discussion Discourse Parsing Yating Wu, Ritika Mangla, Greg Durrett, Junyi Li

PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter Haoyan Yang, Zhitao Li, Yong Zhang, Jianzong Wang, Ning Cheng, Ming Li, Jing Xiao

Exploring Chain of Thought Style Prompting for Text-to-SQL Chang-Yu Tai, Ziru Chen, Tianshu Zhang, Xiang Deng, Huan Sun

Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages Alexandra Butoi, Tim Vieira, Ryan Cotterell, David Chiang

Harnessing Black-Box Control to Boost Commonsense in LM’s Generation Yufei Tian, Felix Zhang, Nanyun Peng

Representative Demonstration Selection for In-Context Learning with Two-Stage Determinantal Point Process Zhao Yang, Yuanzhe Zhang, Dianbo Sui, Cao Liu, Jun Zhao, Kang Liu

The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language Models Lovisa Hagström, Denitsa Saynova, Tobias Norlund, Moa Johansson, Richard Johansson

ViPE: Visualise Pretty-much Everything Hassan Shahmohammadi, Adhiraj Ghosh, Hendrik Lensch

Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models Kaitlyn Zhou, Dan Jurafsky, Tatsunori Hashimoto

Elaborative Simplification as Implicit Questions Under Discussion Yating Wu, William Sheffield, Kyle Mahowald, Junyi Li

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations Amanpreet Singh, Mike D’Arcy, Arman Cohan, Doug Downey, Sergey Feldman

A Diachronic Perspective on User Trust in AI under Uncertainty Shehzaad Dhuliawala, Vilém Zouhar, Mennatallah El-Assady, Mrinmaya Sachan

CT-GAT: Cross-Task Generative Adversarial Attack based on Transferability Minxuan Lv, Chengwei Dai, Kun Li, Wei Zhou, Songlin Hu

Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling Hai Yu, Chong Deng, Qinglin Zhang, Jiaqing Liu, Qian Chen, Wen Wang

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents Hyungjoo Chae, Yongho Song, Kai Ong, Taeyoon Kwon, Minjin Kim, Youngjae Yu, Dongha Lee, Dongyeop Kang, Jinyoung Yeo

Information Value: Measuring Utterance Predictability as Distance from Plausible Alternatives Mario Giulianelli, Sarenne Wallbridge, Raquel Fernández

Generating Commonsense Counterfactuals for Stable Relation Extraction Xin Miao, Yongqi Li, Tieyun Qian

C-STS: Conditional Semantic Textual Similarity Ameet Deshpande, Carlos Jimenez, Howard Chen, Vishvak Murahari, Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, Karthik Narasimhan

Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis Seraphina Goldfarb-Tarrant, Björn Ross, Adam Lopez

Rumor Detection on Social Media with Crowd Intelligence and ChatGPT-Assisted Networks Chang Yang, Peng Zhang, Wenbo Qiao, Hui Gao, Jiaming Zhao

Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai

Controllable Contrastive Generation for Multilingual Biomedical Entity Linking Tiantian Zhu, Yang Qin, Qingcai Chen, Xin Mu, Changlong Yu, Yang Xiang

MediaHG: Rethinking Eye-catchy Features in Social Media Headline Generation Boning Zhang, Yang Yang

Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata Silei Xu, Shicheng Liu, Theo Culhane, Elizaveta Pertseva, Meng-Hsi Wu, Sina Semnani, Monica Lam

Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule Andrey Bout, Alexander Podolskiy, Sergey Nikolenko, Irina Piontkovskaya

The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models Xinyi Chen, Raquel Fernández, Sandro Pezzelle

RainProof: An Umbrella to Shield Text Generator from Out-Of-Distribution Data Maxime Darrin, Pablo Piantanida, Pierre Colombo

KEPL: Knowledge Enhanced Prompt Learning for Chinese Hypernym-Hyponym Extraction Ningchen Ma, Dong Wang, Hongyun Bao, Lei He, Suncong Zheng

Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction Ji Qi, Chuchun Zhang, Xiaozhi Wang, Kaisheng Zeng, Jifan Yu, Jinxin Liu, Lei Hou, Juanzi Li, Xu Bin

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions Lucie-Aimée Kaffee, Arnav Arora, Isabelle Augenstein

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding Sangmin Bae, Jongwoo Ko, Hwanjun Song, Se-Young Yun

End-to-end Task-oriented Dialogue: A Survey of Tasks, Methods, and Future Directions Libo Qin, Wenbo Pan, Qiguang Chen, Lizi Liao, Zhou Yu, Yue Zhang, Wanxiang Che, Min Li

Answering Questions by Meta-Reasoning over Multiple Chains of Thought Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, Jonathan Berant

INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback Wenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Wang, Lei Li

Multi-level Contrastive Learning for Script-based Character Understanding Dawei Li, Hengyuan Zhang, Yanran Li, Shiping Yang

CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Heuiseok Lim

Automatic Debate Evaluation with Argumentation Semantics and Natural Language Argument Graph Networks Ramon Ruiz-Dolz, Stella Heras, Ana Garcia

Transfer-Free Data-Efficient Multilingual Slot Labeling Evgeniia Razumovskaia, Ivan Vulić, Anna Korhonen

Towards Interpretable Mental Health Analysis with Large Language Models Kailai Yang, Shaoxiong Ji, Tianlin Zhang, Qianqian Xie, Ziyan Kuang, Sophia Ananiadou

Learning to Rank Generation with Pairwise Partial Rewards Youngwon Lee, Jinu Lee, Seung-won Hwang

GreedyCAS: Unsupervised Scientific Abstract Segmentation with Normalized Mutual Information Yingqiang Gao, Jessica Lam, Nianlong Gu, Richard Hahnloser

Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue Aishwarya Padmakumar, Mert Inan, Spandana Gella, Patrick Lange, Dilek Hakkani-Tur

GEM: Gestalt Enhanced Markup Language Model for Web Understanding via Render Tree Zirui Shao, Feiyu Gao, Zhongda Qi, Hangdi Xing, Jiajun Bu, Zhi Yu, Qi Zheng, Xiaozhong Liu

Abstractive Open Information Extraction Kevin Pei, Ishan Jindal, Kevin Chang

CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network Sreyan Ghosh, Manan Suri, Purva Chiniya, Utkarsh Tyagi, Sonal Kumar, Dinesh Manocha

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction Jingheng Ye, Yinghui Li, Qingyu Zhou, Yangning Li, Shirong Ma, Hai-Tao Zheng, Ying Shen

SentiStream: A Co-Training Framework for Adaptive Online Sentiment Analysis in Evolving Data Streams Yuhao Wu, Karthick Sharma, Chun Seah, Shuhao Zhang

HyperNetwork-based Decoupling to Improve Model Generalization for Few-Shot Relation Extraction Liang Zhang, Chulun Zhou, Fandong Meng, Jinsong Su, Yidong Chen, Jie Zhou

Solving Hard Analogy Questions with Relation Embedding Chains Nitesh Kumar, Steven Schockaert

Modeling Empathic Similarity in Personal Narratives Jocelyn Shen, Maarten Sap, Pedro Colon-Hernandez, Hae Park, Cynthia Breazeal

Tree Prompting: Efficient Task Adaptation without Fine-Tuning Chandan Singh, John Morris, Alexander Rush, Jianfeng Gao, Yuntian Deng

Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data Canwen Xu, Daya Guo, Nan Duan, Julian McAuley

Empathy Intent Drives Empathy Detection Liting Jiang, Di Wu, Bohui Mao, Yanbing Li, Wushour Slamu

Adaptive End-to-End Metric Learning for Zero-Shot Cross-Domain Slot Filling Yuanjun Shi, Linzhi Wu, Minglai Shao

ReTAG: Reasoning Aware Table to Analytic Text Generation Deepanway Ghosal, Preksha Nema, Aravindan Raghuveer

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators Liang Chen, Yang Deng, Yatao Bian, Zeyu Qin, Bingzhe Wu, Tat-Seng Chua, Kam-Fai Wong

Compressing Context to Enhance Inference Efficiency of Large Language Models Yucheng Li, Bo Dong, Frank Guerin, Chenghua Lin

MoT: Memory-of-Thought Enables ChatGPT to Self-Improve Xiaonan Li, Xipeng Qiu

Can You Follow Me? Testing Situational Understanding for ChatGPT Chenghao Yang, Allyson Ettinger

Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4 Kellin Pelrine, Anne Imouza, Camille Thibault, Meilina Reksoprodjo, Caleb Gupta, Joel Christoph, Jean-François Godbout, Reihaneh Rabbany

Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation Bashar Alhafni, Go Inoue, Christian Khairallah, Nizar Habash

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models Junyi Li, Xiaoxue Cheng, Xin Zhao, Jian-Yun Nie, Ji-Rong Wen

Enabling Large Language Models to Generate Text with Citations Tianyu Gao, Howard Yen, Jiatong Yu, Danqi Chen

Revisiting Machine Translation for Cross-lingual Classification Mikel Artetxe, Vedanuj Goswami, Shruti Bhosale, Angela Fan, Luke Zettlemoyer

Counting the Bugs in ChatGPT’s Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model Leonie Weissweiler, Valentin Hofmann, Anjali Kantharuban, Anna Cai, Ritam Dutt, Amey Hengle, Anubha Kabra, Atharva Kulkarni, Abhishek Vijayakumar, Haofei Yu, Hinrich Schuetze, Kemal Oflazer, David Mortensen

Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning Quanyu Long, Wenya Wang, Sinno Pan

Dual-Feedback Knowledge Retrieval for Task-Oriented Dialogue Systems Tianyuan Shi, Liangzhi Li, Zijian Lin, Tao Yang, Xiaojun Quan, Qifan Wang

MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models Deepak Nathani, David Wang, Liangming Pan, William Wang

Granularity Matters: Pathological Graph-driven Cross-modal Alignment for Brain CT Report Generation Yanzhao Shi, Junzhong Ji, Xiaodan Zhang, Liangqiong Qu, Ying Liu

Enhancing Structured Evidence Extraction for Fact Verification Zirui Wu, Nan Hu, Yansong Feng

Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models Di Wu, Wasi Ahmad, Kai-Wei Chang

A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking Systems Hannah Bast, Matthias Hertel, Natalie Prange

A Multi-Task Dataset for Assessing Discourse Coherence in Chinese Essays: Structure, Theme, and Logic Analysis Hongyi Wu, Xinshu Shen, Man Lan, Shaoguang Mao, Xiaopeng Bai, Yuanbin Wu

SKD-NER: Continual Named Entity Recognition via Span-based Knowledge Distillation with Reinforcement Learning Yi Chen, Liang He

Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation Chengwei Qin, Chen Chen, Shafiq Joty

When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks Eve Fleisig, Rediet Abebe, Dan Klein

Lazy-k Decoding: Constrained Decoding for Information Extraction Arthur Hemmer, Mickael Coustaty, Nicola Bartolo, Jerome Brachat, Jean-marc Ogier

Personalized Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation Hailin Chen, Amrita Saha, Steven Hoi, Shafiq Joty

Do Language Models Have a Common Sense regarding Time? Revisiting Temporal Commonsense Reasoning in the Era of Large Language Models Raghav Jain, Daivik Sojitra, Arkadeep Acharya, Sriparna Saha, Adam Jatowt, Sandipan Dandapat

Comparing Styles across Languages Shreya Havaldar, Matthew Pressimone, Eric Wong, Lyle Ungar

Event Causality Extraction via Implicit Cause-Effect Interactions Jintao Liu, Zequn Zhang, Kaiwen Wei, Zhi Guo, Xian Sun, Li Jin, Xiaoyu Li

Evaluation of African American Language Bias in Natural Language Generation Nicholas Deas, Jessica Grieser, Shana Kleiner, Desmond Patton, Elsbeth Turcan, Kathleen McKeown

A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems Songbo Hu, Han Zhou, Moy Yuan, Milan Gritta, Guchun Zhang, Ignacio Iacobacci, Anna Korhonen, Ivan Vulić

Cognate Transformer for Automated Phonological Reconstruction and Cognate Reflex Prediction V.S.D.S.Mahesh Akavarapu, Arnab Bhattacharya

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning Ximing Lu, Faeze Brahman, Peter West, Jaehun Jung, Khyathi Chandu, Abhilasha Ravichander, Prithviraj Ammanabrolu, Liwei Jiang, Sahana Ramnath, Nouha Dziri, Jillian Fisher, Bill Lin, Skyler Hallinan, Lianhui Qin, Xiang Ren, Sean Welleck, Yejin Choi

Weakly Supervised Semantic Parsing with Execution-based Spurious Program Filtering Kang-il Lee, Segwang Kim, Kyomin Jung

Taxonomy Expansion for Named Entity Recognition Karthikeyan K, Yogarshi Vyas, Jie Ma, Giovanni Paolini, Neha John, Shuai Wang, Yassine Benajiba, Vittorio Castelli, Dan Roth, Miguel Ballesteros

Rather a Nurse than a Physician - Contrastive Explanations under Investigation Oliver Eberle, Ilias Chalkidis, Laura Cabello, Stephanie Brandl

An Investigation of LLMs’ Inefficacy in Understanding Converse Relations Chengwen Qi, Bowen Li, Binyuan Hui, Bailin Wang, Jinyang Li, Jinwang Wu, Yuanjun Laili

Towards Low-Resource Automatic Program Repair with Meta-Learning and Pretrained Language Models Weishi Wang, Yue Wang, Steven Hoi, Shafiq Joty

ZGUL: Zero-shot Generalization to Unseen Languages using Multi-source Ensembling of Language Adapters Vipul Rathore, Rajdeep Dhingra, Parag Singla, Mausam

Log-FGAER: Logic-Guided Fine-Grained Address Entity Recognition from Multi-Turn Spoken Dialogue Xue Han, Yitong Wang, Qian Hu, Pengwei Hu, Chao Deng, Junlan Feng

Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse Finetuning Sarkar Snigdha Sarathi Das, Haoran Zhang, Peng Shi, Wenpeng Yin, Rui Zhang

On the Representational Capacity of Recurrent Neural Language Models Franz Nowak, Anej Svete, Li Du, Ryan Cotterell

A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis Alessandro Stolfo, Yonatan Belinkov, Mrinmaya Sachan

Benchmarking and Improving Text-to-SQL Generation under Ambiguity Adithya Bhaskar, Tushar Tomar, Ashutosh Sathe, Sunita Sarawagi

Non-autoregressive Text Editing with Copy-aware Latent Alignments Yu Zhang, Yue Zhang, Leyang Cui, Guohong Fu

Translating away Translationese without Parallel Data Rricha Jalota, Koel Chowdhury, Cristina España-Bonet, Josef van Genabith

CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding Yixiao Ma, Yueyue WU, Weihang Su, Qingyao Ai, Yiqun Liu

HiddenTables and PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies William Watson, Nicole Cho, Tucker Balch, Manuela Veloso

Causal Document-Grounded Dialogue Pre-training Yingxiu Zhao, Bowen Yu, Bowen Li, Haiyang Yu, Jinyang Li, Chao Wang, Fei Huang, Yongbin Li, Nevin Zhang

Accented Speech Recognition With Accent-specific Codebooks Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, Vinit Unni

Linking Surface Facts to Large-Scale Knowledge Graphs Gorjan Radevski, Kiril Gashteovski, Chia-Chien Hung, Carolin Lawrence, Goran Glavaš

Sentiment Analysis on Streaming User Reviews via Dual-Channel Dynamic Graph Neural Network Xin Zhang, Linhai Zhang, Deyu Zhou

DUMB: A Dutch Model Benchmark Wietse de Vries, Martijn Wieling, Malvina Nissim

OssCSE: Overcoming Surface Structure Bias in Contrastive Learning for Unsupervised Sentence Embedding Zhan Shi, Guoyin Wang, Ke Bai, Jiwei Li, Xiang Li, Qingjun Cui, Belinda Zeng, Trishul Chilimbi, Xiaodan Zhu

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation Juan Pablo Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

A Fine-Grained Taxonomy of Replies to Hate Speech Xinchen Yu, Ashley Zhao, Eduardo Blanco, Lingzi Hong

JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification Henry Zou, Cornelia Caragea

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4 Kent Chang, Mackenzie Cramer, Sandeep Soni, David Bamman

CiteBench: A Benchmark for Scientific Citation Text Generation Martin Funkquist, Ilia Kuznetsov, Yufang Hou, Iryna Gurevych

From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning Zheyuan Zhang, Shane Storks, Fengyuan Hu, Sungryull Sohn, Moontae Lee, Honglak Lee, Joyce Chai

A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video Keito Kudo, Haruki Nagasawa, Jun Suzuki, Nobuyuki Shimizu

Effects of sub-word segmentation on performance of transformer language models Jue Hou, Anisia Katinskaia, Anh-Duc Vu, Roman Yangarber

Symbolic Planning and Code Generation for Grounded Dialogue Justin Chiu, Wenting Zhao, Derek Chen, Saujas Vaduguru, Alexander Rush, Daniel Fried

Universal Self-Adaptive Prompting Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Hanjun Dai, Julian Eisenschlos, Sercan Arik, Tomas Pfister

Content- and Topology-Aware Representation Learning for Scientific Multi-Literature Kai Zhang, Kaisong Song, Yangyang Kang, Xiaozhong Liu

Joint Entity and Relation Extraction with Span Pruning and Hypergraph Neural Networks Zhaohui Yan, Songlin Yang, Wei Liu, Kewei Tu

Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models Daman Arora, Himanshu Singh, Mausam

StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure Mattia Opper, Victor Prokhorov, Siddharth N

WiCE: Real-World Entailment for Claims in Wikipedia Ryo Kamoi, Tanya Goyal, Juan Rodriguez, Greg Durrett

Natural Disaster Tweets Classification Using Multimodal Data Mohammad Basit, Bashir Alam, Zubaida Fatima, Salman Shaikh

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

RoBoCoP: A Comprehensive ROmance BOrrowing COgnate Package and Benchmark for Multilingual Cognate Identification Liviu Dinu, Ana Uban, Alina Cristea, Anca Dinu, Ioan-Bogdan Iordache, Simona Georgescu, Laurentiu Zoicas

Instructive Dialogue Summarization with Query Aggregations Bin Wang, Zhengyuan Liu, Nancy Chen

Semantic matching for text classification with complex class descriptions Brian De Silva, Kuan-Wen Huang, Gwang Lee, Karen Hovsepian, Yan Xu, Mingwei Shen

MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation Jia-Chen Gu, Chao-Hong Tan, Caiyuan Chu, Zhen-Hua Ling, Chongyang Tao, Quan Liu, Cong Liu

GLEN: Generative Retrieval via Lexical Index Learning Sunkyung Lee, Minjin Choi, Jongwuk Lee

Turn-Level Active Learning for Dialogue State Tracking Zihan Zhang, Meng Fang, Fanghua Ye, Ling Chen, Mohammad-Reza Namazi-Rad

ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue Haoqin Tu, Yitong Li, Fei Mi, Zhongliang Yang

Modeling Conceptual Attribute Likeness and Domain Inconsistency for Metaphor Detection Yuan Tian, Nan Xu, Wenji Mao, Daniel Zeng

Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network Ziling Huang, Shin’ichi Satoh

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro

SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables Xinyuan Lu, Liangming Pan, Qian Liu, Preslav Nakov, Min-Yen Kan

Training Simultaneous Speech Translation with Robust and Random Wait-k-Tokens Strategy Linlin Zhang, Kai Fan, Jiajun Bu, Zhongqiang Huang

SCENE: Self-Labeled Counterfactuals for Extrapolating to Negative Examples Deqing Fu, Ameya Godbole, Robin Jia

Task-Agnostic Low-Rank Adapters for Unseen English Dialects Zedian Xiao, William Held, Yanchen Liu, Diyi Yang

Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization Tianshi Che, Ji Liu, Yang Zhou, Jiaxiang Ren, Jiwen Zhou, Victor Sheng, Huaiyu Dai, Dejing Dou

TheoremQA: A Theorem-driven Question Answering Dataset Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, Tony Xia

Scalable-DSC: A Structural Template Prompt Approach to Scalable Dialogue State Correction Haoxiang Su, Hongyan Xie, Hao Huang, Shuangyong Song, Ruiyu Fang, Xiaomeng Huang, Sijie Feng

Don’t Trust ChatGPT when your Question is not in English: A Study of Multilingual Abilities and Types of LLMs Xiang Zhang, Senyu Li, Bradley Hauer, Ning Shi, Grzegorz Kondrak

Empirical Study of Zero-Shot NER with ChatGPT Tingyu Xie, Qi Li, Jian Zhang, Yan Zhang, Zuozhu Liu, Hongwei Wang

Automatic Prompt Optimization with ``Gradient Descent’’ and Beam Search Reid Pryzant, Dan Iter, Jerry Li, Yin Lee, Chenguang Zhu, Michael Zeng

Active Retrieval Augmented Generation Zhengbao Jiang, Frank Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, Graham Neubig

Multi-level Adaptive Contrastive Learning for Knowledge Internalization in Dialogue Generation Chenxu Yang, Zheng Lin, Lanrui Wang, Chong Tian, Liang Pang, Jiangnan Li, Qirong Ho, Yanan Cao, Weiping Wang

Enhancing Biomedical Lay Summarisation with External Knowledge Graphs Tomas Goldsack, Zhihao Zhang, Chen Tang, Carolina Scarton, Chenghua Lin

A Diffusion Weighted Graph Framework for New Intent Discovery Wenkai Shi, Wenbin An, Feng Tian, Qinghua Zheng, QianYing Wang, Ping Chen

A Self-enhancement Multitask Framework for Unsupervised Aspect Category Detection Thi-Nhung Nguyen, Hoang Ngo, Kiem-Hieu Nguyen, Tuan-Dung Cao

DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao, Baoyuan Wang

Recurrent Neural Language Models as Probabilistic Finite-state Automata Anej Svete, Ryan Cotterell

Revisiting Source Context in Nearest Neighbor Machine Translation Xuanhong Li, Peng Li, Po Hu

Find-2-Find: Multitask Learning for Anaphora Resolution and Object Localization Cennet Oguz, Pascal Denis, Emmanuel Vincent, Simon Ostermann, Josef van Genabith

Background Summarization of Event Timelines Adithya Pratapa, Kevin Small, Markus Dreyer

Superlim: A Swedish Language Understanding Evaluation Benchmark Aleksandrs Berdicevskis, Gerlof Bouma, Robin Kurtz, Felix Morger, Joey öhman, Yvonne Adesam, Lars Borin, Dana Dannélls, Markus Forsberg, Tim Isbister, Anna Lindahl, Martin Malmsten, Faton Rekathati, Magnus Sahlgren, Elena Volodina, Love Börjeson, Simon Hengchen, Nina Tahmasebi

Reasoning with Language Model is Planning with World Model Shibo Hao, Yi Gu, Haodi Ma, Joshua Hong, Zhen Wang, Daisy Wang, Zhiting Hu

LLM-enhanced Self-training for Cross-domain Constituency Parsing Jianling Li, Meishan Zhang, Peiming Guo, Min Zhang, Yue Zhang

Continual Named Entity Recognition without Catastrophic Forgetting Duzhen Zhang, Wei Cong, Jiahua Dong, Yahan Yu, Xiuyi Chen, Yonggang Zhang, Zhen Fang

DSI++: Updating Transformer Memory with New Documents Sanket Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler

Editing Common Sense in Transformers Anshita Gupta, Debanjan Mondal, Akshay Sheshadri, Wenlong Zhao, Xiang Li, Sarah Wiegreffe, Niket Tandon

Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation Tianqi Zhong, Quan Wang, Jingxuan Han, Yongdong Zhang, Zhendong Mao

Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers Hosein Mohebbi, Grzegorz Chrupała, Willem Zuidema, Afra Alishahi

Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System Weizhou Shen, Yingqi Gao, Canbin Huang, Fanqi Wan, Xiaojun Quan, Wei Bi

IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions Wenhao Yu, Meng Jiang, Peter Clark, Ashish Sabharwal

How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances Zihan Zhang, Meng Fang, Ling Chen, Mohammad-Reza Namazi-Rad, Jun Wang

Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation Verna Dankers, Ivan Titov, Dieuwke Hupkes

DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4 Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Hassan Foroosh, Fei Liu

Gender Biases in Automatic Evaluation Metrics for Image Captioning Haoyi Qiu, Zi-Yi Dou, Tianlu Wang, Asli Celikyilmaz, Nanyun Peng

QA-NatVer: Question Answering for Natural Logic-based Fact Verification Rami Aly, Marek Strong, Andreas Vlachos

Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish Sabharwal

Generating Data for Symbolic Language with Large Language Models Jiacheng Ye, Chengzu Li, Lingpeng Kong, Tao Yu

IDTraffickers: An Authorship Attribution Dataset to link and connect Potential Human-Trafficking Operations on Text Escort Advertisements Vageesh Saxena, Benjamin Ashpole, Gijs van Dijck, Gerasimos Spanakis

Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models Laura Cabello, Emanuele Bugliarello, Stephanie Brandl, Desmond Elliott

Improving Dialogue Discourse Parsing via Reply-to Structures of Addressee Recognition Yaxin Fan, Feng Jiang, Peifeng Li, Fang Kong, Qiaoming Zhu

Improving Language Models’ Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary Myeongjun Jang, Thomas Lukasiewicz

DALE: Generative Data Augmentation for Low-Resource Legal NLP Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, S Ramaneswaran, S Sakshi, Utkarsh Tyagi, Dinesh Manocha

FedID: Federated Interactive Distillation for Large-Scale Pretraining Language Models Xinge Ma, Jiangming Liu, Jin Wang, Xuejie Zhang

trlX: A Framework for Large Scale Reinforcement Learning from Human Feedback Alexander Havrilla, Maksym Zhuravinskyi, Duy Phung, Aman Tiwari, Jonathan Tow, Stella Biderman, Quentin Anthony, Louis Castricato

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models Iker García-Ferrero, Begoña Altuna, Javier Alvez, Itziar Gonzalez-Dios, German Rigau

MT2: Towards a Multi-Task Machine Translation Model with Translation-Specific In-Context Learning Chunyou Li, Mingtong Liu, Hongxiao Zhang, Yufeng Chen, Jinan Xu, Ming Zhou

CleanCoNLL: A Nearly Noise-Free Named Entity Recognition Dataset Susanna Rücker, Alan Akbik

Disentangling Transformer Language Models as Superposed Topic Models Jia Lim, Hady Lauw

Conversational Semantic Parsing using Dynamic Context Graphs Parag Jain, Mirella Lapata

Not all quantifiers are equal: Probing Transformer-based language models’ understanding of generalised quantifiers Tharindu Madusanka, Iqra Zahid, Hao Li, Ian Pratt-Hartmann, Riza Batista-Navarro

Structure-aware Knowledge Graph-to-text Generation with Planning Selection and Similarity Distinction Feng Zhao, Hongzhi Zou, Cheng Yan

Regulation and NLP (RegNLP): Taming Large Language Models Catalina Goanta, Nikolaos Aletras, Ilias Chalkidis, Sofia Ranchordás, Gerasimos Spanakis

MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation Zexue He, Yu Wang, An Yan, Yao Liu, Eric Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu

Seeing through the mess: evolutionary dynamics of lexical polysemy Andreas Baumann, Andreas Stephan, Benjamin Roth

Are Embedded Potatoes Still Vegetables? On the Limitations of WordNet Embeddings for Lexical Semantics Xuyou Cheng, Michael Schlichtkrull, Guy Emerson

Event-Location Tracking in Narratives: A Case Study on Holocaust Testimonies Eitan Wagner, Renana Keydar, Omri Abend

Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources Yerin Hwang, Yongil Kim, Hyunkyung Bae, Hwanhee Lee, Jeesoo Bang, Kyomin Jung

Learning to Predict Task Transferability via Soft Prompt Lingyun Feng

Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering Wang Zhu, Jesse Thomason, Robin Jia

Mirror: A Universal Framework for Various Information Extraction Tasks Tong Zhu, Junfei Ren, Zijian Yu, Mengsong Wu, Guoliang Zhang, Xiaoye Qu, Wenliang Chen, Zhefeng Wang, Baoxing Huai, Min Zhang

``Mistakes Help Us Grow’’: Facilitating and Evaluating Growth Mindset Supportive Language in Classrooms Kunal Handa, Margarett Clapper, Jessica Boyle, Rose Wang, Diyi Yang, David Yeager, Dorottya Demszky

Detecting and Mitigating Hallucinations in Multilingual Summarisation Yifu Qiu, Yftah Ziser, Anna Korhonen, Edoardo Ponti, Shay Cohen

AMR Parsing with Causal Hierarchical Attention and Pointers Chao Lou, Kewei Tu

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Changjian Wang, Dongsheng Li, Dacheng Tao

IC3: Image Captioning by Committee Consensus David Chan, Austin Myers, Sudheendra Vijayanarasimhan, David Ross, John Canny

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models Potsawee Manakul, Adian Liusie, Mark Gales

Fair Without Leveling Down: A New Intersectional Fairness Definition Gaurav Maheshwari, Aurélien Bellet, Pascal Denis, Mikaela Keller

M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis Fei Zhao, Chunhui Li, Zhen Wu, Yawen Ouyang, Jianbing Zhang, Xinyu Dai

Detection of Multiple Mental Disorders from Social Media with Two-Stream Psychiatric Experts Siyuan Chen, Zhiling Zhang, Mengyue Wu, Kenny Zhu

Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance? Ahmed Alajrami, Katerina Margatina, Nikolaos Aletras

EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs Hanlin Tang, Yifu Sun, Decheng Wu, Kai Liu, Jianchen Zhu, Zhanhui Kang

Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings Mattia Atzeni, Mikhail Plekhanov, Frederic Dreyer, Nora Kassner, Simone Merello, Louis Martin, Nicola Cancedda

APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models Qifan Wang, Yuning Mao, Jingang Wang, Hanchao Yu, Shaoliang Nie, Sinong Wang, Fuli Feng, Lifu Huang, Xiaojun Quan, Zenglin Xu, Dongfang Liu

What’s ``up’’ with vision-language models? Investigating their struggle with spatial reasoning Amita Kamath, Jack Hessel, Kai-Wei Chang

IBADR: an Iterative Bias-Aware Dataset Refinement Framework for Debiasing NLU models Xiaoyue Wang, Xin Liu, Lijie Wang, Yaoxiang Wang, Jinsong Su, Hua Wu

Learning Preference Model for LLMs via Automatic Preference Data Generation Shijia Huang, Jianqiao Zhao, Yanyang Li, Liwei Wang

Causal Reasoning through Two Cognition Layers for Improving Generalization in Visual Question Answering Trang Nguyen, Naoaki Okazaki

StructGPT: A General Framework for Large Language Model to Reason over Structured Data Jinhao Jiang, Kun Zhou, Zican Dong, Keming Ye, Xin Zhao, Ji-Rong Wen

Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement Rosamond Thalken, Edward Stiglitz, David Mimno, Matthew Wilkens

Model-tuning Via Prompts Makes NLP Models Adversarially Robust Mrigank Raman, Pratyush Maini, J Kolter, Zachary Lipton, Danish Pruthi

Learning Co-Speech Gesture for Multimodal Aphasia Type Detection Daeun Lee, Sejung Son, Hyolim Jeon, Seungbae Kim, Jinyoung Han

STINMatch: Semi-Supervised Semantic-Topological Iteration Network for Financial Risk Detection via News Label Diffusion Xurui Li, Yue Qin, Rui Zhu, Tianqianjin Lin, Yongming Fan, Yangyang Kang, Kaisong Song, Fubang Zhao, Changlong Sun, Haixu Tang, Xiaozhong Liu

Centering the Margins: Outlier-Based Identification of Harmed Populations in Toxicity Detection Vyoma Raman, Eve Fleisig, Dan Klein

Describe Me an Auklet: Generating Grounded Perceptual Category Descriptions Bill Noble, Nikolai Ilinykh

ORCHID: A Chinese Debate Corpus for Target-Independent Stance Detection and Argumentative Dialogue Summarization Xiutian Zhao, Ke Wang, Wei Peng

On the Benefits of Learning to Route in Mixture-of-Experts Models Nishanth Dikkala, Nikhil Ghosh, Raghu Meka, Rina Panigrahy, Nikhil Vyas, Xin Wang

SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation Elizabeth Clark, Shruti Rijhwani, Sebastian Gehrmann, Joshua Maynez, Roee Aharoni, Vitaly Nikolaev, Thibault Sellam, Aditya Siddhant, Dipanjan Das, Ankur Parikh

We Need to Talk About Reproducibility in NLP Model Comparison Yan Xue, Xuefei Cao, Xingli Yang, Yu Wang, Ruibo Wang, Jihong Li

Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi

Just Adjust One Prompt: Enhancing In-Context Dialogue Scoring via Constructing the Optimal Subgraph of Demonstrations and Prompts Jiashu Pu, Ling Cheng, Lu Fan, Tangjie Lv, Rongsheng Zhang

Multilingual estimation of political-party positioning: From label aggregation to long-input Transformers Dmitry Nikolaev, Tanise Ceron, Sebastian Padó

ART: rule bAsed futuRe-inference deducTion Mengze Li, Tianqi Zhao, Bai Jionghao, Baoyi He, Jiaxu Miao, Wei Ji, Zheqi Lv, Zhou Zhao, Shengyu Zhang, Wenqiao Zhang, Fei Wu

EpiK-Eval: Evaluation for Language Models as Epistemic Models Gabriele Prato, Jerry Huang, Prasanna Parthasarathi, Shagun Sodhani, Sarath Chandar

From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification Shanshan Xu, Santosh T.Y.S.S, Oana Ichim, Isabella Risini, Barbara Plank, Matthias Grabmair

On Bilingual Lexicon Induction with Large Language Models Yaoyiran Li, Anna Korhonen, Ivan Vulić

Statistical Depth for Ranking and Characterizing Transformer-Based Text Embeddings Parker Seegmiller, Sarah Preum

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, Bowen Zhou

From Multilingual Complexity to Emotional Clarity: Leveraging Commonsense to Unveil Emotions in Code-Mixed Dialogues Shivani Kumar, Ramaneswaran S, Md Akhtar, Tanmoy Chakraborty

SummEdits: Measuring LLM Ability at Factual Reasoning Through The Lens of Summarization Philippe Laban, Wojciech Kryscinski, Divyansh Agarwal, Alexander Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu

DIVE: Towards Descriptive and Diverse Visual Commonsense Generation Jun-Hyung Park, Hyuntae Park, Youjin Kang, Eojin Jeon, SangKeun Lee

Towards Conceptualization of ``Fair Explanation’’: Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators Tin Nguyen, Jiannan Xu, Aayushi Roy, Hal Daumé III, Marine Carpuat

Bridging Background Knowledge Gaps in Translation with Automatic Explicitation HyoJung Han, Jordan Boyd-Graber, Marine Carpuat

A Quality-based Syntactic Template Retriever for Syntactically-Controlled Paraphrase Generation Xue Zhang, Songming Zhang, Yunlong Liang, Yufeng Chen, Jian Liu, Wenjuan Han, Jinan Xu

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation Di Wu, Christof Monz

Quantifying the redundancy between prosody and text Lukas Wolf, Tiago Pimentel, Evelina Fedorenko, Ryan Cotterell, Alex Warstadt, Ethan Wilcox, Tamar Regev

CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks Mete Ismayilzada, Debjit Paul, Syrielle Montariol, Mor Geva, Antoine Bosselut

A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot Aanisha Bhattacharyya, Yaman Singla, Balaji Krishnamurthy, Rajiv Shah, Changyou Chen

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun

Active Learning for Natural Language Generation Yotam Perlitz, Ariel Gera, Michal Shmueli-Scheuer, Dafna Sheinwald, Noam Slonim, Liat Ein-Dor

Re$^3$Dial: Retrieve, Reorganize and Rescale Conversations for Long-Turn Open-Domain Dialogue Pre-training Jiaxin Wen, Hao Zhou, Jian Guan, Jie Zhou, Minlie Huang

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David Mortensen, Noah Smith, Yulia Tsvetkov

Characterizing Mechanisms for Factual Recall in Language Models Qinan Yu, Jack Merullo, Ellie Pavlick

MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark Dominik Macko, Robert Moro, Adaku Uchendu, Jason Lucas, Michiharu Yamashita, Matúš Pikuliak, Ivan Srba, Thai Le, Dongwon Lee, Jakub Simko, Maria Bielikova

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference? Cheng Zhang, Jianyi Cheng, Ilia Shumailov, George Constantinides, Yiren Zhao

Reducing Sequence Length by Predicting Edit Spans with Large Language Models Masahiro Kaneko, Naoaki Okazaki

Instruct and Extract: Instruction Tuning for On-Demand Information Extraction Yizhu Jiao, Ming Zhong, Sha Li, Ruining Zhao, Siru Ouyang, Heng Ji, Jiawei Han

Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models Xiaolei Wang, Xinyu Tang, Xin Zhao, Jingyuan Wang, Ji-Rong Wen

ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness Archiki Prasad, Swarnadeep Saha, Xiang Zhou, Mohit Bansal

Expand, Highlight, Generate: RL-driven Document Generation for Passage Reranking Arian Askari, Mohammad Aliannejadi, Chuan Meng, Evangelos Kanoulas, Suzan Verberne

Make Every Example Count: On the Stability and Utility of Self-Influence for Learning from Noisy NLP Datasets Irina Bejan, Artem Sokolov, Katja Filippova

Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews Hye Yun, Iain Marshall, Thomas Trikalinos, Byron Wallace

PromptST: Abstract Prompt Learning for End-to-End Speech Translation Tengfei Yu, Liang Ding, Xuebo Liu, Kehai Chen, Meishan Zhang, Dacheng Tao, Min Zhang

Text Rendering Strategies for Pixel Language Models Jonas Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott

APoLLo : Unified Adapter and Prompt Learning for Vision Language Models Sanjoy Chowdhury, Sayan Nag, Dinesh Manocha

SAMRank: Unsupervised Keyphrase Extraction using Self-Attention Map in BERT and GPT-2 Byungha Kang, Youhyun Shin

Contrastive Learning for Inference in Dialogue Etsuko Ishii, Yan Xu, Bryan Wilie, Ziwei Ji, Holy Lovenia, Willy Chung, Pascale Fung

Editing Large Language Models: Problems, Methods, and Opportunities Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang

MarkQA: A large scale KBQA dataset with numerical reasoning Xiang Huang, Sitao Cheng, Yuheng Bao, Shanshan Huang, Yuzhong Qu

Comparing Biases and the Impact of Multilingual Training across Multiple Languages Sharon Levy, Neha John, Ling Liu, Yogarshi Vyas, Jie Ma, Yoshinari Fujinuma, Miguel Ballesteros, Vittorio Castelli, Dan Roth

HutCRS: Hierarchical User-Interest Tracking for Conversational Recommender System Mingjie Qian, Yongsen Zheng, Jinghui Qin, Liang Lin

Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT Xiaoshuai Song, Keqing He, Pei Wang, Guanting Dong, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu

The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining Ting-Rui Chiang, Dani Yogatama

Simple and Effective Input Reformulations for Translation Brian Yu, Hansen Lillemark, Kurt Keutzer

Pointwise Mutual Information Based Metric and Decoding Strategy for Faithful Generation in Document Grounded Dialogs Yatin Nandwani, Vineet Kumar, Dinesh Raghu, Sachindra Joshi, Luis Lastras

The ACL OCL Corpus: Advancing Open Science in Computational Linguistics Shaurya Rohatgi, Yanxia Qin, Benjamin Aw, Niranjana Unnithan, Min-Yen Kan

Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset Arthur Amalvy, Vincent Labatut, Richard Dufour

Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting Preethi Lahoti, Nicholas Blumm, Xiao Ma, Raghavendra Kotikalapudi, Sahitya Potluri, Qijun Tan, Hansa Srinivasan, Ben Packer, Ahmad Beirami, Alex Beutel, Jilin Chen

Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay Detection Xinlin Peng, Ying Zhou, Ben He, Le Sun, Yingfei Sun

Contextual Interaction for Argument Post Quality Assessment Yiran Wang, Xuanang Chen, Ben He, Le Sun

Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification Mujeen Sung, James Gung, Elman Mansimov, Nikolaos Pappas, Raphael Shu, Salvatore Romeo, Yi Zhang, Vittorio Castelli

Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, Ming Yin

GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations Muhammet Ilaslan, Chenan Song, Joya Chen, Difei Gao, Weixian Lei, Qianli Xu, Joo Lim, Mike Shou

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection Indira Sen, Dennis Assenmacher, Mattia Samory, Isabelle Augenstein, Wil Aalst, Claudia Wagner

Unraveling Feature Extraction Mechanisms in Neural Networks Xiaobing Sun, Jiaxi Li, Wei Lu

CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion Xingwei He, Yeyun Gong, A-Long Jin, Hang Zhang, Anlei Dong, Jian Jiao, Siu Yiu, Nan Duan

Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks Yimu Wang, Xiangru Jian, Bo Xue

E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation Fengyi Fu, Lei Zhang, Quan Wang, Zhendong Mao

ALDi: Quantifying the Arabic Level of Dialectness of Text Amr Keleg, Sharon Goldwater, Walid Magdy

3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao

Goal-Driven Explainable Clustering via Language Descriptions Zihan Wang, Jingbo Shang, Ruiqi Zhong

Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models Jirui Qi, Raquel Fernández, Arianna Bisazza

Learning from Mistakes via Cooperative Study Assistant for Large Language Models Danqing Wang, Lei Li

Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models Joan Nwatu, Oana Ignat, Rada Mihalcea

Conceptor-Aided Debiasing of Large Language Models Li Yifei, Lyle Ungar, João Sedoc

AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing Evaluation Suite Jonas Groschwitz, Shay Cohen, Lucia Donatelli, Meaghan Fowlie

Rethinking and Improving Multi-task Learning for End-to-end Speech Translation Yuhao Zhang, Chen Xu, Bei Li, Hao Chen, Tong Xiao, Chunliang Zhang, Jingbo Zhu

AD-NLP: A Benchmark for Anomaly Detection in Natural Language Processing Matei Bejan, Andrei Manolache, Marius Popescu

Enhancing the Ranking Context of Dense Retrieval through Reciprocal Nearest Neighbors George Zerveas, Navid Rekabsaz, Carsten Eickhoff

Cross-Lingual Cross-Target Stance Detection with Dual Knowledge Distillation Framework Ruike Zhang, Hanxuan Yang, Wenji Mao

PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs Rahul Goel, Waleed Ammar, Aditya Gupta, Siddharth Vashishtha, Motoki Sano, Faiz Surani, Max Chang, HyunJeong Choe, David Greene, Chuan He, Rattima Nitisaroj, Anna Trukhina, Shachi Paul, Pararth Shah, Rushin Shah, Zhou Yu

An Iteratively Parallel Generation Method with the Pre-Filling Strategy for Document-level Event Extraction Guanhua Huang, Runxin Xu, Ying Zeng, Jiaze Chen, Zhouwang Yang, Weinan E

CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations Myra Cheng, Tiziano Piccardi, Diyi Yang

Reduce Human Labor On Evaluating Conversational Information Retrieval System: A Human-Machine Collaboration Approach Chen Huang, Peixin Qin, Wenqiang Lei, Jiancheng Lv

BERTie Bott’s Every Flavor Labels: A Tasty Introduction to Semantic Role Labeling for Galician Micaella Bruton, Meriem Beloucif

Program Translation via Code Distillation Yufan Huang, Mengnan Qi, Yongqiang Yao, Maoquan Wang, Bin Gu, Colin Clement, Neel Sundaresan

FaMeSumm: Investigating and Improving Faithfulness of Medical Summarization Nan Zhang, Yusen Zhang, Wu Guo, Prasenjit Mitra, Rui Zhang

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West

Systematic word meta-sense extension Lei Yu

Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory Ziang Xiao, Susu Zhang, Vivian Lai, Q. Vera Liao

Revisiting the Knowledge Injection Frameworks Peng Fu, Yiming Zhang, Haobo Wang, Weikang Qiu, Junbo Zhao

We Are What We Repeatedly Do: Inducing and Deploying Habitual Schemas in Persona-Based Responses Benjamin Kane, Lenhart Schubert

Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model Qi Jia, Siyu Ren, Yizhu Liu, Kenny Zhu

TaskWeb: Selecting Better Source Tasks for Multi-task NLP Joongwon Kim, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi

Improving Bias Mitigation through Bias Experts in Natural Language Understanding Eojin Jeon, Mingyu Lee, Juhyeong Park, Yeachan Kim, Wing-Lam Mok, SangKeun Lee

Semi-supervised multimodal coreference resolution in image narrations Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models Yi Zhou, Jose Camacho-Collados, Danushka Bollegala

Argument-based Detection and Classification of Fallacies in Political Debates Pierpaolo Goffredo, Mariana Espinoza, Serena Villata, Elena Cabrio

SpEL: Structured Prediction for Entity Linking Hassan Shavarani, Anoop Sarkar

Architectural Sweet Spots for Modeling Human Label Variation by the Example of Argument Quality: It’s Best to Relate Perspectives! Philipp Heinisch, Matthias Orlikowski, Julia Romberg, Philipp Cimiano

Explicit Planning Helps Language Models in Logical Reasoning Hongyu Zhao, Kangrui Wang, Mo Yu, Hongyuan Mei

clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents Kranti Chalamalasetti, Jana Götze, Sherzod Hakimov, Brielen Madureira, Philipp Sadler, David Schlangen

Explaining with Contrastive Phrasal Highlighting: A Case Study in Assisting Humans to Detect Translation Differences Eleftheria Briakou, Navita Goyal, Marine Carpuat

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Sultan, Christopher Potts

TATA: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings Hans Hanley, Zakir Durumeric

Zero-shot Sharpness-Aware Quantization for Pre-trained Language Models Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

Deciphering Stereotypes in Pre-Trained Language Models Weicheng Ma, Henry Scheible, Brian Wang, Goutham Veeramachaneni, Pratim Chowdhary, Alan Sun, Andrew Koulogeorge, Lili Wang, Diyi Yang, Soroush Vosoughi

An “Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives” Young Cho, Sunny Rai, Lyle Ungar, João Sedoc, Sharath Guntuku

Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark Minje Choi, Jiaxin Pei, Sagar Kumar, Chang Shu, David Jurgens

Interventional Rationalization Linan Yue, Qi Liu, Li Wang, Yanqing An, Yichao Du, Zhenya Huang

Don’t Take This Out of Context!: On the Need for Contextual Models and Evaluations for Stylistic Rewriting Akhila Yerukola, Xuhui Zhou, Elizabeth Clark, Maarten Sap

Axiomatic Preference Modeling for Longform Question Answering Corby Rosset, Guoqing Zheng, Victor Dibia, Ahmed Awadallah, Paul Bennett

Countering Misinformation via Emotional Response Generation Daniel Russo, Shane Kaszefski-Yaschuk, Jacopo Staiano, Marco Guerini

Seq2seq is All You Need for Coreference Resolution Wenzheng Zhang, Sam Wiseman, Karl Stratos

StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding Cheng Jiayang, Lin Qiu, Tsz Chan, Tianqing Fang, Weiqi Wang, Chunkit Chan, Dongyu Ru, Qipeng Guo, Hongming Zhang, Yangqiu Song, Yue Zhang, Zheng Zhang

Beyond Detection: A Defend-and-Summarize Strategy for Robust and Interpretable Rumor Analysis on Social Media Yi-Ting Chang, Yun-Zhu Song, Yi-Syuan Chen, Hong-Han Shuai

Crystal: Introspective Reasoners Reinforced with Self-Feedback Jiacheng Liu, Ramakanth Pasunuru, Hannaneh Hajishirzi, Yejin Choi, Asli Celikyilmaz

DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation Yongxin Zhu, Zhujin Gao, Xinyuan Zhou, Ye Zhongyi, Linli Xu

BioFEG: Generate Latent Features for Biomedical Entity Linking Xuhui Sui, Ying Zhang, Xiangrui Cai, Kehui Song, Baohang Zhou, Xiaojie Yuan, Wensheng Zhang

TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models Jing Xiong, Jianhao Shen, Ye Yuan, Haiming Wang, Yichun Yin, Zhengying Liu, Lin Li, Zhijiang Guo, Qingxing Cao, Yinya Huang, Chuanyang Zheng, Xiaodan Liang, Ming Zhang, Qun Liu

Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors Nikita Mehandru, Sweta Agrawal, Yimin Xiao, Ge Gao, Elaine Khoong, Marine Carpuat, Niloufar Salehi

Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive Tharindu Weerasooriya, Sujan Dutta, Tharindu Ranasinghe, Marcos Zampieri, Christopher Homan, Ashiqur KhudaBukhsh

Generating Summaries with Controllable Readability Levels Leonardo Ribeiro, Mohit Bansal, Markus Dreyer

CESAR: Automatic Induction of Compositional Instructions for Multi-turn Dialogs Taha Aksu, Devamanyu Hazarika, Shikib Mehri, Seokhwan Kim, Dilek Hakkani-Tur, Yang Liu, Mahdi Namazifar

ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos Te-Lin Wu, Zi-Yi Dou, Qingyuan Hu, Yu Hou, Nischal Chandra, Marjorie Freedman, Ralph Weischedel, Nanyun Peng

From Parse-Execute to Parse-Execute-Refine: Improving Semantic Parser for Complex Question Answering over Knowledge Base Wangzhen Guo, Linyin Luo, Hanjiang Lai, Jian Yin

CORE: A Few-Shot Company Relation Classification Dataset for Robust Domain Adaptation. Philipp Borchert, Jochen De Weerdt, Kristof Coussement, Arno De Caigny, Marie-Francine Moens

Models See Hallucinations: Evaluating the Factuality in Video Captioning Hui Liu, Xiaojun Wan

Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Marek Kubis, Paweł Skórzewski, Marcin Sowańnski, Tomasz Ziętkiewicz

Can Language Models Understand Physical Concepts? Lei Li, Jingjing Xu, Qingxiu Dong, Ce Zheng, Xu Sun, Lingpeng Kong, Qi Liu

SPT: Learning to Selectively Insert Prompts for Better Prompt Tuning Wei Zhu, Ming Tan

Once Upon a ${\it Time}$ in ${\it Graph}$: Relative-Time Pretraining for Complex Temporal Reasoning Sen Yang, Xin Li, Lidong Bing, Wai Lam

Expository Text Generation: Imitate, Retrieve, Paraphrase Nishant Balepur, Jie Huang, Kevin Chang

Enhancing Textbooks with Visuals from the Web for Improved Learning Janvijay Singh, Vilém Zouhar, Mrinmaya Sachan

Continual Event Extraction with Semantic Confusion Rectification Zitao Wang, Xinyi Wang, Wei Hu

An Empirical Study of Translation Hypothesis Ensembling with Large Language Models António Farinhas, José de Souza, Andre Martins

Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models Geewook Kim, Hodong Lee, Daehee Kim, Haeji Jung, Sanghee Park, Yoonsik Kim, Sangdoo Yun, Taeho Kil, Bado Lee, Seunghyun Park

Continual Learning for Multilingual Neural Machine Translation via Dual Importance-based Model Division Junpeng Liu, Kaiyu Huang, Hao Yu, Jiuyi Li, Jinsong Su, Degen Huang

SimCSE++: Improving Contrastive Learning for Sentence Embeddings from Two Perspectives Jiahao Xu, Wei Shao, Lihui Chen, Lemao Liu

Unlearn What You Want to Forget: Efficient Unlearning for LLMs Jiaao Chen, Diyi Yang

Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration Yiquan Wu, Siying Zhou, Yifei Liu, Weiming Lu, Xiaozhong Liu, Yating Zhang, Changlong Sun, Fei Wu, Kun Kuang

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi

When Language Models Fall in Love: Animacy Processing in Transformer Language Models Michael Hanna, Yonatan Belinkov, Sandro Pezzelle

Improving Unsupervised Relation Extraction by Augmenting Diverse Sentence Pairs Qing Wang, Kang Zhou, Qiao Qiao, Yuepei Li, Qi Li

Paraphrase Types for Generation and Detection Jan Wahle, Bela Gipp, Terry Ruas

Target-to-Source Augmentation for Aspect Sentiment Triplet Extraction Yice Zhang, Yifan Yang, Meng Li, Bin Liang, Shiwei Chen, Ruifeng Xu

PAC-tuning: Fine-tuning Pre-trained Language Models with PAC-driven Perturbed Gradient Descent Guangliang Liu, Zhiyu Xue, Xitong Zhang, Kristen Johnson, Rongrong Wang

Emergence of Abstract State Representations in Embodied Sequence Modeling Tian Yun, Zilai Zeng, Kunal Handa, Ashish Thapliyal, Bo Pang, Ellie Pavlick, Chen Sun

Accelerating Toeplitz Neural Network with Constant-time Inference Complexity Zhen Qin, Yiran Zhong

Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson

StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large Language Models Sullam Jeoung, Yubin Ge, Jana Diesner

Impressions: Visual Semiotics and Aesthetic Impact Understanding Julia Kruk, Caleb Ziems, Diyi Yang

DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery Wenbin An, Feng Tian, Wenkai Shi, Yan Chen, Qinghua Zheng, QianYing Wang, Ping Chen

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models Shuai Zhao, Jinming Wen, Anh Luu, Junbo Zhao, Jie Fu

UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Weiwei Deng, Qi Zhang

KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning Xiao Yu, Qingyang Wu, Kun Qian, Zhou Yu

Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU Fajri Koto, Nurul Aisyah, Haonan Li, Timothy Baldwin

Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam

Bridging Information-Theoretic and Geometric Compression in Language Models Emily Cheng, Corentin Kervadec, Marco Baroni

Pre-training Language Models for Comparative Reasoning Mengxia Yu, Zhihan Zhang, Wenhao Yu, Meng Jiang

Improved Pseudo Data for Machine Translation Quality Estimation with Constrained Beam Search Xiang Geng, Yu Zhang, Zhejian Lai, Shuaijie She, Wei Zou, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang

Text Embeddings Reveal (Almost) As Much As Text John Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander Rush

AutoTrial: Prompting Language Models for Clinical Trial Design Zifeng Wang, Cao Xiao, Jimeng Sun

Enhancing Generative Retrieval with Reinforcement Learning from Relevance Feedback Yujia Zhou, Zhicheng Dou, Ji-Rong Wen

Multi-Source Probing for Open-Domain Conversational Understanding Yuanxi Li, Hao Zhou, Jie Zhou, Minlie Huang

Hallucination Mitigation in Natural Language Generation from Large-Scale Open-Domain Knowledge Graphs Xiao Shi, Zhengyuan Zhu, Zeyu Zhang, Chengkai Li

Multi-Source Multi-Type Knowledge Exploration and Exploitation for Dialogue Generation Xuanfan Ni, Hongliang Dai, Zhaochun Ren, Piji Li

Focus Your Attention (with Adaptive IIR Filters) Shahar Lutati, Itamar Zimerman, Lior Wolf

Identifying Statements Crucial for Awareness of Interpretive Nonsense to Prevent Communication Breakdowns Tomoyuki Maekawa, Michita Imai

Multilingual Large Language Models Are Not (Yet) Code-Switchers Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Genta Winata, Alham Aji

Reinforced Target-driven Conversational Promotion Huy Dao, Lizi Liao, Dung Le, Yuxiang Nie

Identification of Multimodal Stance Towards Frames of Communication Maxwell Weinzierl, Sanda Harabagiu

Unsupervised Sounding Pixel Learning Yining Zhang, Yanli Ji, Yang Yang

LM vs LM: Detecting Factual Errors via Cross Examination Roi Cohen, May Hamri, Mor Geva, Amir Globerson

Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding Bram van Dijk, Tom Kouwenhoven, Marco Spruit, Max Johannes van Duijn

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han

MeaeQ: Mount Model Extraction Attacks with Efficient Queries Chengwei Dai, Minxuan Lv, Kun Li, Wei Zhou

The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning Seungone Kim, Se Joo, Doyoung Kim, Joel Jang, Seonghyeon Ye, Jamin Shin, Minjoon Seo

Explaining Interactions Between Text Spans Sagnik Choudhury, Pepa Atanasova, Isabelle Augenstein

Predictive Chemistry Augmented with Text Retrieval Yujie Qian, Zhening Li, Zhengkai Tu, Connor Coley, Regina Barzilay

System Combination via Quality Estimation for Grammatical Error Correction Muhammad Qorib, Hwee Ng

Rethinking Negative Pairs in Code Search Haochen Li, Xin Zhou, Anh Luu, Chunyan Miao

Question Answering as Programming for Solving Time-Sensitive Questions Xinyu Zhu, Cheng Yang, Bei Chen, Siheng Li, Jian-Guang Lou, Yujiu Yang

Joint Geometrical and Statistical Domain Adaptation for Cross-domain Code Vulnerability Detection Qianjin Du, Shiji Zhou, Xiaohui Kuang, Gang Zhao, Jidong Zhai

Controlling Pre-trained Language Models for Grade-Specific Text Simplification Sweta Agrawal, Marine Carpuat

CLEVR-Implicit: A Diagnostic Dataset for Implicit Reasoning in Referring Expression Comprehension Jingwei Zhang, Xin Wu, Yi Cai

``Are Your Explanations Reliable?’’ Investigating the Stability of LIME in Explaining Text Classifiers by Marrying XAI and Adversarial Attack Christopher Burger, Lingwei Chen, Thai Le

CQE: A Comprehensive Quantity Extractor Satya Almasian, Vivian Kazakova, Philipp Göldner, Michael Gertz

A Unified View of Evaluation Metrics for Structured Prediction Yunmo Chen, William Gantt, Tongfei Chen, Aaron White, Benjamin Van Durme

A Deeper (Autoregressive) Approach to Non-Convergent Discourse Parsing Oren Tsur, Yoav Tulpan

We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields Jan Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif Mohammad

Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration Daniel Deutsch, George Foster, Markus Freitag

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Yejin Choi

Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs Zhiwei Hu, Victor Basulto, Zhiliang Xiang, Ru Li, Jeff Pan

MailEx: Email Event and Argument Extraction Saurabh Srivastava, Gaurav Singh, Shou Matsumoto, Ali Raz, Paulo Costa, Joshua Poore, Ziyu Yao

Optimized Tokenization for Transcribed Error Correction Tomer Wullach, Shlomo Chazan

Beware of Model Collapse! Fast and Stable Test-time Adaptation for Robust Question Answering Yi Su, Yixin Ji, Juntao Li, Hai Ye, Min Zhang

Generative Adversarial Training with Perturbed Token Detection for Model Robustness Jiahao Zhao, Wenji Mao

Multi-Task Knowledge Distillation with Embedding Constraints for Scholarly Keyphrase Boundary Classification Seo Park, Cornelia Caragea

Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation Anastasia Kritharoula, Maria Lymperaiou, Giorgos Stamou

Be Selfish, But Wisely: Investigating the Impact of Agent Personality in Mixed-Motive Human-Agent Interactions Kushal Chawla, Ian Wu, Yu Rong, Gale Lucas, Jonathan Gratch

Doolittle: Benchmarks and Corpora for Academic Writing Formalization Shizhe Diao, Yongyu Lei, Liangming Pan, Tianqing Fang, Wangchunshu Zhou, Sedrick Keh, Min-Yen Kan, Tong Zhang

Reconstruct Before Summarize: An Efficient Two-Step Framework for Condensing and Summarizing Meeting Transcripts Haochen Tan, Han Wu, Wei Shao, Xinyun Zhang, Mingjie Zhan, Zhaohui Hou, Ding Liang, Linqi Song

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa

Character-LLM: A Trainable Agent for Role-Playing Yunfan Shao, Linyang Li, Junqi Dai, Xipeng Qiu

Natural Language Decompositions of Implicit Content Enable Better Text Representations Alexander Hoyle, Rupak Sarkar, Pranav Goel, Philip Resnik

A Scalable Framework for Table of Contents Extraction from Complex ESG Annual Reports Xinyu Wang, Lin Gui, Yulan He

Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue Generation Zhiling Zhang, Mengyue Wu, Kenny Zhu

How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning Rochelle Choenni, Dan Garrette, Ekaterina Shutova

COFFEE: Counterfactual Fairness for Personalized Text Generation in Explainable Recommendation Nan Wang, Qifan Wang, Yi-Chia Wang, Maziar Sanjabi, Jingzhou Liu, Hamed Firooz, Hongning Wang, Shaoliang Nie

NameGuess: Column Name Expansion for Tabular Data Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Shen Wang, Huzefa Rangwala, George Karypis

BLESS: Benchmarking Large Language Models on Sentence Simplification Tannon Kew, Alison Chi, Laura Vásquez-Rodríguez, Sweta Agrawal, Dennis Aumiller, Fernando Alva-Manchego, Matthew Shardlow

To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing Sireesh Gururaja, Amanda Bertsch, Clara Na, David Widder, Emma Strubell

PALS: Personalized Active Learning for Subjective Tasks in NLP Kamil Kanclerz, Konrad Karanowski, Julita Bielaniewicz, Marcin Gruza, Piotr Miłkowski, Jan Kocon, Przemyslaw Kazienko

ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation Yangyi Chen, Xingyao Wang, Manling Li, Derek Hoiem, Heng Ji

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu

EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data Augmentation for Multi-hop Fact Verification Yingjie Zhu, Jiasheng Si, Yibo Zhao, Haiyang Zhu, Deyu Zhou, Yulan He

An Exploration of Left-Corner Transformations Andreas Opedal, Eleftheria Tsipidi, Tiago Pimentel, Ryan Cotterell, Tim Vieira

Characterizing and Verifying Scientific Claims: Qualitative Causal Structure is All You Need Jinxuan Wu, Wenhan Chao, Xian Zhou, Zhunchen Luo

FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models Konstantin Dobler, Gerard de Melo

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games Ruoyao Wang, Graham Todd, Xingdi Yuan, Ziang Xiao, Marc-Alexandre Côté, Peter Jansen

Skill-Based Few-Shot Selection for In-Context Learning Shengnan An, Bo Zhou, Zeqi Lin, Qiang Fu, Bei Chen, Nanning Zheng, Weizhu Chen, Jian-Guang Lou

MaNtLE: Model-agnostic Natural Language Explainer Rakesh Menon, Kerem Zaman, Shashank Srivastava

PTP: Boosting Stability and Performance of Prompt Tuning with Perturbation-Based Regularizer Lichang Chen, Jiuhai Chen, Heng Huang, Minhao Cheng

Ling-CL: Understanding NLP Models through Linguistic Curricula Mohamed Elgaar, Hadi Amiri

Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance Shaomu Tan, Christof Monz

SEER : A Knapsack approach to Exemplar Selection for In-Context HybridQA Jonathan Tonglet, Manon Reusens, Philipp Borchert, Bart Baesens

Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations Jihyoung Jang, Minseong Boo, Hyounghun Kim

DueT: Image-Text Contrastive Transfer Learning with Dual-adapter Tuning Taku Hasegawa, Kyosuke Nishida, Koki Maeda, Kuniko Saito

Towards a Unified Conversational Recommendation System: Multi-task Learning via Contextualized Knowledge Distillation Yeongseo Jung, Eunseo Jung, Lei Chen

MoPe: Model Perturbation based Privacy Attacks on Language Models Marvin Li, Jason Wang, Jeffrey Wang, Seth Neel

q2d: Turning Questions into Dialogs to Teach Models How to Search Yonatan Bitton, Shlomi Cohen-Ganor, Ido Hakimi, Yoad Lewenberg, Roee Aharoni, Enav Weinreb

Aligning Large Language Models through Synthetic Feedback Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak, Kang Yoo, Minjoon Seo

You Told Me That Joke Twice: A Systematic Investigation of Transferability and Robustness of Humor Detection Models Alexander Baranov, Vladimir Kniazhevsky, Pavel Braslavski

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction Chong Zhang, Ya Guo, Yi Tu, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang, Tao Gui

Empower Nested Boolean Logic via Self-Supervised Curriculum Learning Hongqiu Wu, Linfeng Liu, Hai Zhao, Min Zhang

The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis Pranav Venkit, Mukund Srinath, Sanjana Gautam, Saranya Venkatraman, Vipul Gupta, Rebecca Passonneau, Shomir Wilson

DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules Yanchen Liu, William Held, Diyi Yang

Unifying Discrete and Continuous Representations for Unsupervised Paraphrase Generation Mingfeng Xue, Dayiheng Liu, Wenqiang Lei, Jie Fu, Jian Lan, Mei Li, Baosong Yang, Jun Xie, Yidan Zhang, Dezhong Peng, Jiancheng Lv

The Benefits of Label-Description Training for Zero-Shot Text Classification Lingyu Gao, Debanjan Ghosh, Kevin Gimpel

Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer Elizabeth Salesky, Neha Verma, Philipp Koehn, Matt Post

Finding Authentic Counterhate Arguments: A Case Study with Public Figures Abdullah Albanyan, Ahmed Hassan, Eduardo Blanco

Can We Edit Multimodal Large Language Models? Siyuan Cheng, Bozhong Tian, Qingbin Liu, Xi Chen, Yongheng Wang, Huajun Chen, Ningyu Zhang

Exploring Discourse Structure in Document-level Machine Translation Xinyu Hu, Xiaojun Wan

ClusterLLM: Large Language Models as a Guide for Text Clustering Yuwei Zhang, Zihan Wang, Jingbo Shang

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code Shuyan Zhou, Uri Alon, Sumit Agarwal, Graham Neubig

Learn and Consolidate: Continual Adaptation for Zero-Shot and Multilingual Neural Machine Translation Kaiyu Huang, Peng Li, Junpeng Liu, Maosong Sun, Yang Liu

e-THERAPIST: I suggest you to cultivate a mindset of positivity and nurture uplifting thoughts Kshitij Mishra, Priyanshu Priya, Manisha Burja, Asif Ekbal

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages Shamsuddeen Muhammad, Idris Abdulmumin, Abinew Ayele, Nedjma Ousidhoum, David Adelani, Seid Yimam, Ibrahim Ahmad, Meriem Beloucif, Saif Mohammad, Sebastian Ruder, Oumaima Hourrane, Alipio Jorge, Pavel Brazdil, Felermino Ali, Davis David, Salomey Osei, Bello Shehu-Bello, Falalu Lawan, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Wendimu Messelle, Hailu Balcha, Sisay Chala, Hagos Gebremichael, Bernard Opoku, Stephen Arthur

Quantifying Character Similarity with Vision Transformers Xinmei Yang, Abhishek Arora, Shao-Yu Jheng, Melissa Dell

Syllogistic Reasoning for Legal Judgment Analysis Wentao Deng, Jiahuan Pei, Keyi Kong, Zhe Chen, Furu Wei, Yujun Li, Zhaochun Ren, Zhumin Chen, Pengjie Ren

Improving Transformer-based Program Repair Model through False Behavior Diagnosis Youngkyoung Kim, Misoo Kim, Eunseok Lee

KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection Sehyun Choi, Tianqing Fang, Zhaowei Wang, Yangqiu Song

CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL Mayank Kothyari, Dhruva Dhingra, Sunita Sarawagi, Soumen Chakrabarti

Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs Roei Herzig, Alon Mendelson, Leonid Karlinsky, Assaf Arbelle, Rogerio Feris, Trevor Darrell, Amir Globerson

TLM: Token-Level Masking for Transformers Yangjun Wu, Kebin Fang, Dongxiang Zhang, Han Wang, Hao Zhang, Gang Chen

Addressing NER Annotation Noises with Uncertainty-Guided Tree-Structured CRFs Jian Liu, Weichang Liu, Yufeng Chen, Jinan Xu, Zhe Zhao

Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus Andrea Piergentili, Beatrice Savoldi, Dennis Fucci, Matteo Negri, Luisa Bentivogli

Multilingual Holistic Bias: Extending Descriptors and Patterns to Unveil Demographic Biases in Languages at Scale Marta Costa-jussà, Pierre Andrews, Eric Smith, Prangthip Hansanti, Christophe Ropers, Elahe Kalbassi, Cynthia Gao, Daniel Licht, Carleigh Wood

GlobalBench: A Benchmark for Global Progress in Natural Language Processing Yueqi Song, Simran Khanuja, Pengfei Liu, Fahim Faisal, Alissa Ostapenko, Genta Winata, Alham Aji, Samuel Cahyawijaya, Yulia Tsvetkov, Antonios Anastasopoulos, Graham Neubig

DetGPT: Detect What You Need via Reasoning Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang

Language Models with Rationality Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schuetze, Peter Clark

Self-Improvement of Non-autoregressive Model via Sequence-Level Distillation Yusheng Liao, Shuyang Jiang, Yiqi Li, Yu Wang, Yanfeng Wang

Mitigating Temporal Misalignment by Discarding Outdated Facts Michael Zhang, Eunsol Choi

Open-world Semi-supervised Generalized Relation Discovery Aligned in a Real-world Setting William Hogan, Jiacheng Li, Jingbo Shang

IEKG: A Commonsense Knowledge Graph for Idiomatic Expressions Ziheng Zeng, Kellen Cheng, Srihari Nanniyur, Jianing Zhou, Suma Bhat

Bias Neutralization in Non-Parallel Texts: A Cyclic Approach with Auxiliary Guidance Karthic Madanagopal, James Caverlee

Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation Jason Lucas, Adaku Uchendu, Michiharu Yamashita, Jooyoung Lee, Shaurya Rohatgi, Dongwon Lee

BRAINTEASER: Lateral Thinking Puzzles for Large Language Models Yifan Jiang, Filip Ilievski, Kaixin Ma, Zhivar Sourati

When are Lemons Purple? The Concept Association Bias of Vision-Language Models Yingtian Tang, Yutaro Yamada, Yoyo Zhang, Ilker Yildirim

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability Mario Giulianelli, Joris Baan, Wilker Aziz, Raquel Fernández, Barbara Plank

Text Representation Distillation via Information Bottleneck Principle Yanzhao Zhang, Dingkun Long, Zehan Li, Pengjun Xie

Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation Zhenwen Liang, Wenhao Yu, Tanmay Rajpurohit, Peter Clark, Xiangliang Zhang, Ashwin Kalyan

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions Hyunwoo Kim, Melanie Sclar, Xuhui Zhou, Ronan Bras, Gunhee Kim, Yejin Choi, Maarten Sap

Exploring the Boundaries of GPT-4 in Radiology Qianchu Liu, Stephanie Hyland, Shruthi Bannur, Kenza Bouzid, Daniel Castro, Maria Wetscherek, Robert Tinn, Harshita Sharma, Fernando Pérez-García, Anton Schwaighofer, Pranav Rajpurkar, Sameer Khanna, Hoifung Poon, Naoto Usuyama, Anja Thieme, Aditya Nori, Matthew Lungren, Ozan Oktay, Javier Alvarez-Valle

A Frustratingly Easy Post-Training Quantization Scheme for LLMs Yongkweon Jeon, Chungman Lee, Kyungphil Park, Ho-young Kim

A Comprehensive Evaluation of Biomedical Entity Linking Models David Kartchner, Jennifer Deng, Shubham Lohiya, Tejasri Kopparthi, Prasanth Bathala, Daniel Domingo-Fernández, Cassie Mitchell

Exploring Jiu-Jitsu Argumentation for Writing Peer Review Rebuttals Sukannya Purkayastha, Anne Lauscher, Iryna Gurevych

LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages Milind Agarwal, Md Mahfuz Ibn Alam, Antonios Anastasopoulos

FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models Ruixuan Xiao, Yiwen Dong, Junbo Zhao, Runze Wu, Minmin Lin, Gang Chen, Haobo Wang

API-Assisted Code Generation for Question Answering on Varied Table Structures Yihan Cao, Shuyi Chen, Ryan Liu, Zhiruo Wang, Daniel Fried

Data Factors for Better Compositional Generalization Xiang Zhou, Yichen Jiang, Mohit Bansal

ChatEdit: Towards Multi-turn Interactive Facial Image Editing via Dialogue Xing Cui, Zekun Li, Pei Li, Yibo Hu, Hailin Shi, Chunshui Cao, Zhaofeng He

Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations James Huang, Wenlin Yao, Kaiqiang Song, Hongming Zhang, Muhao Chen, Dong Yu

Hi-ArG: Exploring the Integration of Hierarchical Argumentation Graphs in Language Pretraining Jingcong Liang, Rong Ye, Meng Han, Qi Zhang, Ruofei Lai, Xinyu Zhang, Zhao Cao, Xuanjing Huang, Zhongyu Wei

Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization Zihao Fu, Yixuan Su, Zaiqiao Meng, Nigel Collier

GNAT: A General Narrative Alignment Tool Tanzir Pial, Steven Skiena

UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning Ahmed Masry, Parsa Kavehzadeh, Do Long, Enamul Hoque, Shafiq Joty

Distance-Based Propagation for Efficient Knowledge Graph Reasoning Harry Shomer, Yao Ma, Juanhui Li, Bo Wu, Charu Aggarwal, Jiliang Tang

What to Read in a Contract? Party-Specific Summarization of Legal Obligations, Entitlements, and Prohibitions Abhilasha Sancheti, Aparna Garimella, Balaji Srinivasan, Rachel Rudinger

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization Janghwan Lee, Minsoo Kim, Seungcheol Baek, Seok Hwang, Wonyong Sung, Jungwook Choi

CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang

Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language Understanding Caoyun Fan, Jidong Tian, Yitian Li, Wenqing Chen, Hao He, Yaohui Jin

Large Language Models are Complex Table Parsers Bowen Zhao, Changkai Ji, Yuejie Zhang, Wen He, Yingwen Wang, Qing Wang, Rui Feng, Xiaobo Zhang

R2H: Building Multimodal Navigation Helpers that Respond to Help Requests Yue Fan, Jing Gu, Kaizhi Zheng, Xin Wang

Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries Ashish Mittal, Sunita Sarawagi, Preethi Jyothi, George Saon, Gakuto Kurata

Generative Table Pre-training Empowers Models for Tabular Prediction Tianping Zhang, Shaowen Wang, Shuicheng Yan, Li Jian, Qian Liu

Learning to Describe for Predicting Zero-shot Drug-Drug Interactions Fangqi Zhu, Yongqi Zhang, Lei Chen, Bing Qin, Ruifeng Xu

Privacy Implications of Retrieval-Based Language Models Yangsibo Huang, Samyak Gupta, Zexuan Zhong, Kai Li, Danqi Chen

IMTLab: An Open-Source Platform for Building, Evaluating, and Diagnosing Interactive Machine Translation Systems Xu Huang, Zhirui Zhang, Ruize Gao, Yichao Du, Lemao Liu, Guoping Huang, Shuming Shi, Jiajun Chen, Shujian Huang

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, Zhaochun Ren

DiNeR: A Large Realistic Dataset for Evaluating Compositional Generalization Chengang Hu, Xiao Liu, Yansong Feng

Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? Yang Chen, Hexiang Hu, Yi Luan, Haitian Sun, Soravit Changpinyo, Alan Ritter, Ming-Wei Chang

EDeR: Towards Understanding Dependency Relations Between Events Ruiqi Li, Patrik Haslum, Leyang Cui

It Ain’t Over: A Multi-aspect Diverse Math Word Problem Dataset Jiwoo Kim, Youngbin Kim, Ilwoong Baek, JinYeong Bak, Jongwuk Lee

Dr ChatGPT tell me what I want to hear: How different prompts impact health answer correctness Bevan Koopman, Guido Zuccon

$k$NN-LM Does Not Improve Open-ended Text Generation Shufan Wang, Yixiao Song, Andrew Drozdov, Aparna Garimella, Varun Manjunatha, Mohit Iyyer

Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model Zeyu Liu, Tim Dettmers, Xi Victoria Lin, Veselin Stoyanov, Xian Li

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Jose, Alexander Toshev, Yantao Zheng, Jonathon Shlens, Ruoming Pang, Yinfei Yang

Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting Emmy Liu, Aditi Chaudhary, Graham Neubig

A linear time approximation of Wasserstein distance with word embedding selection Sho Otao, Makoto Yamada

Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication Zhangyue Yin, Qiushi Sun, Cheng Chang, Qipeng Guo, Junqi Dai, Xuanjing Huang, Xipeng Qiu

Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction Cam Van Thi Nguyen, Tuan Mai, Son The, Dang Kieu, Duc-Trong Le

Connecting degree and polarity: An artificial language learning study Lisa Bylinina, Alexey Tikhonov, Ekaterina Garmash

Prompting with Pseudo-Code Instructions Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy, Danish Contractor, Srikanth Tamilselvam

CRAB: Assessing the Strength of Causal Relationships Between Real-world Events Angelika Romanou, Syrielle Montariol, Debjit Paul, Leo Laugier, Karl Aberer, Antoine Bosselut

NORMSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly Yi Fung, Tuhin Chakrabarty, Hao Guo, Owen Rambow, Smaranda Muresan, Heng Ji

A State-Vector Framework for Dataset Effects Esmat Sahak, Zining Zhu, Frank Rudzicz

Challenges in Context-Aware Neural Machine Translation Linghao Jin, Jacqueline He, Jonathan May, Xuezhe Ma

Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond Siyang Liu, Naihao Deng, Sahand Sabour, Yilin Jia, Minlie Huang, Rada Mihalcea

FACTIFY3M: A benchmark for multimodal fact verification with explainability through 5W Question-Answering Megha Chakraborty, Khushbu Pahwa, Anku Rani, Shreyas Chatterjee, Dwip Dalal, Harshit Dave, Ritvik G, Preethi Gurumurthy, Adarsh Mahor, Samahriti Mukherjee, Aditya Pakala, Ishan Paul, Janvita Reddy, Arghya Sarkar, Kinjal Sensharma, Aman Chadha, Amit Sheth, Amitava Das

Building Multi-domain Dialog State Trackers from Single-domain Dialogs Qi Zhu, Zheng Zhang, Xiaoyan Zhu, Minlie Huang

Specialist or Generalist? Instruction Tuning for Specific NLP Tasks Chufan Shi, Yixuan Su, Cheng Yang, Yujiu Yang, Deng Cai

Making Large Language Models Better Data Creators Dong-Ho Lee, Jay Pujara, Mohit Sewak, Ryen White, Sujay Jauhar

Hallucination Detection for Generative Large Language Models by Bayesian Sequential Estimation Xiaohua Wang, Yuliang Yan, Longtao Huang, Xiaoqing Zheng, Xuanjing Huang

Guideline Learning for In-Context Information Extraction Chaoxu Pang, Yixuan Cao, Qiang Ding, Ping Luo

Open Information Extraction via Chunks Kuicai Dong, Aixin Sun, Jung-jae Kim, Xiaoli Li

Rethinking Word-Level Auto-Completion in Computer-Aided Translation Xingyu Chen, Lemao Liu, Guoping Huang, Zhirui Zhang, Mingming Yang, Shuming Shi, Rui Wang

Automatic Transcription of Handwritten Old Occitan Language Esteban Arias, Vallari Pai, Matthias Schöffel, Christian Heumann, Matthias Aßenmacher

CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities Sheng Xu, Peifeng Li, Qiaoming Zhu

Anaphor Assisted Document-Level Relation Extraction Chonggang Lu, Richong Zhang, Kai Sun, Jaein Kim, Cunwang Zhang, Yongyi Mao

All Things Considered: Detecting Partisan Events from News Media with Cross-Article Comparison Yujian Liu, Xinliang Zhang, Kaijian Zou, Ruihong Huang, Nicholas Beauchamp, Lu Wang

BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification Mithun Das, Animesh Mukherjee

ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts Lena Bolliger, David Reich, Patrick Haller, Deborah Jakobi, Paul Prasse, Lena Jäger

From Values to Opinions: Predicting Human Behaviors and Stances Using Value-Injected Large Language Models Dongjun Kang, Joonsuk Park, Yohan Jo, JinYeong Bak

Analyzing Film Adaptation through Narrative Alignment Tanzir Pial, Shahreen Aunti, Charuta Pethe, Allen Kim, Steven Skiena

Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer Ruize Gao, Zhirui Zhang, Yichao Du, Lemao Liu, Rui Wang

Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment Ryo Nagata, Hiroya Takamura, Naoki Otani, Yoshifumi Kawasaki

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

A Training-Free Debiasing Framework with Counterfactual Reasoning for Conversational Emotion Detection Geng Tu, Ran Jing, Bin Liang, Min Yang, Kam-Fai Wong, Ruifeng Xu

Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations Wei-Lin Chen, Cheng-Kuang Wu, Yun-Nung Chen, Hsin-Hsi Chen

Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding Taolin Zhang, Ruyao Xu, Chengyu Wang, Zhongjie Duan, Cen Chen, Minghui Qiu, Dawei Cheng, Xiaofeng He, Weining Qian

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions Zexuan Zhong, Zhengxuan Wu, Christopher Manning, Christopher Potts, Danqi Chen

Stance Detection on Social Media with Background Knowledge Ang Li, Bin Liang, Jingqian Zhao, Bowen Zhang, Min Yang, Ruifeng Xu

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning Hao Wang, Xiahua Chen, Rui Wang, Chenhui Chu

Leap-of-Thought: Accelerating Transformers via Dynamic Token Routing Yeachan Kim, Junho Kim, Jun-Hyung Park, Mingyu Lee, SangKeun Lee

Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement Learning Swaroop Nath, Pushpak Bhattacharyya, Harshad Khadilkar

Fair Text Classification with Wasserstein Independence Thibaud Leteno, Antoine Gourru, Charlotte Laclau, Rémi Emonet, Christophe Gravier

TacoPrompt: A Collaborative Multi-Task Prompt Learning Method for Self-Supervised Taxonomy Completion Hongyuan Xu, Ciyi Liu, Yuhang Niu, Yunong Chen, Xiangrui Cai, Yanlong Wen, Xiaojie Yuan

Global Voices, Local Biases: Socio-Cultural Prejudices across Languages Anjishnu Mukherjee, Chahat Raj, Ziwei Zhu, Antonios Anastasopoulos

Graph vs. Sequence: An Empirical Study on Knowledge Forms for Knowledge-Grounded Dialogue Yizhe Yang, Heyan Huang, Yuhang Liu, Yang Gao

NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models Yongchao Chen, Rujul Gandhi, Yang Zhang, Chuchu Fan

Reformulating NLP tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia. Dimitris Gkoumas, Matthew Purver, Maria Liakata

Elevating Code-mixed Text Handling through Auditory Information of Words Mamta Mamta, Zishan Ahmad, Asif Ekbal

Predict and Use: Harnessing Predicted Gaze to Improve Multimodal Sarcasm Detection Divyank Tiwari, Diptesh Kanojia, Anupama Ray, Apoorva Nunna, Pushpak Bhattacharyya

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer Huadai Liu, Rongjie Huang, Xuan Lin, Wenqiang Xu, Maozong Zheng, Hong Chen, Jinzheng He, Zhou Zhao

Consistency Analysis of ChatGPT Myeongjun Jang, Thomas Lukasiewicz

Do Differences in Values Influence Disagreements in Online Discussions? Michiel van der Meer, Piek Vossen, Catholijn Jonker, Pradeep Murukannaiah

A Digital Language Coherence Marker for Monitoring Dementia Dimitris Gkoumas, Adam Tsakalidis, Maria Liakata

Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks Heng Wang, Wenqian Zhang, Yuyang Bai, Zhaoxuan Tan, Shangbin Feng, Qinghua Zheng, Minnan Luo

Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimoda Emotion Recognition Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

HyperRank: Hyperbolic Ranking Model for Unsupervised Keyphrase Extraction Mingyang Song, Huafeng Liu, Liping Jing

Federated Meta-Learning for Emotion and Sentiment Aware Multi-modal Complaint Identification Apoorva Singh, Siddarth Chandrasekar, Sriparna Saha, Tanmay Sen

Semantic Similarity Models for Depression Severity Estimation Anxo Pérez, Neha Warikoo, Kexin Wang, Javier Parapar, Iryna Gurevych

Hop, Union, Generate: Explainable Multi-hop Reasoning without Rationale Supervision Wenting Zhao, Justin Chiu, Claire Cardie, Alexander Rush

ToolWriter: Question Specific Tool Synthesis for Tabular Data Carlos Gemmell, Jeff Dalton

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations Yuan Tian, Zheng Zhang, Zheng Ning, Toby Li, Jonathan Kummerfeld, Tianyi Zhang

CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Low Resource With Contrastive Learning Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Hang Pu, Yu Lan, Chao Shen

AnyTOD: A Programmable Task-Oriented Dialog System Jeffrey Zhao, Yuan Cao, Raghav Gupta, Harrison Lee, Abhinav Rastogi, Mingqiu Wang, Hagen Soltau, Izhak Shafran, Yonghui Wu

Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization Chi Cheang, Hou Chan, Derek Wong, Xuebo Liu, Zhaocong Li, Yanming Sun, Shudong Liu, Lidia Chao

Zero-Shot Multi-Label Topic Inference with Sentence Encoders and LLMs Souvika Sarkar, Dongji Feng, Shubhra Kanti Karmaker Santu

Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines Yoo Sung, Jordan Boyd-Graber, Naeemul Hassan

Learning From Free-Text Human Feedback – Collect New Datasets Or Extend Existing Ones? Dominic Petrak, Nafise Moosavi, Ye Tian, Nikolai Rozanov, Iryna Gurevych

Euphemistic Abuse – A New Dataset and Classification Experiments for Implicitly Abusive Language Michael Wiegand, Jana Kampfmeier, Elisabeth Eder, Josef Ruppenhofer

Exploring Distributional Shifts in Large Language Models for Code Analysis Shushan Arakelyan, Rocktim Das, Yi Mao, Xiang Ren

ATHENA: Mathematical Reasoning with Thought Expansion JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han

TIMELINE: Exhaustive Annotation of Temporal Relations Supporting the Automatic Ordering of Events in News Articles Sarah Alsayyahi, Riza Batista-Navarro

Mitigating Over-Generation for Unsupervised Keyphrase Extraction with Heterogeneous Centrality Detection Mingyang Song, Pengyu Xu, Yi Feng, Huafeng Liu, Liping Jing

More Than Spoken Words: Nonverbal Message Extraction and Generation Dian Yu, Xiaoyang Wang, Wanshun Chen, Nan Du, Longyue Wang, Haitao Mi, Dong Yu

Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance Molly Petersen, Lonneke van der Plas

FAME: Flexible, Scalable Analogy Mappings Engine Shahar Jacob, Chen Shani, Dafna Shahaf

Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning Chong Li, Shaonan Wang, Yunhao Zhang, Jiajun Zhang, Chengqing Zong

Multilingual Previously Fact-Checked Claim Retrieval Matúš Pikuliak, Ivan Srba, Robert Moro, Timo Hromadka, Timotej Smoleň, Martin Melišek, Ivan Vykopal, Jakub Simko, Juraj Podroužek, Maria Bielikova

ALCAP: Alignment-Augmented Music Captioner Zihao He, Weituo Hao, Wei-Tsung Lu, Changyou Chen, Kristina Lerman, Xuchen Song

Do Transformers Parse while Predicting the Masked Word? Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

Composable Text Controls in Latent Space with ODEs Guangyi Liu, Zeyu Feng, Yuan Gao, Zichao Yang, Xiaodan Liang, Junwei Bao, Xiaodong He, Shuguang Cui, Zhen Li, Zhiting Hu

P5: Plug-and-Play Persona Prompting for Personalized Response Selection Joosung Lee, Minsik Oh, Donghun Lee

Reader: Model-based language-instructed reinforcement learning Nicola Dainese, Pekka Marttinen, Alexander Ilin

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference Biao Fu, Minpeng Liao, Kai Fan, Zhongqiang Huang, Boxing Chen, Yidong Chen, Xiaodong Shi

GenEx: A Commonsense-aware Unified Generative Framework for Explainable Cyberbullying Detection Krishanu Maity, Raghav Jain, Prince Jha, Sriparna Saha, Pushpak Bhattacharyya

Document-Level Machine Translation with Large Language Models Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, Zhaopeng Tu

Multilingual Simplification of Medical Texts Sebastian Joseph, Kathryn Kazanas, Keziah Reina, Vishnesh Ramanathan, Wei Xu, Byron Wallace, Junyi Li

Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation Jiayu Lin, Rong Ye, Meng Han, Qi Zhang, Ruofei Lai, Xinyu Zhang, Zhao Cao, Xuanjing Huang, zhongyu wei

JASMINE: Arabic GPT Models for Few-Shot Learning El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, AbdelRahim Elmadany, Alcides Inciarte, Md Tawkat Islam Khondaker

NLI4CT: Multi-Evidence Natural Language Inference for Clinical Trial Reports Mael Jullien, Marco Valentino, Hannah Frost, Paul O’Regan, Dónal Landers, Andre Freitas

Addressing Linguistic Bias through a Contrastive Analysis of Academic Writing in the NLP Domain Robert Ridley, Zhen Wu, Jianbing Zhang, Shujian Huang, Xinyu Dai

RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation Yue Zhang, Leyang Cui, Enbo Zhao, Wei Bi, Shuming Shi

Detecting Propaganda Techniques in Code-Switched Social Media Text Muhammad Salman, Asif Hanif, Shady Shehata, Preslav Nakov

Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian Ruhiyah Widiaputri, Ayu Purwarianti, Dessi Lestari, Kurniawati Azizah, Dipta Tanaya, Sakriani Sakti

Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation Minwoo Lee, Hyukhun Koh, Kang-il Lee, Dongdong Zhang, Minsung Kim, Kyomin Jung

Code-Switching Metrics Using Intonation Units Rebecca Pattichis, Dora LaCasse, Sonya Trawick, Rena Cacoullos

Short Papers

Fine-grained Conversational Decoding via Isotropic and Proximal Search Yuxuan Yao, Han Wu, Qiling Xu, Linqi Song

Primacy Effect of ChatGPT Yiwei Wang, Yujun Cai, Muhao Chen, Yuxuan Liang, Bryan Hooi

Better Quality Pre-training Data and T5 Models for African Languages Akintunde Oladipo, Mofetoluwa Adeyemi, Orevaoghene Ahia, Abraham Owodunni, Odunayo Ogundepo, David Adelani, Jimmy Lin

Establishing Trustworthiness: Rethinking Tasks and Model Evaluation Robert Litschko, Max Müller-Eberstein, Rob van der Goot, Leon Weber-Genzel, Barbara Plank

Bootstrapping Small & High Performance Language Models with Unmasking-Removal Training Policy Yahan Yang, Elior Sulem, Insup Lee, Dan Roth

Fidelity-Enriched Contrastive Search: Reconciling the Faithfulness-Diversity Trade-Off in Text Generation Wei-Lin Chen, Cheng-Kuang Wu, Hsin-Hsi Chen, Chung-Chi Chen

Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models Gangwoo Kim, Sungdong Kim, Byeongguk Jeon, Joonsuk Park, Jaewoo Kang

Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings Andrea Wen-Yi, David Mimno

Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation Jian Wang, Yi Cheng, Dongding Lin, Chak Leong, Wenjie Li

Penalty Decoding: Well Suppress the Self-Reinforcement Effect in Open-Ended Text Generation Wenhong Zhu, Hongkun Hao, Rui Wang

PEFTDebias : Capturing debiasing information using PEFTs Sumit Agarwal, Aditya Veerubhotla, Srijan Bansal

ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation Xinpeng Wang, Barbara Plank

VivesDebate-Speech: A Corpus of Spoken Argumentation to Leverage Audio Features for Argument Mining Ramon Ruiz-Dolz, Javier Sanchez

Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning Namrata Shivagunde, Vladislav Lialin, Anna Rumshisky

Did You Mean…? Confidence-based Trade-offs in Semantic Parsing Elias Stengel-Eskin, Benjamin Van Durme

Understanding the Effect of Model Compression on Social Bias in Large Language Models Gustavo Gonçalves, Emma Strubell

Once is Enough: A Light-Weight Cross-Attention for Fast Sentence Pair Modeling Yuanhang Yang, Shiyi Qi, Chuanyi Liu, Qifan Wang, Cuiyun Gao, Zenglin Xu

Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation Mateusz Lango, Ondrej Dusek

Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques Manon Reusens, Philipp Borchert, Margot Mieskes, Jochen De Weerdt, Bart Baesens

Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies Zhengxuan Wu, Alex Tamkin, Isabel Papadimitriou

GROOViST: A Metric for Grounding Objects in Visual Storytelling Aditya Surikuchi, Sandro Pezzelle, Raquel Fernández

When Do Decompositions Help for Machine Reading? Kangda Wei, Dawn Lawrie, Benjamin Van Durme, Yunmo Chen, Orion Weller

Revisiting De-Identification of Electronic Medical Records: Evaluation of Within- and Cross-Hospital Generalization Yiyang Liu, Jinpeng Li, Enwei Zhu

Language Representation Projection: Can We Transfer Factual Knowledge across Languages in Multilingual Language Models? Shaoyang Xu, Junzhuo Li, Deyi Xiong

Are All Steps Equally Important? Benchmarking Essentiality Detection in Event Processes Haoyu Wang, Hongming Zhang, Yueguan Wang, Yuqian Deng, Muhao Chen, Dan Roth

ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision Anastasiia Sedova, Benjamin Roth

Uncertainty Guided Global Memory Improves Multi-Hop Question Answering Alsu Sagirova, Mikhail Burtsev

Knowledge Distillation {$\approx$} Label Smoothing: Fact or Fallacy? Md Sultan

Analyzing Cognitive Plausibility of Subword Tokenization Lisa Beinborn, Yuval Pinter

POE: Process of Elimination for Multiple Choice Reasoning Chenkai Ma, Xinya Du

Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral Analysis Hongyi Zheng, Abulhair Saparov

Best of Both Worlds: Towards Improving Temporal Knowledge Base Question Answering via Targeted Fact Extraction Nithish Kannen, Udit Sharma, Sumit Neelam, Dinesh Khandelwal, Shajith Ikbal, Hima Karanam, L Subramaniam

Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness? Kevin Liu, Stephen Casper, Dylan Hadfield-Menell, Jacob Andreas

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebron, Sumit Sanghai

BiasX: ``Thinking Slow’’ in Toxic Content Moderation with Explanations of Implied Social Biases Yiming Zhang, Sravani Nanduri, Liwei Jiang, Tongshuang Wu, Maarten Sap

Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks Alon Jacovi, Avi Caciularu, Omer Goldman, Yoav Goldberg

MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of Indian Legal Case Judgments Debtanu Datta, Shubham Soni, Rajdeep Mukherjee, Saptarshi Ghosh

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher Manning

Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models Junpeng Li, Zixia Jia, Zilong Zheng

EntSUMv2: Dataset, Models and Evaluation for More Abstractive Entity-Centric Summarization Dhruv Mehra, Lingjue Xie, Ella Hofmann-Coyle, Mayank Kulkarni, Daniel Preotiuc-Pietro

Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study Freddy Heppell, Kalina Bontcheva, Carolina Scarton

HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts Truong Do, Le Khiem, Quang Pham, TrungTin Nguyen, Thanh-Nam Doan, Binh Nguyen, Chenghao Liu, Savitha Ramasamy, Xiaoli Li, Steven Hoi

ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models Dheeraj Mekala, Jason Wolfe, Subhro Roy

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

Spoiler Detection as Semantic Text Matching Ryan Tran, Canwen Xu, Julian McAuley

Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods Jonathan Kamp, Lisa Beinborn, Antske Fokkens

BasahaCorpus: An Expanded Linguistic Resource for Readability Assessment in Central Philippine Languages Joseph Imperial, Ekaterina Kochmar

4 and 7-bit Labeling for Projective and Non-Projective Dependency Trees Carlos Gómez-Rodríguez, Diego Roca, David Vilares

Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language Understanding Shuwen Deng, Paul Prasse, David Reich, Tobias Scheffer, Lena Jäger

Understanding the Inner-workings of Language Models Through Representation Dissimilarity Davis Brown, Charles Godfrey, Nicholas Konz, Jonathan Tu, Henry Kvinge

Efficient Classification of Long Documents via State-Space Models Peng Lu, Suyuchen Wang, Mehdi Rezagholizadeh, Bang Liu, Ivan Kobyzev

Construction Artifacts in Metaphor Identification Datasets Joanne Boisson, Luis Espinosa-Anke, Jose Camacho-Collados

EtiCor: Corpus for Analyzing LLMs for Etiquettes Ashutosh Dwivedi, Pradhyumna Lavania, Ashutosh Modi

Prompt-Based Monte-Carlo Tree Search for Goal-oriented Dialogue Policy Planning Xiao Yu, Maximillian Chen, Zhou Yu

UniMath: A Foundational and Multimodal Mathematical Reasoner Zhenwen Liang, Tianyu Yang, Jipeng Zhang, Xiangliang Zhang

Simple Temporal Adaptation to Changing Label Sets: Hashtag Prediction via Dense KNN Niloofar Mireshghallah, Nikolai Vogler, Junxian He, Omar Florez, Ahmed El-Kishky, Taylor Berg-Kirkpatrick

A Study on Accessing Linguistic Information in Pre-Trained Language Models by Using Prompts Marion Di Marco, Katharina Hämmerl, Alexander Fraser

Somali Information Retrieval Corpus: Bridging the Gap between Query Translation and Dedicated Language Resources Abdisalam Badel, Ting Zhong, Wenxin Tai, Fan Zhou

Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection via Querying ChatGPT Biru Zhu, Lifan Yuan, Ganqu Cui, Yangyi Chen, Chong Fu, Bingxiang He, Yangdong Deng, Zhiyuan Liu, Maosong Sun, Ming Gu

Faithful Model Evaluation for Model-Based Metrics Qian Hu, Palash Goyal, Rahul Gupta

Language Model Quality Correlates with Psychometric Predictive Power in Multiple Languages Ethan Wilcox, Clara Meister, Ryan Cotterell, Tiago Pimentel

Enhancing Code-Switching for Cross-lingual SLU: A Unified View of Semantic and Grammatical Coherence Zhihong Zhu, Xuxin Cheng, Zhiqi Huang, Dongsheng Chen, Yuexian Zou

M$^3$Seg: A Maximum-Minimum Mutual Information Paradigm for Unsupervised Topic Segmentation in ASR Transcripts Ke Wang, Xiutian Zhao, Yanghui Li, Wei Peng

GD-COMET: A Geo-Diverse Commonsense Inference Model Mehar Bhatia, Vered Shwartz

PreWoMe: Exploiting Presuppositions as Working Memory for Long Form Question Answering Wookje Han, Jinsol Park, Kyungjae Lee

SOUL: Towards Sentiment and Opinion Understanding of Language Yue Deng, Wenxuan Zhang, Sinno Pan, Lidong Bing

Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks Andrea Sottana, Bin Liang, Kai Zou, Zheng Yuan

Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text Qi Cao, Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa

Exploring Linguistic Probes for Morphological Inflection Jordan Kodner, Salam Khalifa, Sarah Ruth Brogden Payne

FLatS: Principled Out-of-Distribution Detection with Feature-Based Likelihood Ratio Score Haowei Lin, Yuntian Gu

Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications Manuel Faysse, Gautier Viaud, Céline Hudelot, Pierre Colombo

CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation Sathish Indurthi, Shamil Chollampatt, Ravi Agrawal, Marco Turchi

Improved Unsupervised Chinese Word Segmentation Using Pre-trained Knowledge and Pseudo-labeling Transfer Hsiu-Wen Li, Ying-Jia Lin, Yi-Ting Li, Chun Lin, Hung-Yu Kao

Multilingual $k$-Nearest-Neighbor Machine Translation David Stap, Christof Monz

Understanding Computational Models of Semantic Change: New Insights from the Speech Community Filip Miletić, Anne Przewozny-Desriaux, Ludovic Tanguy

Revisiting Automated Topic Model Evaluation with Large Language Models Dominik Stammbach, Vilém Zouhar, Alexander Hoyle, Mrinmaya Sachan, Elliott Ash

Query2doc: Query Expansion with Large Language Models Liang Wang, Nan Yang, Furu Wei

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions Bodhisattwa Majumder, Zexue He, Julian McAuley

Large Language Models are biased to overestimate profoundness Eugenio Herrera-Berg, Tomás Browne, Pablo León-Villagrá, Marc-Lluís Vives, Cristian Calderon

Prompting Scientific Names for Zero-Shot Species Recognition Shubham Parashar, Zhiqiu Lin, Yanan Li, Shu Kong

MultiTurnCleanup: A Benchmark for Multi-Turn Spoken Conversational Transcript Cleanup Hua Shen, Vicky Zayats, Johann Rocholl, Daniel Walker, Dirk Padfield

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition Srijith Radhakrishnan, Chao-Han Yang, Sumeer Khan, Rohit Kumar, Narsis Kiani, David Gomez-Cabrero, Jesper Tegnér

Transformer-based Live Update Generation for Soccer Matches from Microblog Posts Masashi Oshika, Kosuke Yamada, Ryohei Sasano, Koichi Takeda

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models Lina Conti, Guillaume Wisniewski

What do Deck Chairs and Sun Hats Have in Common? Uncovering Shared Properties in Large Concept Vocabularies Amit Gajbhiye, Zied Bouraoui, Na Li, Usashi Chatterjee, Luis Espinosa-Anke, Steven Schockaert

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Wang, Miguel Eckstein, William Wang

Polyglot or Not? Measuring Multilingual Encyclopedic Knowledge in Foundation Models Tim Schott, Daniel Furman, Shreshta Bhat

Anchoring Fine-tuning of Sentence Transformer with Semantic Label Information for Efficient Truly Few-shot Classification Amalie Pauli, Leon Derczynski, Ira Assent

Data Similarity is Not Enough to Explain Language Model Performance Gregory Yauney, Emily Reif, David Mimno

Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli

mAggretriever: A Simple yet Effective Approach to Zero-Shot Multilingual Dense Retrieval Sheng-Chieh Lin, Amin Ahmad, Jimmy Lin

CodeFusion: A Pre-trained Diffusion Model for Code Generation Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen

VECHR: A Dataset for Explainable and Robust Classification of Vulnerability Type in the European Court of Human Rights Shanshan Xu, Leon Staufer, Santosh T.Y.S.S, Oana Ichim, Corina Heri, Matthias Grabmair

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model Haikang Deng, Colin Raffel

Cabbage Sweeter than Cake? Analysing the Potential of Large Language Models for Learning Conceptual Spaces Usashi Chatterjee, Amit Gajbhiye, Steven Schockaert

Large-scale similarity search with Optimal Transport Cléa Laouar, Yuki Takezawa, Makoto Yamada

FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning Jaemin Shin, Hyungjun Yoon, Seungjoo Lee, Sungjoon Park, Yunxin Liu, Jinho Choi, Sung-Ju Lee

Simplicity Level Estimate (SLE): A Learned Reference-Less Metric for Sentence Simplification Liam Cripwell, Joël Legrand, Claire Gardent

Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems Marek Kadlčík, Michal Štefánik, Ondrej Sotolar, Vlastimil Martinek

CoF-CoT: Enhancing Large Language Models with Coarse-to-Fine Chain-of-Thought Prompting for Multi-domain NLU Tasks Hoang Nguyen, Ye Liu, Chenwei Zhang, Tao Zhang, Philip Yu

Select, Prompt, Filter: Distilling Large Language Models for Summarizing Conversations Minh-Quang Pham, Sathish Indurthi, Shamil Chollampatt, Marco Turchi

Human Raters Cannot Distinguish English Translations from Original English Texts Shira Wein

Faster Minimum Bayes Risk Decoding with Confidence-based Pruning Julius Cheng, Andreas Vlachos

Revisiting Sparse Retrieval for Few-shot Entity Linking Yulin Chen, Zhenran Xu, Baotian Hu, Min Zhang

Context Compression for Auto-regressive Transformers with Sentinel Tokens Siyu Ren, Qi Jia, Kenny Zhu

Set Learning for Generative Information Extraction Jiangnan Li, Yice Zhang, Bin Liang, Kam-Fai Wong, Ruifeng Xu

Token Prediction as Implicit Classification to Identify LLM-Generated Text Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

On Evaluation of Bangla Word Analogies Mousumi Akter, Souvika Sarkar, Shubhra Kanti Karmaker Santu

Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents Jannis Vamvas, Rico Sennrich

CLAIR: Evaluating Image Captions with Large Language Models David Chan, Suzanne Petryk, Joseph Gonzalez, Trevor Darrell, John Canny

Poisoning Retrieval Corpora by Injecting Adversarial Passages Zexuan Zhong, Ziqing Huang, Alexander Wettig, Danqi Chen

Clustering Pseudo Language Family in Multilingual Translation Models with Fisher Information Matrix Xinyu Ma, Xuebo Liu, Min Zhang

SUT: Active Defects Probing for Transcompiler Models Mengnan Qi, Yufan Huang, Maoquan Wang, Yongqiang Yao, Zihan Liu, Bin Gu, Colin Clement, Neel Sundaresan

This Reads Like That: Deep Learning for Interpretable Natural Language Processing Claudio Fanconi, Moritz Vandenhirtz, Severin Husmann, Julia Vogt

SMoP: Towards Efficient and Effective Prompt Tuning with Sparse Mixture-of-Prompts Joon-Young Choi, Junho Kim, Jun-Hyung Park, Wing-Lam Mok, SangKeun Lee

Outlier Dimensions Encode Task Specific Knowledge William Rudman, Catherine Chen, Carsten Eickhoff

Self-Ensemble of $N$-best Generation Hypotheses by Lexically Constrained Decoding Ryota Miyano, Tomoyuki Kajiwara, Yuki Arase

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao

Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism Mengyu Ye, Tatsuki Kuribayashi, Jun Suzuki, Goro Kobayashi, Hiroaki Funayama

A Simple Baseline for Knowledge-Based Visual Question Answering Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos

Unveiling the Essence of Poetry: Introducing a Comprehensive Dataset and Benchmark for Poem Summarization Ridwan Mahbub, Ifrad Khan, Samiha Anuva, Md Shahriar, Md Tahmid Rahman Laskar, Sabbir Ahmed

CoRec: An Easy Approach for Coordination Recognition Qing Wang, Haojie Jia, Wenfei Song, Qi Li

FinEntity: Entity-level Sentiment Classification for Financial Texts Yixuan Tang, Yi Yang, Allen Huang, Andy Tam, Justin Tang

Rationale-Enhanced Language Models are Better Continual Relation Learners Weimin Xiong, Yifan Song, Peiyi Wang, Sujian Li

Inverse Scaling Can Become U-Shaped Jason Wei, Najoung Kim, Yi Tay, Quoc Le

ScdNER: Span-Based Consistency-Aware Document-Level Named Entity Recognition Ying Wei, Qi Li

NormDial: A Comparable Bilingual Synthetic Dialog Dataset for Modeling Social Norm Adherence and Violation Oliver Li, Mallika Subramanian, Arkadiy Saakyan, Sky CH-Wang, Smaranda Muresan

ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets Tobias Schimanski, Julia Bingler, Mathias Kraus, Camilla Hyslop, Markus Leippold

An Attribution Method for Siamese Encoders Lucas Moeller, Dmitry Nikolaev, Sebastian Padó

Are Compressed Language Models Less Subgroup Robust? Leonidas Gee, Andrea Zugarini, Novi Quadrianto

Length Does Matter: Summary Length can Bias Summarization Metrics Xiaobo Guo, Soroush Vosoughi

Fine-grained Medical Vision-Language Representation Learning for Radiology Report Generation Siyuan Wang, Bo Peng, Yichao Liu, Qi Peng

Automated Fact-Checking in Dialogue: Are Specialized Models Needed? Eric Chamoun, Marzieh Saeidi, Andreas Vlachos

Assessing the influence of attractor-verb distance on grammatical agreement in humans and language models Christos Zacharopoulos, Théo Desbordes, Mathias Sablé-Meyer

To Split or Not to Split: Composing Compounds in Contextual Vector Spaces Christopher Jenkins, Filip Miletic, Sabine Schulte im Walde

TaskDiff: A Similarity Metric for Task-Oriented Conversations Ankita Bhaumik, Praveen Venkateswaran, Yara Rizk, Vatche Isahagian

A Benchmark for Reasoning with Spatial Prepositions Iulia Comsa, Srini Narayanan

Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation Yixin Liu, Alexander Fabbri, Yilun Zhao, Pengfei Liu, Shafiq Joty, Chien-Sheng Wu, Caiming Xiong, Dragomir Radev

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding Steven Wang, Antoine Scardigli, Leonard Tang, Wei Chen, Dmitry Levkin, Anya Chen, Spencer Ball, Thomas Woodside, Oliver Zhang, Dan Hendrycks

PK-ICR: Persona-Knowledge Interactive Multi-Context Retrieval for Grounded Dialogue Minsik Oh, Joosung Lee, Jiwei Li, Guoyin Wang

A Self-training Framework for Automated Medical Report Generation Siyuan Wang, Zheng Liu, Bo Peng

A Picture is Worth a Thousand Words: Language Models Plan from Pixels Anthony Liu, Lajanugen Logeswaran, Sungryull Sohn, Honglak Lee

Relation-aware Ensemble Learning for Knowledge Graph Embedding Ling Yue, Yongqi Zhang, Quanming Yao, Yong Li, Xian Wu, Ziheng Zhang, Zhenxi Lin, Yefeng Zheng

When Reviewers Lock Horns: Finding Disagreements in Scientific Peer Reviews Sandeep Kumar, Tirthankar Ghosal, Asif Ekbal

Cohere and Fujitsu Announce Strategic Partnership To Provide Japanese Enterprise AI Services

Sep 06, 2023

References provide the information necessary for readers to identify and retrieve each work cited in the text .

Check each reference carefully against the original publication to ensure information is accurate and complete. Accurately prepared references help establish your credibility as a careful researcher and writer.

Consistency in reference formatting allows readers to focus on the content of your reference list, discerning both the types of works you consulted and the important reference elements (who, when, what, and where) with ease. When you present each reference in a consistent fashion, readers do not need to spend time determining how you organized the information. And when searching the literature yourself, you also save time and effort when reading reference lists in the works of others that are written in APA Style.

Academic Writer ®

Master academic writing with APA’s essential teaching and learning resource

illustration or abstract figure and computer screen

Course Adoption

Teaching APA Style? Become a course adopter of the 7th edition Publication Manual

illustration of woman using a pencil to point to text on a clipboard

Instructional Aids

Guides, checklists, webinars, tutorials, and sample papers for anyone looking to improve their knowledge of APA Style

IMAGES

Four innovative NLP research papers in ACL 2023
Bloomberg’s AI Engineering Group Publishes 4 NLP Research Papers at
Bloomberg’s AI Engineering Group & CTO Office Publish 4 NLP Research
Bloomberg’s AI Engineering Group & CTO Office Publish 4 NLP Research
Four innovative NLP research papers in ACL 2023
Bloomberg’s AI Engineering Group Publishes 4 NLP Research Papers at

COMMENTS

Natural Language Processing
Browse 701 tasks • 2146 datasets • 2134
Top Natural Language Processing (NLP) Papers of January 2023
Get ready for cutting-edge NLP research! Our top NLP papers for January 2023 cover language models, text generation, and summarization. Discover the latest advancements in language processing with Cohere's selection of the best research.
NATURAL LANGUAGE PROCESSING: TRANSFORMING HOW MACHINES ...
Natural Language Processing (NLP) stands as a pivotal advancement in the field of artificial intelligence, revolutionizing the way machines comprehend and interact with human language. This paper ...
Emerging Trends in NLP Research: Top NLP Papers April 2023
Explore top NLP papers for April 2023, curated by Cohere For AI, covering topics like toxicity evaluation, large language model limitations, neural scaling laws, and retrieval-augmented models. Stay updated in the fast-evolving NLP field, and consider joining Cohere's research community. Staying informed about the latest breakthroughs in ...
The Best of NLP: February 2023's Top NLP Papers
Stay ahead of the game: Get a sneak peek of the coolest natural language processing (NLP) research of February 2023! Our handpicked selection of the best NLP papers will keep you up-to-date on the latest advancements in language models, text generation, and summarization.
Natural language processing: state of the art, current trends and
Natural language processing (NLP) has recently gained much attention for representing and analyzing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. In this paper, we first distinguish four phases by discussing different levels of NLP ...
[2111.01243] Recent Advances in Natural Language Processing via Large
Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field. We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches. We also present approaches that use pre-trained language models to generate data for ...
Natural Language Processing Journal
The NLP journal welcomes original research papers, review papers, position papers, tutorial and best practice papers. Special Issues proposals on specific current topics are welcomed. To foster trust, transparency, and reproducibility in AI research the NLP journal promotes open and FAIR data and software sharing practices.
PDF The Decades Progress on Code-Switching Research in NLP: A Systematic
We show quantitative evidence of the upward trend for CSW-related research in Figure1. In this paper, we present the rst large-scale comprehensive survey on CSW NLP research in a structured manner by collecting more than 400 papers published on open repositories, such as the ACL Anthology and ISCA proceedings (see §2).
Four innovative NLP research papers in ACL 2023
Part 3 of the ACL 2023 Review Series for NLP research papers that show an innovative standpoint on where the technology can head next.
Efficient Methods for Natural Language Processing: A Survey
Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require ...
Exploring the Landscape of Natural Language Processing Research
As an efficient approach to understand, generate, and process natural language texts, research in natural language processing (NLP) has exhibited a rapid spread and wide adoption in recent years. Given the increasing research work in this area, several NLP-related approaches have been surveyed in the research community. However, a comprehensive study that categorizes established topics ...
The Recent Large Language Models in NLP
Our paper compares the recent Language Models and their contributions to the field of NLP, and discusses future extensions. Published in: 2023 22nd International Symposium on Communications and Information Technologies (ISCIT)
Top 10 NLP Research Papers Worth Reading For Beginners
Getting started with reading research papers in NLP can seem daunting, but it can be a valuable and rewarding experience with the right approach. This article provides tips for reading research papers and a top-10 list of articles to get you started.
Unlocking New Possibilities: March 2023's Top NLP Papers
Dive into Cohere For AI's community selection of March 2023's NLP research, featuring cutting-edge language models, unparalleled text generation, and revolutionary summarization techniques! Stay ahead, and stay informed! 🌐🧠 TL;DR: Explore the C4AI community's top NLP research picks for March 2023. This post features an array of topics, encompassing the latest advancements in large ...
Bloomberg's AI Engineering Group & CTO Office Publish 11 NLP Research
Bloomberg's 11 papers at ACL 2023 highlight a variety of state-of-the-art NLP applications, novel approaches & improved models used in key tasks
Bloomberg's AI Engineering Group Publishes 4 NLP Research Papers at
Bloomberg's four EMNLP 2023 research papers highlight a variety of applications, novel approaches, and improved models used in key NLP tasks.
Natural Language Processing
Natural Language Processing Natural Language Processing (NLP) research at Google focuses on algorithms that apply at scale, across languages, and across domains. Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more.
Best Papers
ACL'23 implemented the new award policy, which aims for broader recognition of exceptional research, in particular by significantly increasing the pool of outstanding papers to 1.5-2.5% of the total submissions. So, this year we have a total of 3 best papers, 4 special awards papers (Resource Award, Social Impact Award, Reproduction Award, Theme Paper Award)—and 39 outstanding papers ...
[2312.05688] NLLG Quarterly arXiv Report 09/23: What are the most
Artificial Intelligence (AI) has witnessed rapid growth, especially in the subfields Natural Language Processing (NLP), Machine Learning (ML) and Computer Vision (CV). Keeping pace with this rapid progress poses a considerable challenge for researchers and professionals in the field. In this arXiv report, the second of its kind, which covers the period from January to September 2023, we aim to ...
Main Conference
The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment Jared Fernandez, Jacob Kahn, Clara Na, Yonatan Bisk, Emma Strubell Evaluating Cross-Domain Text-to-SQL Models and Benchmarks Mohammadreza Pourreza, Davood Rafiei
Emerging Trends in Generative AI Research: Top Research Papers August 2023
Explore top NLP papers for August 2023, curated by Cohere For AI, covering topics like reducing hallucinations, addressing limitations in RLHF, instruction tuning, aligning LLMs, and more. Stay updated in the fast-evolving NLP field, and consider joining Cohere's research community. Generative AI enthusiasts and practitioners, get ready for a ...
2023 Conference
Announcing the NeurIPS 2023 Paper Awards: Important Dates. Workshop Accept/Reject Notification Date: Oct 27 '23 (Anywhere on Earth) CameraReadyDeadline ... Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, ...
Announcing the NeurIPS 2023 Paper Awards
Announcing the NeurIPS 2023 Paper Awards By Amir Globerson, Kate Saenko, Moritz Hardt, Sergey Levine and Comms Chair, Sahra Ghalebikesabi . We are honored to announce the award-winning papers for NeurIPS 2023! This year's prestigious awards consist of the Test of Time Award plus two Outstanding Paper Awards in each of these three categories: . Two Outstanding Main Track Papers
PDF arXiv:2307.10652v5 [cs.CL] 24 Sep 2023
as for future research remains absent. Contributing to closing this gap, we have systematically classified and an-alyze research papers in the ACL Anthology. As a result, we present a structured overview of the research landscape, provide a taxonomy of fields of study in NLP, analyze recent devel-opments in NLP, summarize our findings, and
References
References provide the information necessary for readers to identify and retrieve each work cited in the text. Consistency in reference formatting allows readers to focus on the content of your reference list, discerning both the types of works you consulted and the important reference elements with ease.