Mission Statement: To foster the development of new and improved ways of measuring the quality and understanding the properties of vector space representations in NLP.

Time & Location: Copenhagen, Denmark (EMNLP 2017 workshop).


Models that learn real-valued vector representations of words, phrases, sentences, and even document are ubiquitous in today’s NLP landscape. These representations are usually obtained by training a model on large amounts of unlabeled data, and then employed in NLP tasks and downstream applications. While such representations should ideally be evaluated according to their value in these applications, doing so is laborious, and it can be hard to rigorously isolate the effects of different representations for comparison. There is therefore a need for evaluation via simple and generalizable proxy tasks. To date, these proxy tasks have been mainly focused on lexical similarity and relatedness, and do not capture the full spectrum of interesting linguistic properties that are useful for downstream applications. This workshop challenges its participants to propose methods and/or design benchmarks for evaluating the next generation of vector space representations, for presentation and detailed discussion at the event.


We encourage researchers at all levels of experience to consider contributing to the discussion at RepEval by making a short submission to either of two tracks:

Shared Task

Starting from this year, RepEval will be hosting a shared task for evaluating general-purpose sentence representations. In addition to their system’s output, participants will be required to submit a system description and make their code and resources publicly available for posterity. More details can be found here.


A proposal submission should propose a novel method for evaluating representations. It does not have to construct an actual dataset, but it should describe a way (or several optional ways) of collecting one. Proposals are expected to provide roughly 5-10 examples in the manuscript as a proof of concept.

In addition, each proposal should explicitly mention:

  • Which type of representation it evaluates (e.g. word, sentence, document)
  • For which downstream application(s) it functions as a proxy
  • Any linguistic/semantic/psychological properties it captures

Among other important points, proposals should take the following into consideration:

  • If the task captures some linguistic phenomenon via annotators, what evidence is there that it is robustly observed in humans (e.g., inter-annotator agreement)?
  • How easy would it be for other researchers to accurately reproduce the evaluation (not necessarily the dataset)?
  • Will the dataset be cost-effective to produce?
  • Is a specific family of models expected to perform particularly better (or worse) on the task? In other words, which types of models is this evaluation targeted at?
  • How should the evaluation’s results be interpreted?

We hope that one or more of these proposals will evolve into next year’s shared task (RepEval 2018).

Submission Format

Submissions to both tracks should be 2-4 pages of content in EMNLP format, with an unlimited amount of pages for references. For the proposal track, we encourage shorter content (2-3 pages), leaving more room for examples and their visualization.

Important Dates

  • June 14 (GMT-11, 23:59:59): Proposal papers due
  • July 3 (GMT-11, 23:59:59): Reviews due
  • July 6: Acceptance notification
  • July 21 (GMT-11, 23:59:59): Camera-ready papers due