Distributional Semantics for Linguistic Research

Presenters: Florent Perek

T1MRC2

Distributional semantics seeks to capture the meaning of words by automatically extracting information about their contexts of use from large corpora, under the assumption that words with similar meanings tend to be found in similar contexts. In a distributional semantic model (DSM), also called word embedding, the meaning of a word is represented by an array of numerical values derived from its co-occurrences, turning the informal notion of meaning into a more precise quantification which is built from usage data and lends itself well to quantitative studies.

This workshop will provide an introduction to distributional semantics and to a range of ways it can be used to conduct empirical research in linguistics. I will first describe the basic ideas behind the distributional semantic approach, and the various ways in which it has been computationally implemented, notably the “bag-of-words” approach based on lexical co-occurrences, and neural network approaches like word2vec. I will discuss various off-the-shelf DSMs as well as tools to create tailormade DSMs from your own corpus data. Finally, I will demonstrate various ways in which DSMs can be reliably used as a source of lexical semantic information, notably through semantic measures such as semantic distance, measures of semantic spread, and clustering into semantic classes. Examples will include research in syntactic productivity, language change, language development, and descriptive grammar. The workshop will feature a mixture of lectures and hands-on sessions.

Prior knowledge of R is advised for the hands-on workshops+B57, and some knowledge of a programming language like Python or Java is a plus to take full advantage of t+B53his workshop.

Keywords: AI, Communicative Efficiency, Computational Linguistics, Deep Learning, Production, Productivity, Semantics, Computational Modeling, Corpus Linguistics, Language Change, Learning, Theoretical Frameworks

When/Where:
Room STB 151, Mondays and Thursdays, July 7-July 21, 1:00pm - 2:20pm

Days:
Mondays and Thursdays

Presenters

Florent Perek

University of Birmingham, UK

Florent Perek holds a PhD in English Linguistics from the University of Freiburg (Germany), under a co-tutelle (French joint PhD programme) with the University of Lille (France). He has occupied postdoc positions at the University of Basel (Switzerland) and Princeton University. He is currently an Associate Professor at the Department of English Language and Linguistics at the University of Birmingham, UK. Florent is a cognitive linguist, a quantitative corpus linguist, and a construction grammarian. His main research interests lie in the study of grammar from a cognitive and corpus linguistic perspective. He focuses in particular on how syntactic constructions are mentally represented, how they are learned, and how they change over time.

When/Where:
Room STB 151, Mondays and Thursdays, July 7-July 21, 1:00pm - 2:20pm

Days:
Mondays and Thursdays

Distributional Semantics for Linguistic Research

Presenters: Florent Perek

Presenters

Questions?