Foundations of Predictive Modeling with R
Instructors: Stefan Th. Gries
According to many practitioners and observers, linguistics has undergone a so-called "quantitative turn" such that, over the last 25 years or so, the number of studies using statistical methods in the analysis of empirical data has been steadily increasing. A similar, but slightly delayed in comparison, increase can be observed in the number of studies that are multifactorial or multivariate and the arguably most frequent statistical methods are by now techniques from the domain of predictive modeling, i.e. scenarios where, typically, one response variable's behavior is modeled on the basis of multiple predictor variables. The most frequently used techniques are regression models -- most notably, linear and binary logistic regression modeling -- and tree-based models -- most notably, classification and regression trees and random forests based on them. This course aims at meeting two objectives: First, it introduces these main predictive modeling techniques and exemplifies them in R using corpus-linguistic data on word durations, reaction time data, genitive choices, and a new data set on clause-ordering choices in complex sentences. In this first part, the focus is on exploration and preparation of data for predictive modeling, model fitting, and efficient model interpretation. The second part discusses (i) a variety of pitfalls in predictive modeling that one should try to avoid and exemplifies them (partially on the basis of presented/published work (appropriately anonymized) and (ii) several simple techniques that increase the chances of 'getting the most' out of one's data set. In addition, this course aims to be good preparation for more advanced modeling courses. Participants need some basic familiarity with R (loading data and descriptive statistics) but no prior knowledge of predictive modeling and will get Quarto/RMarkdown documents to follow along and work with in class.
Keywords: Probabilistic Models, Quantitative Methods, Statistics, R, Corpus Linguistics
Mondays and Thursdays, July 7-July 21, 1:00pm - 2:20pm
Term 1 (July 7 - 22)
Mondays and Thursdays
Instructors

UC Santa Barbara & JLU Giessen
Stefan Th. Gries is Professor of Linguistics at UC Santa Barbara and Chair of English Linguistics (Corpus Linguistics with a focus on quantitative methods, 25%) in the Department of English at the JLU Giessen. He is a quantitative corpus linguist at the intersection of corpus linguistics, usage-based/cognitive linguistics, and a bit of computational and psycholinguistics. He has worked on topics such as blend formation, grammatical variation, the syntax-lexis interface, semantics (polysemy, antonymy, near synonymy, and legal interpretation), learner corpus and varieties research, corpus-linguistic methodology, and the development and application of statistical methods in linguistics.
Mondays and Thursdays, July 7-July 21, 1:00pm - 2:20pm
Term 1 (July 7 - 22)
Mondays and Thursdays