Syllabus: Meaningful Text Analysis with Word Embeddings

A week-long workshop, taught at Digital Humanities Summer Institute, Summer 2021

Instructor: Jonathan Reeve, Department of English and Comparative Literature, Columbia University.

Course website: https://dhsi2021.jonreeve.com (You’re probably reading it now.)

Dates: 14–18 June, 2021, at 11:00 Pacific / 14:00 New York / 18:00 UTC, for one hour.

Virtual classroom: https://meet.jit.si/dhsi2021-word-embeddings

Chatroom: #dhsi2021-word-embeddings:matrix.org on matrix. Download a client.

WARNING: This syllabus is very much still a work-in-progress, and likely won’t be complete until the course start date, in June 2021.

Course Description

Word embeddings provide new ways of understanding language, by encorporating contexts, meanings, and senses of words into their digital representations. They are a new technology, developed by researchers at Google, which now powers the most advanced computational language tasks, such as machine translation, automatic summarization, and information extraction. Since they represent more than just the surface forms of words, their applications for humanities scholarship are profound. This course will serve as a hands-on introduction to word embeddings, and will use the Python programming language, in conjunction with the SpaCy package for natural language processing. Participants are encouraged to bring their own collections of text to analyze, and will create meaningful explorations of them by the end of the course. No prior programming experience is necessary.

Monday, 14 June: Theory of Word Embeddings

Lecture video: TBA
Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
Reading: Chapter 6 of Jurafski, Dan, and James H. Martin. Speech and Language Processing. Third edition draft.

Tuesday, 15 June: Introduction to Python for Text Analysis

Lecture video: TBA
Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
Mikolov, Tomas and Chen, Kai and Corrado, Greg and Dean, Jeffrey. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781.

Wednesday, 16 June: Hands-on With Pre-Trained Word Embeddings

Lecture video: TBA
Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
Readign: Kozlowski, Austin C., Matt Taddy, and Evans, James A. (2019) “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 84:5.

Thursday, 17 June: Practicum in Text Analysis

Lecture video: TBA
Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
Reading: Garg, N., Schiebinger, L., Jurafsky, D., and Zou, J. (2018) “Word embeddings quantify 100 years of gender and ethnic stereotypes” PNAS 115:16

Friday, 18 June: Lab Work

Lecture video: TBA
Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
Reading: Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. “Man is to computer programmer as woman is to homemaker? debiasing word embeddings.” arXiv preprint arXiv:1607.06520 (2016).