A week-long workshop, taught at Digital Humanities Summer Institute, Summer 2021
Instructor: Jonathan Reeve, Department of English and Comparative Literature, Columbia University.
Email: jonathan.reeve@columbia.edu
Course website: https://dhsi2021.jonreeve.com (You’re probably reading it now.)
Dates: 14–18 June, 2021, at 11:00 Pacific / 14:00 New York / 18:00 UTC, for one hour.
Virtual classroom: https://meet.jit.si/dhsi2021-word-embeddings
Chatroom: #dhsi2021-word-embeddings:matrix.org on matrix. Download a client.
WARNING: This syllabus is very much still a work-in-progress, and likely won’t be complete until the course start date, in June 2021.
Course Description
Word embeddings provide new ways of understanding language, by encorporating contexts, meanings, and senses of words into their digital representations. They are a new technology, developed by researchers at Google, which now powers the most advanced computational language tasks, such as machine translation, automatic summarization, and information extraction. Since they represent more than just the surface forms of words, their applications for humanities scholarship are profound. This course will serve as a hands-on introduction to word embeddings, and will use the Python programming language, in conjunction with the SpaCy package for natural language processing. Participants are encouraged to bring their own collections of text to analyze, and will create meaningful explorations of them by the end of the course. No prior programming experience is necessary.
Monday, 14 June: Theory of Word Embeddings
- Lecture video: TBA
- Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
- Reading: Chapter 6 of Jurafski, Dan, and James H. Martin. Speech and Language Processing. Third edition draft.
Tuesday, 15 June: Introduction to Python for Text Analysis
- Lecture video: TBA
- Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
- Mikolov, Tomas and Chen, Kai and Corrado, Greg and Dean, Jeffrey. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781.
Wednesday, 16 June: Hands-on With Pre-Trained Word Embeddings
- Lecture video: TBA
- Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
- Readign: Kozlowski, Austin C., Matt Taddy, and Evans, James A. (2019) “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 84:5.
Thursday, 17 June: Practicum in Text Analysis
- Lecture video: TBA
- Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
- Reading: Garg, N., Schiebinger, L., Jurafsky, D., and Zou, J. (2018) “Word embeddings quantify 100 years of gender and ethnic stereotypes” PNAS 115:16
Friday, 18 June: Lab Work
- Lecture video: TBA
- Class videoconference: 11:00 Pacific / 14:00 New York / 18:00 UTC, in our videoconference room on Jitsi.
- Reading: Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. “Man is to computer programmer as woman is to homemaker? debiasing word embeddings.” arXiv preprint arXiv:1607.06520 (2016).