Syllabus: Meaningful Text Analysis with Word Embeddings

A week-long workshop, taught at Digital Humanities Summer Institute, Summer 2021

InstructorJonathan Reeve, Department of English and Comparative Literature, Columbia University.

Email: jonathan.reeve@columbia.edu

Course website: https://dhsi2021.jonreeve.com (You’re probably reading it now.)

Dates: 14–18 June, 2021, at 11:00 Pacific / 14:00 New York / 18:00 UTC, for one hour.

Virtual classroom: https://meet.jit.si/dhsi2021-word-embeddings

Chatroom: #dhsi2021-word-embeddings:matrix.org on matrix. Download a client.

WARNING: This syllabus is very much still a work-in-progress, and likely won’t be complete until the course start date, in June 2021.

Course Description

Word embeddings provide new ways of understanding language, by encorporating contexts, meanings, and senses of words into their digital representations. They are a new technology, developed by researchers at Google, which now powers the most advanced computational language tasks, such as machine translation, automatic summarization, and information extraction. Since they represent more than just the surface forms of words, their applications for humanities scholarship are profound. This course will serve as a hands-on introduction to word embeddings, and will use the Python programming language, in conjunction with the SpaCy package for natural language processing. Participants are encouraged to bring their own collections of text to analyze, and will create meaningful explorations of them by the end of the course. No prior programming experience is necessary.

Monday, 14 June: Theory of Word Embeddings

Tuesday, 15 June: Introduction to Python for Text Analysis

Wednesday, 16 June: Hands-on With Pre-Trained Word Embeddings

Thursday, 17 June: Practicum in Text Analysis

Friday, 18 June: Lab Work