What’s of interest? LC & Flickr Commons: Analysis of user interactions on 40,000 images
Tell me more!
Flickr Commons is a program to bring the visual collections of cultural heritage organizations to new audiences. Getting these resources in front of people where they are online as opposed to being siloed in their own website or not online at all. It was a pretty ground breaking project, the Library of Congress was the first participant with over 40,000 photos now on Flickr. The program continues today under the Flickr Foundation. Starting in 2008 there is a lot of information about the project, this webcast, a project report, and a 2024 impact report. While the project predates my time at the library by a decade and I have nothing to do with these collections with my job at LC I was really compelled by having potentially 17 years of data about interactions between the public and these materials. This post is going to analyze and visualize that data.
Data + Code
I’ll be using data from the public Flickr API that I harvested back in 2024 (I unreliably work on too many personal projects for years and then eventually something will cause me to finish one, like being furloughed in a US government shutdown). So this is all public data, and the code I use to do everything on this page can be found in this Github repo.
Interactions
The comments are the big thing with this project. They are the largest interaction surface between the public and the photos. With over 95,000 comments made on the photos over the 17 years there are a lot of questions in my mind as to what people are saying. To organize them I built embeddings for all 95K comments using the Google Gemini gemini-embedding-001 model. This produces a 3072-dimensional vector for each comment which I then reduced to two-dimensional space and ran some clustering over them to build communities of comments. I then sent a random sample of each off to a LLM to classify them into a group based on the actual text of the comments.
Where is it?: LC & Flickr Commons
This is one among many items I will regularly tag in Pinboard as oegconnect, and automatically post tagged as #OEGConnect to Mastodon. Do you know of something else we should share like this? Just reply below and we will check it out.
Or share it directly to the OEG Connect Sharing Zone