Unsplash Dataset Documentation

June 12, 2026 ยท View on GitHub

The Unsplash Dataset is composed of multiple CSV files:

1 - photos.tsv

The photos.tsv dataset has one row per photo. It contains properties of the photo, the name of the contributor, the image URL, and overall stats.

FieldDescription
photo_idID of the Unsplash photo
photo_urlPermalink URL to the photo page on unsplash.com
photo_image_urlURL of the image file. Note: this is a dynamic URL, so you can apply resizing and customization operations directly on the image
photo_submitted_atTimestamp of when the photo was submitted to Unsplash
photo_featuredWhether the photo was promoted to the Editorial feed or not
photo_widthWidth of the photo in pixels
photo_heightHeight of the photo in pixels
photo_aspect_ratioAspect ratio of the photo
photo_descriptionDescription of the photo written by the photographer
photographer_usernameUsername of the photographer on Unsplash
photographer_first_nameFirst name of the photographer
photographer_last_nameLast name of the photographer
exif_camera_makeCamera make (brand) extracted from the EXIF data
exif_camera_modelCamera model extracted from the EXIF data
exif_isoISO setting of the camera, extracted from the EXIF data
exif_aperture_valueAperture setting of the camera, extracted from the EXIF data
exif_focal_lengthFocal length setting of the camera, extracted from the EXIF data
exif_exposure_timeExposure time setting of the camera, extracted from the EXIF data
photo_location_nameLocation of the photo
photo_location_latitudeLatitude of the photo
photo_location_longitudeLongitude of the photo
photo_location_countryCountry where the photo was made
photo_location_cityCity where the photo was made
stats_viewsTotal # of times that a photo has been viewed on the Unsplash platform
stats_downloadsTotal # of times that a photo has been downloaded via the Unsplash platform
ai_descriptionTextual description of the photo, generated by a 3rd party AI
ai_primary_landmark_nameLandmark present in the photo, generated by a 3rd party AI
ai_primary_landmark_latitudeLatitude of the landmark, generated by a 3rd party AI
ai_primary_landmark_longitudeLongitude of the landmark, generated by a 3rd party AI
ai_primary_landmark_confidenceLandmark confidence of the 3rd party AI
blur_hashBlurHash hash of the photo

2 - keywords.tsv

The keywords.tsv dataset has one row per photo-keyword pair. It contains data about how a keyword is connected to a photo and the conversions of the photo our search engine for a particular keyword.

FieldDescription
photo_idID of the Unsplash photo
keywordKeyword or search term
ai_service_1_confidenceConfidence for the keyword from a 3rd party AI (0-100)
ai_service_2_confidenceConfidence for the keyword from another 3rd party AI (0-100)
suggested_by_userWhether the keyword was added by a user (human)
user_suggestion_sourceThe type of user that suggested or set the keyword (photographer, admin or unknown)
suggested_by_ai_service_3The keyword was suggested by another 3rd party AI
confirmed_by_ai_service_3The keyword was confirmed to be relevant by another 3rd party AI

We use different AI services to generate tags. Some are not in use anymore but we still have historical data, some are more recent and will only show data for more recent photos.

3 - collections.tsv

Note: A collection on Unsplash is a user created grouping of photos. These are similar to boards on Pinterest and can often group photos in complex and creative ways. Another type of collection is topics. Topics are different content-specific photo feeds available on the website

The collections.tsv dataset has one row per photo-collection/topic pair. Whenever a photo belongs to a collection or a topic, it will appear as one row. Each row describes when the photo was added to the collection/topic and gives the title of the collection/topic.

FieldDescription
photo_idID of the Unsplash photo
collection_idID of the Unsplash collection containing the photo
collection_titleTitle of the collection containing the photo
photo_collected_atTimestamp of when the photo was added to the collection
collection_typeType of the collection (collection or topic)

4 - conversions.tsv

Note: a conversion is currently defined as a user selecting an image to download it.

The conversions.tsv dataset has one row per search conversion. The dataset tells you which photo has been downloaded for a search, the country of origin, and an anonymous identifier to indiciate the unique users. The data goes back up to 1 year before the release of each version of the dataset.

FieldDescription
converted_atTimestamp of the conversion event
conversion_typeType of conversion (download only for now)
keywordKeyword that was searched and led to the conversion
photo_idPhoto ID of the photo that converted
anonymous_user_idAnonymous user ID
conversion_countryCountry code of the device geolocation
device_typeType of device that interacted with the photo
search_orientation_filterOrientation (portrait/landscape/all) filter applied to the search
search_orderingOrder setting for the search results (popular/recent)

5 - colors.tsv

Note: The coverage and score data comes from a 3rd party AI

The colors.tsv dataset has one row per major color present in the photo. The dataset tells which colors are contained within a photo, their coverage as a percentage, and a score for how in focus the color is.

FieldDescription
photo_idID of the Unsplash photo
hexHexadecimal representation of the color
redRed component of the photo in the RGB system
greenGreen component of the photo in the RGB system
blueBlue component of the photo in the RGB system
keywordName of the closest color as a CSS color keyword
coveragePixel coverage of the color as a percentage
scoreScore of the color in the photo (including the notion of focus)

Combining datasets

You can merge the different datasets through the primary key ID fields (usually the photo_id field). With this you'll be able to cross-reference properties from the photos dataset with data from the keywords or conversions dataset.


For help loading the dataset, see the how to docs.