Annotation

April 29, 2026 · View on GitHub

Contains information on the annotations in the Harvard Art Museums collections.

Each annotation indicates an area of interest in an image in our dataset. An area can be a small portion of an image or it can be the entire image. Small regions typically contain a persons face or words. Annotations on the full image are typically tags (e.g. cat, watermelon, rock) and descriptions (e.g. a cow standing in a field).

Annotations come from a number of different sources. Some of the annotations are created by humans. Most of the data is a result of machine processing our images with external services. We use the following services.

We use the services to perform face detection, text detection, tagging, categorization, and captioning. We do not use custom training sets with these services. We feed in our images and data as-is. We then process, store, and serve whatever comes out.

Note: We are paying customers of these services. We do not have partnerships with these services, nor do we endorse any one particular service. All of the services are black boxes so we use them and provide the data, in part, to call attention to the differences and biases inherent in AI services.

Get Annotations

GET /annotation will get all annotations.

Include one or more of the following parameters to filter the items.

ParameterValue
apikeyYOUR API KEY required
qFIELD:VALUE (see Elasticsearch Query String syntax for more options)
size0-9+
page0-9+
sortFIELD NAME or "random" or "random:[SEED NUMBER]"
sortorderasc or desc
fieldscomma separated list of data fields you want in the output
aggregationsee Elasticsearch aggregations
idpipe separated list of record IDs
imageIMAGE ID

Examples

https://api.harvardartmuseums.org/annotation
Returns all of the annotations.

Response

{
    "info": {
        "totalrecordsperquery": 2,
        "totalrecords": 6274466,
        "pages": 3137233,
        "page": 1,
        "next": "https://api.harvardartmuseums.org/annotation?size=2&page=2",
        "responsetime": "10 ms"
    },
    "records": [
        {
            "id": 1798459,
            "imageid": 371197,
            "annotationid": 1798459,
            "fileid": 359384,
            "target": "https://nrs.harvard.edu/urn-3:HUAM:DDC110593_dynmc/full/full/0/default.jpg",
            "body": "modern",
            "source": "Clarifai",
            "feature": "full",
            "confidence": 0.88893,
            "type": "tag",
            "raw": {
                "name": "modern",
                "app_id": "main",
                "value": 0.88892984,
                "id": "ai_4Qjv5PTH"
            },
            "createdate": "2018-03-22T21:30:54+00:00",
            "accesslevel": 1,
            "model": "unknown",
            "selectors": [
                {
                    "type": "FragmentSelector",
                    "value": "xywh=0,0,1024,916"
                }
            ]
        },
        {
            "raw": {
                "description": "A",
                "boundingPoly": {
                    "vertices": [
                        {
                            "y": 458,
                            "x": 171
                        },
                        {
                            "y": 458,
                            "x": 199
                        },
                        {
                            "y": 482,
                            "x": 199
                        },
                        {
                            "y": 482,
                            "x": 171
                        }
                    ]
                }
            },
            "body": "A",
            "createdate": "2018-06-27T22:53:39-04:00",
            "fileid": 155102,
            "confidence": -1,
            "type": "text",
            "imageid": 188184,
            "id": 333467,
            "lastupdate": "2018-10-30T13:24:40-0400",
            "annotationid": 333467,
            "source": "Google Vision",
            "selectors": [
                {
                    "value": "xywh=171,458,33,29",
                    "type": "FragmentSelector"
                }
            ],
            "target": "https://nrs.harvard.edu/urn-3:huam:INV102590_prdwork/full/full/0/default.jpg",
            "feature": "region"
        }
    ]
}

Get annotation

GET /annotation/ANNOTATION_ID will get the full record of the specified annotation.

annotationid describes the numeric unique identifier for a record

imageid describes the numeric identifier for the image on which the annotation occurs

body describes the actual annotation

selectors describes the region of the image where the annotation occurs
Selectors take many forms. By default all of our selectors will conform to the Media Fragment Specification as documented in the W3C's Web Annotation Data Model.

source describes how the annotation was generated. These are the possible values.

  • Amazon - means the annotation was generated using machine processing
  • Anthropic - means the annotation was generated using machine processing
  • AWS Rekognition - means the annotation was generated using machine processing
  • Azure OpenAI Service - means the annotation was generated using machine processing
  • Clarifai - means the annotation was generated using machine processing
  • Google Gemini - means the annotation was generated using machine processing
  • Google Vision - means the annotation was generated using machine processing
  • Imagga - means the annotation was generated using machine processing
  • Manual - means the annotation was generated by a human
  • Meta - means the annotation was generated using machine processing
  • Microsoft Cognitive Services - means the annotation was generated using machine processing
  • Mistral - means the annotation was generated using machine processing
  • Qwen - means the annotation was generated using machine processing

model describes the model used for machine generated annotations

feature describes the size of the annotation in relation to the image it is on. These are the possible values.

  • region - the annotation describes a portion of the image
  • full - the annotation describes the entire image

type describes generally what the annotation contains. These are the possible values.

  • face - a face is found within the annotation
  • text - some text is found within the annotation
  • category - a broad categorization of the contents of the image
  • description - a full sentence caption of the contents of the image
  • tag - a term or set of terms

confidence describes the services degree of confidence in its thinking. Values range from 0 (no confidence) to 1 (very confident). A value of -1 means confidence is unknown or was not provided by the service for the annotation.

raw describes the unparsed version of the annotation directly from the source

Examples

https://api.harvardartmuseums.org/annotation/9329
Returns the full record for the annotation with the ID number 9329.

Response

{
    "id": 9329,
    "imageid": 441393,
    "annotationid": 9329,
    "fileid": 4967858,
    "target": "https://nrs.harvard.edu/urn-3:HUAM:755630/full/full/0/default.jpg",
    "body": "Düsseldorf",
    "source": "Google Vision",
    "feature": "region",
    "confidence": -1,
    "type": "text",
    "raw": {
        "description": "Düsseldorf",
        "boundingPoly": {
            "vertices": [
                {
                    "y": 641,
                    "x": 341
                },
                {
                    "y": 640,
                    "x": 425
                },
                {
                    "y": 662,
                    "x": 425
                },
                {
                    "y": 663,
                    "x": 341
                }
            ]
        }
    },
    "createdate": "2018-02-09T17:10:56+00:00",
    "accesslevel": 1,
    "model": "unknown",
    "selectors": [
        {
            "type": "FragmentSelector",
            "value": "xywh=1068,2005,268,77"
        }
    ]
}