rehype-annotate
July 8, 2021 ยท View on GitHub
Version 1.0 and ESM support
With version 1.0+ this module is ESM only: Node 12+ is needed to use it and it must be imported instead of required.
Introduction
This rehype plugin matches W3C-style annotations to their targets in the parsed HTML file. It wraps text range selections in mark elements and adds attributes and hooks to matched locations that can be used in other processors or browser scripts to implement further behaviours.
Note: this modifies the original tree and in some cases can add class attributes. Make sure to sanitise the tree afterwards.
The script does not embed annotation-provided styles, although that is on the roadmap. We do assign the values of the styleClass property to the annotated nodes so the client can provide their own stylesheet.
License
Apache 2.0
Install
We haven't yet published this package on npm but you can install it directly from the GitHub repository.
npm:
npm install RebusFoundation/rehype-annotate
Use
rehype-annotate should be used as a rehype or unified plugin to match annotations to a hast syntax tree.
const vfile = require("to-vfile");
const unified = require("unified");
const annotate = require("rehype-annotate");
const parse = require("rehype-parse");
const stringify = require("rehype-stringify");
const report = require("vfile-reporter");
const glob = require("glob");
const path = require("path");
async function process(file, options) {
return unified()
.use(parse)
.use(annotate, options)
.use(stringify)
.process(await vfile.read(file));
}
const options = {
// Should be an array of W3C Web Annotations
annotations: require("./path/to/annotations/json"),
// the base url for the original html
url: "https://syndicated.example.com/annotated.html",
// the canonical url for the html
canonical: "https://example.org/annotated.html",
};
process("path/to/example/htmlfile.html", options)
.then((file) => {
console.log(report(file));
console.log(String(file));
})
.catch((err) => console.error(err));
The above code will print whatever issues are found out to the console, followed by the processed HTML.
The file.data.annotations property will contain the annotations that have been matched to the HTML, in the order that the appear in the HTML file itself.
API
processor.use(annotate[, options])
Configure processor to modify the [hast][hast] syntax tree to match annotations to their target locations in the HTML.
The following attributes are added when an element node is matched by a selector:
data-annotation-id: the id of the matched annotationdata-annotation-motivation: space-separated list of the motivations from the annotationmotivationproperty.data-annotation-purposespace-separated list of the purposes from the annotation body'spurposeproperty.classthe value of the annotation'sstyleClassproperty is added when present.
Example (single node match):
If the source HTML is as follows:
<h2 id="test-id">Stirs ending exceeding fond muster fall Bagshot.</h2>
And rehype-annotate is run with the following annotation:
{
"id": "http://example.com/annotations1",
"type": "Annotation",
"motivation": "bookmarking",
"creator": {
"id": "http://example.org/user1",
"type": "Person",
"name": "A. Person",
"nickname": "user1"
},
"created": "2015-10-13T13:00:00Z",
"stylesheet": {
"id": "http://example.org/stylesheet1",
"type": "CssStylesheet"
},
"body": [
{
"type": "TextualBody",
"purpose": "tagging",
"value": "love"
},
{
"type": "TextualBody",
"value": "<p>j'adore !</p>",
"format": "text/html",
"language": "fr",
"purpose": "describing"
}
],
"target": {
"source": "https://example.com/tests/fixtures/fragment-multibody.input.html",
"styleClass": "Bookmarked",
"selector": {
"type": "FragmentSelector",
"value": "test-id"
}
}
}
Then the result should be (provided the url or canonical options match the source):
<h2
id="test-id"
data-annotation-id="http://example.com/annotations1"
data-annotation-motivation="bookmarking"
class="Bookmarked"
data-annotation-purpose="tagging describing"
>
Stirs ending exceeding fond muster fall Bagshot.
</h2>
Example (text range match):
If the source HTML is as follows:
<p>
Resilient Garulf key quest abandon knives lifted niceties tonight disappeared
strongest plates. Farthing ginger large. Nobody tosses a Dwarf. Makes
Shadowfax nearly lesser south deceive hates 22nd missing others!
</p>
And rehype-annotate is run with the following annotation:
{
"id": "http://example.com/annotations1",
"type": "Annotation",
"motivation": "highlighting",
"created": "2015-10-13T13:00:00Z",
"body": [
{
"type": "TextualBody",
"value": "<p>j'adore !</p>",
"format": "text/html",
"language": "fr",
"purpose": "commenting"
}
],
"target": {
"source": "https://example.com/tests/fixtures/text-quote.input.html",
"styleClass": "Bookmarked",
"selector": {
"type": "TextQuoteSelector",
"exact": "Resilient Garulf key quest abandon knives"
}
}
}
Then the result should be (provided the url or canonical options match the source):
<p>
<mark
data-annotation-id="http://example.com/annotations1"
data-annotation-motivation="highlighting"
class="Bookmarked"
data-annotation-purpose="commenting"
>Resilient Garulf key quest abandon knives</mark
>
lifted niceties tonight disappeared strongest plates. Farthing ginger large.
Nobody tosses a Dwarf. Makes Shadowfax nearly lesser south deceive hates 22nd
missing others!
</p>
options
options.annotations
An array of annotations that conform to the W3C Web Annotations Data Model. See note below on selector support.
options.canonical and options.url
The annotation is only matched to the html source if the annotation.target.source property matches either canonical or url.
Selector Support
CssSelector: limited to the selectors supported byhast-util-selectXPathSelector: becauserehype/hastdoesn't come with built-inxpathsupport, these selectors only work when they are very simple. E.g./html/body/p[1]FragmentSelector: supports only HTML fragment ids.RangeSelector: supported when bothstartSelectorandendSelectorresolve to element nodes.TextQuoteSelector: implementation is loosely based on the excellentdom-anchor-text-quoteby Randall Leeds.TextPositionSelector
For performance reasons text quote and text position selectors that overlap each other in the document are not supported.
refinedBy
You can use the refinedBy property on a selector that resolves to a single element node (CssSelector, XPathSelector, FragmentSelector) to create a new scope or root for a another selector, including the TextPositionSelector or the TextQuoteSelector.