OfficeIMO.Reader.Html (Preview)
May 9, 2026 ยท View on GitHub
OfficeIMO.Reader.Html is a modular adapter for HTML ingestion.
Current scope:
- HTML -> Markdown (via
OfficeIMO.Markdown.Html) - Markdown chunk emission in
ReaderChunkshape - heading-aware chunk metadata (
Location.HeadingPath,Location.StartLine) whenReaderOptions.MarkdownChunkByHeadings = true - path and stream dispatch via
DocumentReaderhandler registration - warning chunk when HTML yields no markdown content
Registration into OfficeIMO.Reader:
using OfficeIMO.Reader.Html;
DocumentReaderHtmlRegistrationExtensions.RegisterHtmlHandler();
Status:
- packaged as
OfficeIMO.Reader.Html - preview-scoped modular adapter for
OfficeIMO.Reader