OfficeIMO.Reader.Html (Preview)

May 9, 2026 ยท View on GitHub

OfficeIMO.Reader.Html is a modular adapter for HTML ingestion.

Current scope:

  • HTML -> Markdown (via OfficeIMO.Markdown.Html)
  • Markdown chunk emission in ReaderChunk shape
  • heading-aware chunk metadata (Location.HeadingPath, Location.StartLine) when ReaderOptions.MarkdownChunkByHeadings = true
  • path and stream dispatch via DocumentReader handler registration
  • warning chunk when HTML yields no markdown content

Registration into OfficeIMO.Reader:

using OfficeIMO.Reader.Html;

DocumentReaderHtmlRegistrationExtensions.RegisterHtmlHandler();

Status:

  • packaged as OfficeIMO.Reader.Html
  • preview-scoped modular adapter for OfficeIMO.Reader