Document text extraction

Text extraction is the process by which fragments of text are lifted from a larger body of text prior to tokenization.

For example, the text being indexed may be an XML or HTML document and you may only want to index the text content of the elements:

var index = new FullTextIndexBuilder<int>()
    .WithTextExtraction<XmlTextExtractor>()
    .Build();

Text extraction is only applied when indexing text, i.e. calls to the IFullTextIndex.AddAsync overloads. When searching, text extraction is never applied to any query text.

Last modified January 16, 2024: V6.0.0 (#107) (125ae87)