Custom stemmers
You can implement a custom stemmer if the default English Porter stemmer doesn’t meet your needs.
Let’s say that for some reason you needed to stem every indexed token so that it was at most 3 characters long:
public class FirstThreeLettersStemmer : IStemmer
{
public bool RequiresCaseInsensitivity => false;
public bool RequiresAccentInsensitivity => false;
public void Stem(StringBuilder builder)
{
if (builder.Length > 3)
{
builder.Length = 3;
}
}
}
RequiresCaseInsensitivity
and RequiresAccentInsensitivity
are hints used by the index at creation time that force it to enable
case/accent sensitivity. Case insensitivity means that any text passed to your stemmer will already be uppercase. Accent insensitivity means
that accents will automatically be stripped prior to being sent to the stemmer.
Once you’ve got your stemmer implemented, you just need to give it to the FullTextIndexBuilder
:
var index = new FullTextIndexBuilder<int>()
.WithDefaultTokenization(o => o.WithStemming(new FirstThreeLettersStemmer()))
.Build();
Last modified January 16, 2024: V6.0.0 (#107) (125ae87)