Mitc.Support.FuzzySearch.EntityFramework 1.0.0

Mitc.Support.FuzzySearch.EntityFramework

SQL Server fuzzy / approximate string matching for EF Core. Two layers in one package, no UDF deployment required.

  • Layer 1 — translatable primitives: three SQL Server built-ins (SOUNDEX, DIFFERENCE, PATINDEX) exposed as LINQ-translatable string extensions. Compose into Where / OrderBy clauses; the SQL stays SQL.
  • Layer 2 — in-memory edit-distance terminator: an IQueryable<T> extension that injects a lossless SQL length-window pre-filter, materializes a small candidate set, computes Levenshtein distance in C#, and returns matches ranked by distance.

Install

dotnet add package Mitc.Support.FuzzySearch.EntityFramework

Setup

Register the Layer 1 functions once in OnModelCreating:

using Mitc.Support.FuzzySearch;

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.RegisterFuzzySearchFunctions();
}

Layer 1 — translatable primitives

using Mitc.Support.FuzzySearch;

// Phonetic equality — translates to: SOUNDEX(Name) = SOUNDEX(@term)
context.Users.Where(u => u.Name.SoundsLike("Smith"));

// 0-4 phonetic similarity (DIFFERENCE) — translates to: DIFFERENCE(Name, @term)
context.Users.Where(u => u.Name.PhoneticDistanceFrom("Smith") >= 3);
context.Users.OrderByDescending(u => u.Name.PhoneticDistanceFrom("Smith"));

// Pattern position — translates to: PATINDEX(@pattern, Name)
// Returns 1-based position of first match, 0 if no match.
// Return type is `long` (not `int`) so the result handles PATINDEX's `bigint`
// return on (n)varchar(max) / varbinary(max) columns without runtime cast errors.
context.Users.Where(u => u.Name.IsLike("%Mit%") > 0);
context.Users.OrderBy(u => u.Name.IsLike("%Mit%"));   // earlier match comes first

Layer 2 — in-memory Levenshtein terminator

// Returns matches with distance <= maxDistance, ordered by distance ascending.
List<User> matches = await context.Users
    .ToListByEditDistanceAsync(u => u.Name, "John", maxDistance: 2);

// Same selection, but each result is paired with its computed edit distance:
List<EditDistanceMatch<User>> ranked = await context.Users
    .ToMatchesByEditDistanceAsync(u => u.Name, "John", maxDistance: 2);

The terminator first runs WHERE LEN(Name) BETWEEN @term.Length - 2 AND @term.Length + 2, materializes the (now small) candidate set, then computes Levenshtein in C#. The length pre-filter is mathematically lossless on non-NULL values because Levenshtein distance ≤ k requires |len_a − len_b| ≤ k. Rows where the selector value is NULL are excluded by the pre-filter and do not match.

When to use which

Use case Recommended primitives
User search by name SoundsLike and/or PhoneticDistanceFrom (tight threshold), or ToListByEditDistanceAsync with low maxDistance for typo tolerance
Blog post / article search IsLike for substring matching, optionally combined with ToListByEditDistanceAsync on the title for typo tolerance
Document / longer-text search IsLike with looser patterns; full-text search is out of scope for this package

Performance: composing Layer 1 in front of Layer 2

The Layer 2 terminator's only built-in pre-filter is the lossless length window. On very large tables (millions of rows), the length window alone may still leave a lot of candidates to materialize. Because Layer 1 primitives are translatable, you can chain them in front of ToListByEditDistanceAsync and they stay in SQL:

var matches = await context.InspectionObservations
    .Where(io => io.Observation.Name.PhoneticDistanceFrom(term) >= 2)   // Layer 1, in SQL
    .Include(io => io.Observation)                                       // selector reads nav prop
    .ToListByEditDistanceAsync(io => io.Observation.Name, term, maxDistance: 2);

The Where(... PhoneticDistanceFrom(term) >= 2) pushes a DIFFERENCE(...) check to the database before the package's length pre-filter runs, dramatically cutting candidates ahead of materialization. Tighter phonetic thresholds (>= 3 or >= 4) cut more aggressively but have a small false-negative rate on edit-close-but-phonetically-distant pairs; >= 2 is a safe default.

For the case where the searched string lives on a small lookup joined to a fact table, the structurally-correct pattern is to run the edit-distance match against the lookup first and then filter the fact table by FK — this avoids running Levenshtein once per fact-table row.

Versioning

Multi-targeted from net5.0 through net10.0. The EF Core dependency is pinned to the matching major per target.

No packages depend on Mitc.Support.FuzzySearch.EntityFramework.

.NET 5.0

.NET 6.0

.NET 7.0

.NET 8.0

.NET 9.0

.NET 10.0

Version Downloads Last updated
1.0.0 0 5/1/2026