Levenshtein distance in PHP without 255 characters limit

I needed the levenshtein function from PHP, but unfortunately are there some restrictions.

  • 255 characters limit
  • Problems with UTF-8
  • Only integers possible for custom costs

I have implemented the levenshtein function on my own in plain PHP.

Install gordonlesti/levenshtein

You can find the small library at Github.

Requirements

The library requires PHP 7.

Composer

The recommended installtion is via Composer.

composer require gordonlesti/levenshtein

Usage

Use GordonLesti\Levenshtein\Levenshtein.

use GordonLesti\Levenshtein\Levenshtein;

Calculate the levenshtein distance with default costs of 1 for every operation.

$levDist = Levenshtein::levenshtein("AC", "ABAA");

Calculate the levenshtein distance with custom costs like 7 for the insert operation, 9 for the replace operation and 2 for the delete operation for example.

$levDist = Levenshtein::levenshtein("ACCB", "BC", 7, 9, 2);

Advantages

Here some advantages of gordonlesti/levenshtein.

No string length limit

There is no limit to the string length, but you should know that the algorithm has a complexity of n*m.

UTF-8 support

The library supports UTF-8 thanks to the help of Jan Knipper.

$levDist = Levenshtein::levenshtein("♫⚓⚓♥", "♫♥⚓♫⚓♫");

Floating point custom costs

You have the possibility to use float for the custom costs of the insert, replace or delete operations.

$levDist = Levenshtein::levenshtein("ACCB", "BC", 7.7, 9.4, 2.5);

Disadvantages

The library is written in plain PHP and has a run time minus compared to the implementation in C.

Next Previous