SequenceMatcher
class SequenceMatcher (View source)
A Diff Sequence Matcher
Methods
The constructor. With the sequences being passed, they'll be set for the sequence matcher and it will perform a basic cleanup & calculate junk elements.
Set options for the matcher.
Set the first and second sequences to use with the sequence matcher.
Set the first sequence ($a) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
Set the second sequence ($b) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
Find the longest matching block in the two sequences, as defined by the lower and upper constraints for each sequence. (for the first sequence, $alo - $ahi and for the second sequence, $blo - $bhi)
Check if the two lines at the given indexes are different or not.
Return a nested set of arrays for all of the matching sub-sequences in the strings $a and $b.
Return a list of all of the opcodes for the differences between the two strings.
Return a series of nested arrays containing different groups of generated opcodes for the differences between the strings with up to $context lines of surrounding content.
Return a measure of the similarity between the two sequences.
Details
__construct(string|array $a, string|array $b, string|array $junkCallback = null, array $options = [])
The constructor. With the sequences being passed, they'll be set for the sequence matcher and it will perform a basic cleanup & calculate junk elements.
void
setOptions(array $options)
Set options for the matcher.
void
setSequences(string|array $a, string|array $b)
Set the first and second sequences to use with the sequence matcher.
void
setSeq1(string|array $a)
Set the first sequence ($a) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
void
setSeq2(string|array $b)
Set the second sequence ($b) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
array
findLongestMatch(int $alo, int $ahi, int $blo, int $bhi)
Find the longest matching block in the two sequences, as defined by the lower and upper constraints for each sequence. (for the first sequence, $alo - $ahi and for the second sequence, $blo - $bhi)
Essentially, of all of the maximal matching blocks, return the one that starts earliest in $a, and all of those maximal matching blocks that start earliest in $a, return the one that starts earliest in $b.
If the junk callback is defined, do the above but with the restriction that the junk element appears in the block. Extend it as far as possible by matching only junk elements in both $a and $b.
bool
linesAreDifferent(int $aIndex, int $bIndex)
Check if the two lines at the given indexes are different or not.
array
getMatchingBlocks()
Return a nested set of arrays for all of the matching sub-sequences in the strings $a and $b.
Each block contains the lower constraint of the block in $a, the lower constraint of the block in $b and finally the number of lines that the block continues for.
array
getOpCodes()
Return a list of all of the opcodes for the differences between the two strings.
The nested array returned contains an array describing the opcode which includes: 0 - The type of tag (as described below) for the opcode. 1 - The beginning line in the first sequence. 2 - The end line in the first sequence. 3 - The beginning line in the second sequence. 4 - The end line in the second sequence.
The different types of tags include: replace - The string from $i1 to $i2 in $a should be replaced by the string in $b from $j1 to $j2. delete - The string in $a from $i1 to $j2 should be deleted. insert - The string in $b from $j1 to $j2 should be inserted at $i1 in $a. equal - The two strings with the specified ranges are equal.
array
getGroupedOpcodes(int $context = 3)
Return a series of nested arrays containing different groups of generated opcodes for the differences between the strings with up to $context lines of surrounding content.
Essentially what happens here is any big equal blocks of strings are stripped out, the smaller subsets of changes are then arranged in to their groups. This means that the sequence matcher and diffs do not need to include the full content of the different files but can still provide context as to where the changes are.
float
ratio()
Return a measure of the similarity between the two sequences.
This will be a float value between 0 and 1.
Out of all of the ratio calculation functions, this is the most expensive to call if getMatchingBlocks or getOpCodes is yet to be called. The other calculation methods (quickRatio and realquickRatio) can be used to perform quicker calculations but may be less accurate.
The ratio is calculated as (2 * number of matches) / total number of elements in both sequences.