KMP Algorithm for Pattern Searching

What is KMP (Knuth-Morris-Pratt) algorithm? How to implement KMP algorithm in Python?

In computer science, the Knuth–Morris–Pratt string-searching algorithm (or KMP algorithm) searches for occurrences of a “word” W within a main “text string” S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.

LPS

Longest proper prefix which is also suffix

Algorithm

Prefix Suffix Proper prefix is prefix which doesnt include the string itself

lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i]

Examples of lps[] construction:

Patternlps[]
“AAAA”[0, 1, 2, 3]
“ABCDE”[0, 0, 0, 0, 0]
“AABAACAABAA”[0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]
“AAACAAAAAC”[0, 1, 2, 0, 1, 2, 3, 3, 3, 4]
“AAABAAA”[0, 1, 2, 0, 1, 2, 3]

Implementation in Python

#!/usr/bin/env python3


def get_lps_array(pat):
    lps = [0] * len(pat)
    len = 0 # length of the previous longest prefix suffix
  
    lps[0] # lps[0] is always 0
    i = 1
  
    # the loop calculates lps[i] for i = 1 to M-1
    while i < len(pat):
        if pat[i]==pat[len]:
            len += 1
            lps[i] = len
            i += 1
        else:
            # This is tricky, consider the example:
            # AAACAAAA and i = 7 
            # The idea is similar to search step
            if len != 0:
                len = lps[len-1]
  
                # Also, note that we do not increment i here
            else:
                lps[i] = 0
                i += 1
    return lps

KMP Search Implementation

Implementation in Python

#!/usr/bin/env python3


def kmp_search(str, pat):
    lps = get_lps_array(pat)
    
    i = j = 0
    while i<len(str):
        if pat[j]==str[i]:
            i += 1
            j += 1
        
        if j==len(pat):
            print(f"Pattern found at index: {i-j}")
            j = lps[j-1]
        
        elif i<len(str) and pat[j]!=str[i]:
            if j!=0:
                j = lps[j-1]
            else:
                i += 1

Time Complexity

Time Complexity: O(n+m)