In computer science, the Knuth–Morris–Pratt string-searching algorithm (or KMP algorithm) searches for occurrences of a “word” W within a main “text string” S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.
LPS
Longest proper prefix which is also suffix
Algorithm
Prefix Suffix Proper prefix is prefix which doesnt include the string itself
lps[i]
= the longest proper prefix ofpat[0..i]
which is also a suffix ofpat[0..i]
Examples of lps[]
construction:
Pattern | lps[] |
---|---|
“AAAA” | [0, 1, 2, 3] |
“ABCDE” | [0, 0, 0, 0, 0] |
“AABAACAABAA” | [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5] |
“AAACAAAAAC” | [0, 1, 2, 0, 1, 2, 3, 3, 3, 4] |
“AAABAAA” | [0, 1, 2, 0, 1, 2, 3] |
Implementation in Python
#!/usr/bin/env python3
def get_lps_array(pat):
lps = [0] * len(pat)
len = 0 # length of the previous longest prefix suffix
lps[0] # lps[0] is always 0
i = 1
# the loop calculates lps[i] for i = 1 to M-1
while i < len(pat):
if pat[i]==pat[len]:
len += 1
lps[i] = len
i += 1
else:
# This is tricky, consider the example:
# AAACAAAA and i = 7
# The idea is similar to search step
if len != 0:
len = lps[len-1]
# Also, note that we do not increment i here
else:
lps[i] = 0
i += 1
return lps
KMP Search Implementation
Implementation in Python
#!/usr/bin/env python3
def kmp_search(str, pat):
lps = get_lps_array(pat)
i = j = 0
while i<len(str):
if pat[j]==str[i]:
i += 1
j += 1
if j==len(pat):
print(f"Pattern found at index: {i-j}")
j = lps[j-1]
elif i<len(str) and pat[j]!=str[i]:
if j!=0:
j = lps[j-1]
else:
i += 1
Time Complexity
Time Complexity: O(n+m)