Unique Substrings in Wraparound String
Given a strings s
and p
, find how many unique non-empty substrings of p
appear in s
.
Consider the string s
to be the following infinite repetition of string
“abcdefghijklmnopqrstuvwxyz”.
String p
consists of only lowercase English letters and up to 10000 letters.
Input. The input file consists of one or more input strings, one per line and is terminated by an EOF character. Consider the following example.
Output. The output file consists the count of unique non-empty substrings for each input string. The following example corresponds to the previous example input.
Solution
For a given input, we count the unique non-empty substrings by counting the substrings that end with each given letter in the following way.
Consider input string “zab” and its substrings.
The input string is a substring of s
and for that reason all of its
substrings appear in s
. Therefore, the output for “zab” is the count
of substrings of “zab”, which is 6.
To determine the count of substrings of “zab”, we determine how many of those substrings end with each letter in “zab” and add the results. For each letter in “zab”, the corresponding substrings are the following.
For each letter in “zab”, the count of substrings that end with the letter is given by the position of the letter in “zab”. For example, for letter “b”, the count is 3. The reason is that there is a substring that ends in “b” for each position that precedes or is equal to the position of letter “b”.
When the input string p
is not a substring of s
, the count of
substrings that end with a given letter is the maximum count amongst
the substrings of s
that appear in p
and are longest.
Consider input string “zabyza”.
Substrings “zab” and “yza” appear in s
and are longest.
Given that letter “b” appears only in “zab”, the count of substrings
that end with “b” is given only by “zab”.
In the case of letter “y”, the count is given by “yza”.
For letters “a” and “z”, consider the corresponding substrings in
“zab” and “yza”.
For “yza”, the substrings that end with “a” and “z” are the following.
The count of substrings that ends with “a” is 3 because the maximum count between “zab” and “yza” is 3. The reason is that all substrings given by “zab” that end with “a” are also given by “yza”. In the case of “z”, the count is 2 and the maximum is also given by “yza”.
To determine the substrings of s
in p
that are longest, we
group each letter in p
with its predecessor when either of the
following conditions is true.
- The letter is ‘a’ and the predecessor letter is ‘z’.
- The predecessor letter is the previous letter in the alphabet.
Finally, when a letter is repeated in a longest substring, the count of substrings that end in the letter corresponds to the last position of the letter. For example, consider input string “abcdefghijklmnopqrstuvwxyza” and letter “a”. The count of substrings that end with “a” is 27, which is the last position of letter “a”.