It’s been a while.
Some days ago I thought to use this algorithm to remove duplicate escape characters in my md-toc program. I then realized I didn’t need it and also noticed that dealing correctly with escape characters seems very hard. So I generalized the algorithm to get any number of a specified charatcter in a string, except \
!!, to be halvened.
Algorithm #
Here’s the algorithm along with some unit tests:
import math
import unittest
def halve_characters(s, remove_char):
assert isinstance(s, str)
assert isinstance(remove_char, str)
assert len(remove_char) == 1
assert remove_char != '\\'
i = 0
final_string = str()
while i < len(s):
if s[i] == remove_char:
j = i
count = 1
match = True
while j < len(s) - 1 and match:
if s[j] == s[j + 1]:
count += 1
else:
match = False
j += 1
remove_char_count = math.floor(count / 2)
final_string += remove_char_count * remove_char
i += count
else:
final_string += s[i]
i += 1
return final_string
class Test(unittest.TestCase):
def test_halve_characters(self):
self.assertEqual(halve_characters('thiis is a stringiioo. Hiiii', 'i'), 'this s a strngioo. Hii')
self.assertEqual(halve_characters('****\n', '*'), '**\n')
self.assertEqual(halve_characters('\\\n\\\\\n', '\n'), '\\\\\\')
self.assertEqual(halve_characters('', 'a'), '')
self.assertEqual(halve_characters('abcdefffghifffff', 'f'), 'abcdefghiff')
if __name__ == '__main__':
unittest.main()
Explanation #
The idea behind it is that we iterate character by character through the string until we find the an element that needs to be removed: if s[i] == remove_char
If this condition is true, we set a counter count = 1
which is a counter for the number of remove_char
characters.
We assume that the next character in the string will also be remove_char
. If it is it then we continue to count the occurrencies of remove_char
.
Once we reach the end of the string or if there are no more consecutive remove_chars
we compute the halved number of remove characters with math.floor(count / 2)
and concatenate remove_char * math.floor(count / 2)
to the final string. Using the floor function implies that the count is approximated to the nearest smaller integer number.
If the current character is not if s[i] != remove_char
we simply add it to the final string.
Complexity #
The complexity is O(n) because each character is inspected only once:
while i < len(s):
if s[i] == remove_char:
[...]
# Remember what count is.
i += count
else:
i += 1
Enjoy!