It’s been a while.
Some days ago I thought to use this algorithm to remove duplicate escape characters in my md-toc program. I then realized I didn’t need it and also noticed that dealing correctly with escape characters seems very hard. So I generalized the algorithm to get any number of a specified charatcter in a string, except
\ !!, to be halvened.
Here’s the algorithm along with some unit tests:
import math import unittest def halve_characters(s, remove_char): assert isinstance(s, str) assert isinstance(remove_char, str) assert len(remove_char) == 1 assert remove_char != '\\' i = 0 final_string = str() while i < len(s): if s[i] == remove_char: j = i count = 1 match = True while j < len(s) - 1 and match: if s[j] == s[j + 1]: count += 1 else: match = False j += 1 remove_char_count = math.floor(count / 2) final_string += remove_char_count * remove_char i += count else: final_string += s[i] i += 1 return final_string class Test(unittest.TestCase): def test_halve_characters(self): self.assertEqual(halve_characters('thiis is a stringiioo. Hiiii', 'i'), 'this s a strngioo. Hii') self.assertEqual(halve_characters('****\n', '*'), '**\n') self.assertEqual(halve_characters('\\\n\\\\\n', '\n'), '\\\\\\') self.assertEqual(halve_characters('', 'a'), '') self.assertEqual(halve_characters('abcdefffghifffff', 'f'), 'abcdefghiff') if __name__ == '__main__': unittest.main()
The idea behind it is that we iterate character by character through the string until we find the an element that needs to be removed:
if s[i] == remove_char
If this condition is true, we set a counter
count = 1 which is a counter for the number of
We assume that the next character in the string will also be
remove_char. If it is it then we continue to count the occurrencies of
Once we reach the end of the string or if there are no more consecutive
remove_chars we compute the halved number of remove characters with
math.floor(count / 2) and concatenate
remove_char * math.floor(count / 2) to the final string. Using the floor function implies that the count is approximated to the nearest smaller integer number.
If the current character is not
if s[i] != remove_char we simply add it to the final string.
The complexity is O(n) because each character is inspected only once:
while i < len(s): if s[i] == remove_char: [...] # Remember what count is. i += count else: i += 1