Jens Neuhalfen

This & That

Random Ramblings about Life, the Universe, and Everything

LeetCode #3: Longest Substring Without Repeating Characters

$O(n)$ solution in python3: 44 ms, faster than 99.09%

[Jens Neuhalfen]

5 minutes read

Description

Given a string, find the length of the longest substring without repeating characters.

*Example 1*:

Input: "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length of 3.

*Example 2*:

Input: "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

*Example 3*:

Input: "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3.
             Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

(From the Longest Substring Without Repeating Characters problem description)

Solutions

Problem API

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:

Tests

Test vector #1 (abcabcbb) tells that the longest “longest substring without repeating characters” is abc. This reveals that the question can be rephrased as “the longest substring with unique characters”.

The following edge cases need some special consideration:

Multiple substrings with the same length
Per definition this is not relevant because only the length of the longest string is needed. It is debatable if a longest substring exists when two different strings have the same length. Example 1 of the task implies that this is not relevant.
The whole input is the same character
Experience shows that these are sometimes edge cases for loops.
The whole input is only on character long
Experience shows that this is sometimes an edge case for loops.
No input
if s == "" there is no longest substring and 0 is returned.

This leads to the following tests:

import os
import pytest


class BaseTestClass:
    def test_vector_1(self):
        assert self.lengthOfLongestSubstring("abcabcbb") == 3

    def test_vector_2(self):
        assert self.lengthOfLongestSubstring("bbbbb") == 1

    def test_vector_3(self):
        assert self.lengthOfLongestSubstring("pwwkew") == 3

    def test_empty_vector_is_0(self):
        assert self.lengthOfLongestSubstring("") == 0

    def test_one_elem_vector_is_1(self):
        """ Test the special case of 'starting' and 'ending' at once. """
        assert self.lengthOfLongestSubstring("a") == 1

    def test_detect_longest_substring_at_eof(self):
        assert self.lengthOfLongestSubstring("ababc") == 3

Solution 1a: \(O(|s| * |longest\_substring|)\)

A straight forward (but slow) algorithm is to check for each character in s if it starts the longest string. In other words: For each character (indexed by leftmost) the algorithm looks ahead until a duplicate char is found and the substring ends.

The longest substring wins.

Notes:

Caching the length of s
s is constant, and so is len(s). Even if len(s) is very fast and \(O(1)\) (source) and premature optimization is the root of all evil (or at least most of it) in programming this low hanging fruit can be picked.

Source

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        len_s = len(s)     #  len_s is constant, so we can cache the value
        longest_substring_len = 0
        for leftmost in range(0, len_s):
            current_substring = set(s[leftmost])
            lookahead = leftmost + 1
            while lookahead < len_s and s[lookahead] not in current_substring:
                current_substring.add(s[lookahead])
                lookahead +=1

            if len(current_substring) > longest_substring_len:
                longest_substring_len = len(current_substring)

        return longest_substring_len

Test results

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/003-longest_substr_without_repeating_chars, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 6 items

test_1a-solution.py ......                                               [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
=========================== 6 passed in 0.03 seconds ===========================


Tests ran in 0.1789331170002697 seconds

Runtime analysis

This implementation boils down to two nested loops:

  • An outer loop (for leftmost in range(0, len_s)) loops n = len(s) times: \(\rightarrow O(|s|)\) The outer loop nests the following
    • indexed access into s (s[leftmost]): \(O(1)\) (source)
    • set creation (set(...)): \(O(1)\)
    • A nested loop (while lookahead < len_s and s[lookahead] not in current_substring) is bound by the length of the longest substring: \(O(|longest\_substring|)\).
      • indexed access into s (s[lookahead]): \(O(1)\) (source)
      • set membership test (s[lookahead] not in current_substring): \(O(1)\) (source)
      • set add (current_substring.add(s[lookahead])): probably \(O(1)\) (source only says that for dict this is \(O(1)\))
      • \(\rightarrow\) \(O(1)\)) per loop iteration

\(\rightarrow\) the average case behavior is \(O(|s| * |longest\_substring|)\) \(\blacksquare\)

Solution 1b: Moving window: \(O(n)\)

This builds on the previous solution. The idea is that the previous version can be optimized by removing the lookahead.

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        longest_substring_len = 0
        current_substring = set()
        index_of_start = 0

        for rightmost_char in s:
            if rightmost_char in current_substring:
                # also end the substring but continue
                longest_substring_len = max(longest_substring_len, len(current_substring))

                # Shrink the window until we can add the next char
                while True:
                    current_substring.remove(s[index_of_start])
                    index_of_start += 1
                    if not rightmost_char in current_substring:
                        break

                current_substring.add(rightmost_char)
            else:
                current_substring.add(rightmost_char)

        # end of input: The longest string could also end with s
        longest_substring_len = max(longest_substring_len, len(current_substring))

        return longest_substring_len

Test results

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/003-longest_substr_without_repeating_chars, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 6 items

test_1b-solution.py ......                                               [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
=========================== 6 passed in 0.02 seconds ===========================


Tests ran in 0.16662812398863025 seconds

Runtime analysis

This implementation boils down to two nested loops:

  • set creation (set(...)): \(O(1)\)
  • An outer loop (for rightmost_char in s) over the whole string s times \(\rightarrow O(|s|)\). The outer loop nests the following cases
    • Sliding window found a duplicate at the current position \(\rightarrow O(|longest\_substring|)\)
      • set membership test (rightmost_char in current_substring) \(\rightarrow O(1)\)
      • Reduction of the sliding window, max. \(O(|longest\_substring|)\) times. \(\rightarrow O(|longest\_substring|)\)
        • Remove element from set \(O(1)\) (source although it is only guaranteed for dict)
        • set membership test (rightmost_char in current_substring) \(\rightarrow O(1)\)
        • \(\rightarrow O(1)\) per iteration
    • Extend the substring (current_substring.add(rightmost_char)): \(O(1)\)

\(\rightarrow\) superficially the average case behavior is \(O(|s| * |longest\_substring|)\).

A closer look at the implementation shows a different picture. Over all iterations the nested loop cannot drop more than \(|s|\) elements from the window.

In other words, the amortized runtime of the nested loop is \(O(1)\).

\(\rightarrow\) This brings the runtime of the whole algorithm to \(O(n)\) \(\blacksquare\)

This also explains why this implementation is more than 12 times faster than the previous version when using the leetcode test set.

Summary

Solution Correct Link Runtime Memory Language
#1, Solution 1a (quadratic runtime) Yes Submission 544 ms, faster than 13.77% 12.9 MB, less than 100.00% Python3
#2, Solution 1b (linear runtime) Yes Submission 44 ms, faster than 99.09% 12.7 MB, less than 100.00% Python3

Recent posts

See more

Categories

About

Information security and software development are enablers for a successful company. During the last 20+ years of my working life I have always been convinced that outstanding success is only possible if everybody works together to find and reach a common goal.