Jens Neuhalfen

This & That

Random Ramblings about Life, the Universe, and Everything

LeetCode #1 : Two Sum

$O(n)$ solution in python3: 44 ms, faster than 97,63%

[Jens Neuhalfen]

8 minutes read

Description

Given an array of integers, return indices of the two numbers such that they add up to a specific target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

Example:

Given nums = [2, 7, 11, 15], target = 9,

Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].

(From the Two-Sum problem description)

Test Vectors

Requirement 1
If the nums vector does not contain a valid solution, return an empty list.
Requirement 2
The indexed sum matches the target: \(\sum_{i \in solution}(nums_{i}) = target\)
Requirement 3
No index in solution can be used twice.
Requirement 4
\(|solution| = 2\) - … return the indices of the two numbers …
nums target solution Comment
[1, 2] 4 [] Req. #1 - no possible solution
[] 1 [] Req. #1 - edge case
[2, 7, 11 ,15] 9 [0, 1] Req. #2
[7, 5, 2] 14 [] [0, 0] violates req. #3. [0, 1, 2] violates req. 4

Solutions

Problem API

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:

Test Results

This leads to the following tests:

import os
import pytest


class BaseTestClass:
    """
    Info: the results are order-insensitive. To make the tests easier, we sort them.
    """
    def test_original_vector_ok(self):
        result = self.twoSum( nums=[2,7,11,15], target=9)
        result.sort()
        assert result  == [0, 1]

    def test_no_nums_give_no_result(self):
        assert self.twoSum( nums=[], target=9)  == []

    def test_no_possible_solution_gives_no_result(self):
        assert self.twoSum( nums=[1,2,3], target=999)  == []

    def test_same_elements_can_be_used(self):
        # The requirement only cares about duplicate indices
        result = self.twoSum( nums=[7,7], target=14)
        result.sort()
        assert result == [0, 1]

    def test_same_elements_can_be_used_2(self):
        # The requirement only cares about duplicate indices
        result = self.twoSum( nums=[7,1,7], target=14)
        result.sort()
        assert result == [0, 2]

    def test_the_same_index_cannot_be_used_twice(self):
        # Without req. 3 [0,0] would be a solution
        assert self.twoSum( nums=[7,9], target=14)  == []

    def test_only_two_elements_are_used(self):
        # This vector has a solution with more than two numbers
        assert self.twoSum( nums=[7,6,1], target=14)  == []

    def test_finds_one_of_multiple_solutions(self):
        result = self.twoSum( nums=[1,2,3,4], target=5)
        result.sort()
        assert result in [[0,3], [1,2]]

    def test_very_long_inputs_succeed(self):
        target = 16000
        nums = [ x * 2 for x in range(1, target // 2)]
        res = self.twoSum(nums=nums, target=target)
        res_sum = nums[res[0]] + nums[res[1]]
        assert res_sum == target

Solution 1: Simple, testable, \(O(n**2)\)

A straight forward solution to this problem is to iterate through all permutations of 2 different elements of the nums. Since the indices are needed as a result, the permutations are generated over the indices 0..len(nums).

from typing import List

def two_element_permutations(count):
    """
    Create the two element permutations of the first count numbers (0..count-1).

    - duplicates, e.g. (1, 1) are not emitted
    - the results are unique regardles of order.
      E.g. if (a, b) has already been returned, (b, a) is not returned.
    """
    # range goes from a..b-1.
    for a in range(0, count - 1):
        for b in range(a + 1, count - 0):
            yield (a, b)

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        for (a,b) in two_element_permutations(len(nums)):
            if nums[a] + nums[b] == target: return [a, b]
        return []

Test Results

In addition to the functional tests, the generation of the permutations can be tested.

The tests for the permutation only test the first three cases: 0..3 elements because the cases 4.. are just case 3, only longer.

import os
import pytest

from solution1a import two_element_permutations

def test_list_of_none_is_empty():
    assert [ perm for perm in two_element_permutations(0) ] == []

def test_list_of_one_is_empty():
    assert [ perm for perm in two_element_permutations(1) ] == []

def test_list_of_two():
    assert [ perm for perm in two_element_permutations(2) ] == [(0,1)]

def test_list_of_three():
    assert [ perm for perm in two_element_permutations(3) ] == [(0,1), (0,2), (1,2)]

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/001-two_sum, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 13 items

test_1a-solution.py .........                                            [ 69%]
test_1a-permutations.py ....                                             [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
========================== 13 passed in 0.05 seconds ===========================


Tests ran in 0.19362594500125851 seconds

Solution 1b: Simple, slightly faster, \(O(n**2)\)

The first solution is very decomposed and each part is testable. Unfortunately yield is too slow for leetcode: with a large dataset a timeout is reached.

A quick fix is to inline the generator:

from typing import List

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        count = len(nums)
        for a in range(0, count - 1):
            for b in range(a + 1, count - 0):
                if nums[a] + nums[b] == target: return [a, b]
        return []

Test Results

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/001-two_sum, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 9 items

test_1b-solution.py .........                                            [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
=========================== 9 passed in 0.02 seconds ===========================


Tests ran in 0.16958735800290015 seconds

Solution 2a: Using a dictionary: \(O(n)\)

The first two solutions are easy to understand but they are inefficient for larger n (longer nums lists).

Fortunately the problem description gives some constraints that can be leveraged:

  • The result is exactly two values
  • Given a number a it is trivial to calculate b such that a + b = target

If there would be a way to quickly (meaning: better than \(O(n)\)) find out if a given value is contained in the nums list then the whole algorithm can be greatly improved.

A set is a data structure that provides \(O(1)\) tests for membership. A dictionary (map) also allows to associate a value to the key.

The new algorithm creates a mapping from \(num \mapsto index\) (index_by_value) with \(O(1)\) lookup. The nums list is iterated. For each number a the needed b is calculated. If b is found in the dictionary (and is at a different position than a) a valid result has been found.

If there exist an \(a,b\) with \(a + b = target \land index_a \neq index_b\) it will be found, because the algorithm searches the whole nums list and tests each element a \(\in\) nums for a matching b \(\in\) nums.

One edge case needs special consideration: a == b and a + b = target with \(index_a \lt index_b\).

index_by_value stores \(a \mapsto index_a\)
In this case the found indices would be [index_b, index_a].
index_by_value stores \(a \mapsto index_b\)
In this case the found indices would be [index_a, index_b].

In the twoSum([7,7], 14) example the dictionary would look like this: { 7 => 1}. The iteration would test the first 7 as a (a_index = 0), and calculate b as target - a = b = 7. A lookup in the dictionary would return 1 as b_index. The

from typing import List

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        index_by_value = {k:v for (k,v) in zip(nums, range(0, len(nums)))}
        a_index  = 0
        for a in nums:
           b = target - a
           if b in index_by_value:
               b_index = index_by_value[b]
               # Requirement 4: no duplicate indices
               if a_index != b_index:
                  return [a_index, b_index]
           a_index += 1
        return []

Test Results

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/001-two_sum, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 9 items

test_2a-solution.py .........                                            [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
=========================== 9 passed in 0.04 seconds ===========================


Tests ran in 0.1822243479982717 seconds

Solution 2b: Using a dictionary: \(O(n)\)

Solution 2a can be improved by filling the dictionary step by step.

  • In 2a the dictionary is pre-filled. This takes time and consumes memory.
  • In 2b the dictionary only collects the numbers already visited. This restricts the algorithm to only look back, whereas in 2a it looked at the whole set. The algorithm is still correct because for any valid pair \(a, b\) with \(index_a < index_b\) it finds \(a\) when testing at \(index_b\).

from typing import List

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        index_by_value = dict()
        a_index  = 0
        for a in nums:
           b = target - a
           if b in index_by_value:
               b_index = index_by_value[b]
               # Requirement 4: no duplicate indices
               if a_index != b_index:
                  return [a_index, b_index]
           else:
               index_by_value[a] = a_index
           a_index += 1
        return []

Test Results

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/001-two_sum, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 9 items

test_2b-solution.py .........                                            [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
=========================== 9 passed in 0.04 seconds ===========================


Tests ran in 0.1827685319876764 seconds

Summary

All solutions work (except #1 which leads to a timeout for large data sets).

Solution Correct Status Runtime Memory Language
#1, \(O(n^2)\), with generator Yes Submission Time Limit Exceeded Time Limit Exceeded Python3
#2, \(O(n^2)\), tight loop Yes Submission 5988 ms, faster than 7.73% 13.7 MB, less than 79.07% Python3
#3, \(O(n)\), dictionary Yes Submission 56 ms, faster than 74.89% 14.4 MB, less than 35.11% Python3
#4, \(O(n)\), optimized dictionary Yes Submission 44 ms, faster than 97.63% 14.3 MB, less than 52.56% Python3

Recent posts

See more

Categories

About

Information security and software development are enablers for a successful company. During the last 20+ years of my working life I have always been convinced that outstanding success is only possible if everybody works together to find and reach a common goal.