Skip to main content
  1. posts/

LeetCode #8 : String to Integer (atoi)

·3964 words·19 mins
Coding Leetcode Python Profiling

Preface
#

This problem is quite simple and does not allow for many algorithmic improvements. In fact it would be difficult to come up with a non-optimal (bigger than \(O(n)\)) solution. This makes this problem a good candidate for benchmarking.

I used the following tools:

CAVE: All tests and benchmarks are run while exporting this page to HTML. Since performance tests are not reliably reproducible numbers inside the text (… took \(X {\mu}s\)) are from previous runs and can be slightly different from the actual numbers in the benchmark results.

Without much further ado …

Description
#

Implement atoi which converts a string to an integer.

The function first discards as many whitespace characters as necessary until the first non-whitespace character is found. Then, starting from this character, takes an optional initial plus or minus sign followed by as many numerical digits as possible, and interprets them as a numerical value.

The string can contain additional characters after those that form the integral number, which are ignored and have no effect on the behavior of this function.

If the first sequence of non-whitespace characters in str is not a valid integral number, or if no such sequence exists because either str is empty or it contains only whitespace characters, no conversion is performed.

If no valid conversion could be performed, a zero value is returned.

Note:

  • Only the space character ’ ’ is considered as whitespace character.
  • Assume we are dealing with an environment which could only store integers within the 32-bit signed integer range: [−2^31, 2^31 − 1]. If the numerical value is out of the range of representable values, INT_MAX (2^31 − 1) or INT_MIN (−2^31) is returned.

Example 1:

Input: "42"
Output: 42

Example 2:

Input: "   -42"
Output: -42

Explanation: The first non-whitespace character is ‘-’, which is the minus sign. Then take as many numerical digits as possible, which gets 42.

Example 3:

Input: "4193 with words"
Output: 4193

Explanation: Conversion stops at digit ‘3’ as the next character is not a numerical digit.

Example 4:

Input: "words and 987"
Output: 0

Explanation: The first non-whitespace character is ‘w’, which is not a numerical digit or a +/- sign. Therefore no valid conversion could be performed.

Example 5:

Input: "-91283472332"
Output: -2147483648

Explanation: The number “-91283472332” is out of the range of a 32-bit signed integer. Thefore INT_MIN (−2^31) is returned.

(From the problem description)

Solutions
#

The problem itself is quite easy to solve. What bit me on the first run was that I did not read the problem description thouroughly:

[…] Then, starting from this character, takes an optional initial plus or minus sign followed by as many numerical digits as possible, and interprets them as a numerical value. […]

I forgot to implement the leading ‘+’. C’est la vie …

The first working version was split into two steps: detect the number and calculate the value. Since this is not the most optimal solution (one loop for detecting the number, one for calculating the result) I only got to beat ~29% of the other implementations speed wise. Merging the value calculation brought me to “better than 66%”. Good, but not perfect. Pushing to use more python build-ins (lstrip(), isdigit()), and caching the result of len(string) got me faster than 93%. Good enough.

Problem API
#

class Solution:
    def myAtoi(self, string: str) -> int:

Tests
#

Testing is split into functional and performance testing.

import os
import pytest


class BaseTestClass:
    def test_positive_numbers_are_detected(self):
        assert 42 == self.myAtoi("42")

    def test_positive_numbers_with_plus_are_detected(self):
        assert 42 == self.myAtoi("+42")

    def test_positive_numbers_with_leading_zeroes_are_detected(self):
        assert 42 == self.myAtoi("00000000000042")

    def test_negative_numbers_are_detected(self):
        assert -42 == self.myAtoi("-42")

    def test_numbers_with_words_after_are_detected(self):
        assert 4193 == self.myAtoi("4193 with words")

    def test_words_before_yield_zero(self):
        assert 0 == self.myAtoi("words and 987")

    def test_caps_at_MIN_INT(self):
        assert -2147483648 == self.myAtoi("-91283472332")

    def test_caps_at_MAX_INT(self):
        assert 2147483647 == self.myAtoi("91283472332")

    def test_finds_above_at_MIN_INT(self):
        assert -2147483647 == self.myAtoi("-2147483647")

    def test_finds_below_MAX_INT(self):
        assert 2147483646 == self.myAtoi("2147483646")


    def test_just_text_fails(self):
        assert 0 == self.myAtoi("x")

    def test_just_whitespace_fails(self):
        assert 0 == self.myAtoi("  ")

    def test_just_plus_fails(self):
        assert 0 == self.myAtoi("+")

    def test_just_minus_fails(self):
        assert 0 == self.myAtoi("-")

    def test_leading_whitespaces_are_discarded(self):
        assert 42 == self.myAtoi(" 42"), "one whitespace"
        assert 42 == self.myAtoi("  42"), "two whitespaces"

The performance tests use pytest-benchmark to generate candlestick charts to detect data that leads to longer runtime.

import os
import pytest


class BaseBenchmarkClass:
    def test_benchmark_small_number(self, benchmark):
        assert 42 == benchmark(self.myAtoi, string="42")

    def test_benchmark_medium_number(self, benchmark):
        assert 12345 == benchmark(self.myAtoi, string="12345")

    def test_benchmark_long_number(self, benchmark ):
        assert 2**31 == benchmark(self.myAtoi, string="4298847327493274927349723947329")

    def test_benchmark_negative_number(self, benchmark):
        assert -42 == benchmark(self.myAtoi, string="-42")

    def test_benchmark_many_leading_space(self, benchmark):
        long_string = " "*100 +"124"
        assert 124 == benchmark(self.myAtoi, string=long_string)

    def test_benchmark_detect_non_number(self, benchmark):
        assert 0 == benchmark(self.myAtoi, string="words and 987")

Solution 1: Iterative \(O(n)\)
#

class Solution:
    INT_MAX = 2147483647
    INT_MIN = -2147483648

    def myAtoi(self, string: str) -> int:
        i = 0
        # Eat whitespaces
        while i < len(string) and string[i] == ' ':
            i+=1

        if i == len(string):
            return 0

        if string[i] == '-':
            sign = -1
            i +=1
        elif string[i] == '+':
            sign = 1
            i +=1
        else:
            sign = 1

        if i == len(string):
            return 0

        msd = i
        # TODO: test if '0' <= string[i] <= '9' is faster
        val = 0
        ord_0 = 48
        while i < len(string) and (0 <= ord(string[i]) - ord_0 <= 9):
           digit = ord(string[i]) - ord_0
           if val:
               val = val * 10 + digit
           else:
               val = digit
           # Early baily out for large numbers
           if val > Solution.INT_MAX:
               break
           i += 1

        val *= sign

        if val > Solution.INT_MAX:
            return Solution.INT_MAX
        elif val < Solution.INT_MIN:
            return Solution.INT_MIN
        else:
            return val

Test Results
#

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items

test_1a-solution.py ...............                                      [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
========================== 15 passed in 0.04 seconds ===========================


Tests ran in 0.18480283400276676 seconds

Analysis
#

Although the solution is correct and reasonably fast (better than 66%), the length of the input seems to have a big impact on the runtime (\(8.8 {\mu}s\) for skipping 100 spaces), \(3.2 {\mu}s\) for a 32 bit number.

Solution 2: Iterative, more built-ins \(O(n)\)
#

The algorithm itself is hardly improvable: Nothing is touched twice, all shortcuts are taken. The next step was to improve on caching constant values (len(string)), to tackle space elimination via string.lstrip(), and to use string[i].isdigit() instead of '0' <= string[i] <= '9'.

class Solution:
    INT_MAX = 2147483647
    INT_MIN = -2147483648

    def myAtoi(self, string: str) -> int:
        i = 0
        # Eat whitespaces
        string = string.lstrip()
        len_string = len(string)
        if i == len_string:
            return 0

        if string[i] == '-':
            sign = -1
            i +=1
        elif string[i] == '+':
            sign = 1
            i +=1
        else:
            sign = 1

        if i == len_string:
            return 0

        val = 0
        ord_0 = 48
        while i < len_string and string[i].isdigit():
           digit = ord(string[i]) - ord_0
           if val:
               val = val * 10 + digit
           else:
               val = digit
           # Early baily out for large numbers
           if val > Solution.INT_MAX:
               break
           i += 1

        val *= sign

        if val > Solution.INT_MAX:
            return Solution.INT_MAX
        elif val < Solution.INT_MIN:
            return Solution.INT_MIN
        else:
            return val

Test Results
#

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items

test_1b-solution.py ...............                                      [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
========================== 15 passed in 0.04 seconds ===========================


Tests ran in 0.1848075570014771 seconds

Analysis
#

Using built ins improved runtime by ~22% from 36ms down to 28ms. That is good enough (while not perfect, but what is, ever?).

Profiling allows deeper insight into the performance characteristics. Let’s first start with cProfile and gprof2dot to detect high level bottlenecks for the slowest case, parsing the large number from the previous tests 5.000 times:

Not much surprisingly most of the time is spend in myAtoi. Sadly cProfile does not trace on a per line level by default. Fortunately there is a solution: line_profiler (not shown: some magic to inject @profile into the solution).

The test should benchmark the average case, not only the worst case. Since the test cases are unknown I rolled the dice and came out with 80% short numbers and 20% very long numbers.

kernprof -l -v test_1b-profile-lines.py
Wrote profile results to test_1b-profile-lines.py.lprof
Timer unit: 1e-06 s

Total time: 0.054652 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1b.py
Function: myAtoi at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                               def myAtoi(self, string: str) -> int:
     6      5000       1241.0      0.2      2.3          i = 0
     7                                                   # Eat whitespaces
     8      5000       1462.0      0.3      2.7          string = string.lstrip()
     9      5000       1626.0      0.3      3.0          len_string = len(string)
    10      5000       1462.0      0.3      2.7          if i == len_string:
    11                                                       return 0
    12
    13      5000       1637.0      0.3      3.0          if string[i] == '-':
    14                                                       sign = -1
    15                                                       i +=1
    16      5000       1434.0      0.3      2.6          elif string[i] == '+':
    17                                                       sign = 1
    18                                                       i +=1
    19                                                   else:
    20      5000       1387.0      0.3      2.5              sign = 1
    21
    22      5000       1309.0      0.3      2.4          if i == len_string:
    23                                                       return 0
    24
    25      5000       1332.0      0.3      2.4          val = 0
    26      5000       1263.0      0.3      2.3          ord_0 = 48
    27     22000       7844.0      0.4     14.4          while i < len_string and string[i].isdigit():
    28     18000       6183.0      0.3     11.3             digit = ord(string[i]) - ord_0
    29     18000       4859.0      0.3      8.9             if val:
    30     13000       4059.0      0.3      7.4                 val = val * 10 + digit
    31                                                      else:
    32      5000       1359.0      0.3      2.5                 val = digit
    33                                                      # Early baily out for large numbers
    34     18000       5467.0      0.3     10.0             if val > Solution.INT_MAX:
    35      1000        250.0      0.2      0.5                 break
    36     17000       5016.0      0.3      9.2             i += 1
    37
    38      5000       1423.0      0.3      2.6          val *= sign
    39
    40      5000       1615.0      0.3      3.0          if val > Solution.INT_MAX:
    41      1000        295.0      0.3      0.5              return Solution.INT_MAX
    42      4000       1160.0      0.3      2.1          elif val < Solution.INT_MIN:
    43                                                       return Solution.INT_MIN
    44                                                   else:
    45      4000        969.0      0.2      1.8              return val

Much to my surprise the premature optimization (if val > Solution.INT_MAX: ...) is responsible for roughly 10% of the total runtime!

This leads me to the question, if there is a faster way to detect a 32 bit overflow?


@profile
def benchmark_this():
    INT_MAX = 2**31-1
    FROM = INT_MAX - 1_000
    TO = INT_MAX + 1_000 + 2

    dummy_counter = 0
    # The expexted way: use '>'
    for i in range(FROM, TO):
        if i > INT_MAX: # compare
            dummy_counter += 1
        else:
            dummy_counter -= 1


    BIT_31 = 2**31
    dummy_counter = 0
    # The C way: use a bitwise test
    for i in range(FROM, TO):
        if i & BIT_31: # compare
            dummy_counter += 1
        else:
            dummy_counter -= 1


for i in range(0,5_000):
    benchmark_this()

There is not much difference between comparison and bitwise test:

kernprof -l -v benchmark_overflow_test.py | grep '# compare'
    10  10010000    2646044.0      0.3     16.4          if i > INT_MAX: # compare
    20  10010000    2823236.0      0.3     17.5          if i & BIT_31: # compare

In fact comparing with > is faster than testing bitwise. No luck there.

Solution 3: (with a bug) Iterative, more built-ins, pre-calculate \(O(n)\)
#

Solution 2 showed that the test to bail out on an overflow takes roughly 10% of the total runtime for numbers bigger/smaller than INT_MAX/INT_MIN.

The reason for the guard condition is break out early for really big numbers (e.g. \(10^{1000}\)). A best effort approach is good enough there.

The main loop looks like this:

        while i < len_string and string[i].isdigit():
           # calculate val
           val = ...
           if val > Solution.INT_MAX:
               break

The longest 31 bit number is \(\lceil log_{10}(2^{31}) \rceil = 10\) digits in base 10.

This lets us pull the bail out condition before the loop:

        # 10^10 > 2^31. Any 10 digit number is bigger than 2^31
        # Add 1 because the loop stops at len_string-1
        len_string = min(len_string, 10+1)

        while i < len_string and string[i].isdigit():
           # calculate val
           val = ...

Remark: This will not work for numbers with leading zeroes, e.g. “0001”.

Test Results
#

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items

test_1c-solution.py ..F............                                      [100%]

=================================== FAILURES ===================================
________ Test1c.test_positive_numbers_with_leading_zeroes_are_detected _________

self = <test_1c-solution.Test1c object at 0x7f04cbaad0d0>

    def test_positive_numbers_with_leading_zeroes_are_detected(self):
>       assert 42 == self.myAtoi("00000000000042")
E       AssertionError: assert 42 == 0
E        +  where 0 = <bound method Test1c.myAtoi of <test_1c-solution.Test1c object at 0x7f04cbaad0d0>>('00000000000042')
E        +    where <bound method Test1c.myAtoi of <test_1c-solution.Test1c object at 0x7f04cbaad0d0>> = <test_1c-solution.Test1c object at 0x7f04cbaad0d0>.myAtoi

test_solution.py:13: AssertionError
============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
===================== 1 failed, 14 passed in 0.05 seconds ======================


Tests ran in 0.21932784400996752 seconds

Analysis
#

The new optimization got worse results than the second solution. That was unexpected!

  • Parsing long numbers was slightly slower (\(~2.6 {\mu}s\) vs. \(~2.9 {\mu}s\)). Intuitively this is not completely unexpected since we now use a more coarse comparison (\(10^{10}\) in the looping condition vs. \(2^{31}\) before the looping condition)
  • Parsing short (2 digit) numbers also got worse(\(~1 {\mu}s\) vs. \(~1.1 {\mu}s\)). This is unexpected because in essence we just moved one comparison from within the loop outside of the loop.

Let’s first validate the assumption that large numbers that will get capped are slower because the loop runs more often (keep in mind that the parser ran 5.000 times to smooth out the results):

Wrote profile results to test_1bc-profile-lines-long.py.lprof
Timer unit: 1e-06 s

Total time: 0.12127 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1b.py
Function: myAtoi at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                               def myAtoi(self, string: str) -> int:
     6      5000       1443.0      0.3      1.2          i = 0
     7                                                   # Eat whitespaces
     8      5000       1707.0      0.3      1.4          string = string.lstrip()
     9      5000       1708.0      0.3      1.4          len_string = len(string)
    10      5000       1719.0      0.3      1.4          if i == len_string:
    11                                                       return 0
    12
    13      5000       1678.0      0.3      1.4          if string[i] == '-':
    14                                                       sign = -1
    15                                                       i +=1
    16      5000       1602.0      0.3      1.3          elif string[i] == '+':
    17                                                       sign = 1
    18                                                       i +=1
    19                                                   else:
    20      5000       1465.0      0.3      1.2              sign = 1
    21
    22      5000       1501.0      0.3      1.2          if i == len_string:
    23                                                       return 0
    24
    25      5000       1459.0      0.3      1.2          val = 0
    26      5000       1397.0      0.3      1.2          ord_0 = 48
    27     50000      18845.0      0.4     15.5          while i < len_string and string[i].isdigit():
    28     50000      18160.0      0.4     15.0             digit = ord(string[i]) - ord_0
    29     50000      14345.0      0.3     11.8             if val:
    30     45000      16059.0      0.4     13.2                 val = val * 10 + digit
    31                                                      else:
    32      5000       1422.0      0.3      1.2                 val = digit
    33                                                      # Early baily out for large numbers
    34     50000      16217.0      0.3     13.4             if val > Solution.INT_MAX:
    35      5000       1416.0      0.3      1.2                 break
    36     45000      14063.0      0.3     11.6             i += 1
    37
    38      5000       1775.0      0.4      1.5          val *= sign
    39
    40      5000       1787.0      0.4      1.5          if val > Solution.INT_MAX:
    41      5000       1502.0      0.3      1.2              return Solution.INT_MAX
    42                                                   elif val < Solution.INT_MIN:
    43                                                       return Solution.INT_MIN
    44                                                   else:
    45                                                       return val

Total time: 0.127051 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1c.py
Function: myAtoi at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                               def myAtoi(self, string: str) -> int:
     6      5000       1556.0      0.3      1.2          i = 0
     7                                                   # Eat whitespaces
     8      5000       1834.0      0.4      1.4          string = string.lstrip()
     9      5000       1901.0      0.4      1.5          len_string = len(string)
    10      5000       1828.0      0.4      1.4          if i == len_string:
    11                                                       return 0
    12
    13      5000       1832.0      0.4      1.4          if string[i] == '-':
    14                                                       sign = -1
    15                                                       i +=1
    16      5000       1724.0      0.3      1.4          elif string[i] == '+':
    17                                                       sign = 1
    18                                                       i +=1
    19                                                   else:
    20      5000       1590.0      0.3      1.3              sign = 1
    21
    22      5000       1668.0      0.3      1.3          if i == len_string:
    23                                                       return 0
    24
    25      5000       1587.0      0.3      1.2          val = 0
    26      5000       1562.0      0.3      1.2          ord_0 = 48
    27
    28                                                   # 10^10 > 2^31. Any 10 digit number is bigger than 2^31
    29                                                   # Add 1 because the loop stops at len_string-1
    30      5000       2200.0      0.4      1.7          len_string = min(len_string, 10+1)
    31
    32     60000      24514.0      0.4     19.3          while i < len_string and string[i].isdigit():
    33     55000      21510.0      0.4     16.9             digit = ord(string[i]) - ord_0
    34     55000      17489.0      0.3     13.8             if val:
    35     50000      19293.0      0.4     15.2                 val = val * 10 + digit
    36                                                      else:
    37      5000       1560.0      0.3      1.2                 val = digit
    38     55000      17927.0      0.3     14.1             i += 1
    39
    40      5000       1890.0      0.4      1.5          val *= sign
    41
    42      5000       1811.0      0.4      1.4          if val > Solution.INT_MAX:
    43      5000       1775.0      0.4      1.4              return Solution.INT_MAX
    44                                                   elif val < Solution.INT_MIN:
    45                                                       return Solution.INT_MIN
    46                                                   else:
    47                                                       return val

We can clearly see that the second (“optimized”) version loops more often than the “non-optimized” version, explaining the discrepancy.

Let us now compare the performance for parsing short numbers (“42”):

Wrote profile results to test_1bc-profile-lines-short.py.lprof
Timer unit: 1e-06 s

Total time: 0.045616 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1b.py
Function: myAtoi at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                               def myAtoi(self, string: str) -> int:
     6      5000       1428.0      0.3      3.1          i = 0
     7                                                   # Eat whitespaces
     8      5000       1766.0      0.4      3.9          string = string.lstrip()
     9      5000       1817.0      0.4      4.0          len_string = len(string)
    10      5000       1735.0      0.3      3.8          if i == len_string:
    11                                                       return 0
    12
    13      5000       1748.0      0.3      3.8          if string[i] == '-':
    14                                                       sign = -1
    15                                                       i +=1
    16      5000       1703.0      0.3      3.7          elif string[i] == '+':
    17                                                       sign = 1
    18                                                       i +=1
    19                                                   else:
    20      5000       1477.0      0.3      3.2              sign = 1
    21
    22      5000       1534.0      0.3      3.4          if i == len_string:
    23                                                       return 0
    24
    25      5000       1484.0      0.3      3.3          val = 0
    26      5000       1487.0      0.3      3.3          ord_0 = 48
    27     15000       5922.0      0.4     13.0          while i < len_string and string[i].isdigit():
    28     10000       3799.0      0.4      8.3             digit = ord(string[i]) - ord_0
    29     10000       3157.0      0.3      6.9             if val:
    30      5000       1729.0      0.3      3.8                 val = val * 10 + digit
    31                                                      else:
    32      5000       1479.0      0.3      3.2                 val = digit
    33                                                      # Early baily out for large numbers
    34     10000       3380.0      0.3      7.4             if val > Solution.INT_MAX:
    35                                                          break
    36     10000       3312.0      0.3      7.3             i += 1
    37
    38      5000       1659.0      0.3      3.6          val *= sign
    39
    40      5000       1822.0      0.4      4.0          if val > Solution.INT_MAX:
    41                                                       return Solution.INT_MAX
    42      5000       1783.0      0.4      3.9          elif val < Solution.INT_MIN:
    43                                                       return Solution.INT_MIN
    44                                                   else:
    45      5000       1395.0      0.3      3.1              return val

Total time: 0.046275 s
File: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi/solution1c.py
Function: myAtoi at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                               def myAtoi(self, string: str) -> int:
     6      5000       1565.0      0.3      3.4          i = 0
     7                                                   # Eat whitespaces
     8      5000       1739.0      0.3      3.8          string = string.lstrip()
     9      5000       1942.0      0.4      4.2          len_string = len(string)
    10      5000       1897.0      0.4      4.1          if i == len_string:
    11                                                       return 0
    12
    13      5000       1745.0      0.3      3.8          if string[i] == '-':
    14                                                       sign = -1
    15                                                       i +=1
    16      5000       1751.0      0.4      3.8          elif string[i] == '+':
    17                                                       sign = 1
    18                                                       i +=1
    19                                                   else:
    20      5000       1655.0      0.3      3.6              sign = 1
    21
    22      5000       1518.0      0.3      3.3          if i == len_string:
    23                                                       return 0
    24
    25      5000       1587.0      0.3      3.4          val = 0
    26      5000       1561.0      0.3      3.4          ord_0 = 48
    27
    28                                                   # 10^10 > 2^31. Any 10 digit number is bigger than 2^31
    29                                                   # Add 1 because the loop stops at len_string-1
    30      5000       2159.0      0.4      4.7          len_string = min(len_string, 10+1)
    31
    32     15000       6227.0      0.4     13.5          while i < len_string and string[i].isdigit():
    33     10000       4028.0      0.4      8.7             digit = ord(string[i]) - ord_0
    34     10000       3353.0      0.3      7.2             if val:
    35      5000       1808.0      0.4      3.9                 val = val * 10 + digit
    36                                                      else:
    37      5000       1483.0      0.3      3.2                 val = digit
    38     10000       3471.0      0.3      7.5             i += 1
    39
    40      5000       1678.0      0.3      3.6          val *= sign
    41
    42      5000       1770.0      0.4      3.8          if val > Solution.INT_MAX:
    43                                                       return Solution.INT_MAX
    44      5000       1830.0      0.4      4.0          elif val < Solution.INT_MIN:
    45                                                       return Solution.INT_MIN
    46                                                   else:
    47      5000       1508.0      0.3      3.3              return val

In multiple runs the line-by-line profiler the optimized version sometimes even ran faster. So not much help there.

Solution 4: Unroll first loop \(O(n)\)
#

Moving the “bail out” before the loop did not help and introduced a bug. Next try: Move the if condition that tests for a zero val before the loop.

In essence transform this:

        # Original version
        while i < len_string and string[i].isdigit():
           digit = ord(string[i]) - ord_0
           # Move this before the loop
           if val:
               val = val * 10 + digit
           else:
               val = digit
           i += 1

into this:

        # Partially unroll the loop
        if not string[i].isdigit():
            return 0

        val =  ord(string[i]) - ord_0
        i += 1

        while i < len_string and string[i].isdigit():
           digit = ord(string[i]) - ord_0
           val = val * 10 + digit

           # Early baily out for large numbers
           if val > Solution.INT_MAX:
               break
           i += 1

Test Results
#

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items

test_1d-solution.py ...............                                      [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
========================== 15 passed in 0.03 seconds ===========================


Tests ran in 0.1924969839892583 seconds

Analysis
#

Although the if was moved before the loop, the solution ran slower. Let’s try something else.

Solution 5: NO premature optimizations \(O(n)\)
#

It seems that all premature optimizations made it only worse.

class Solution:
    INT_MAX = 2147483647
    INT_MIN = -2147483648

    def myAtoi(self, string: str) -> int:
        i = 0
        # Eat whitespaces
        string = string.lstrip()
        len_string = len(string)
        if i == len_string:
            return 0

        if string[i] == '-':
            sign = -1
            i +=1
        elif string[i] == '+':
            sign = 1
            i +=1
        else:
            sign = 1

        if i == len_string:
            return 0

        val =  0
        ord_0 = 48

        while i < len_string and string[i].isdigit():
           digit = ord(string[i]) - ord_0
           # The "if val:" check is removed
           val = val * 10 + digit

           # Early baily out for large numbers
           if val > Solution.INT_MAX:
               break
           i += 1

        val *= sign

        if val > Solution.INT_MAX:
            return Solution.INT_MAX
        elif val < Solution.INT_MIN:
            return Solution.INT_MIN
        else:
            return val

Test Results
#

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/008-string-to-integer-atoi, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 15 items

test_1e-solution.py ...............                                      [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
========================== 15 passed in 0.04 seconds ===========================


Tests ran in 0.19208616401010659 seconds

Analysis
#

Who would have thought it? Ditching the last premature optimization did not worsen the execution time but made the code more readable.

Summary
#

Solution Correct Status Runtime Memory Language
#1, parse left to right No Submission Forgot to parse leading ‘+’ Python3
#1, parse left to right Yes Submission 44 ms, faster than 29.9% 12.8 MB, less than 100% Python3
#1, parse right to left Yes Submission Runtime: 36 ms, faster than 66.28% 12.8 MB, less than 100% Python3
#2, use more built ins Yes Submission Runtime: 28 ms, faster than 93.82% 12.7 MB, less than 100% Python3
#5, use more built ins, no optimizations Yes Submission Runtime: 28 ms, faster than 93.43% 12.7 MB, less than 100% Python3