Jens Neuhalfen

This & That

Random Ramblings about Life, the Universe, and Everything

LeetCode #23 : Merge k Sorted Lists

$O(n * log(k))$ solution in python3: faster than 83.12%

[Jens Neuhalfen]

8 minutes read

Description

Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

Example:

Input:
[
  1->4->5,
  1->3->4,
  2->6
]
Output: 1->1->2->3->4->4->5->6

(From the problem description)

Test Vectors

It is not said in the problem statement, but the implementation bases on the following assumptions

Ordering
All lists are ordered ascending and have comparable types.
Empty lists
Any list can be empty (None) but every node as a non-empty value.
nums solution Comment
[ [1,4,5], [1,3,4], [2,6] ] [1,1,2,3,4,4,5,6]
[] [] \(k = 0\) is valid
[ [1]] [1]
[ [], [] ] [] all empty lists is a valid input
[ [100], …, [1] ] [1..100]

Solutions

Problem API

# Definition for singly-linked list.
# class ListNode:
#     def __init__(self, x):
#         self.val = x
#         self.next = None
class Solution:
    def mergeKLists(self, lists: List[ListNode]) -> ListNode:

For debugging purposes it would be nice to print the whole list, so the ListNode class gets a minor upgrade:

from typing import List
class ListNode:

    __slots__ = ['val', 'next']

    def __init__(self, x):
        self.val = x
        self.next = None

    def __str__(self):
        if self.next:
            return f"{self.val}->{self.next}"
        else:
            return f"{self.val}->END"

    def __repr__(self):
        return self.__str__()

    def has_next(self):
        return self.next is not None

    def has_value(self):
        return self.val is not None

# --- helper methods for testing ---
def to_node(digits) -> ListNode:
    if not digits: return None
    # copy that .. we will change it
    digits = list(digits)
    head_node = ListNode(digits.pop(0))
    n = head_node
    while digits:
            n.next = ListNode(digits.pop(0))
            n = n.next
    return head_node

def to_nodes(lists) -> List[ListNode]:
    return [ to_node(sl) for sl in list(lists) ]

def from_node(head : ListNode) -> List:
    r = []
    while head:
        r.append(head.val)
        head = head.next
    return r

Tests

The test vectors are implemented by the following tests:

import os
import pytest

from typing import List

from supplemental import ListNode
from supplemental import to_node, to_nodes, from_node

class BaseTestClass:
    def test_test_vector_from_problem_ok(self):
        lists = to_nodes([[1,4,5], [1,3,4], [2,6]])

        ret = self.mergeKLists(lists)

        assert [1,1,2,3,4,4,5,6] == from_node(ret)

    def test_k_equals_0_ok(self):
        assert [] == from_node(self.mergeKLists([]))

    def test_k_equals_1_ok(self):
        assert [1] == from_node(self.mergeKLists(to_nodes([[1]])))

    def test_empty_lists_ok(self):
        assert [] == from_node(self.mergeKLists(to_nodes([[], []])))

import os
import pytest

from typing import List

from supplemental import ListNode
from supplemental import to_node, to_nodes, from_node

class BaseBenchmarkClass:

    def test_large_k_ok(self, benchmark):
        LARGE_NUM=10_000
        nodes = to_nodes([ [x] for x in reversed(range(0,LARGE_NUM))  ])

        assert list(range(0,LARGE_NUM)) == from_node(benchmark(self.mergeKLists, lists=nodes))

    def test_large_k_and_n_ok(self, benchmark):
        LARGE_NUM=500

        sublist = lambda seed : [ x * seed for x in range(0, LARGE_NUM)]
        matrix = [ sublist(x) for x in range(0,LARGE_NUM)]

        matrix_nodes = to_nodes(matrix)
        head = benchmark(self.mergeKLists, lists=matrix_nodes)

        count = 0
        while head:
            count += 1
            if head.next:
                assert head.val <= head.next.val
            head = head.next
        assert LARGE_NUM * LARGE_NUM == count

It would be bad if the test tooling has bugs, so these are also tested:

from supplemental import ListNode
from supplemental import to_node, to_nodes, from_node

def test_to_node():
    """ Make sure that the test tooling works. """
    head = to_node([1])
    assert 1 == head.val
    assert None == head.next

    head = to_node([1,2])
    assert 1 == head.val
    assert 2 == head.next.val
    assert None == head.next.next

def test_from_node():
    """ Make sure that the test tooling works. """
    n = ListNode(1)
    assert [1] == from_node(n)
    n.next = ListNode(2)
    assert [1,2] == from_node(n)

def test_transformer():
    """ Make sure that the test tooling works. """
    for test_value in [ [1], [1,2], [1,2,3] ]:
        head = to_node(test_value)
        assert from_node(head) == test_value

def test_to_nodes():
    """ Make sure that the test tooling works. """
    two_nodes = to_nodes([[1,2], [3,4]])
    assert 2 == len(two_nodes)
    assert 1 == two_nodes[0].val
    assert 2 == two_nodes[0].next.val

    assert 3 == two_nodes[1].val
    assert 4 == two_nodes[1].next.val
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/023-merge-k-sorted-lists, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 4 items

test_supplemental.py ....                                                [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
=========================== 4 passed in 0.02 seconds ===========================


Tests ran in 0.16634120400703978 seconds

Solution 1: Nested Loops \(O( k * n_{max})\)

This is a very straight forward solution.

Since all lists are guaranteed to be ordered the implementation looks at each of the \(k\) lists to find the smallest element and appends it to the result. It does so until all lists are empty, at most \(n_{max}\) times.

from typing import List
from supplemental import ListNode

class Solution:
    """
    This is a very simple solution to the problem. In more complex environments (applications)
    this implementation has some drawbacks because all passed in data is modified.

    This could be solved by creating a copy of 'lists'. But leetcode has runtime constraints.
    """
    def mergeKLists(self, lists: List[ListNode]) -> ListNode:
        head = None
        tail = None

        # Done := the last element of the longest list has been processed -> no more elmenents
        # to process
        done = False
        while not done:
            # Python int is unbounded, so the MAX_INT trick does not work
            # Also: It should work on non integers
            lowest_list_value = None
            lowest_list_idx = None

            for n in range(0, len(lists)):
                l = lists[n]
                # an empty list (EOFed) is None
                if l and not l.val is None:
                    if lowest_list_value is None or l.val < lowest_list_value:
                        lowest_list_value = l.val
                        lowest_list_idx = n

            # IF no element has been found then
            # all lists are empty and we are done
            done = lowest_list_idx is None

            if not done:
                # The lowest value has been found.
                # store it
                if head is None:
                    head = ListNode(lowest_list_value)
                    tail = head
                else:
                    tail.next = ListNode(lowest_list_value)
                    tail = tail.next

                # the list where the lowest element has been found
                # 'eat' the found element
                lists[lowest_list_idx] = lists[lowest_list_idx].next

        return head

Test Results

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/023-merge-k-sorted-lists, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 4 items

test_1a-solution.py ....                                                 [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
=========================== 4 passed in 0.04 seconds ===========================


Tests ran in 0.18874403899826575 seconds

Benchmark:

Analysis

Since all lists are guaranteed to be ordered the implementation looks at each of the \(k\) lists to find the smallest element and appends it to the result. It does so until all lists are empty, at most \(n_{max}\) times.

  • Let \(k\) be the number of items passed in the lists parameter.
  • Let \(n_{max}\) be the longest list from lists
  • Let the total number of all elements be constant for Best/Avg/Worst case calculation.

The analysis follows the algorithms structure:

Best Case
The best case runtime is linear for \(k=1\) or \(n=1\): \(\Omega(max(k, n_{max}))\). To be more precise: The theoretic lower bound for this task is \(\Omega(e)\) where \(e\) is the number of elements in lists.
Average Case
The overall runtime is quadratic, as can be seen from the description of the algorithm (two nested loops): \(O(k*n_{max})\).
Worst Case
The worst case behavior shows itself when the outer (\(k\)) and inner (\(n\)) loops have the same length (the maximum of \(n*k\) with \(n+k\) is at \(n=k\)). This is still \(O(n*k)\).
Edge Cases
These edge cases can be considered as well:
  • \(k=0\) is trivial (\(O(1)\)).
  • \(n_{max}=0\) is also trivial (\(O(k)\)).

Solution 2: Heap based \(O(n * log k)\)

The first solution was not fast enough. But maybe there are some optimizations left?

Code optimizations
In python iterating over lists is expensive. Pushing some tasks into C code can substantially speed up the code. But this should be done last.
Algorithmic optimizations
The algorithm in the first solution is not to ideal when \(k \approx n\) because of the quadratic runtime. The expensive part is finding the next smallest element, which always is done \(n\) times. Reducing \(n\) to e.g. \(log n\) is not possible because the data structure forces a sequential walk through the lists with \(O(n)\). But finding the next smallest element can be improved. The first solution takes \(O(k)\). This can be improved to \(O(log k)\) by using e.g. a heap.

from typing import List
from supplemental import ListNode
import heapq

class Solution:
    """
    This solution uses a heap to find the next smallest element.
    """
    def mergeKLists(self, lists: List[ListNode]) -> ListNode:
        head = None
        tail = None

        # we do not want to modify lists.
        heads = list(lists)

        # Heap of tupels (val, index into hedas to the NodeList where val came from).
        # val is needed so that the elements can be sorted
        # NodeList (i) is needed so we can advance
        heap = []
        for i in range(0, len(heads)):
            l = heads[i]
            if l and l.val is not None:
                heapq.heappush(heap, (l.val, i))
                found_item = True

        # Done := all elements have been added to the heap
        while len(heap) > 0:
            val,index = heapq.heappop(heap)

            # advance the list of the found node.
            next_node = heads[index].next
            heads[index] = next_node
            if next_node and next_node.val is not None:
                heapq.heappush(heap, (next_node.val, index))

            if head is None:
                head = ListNode(val)
                tail = head
            else:
                tail.next = ListNode(val)
                tail = tail.next

        return head

Test Results

With the following test result:

============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/023-merge-k-sorted-lists, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 4 items

test_2a-solution.py ....                                                 [100%]

============================ slowest test durations ============================

(0.00 durations hidden.  Use -vv to show these durations.)
=========================== 4 passed in 0.02 seconds ===========================


Tests ran in 0.1691520269960165 seconds

Benchmark:

Analysis

Average Case
The overall runtime is \(O(n_{max} * log(k))\). The improvement is replacing the \(O(k)\) lookup with a heap based \(O(log(k))\) lookup.

A lot of the performance improvements stem from pushing computation into C code. The boost seen in the tests is not explainable by the algorithm itself (esp. the \(k=1\) case).

Summary

All solutions work (except #1 which leads to a timeout for large data sets).

Solution Correct Status Runtime Memory Language
#1, \(O(n*k)\) No Submission Time Limit Exceeded Time Limit Exceeded Python3
#2, \(O(n*log k)\) Yes Submission 104 ms, faster than 83.12% 16.6 MB, less than 42.42% Python3

Recent posts

See more

Categories

About

Information security and software development are enablers for a successful company. During the last 20+ years of my working life I have always been convinced that outstanding success is only possible if everybody works together to find and reach a common goal.