## LeetCode #23 : Merge k Sorted Lists

$O(n * log(k))$ solution in python3: faster than 83.12%

## Description

Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

`Example: Input: [ 1->4->5, 1->3->4, 2->6 ] Output: 1->1->2->3->4->4->5->6`

(From the problem description)

### Test Vectors

It is not said in the problem statement, but the implementation bases on the following assumptions

- Ordering
- All lists are ordered ascending and have comparable types.
- Empty lists
- Any list can be empty (
`None`

) but every node as a non-empty value.

nums | solution | Comment |
---|---|---|

[ [1,4,5], [1,3,4], [2,6] ] | [1,1,2,3,4,4,5,6] | |

[] | [] | \(k = 0\) is valid |

[ [1]] | [1] | |

[ [], [] ] | [] | all empty lists is a valid input |

[ [100], …, [1] ] | [1..100] |

## Solutions

### Problem API

```
# Definition for singly-linked list.
# class ListNode:
# def __init__(self, x):
# self.val = x
# self.next = None
class Solution:
def mergeKLists(self, lists: List[ListNode]) -> ListNode:
```

For debugging purposes it would be nice to print the whole list, so the
`ListNode`

class gets a minor upgrade:

```
from typing import List
class ListNode:
__slots__ = ['val', 'next']
def __init__(self, x):
self.val = x
self.next = None
def __str__(self):
if self.next:
return f"{self.val}->{self.next}"
else:
return f"{self.val}->END"
def __repr__(self):
return self.__str__()
def has_next(self):
return self.next is not None
def has_value(self):
return self.val is not None
# --- helper methods for testing ---
def to_node(digits) -> ListNode:
if not digits: return None
# copy that .. we will change it
digits = list(digits)
head_node = ListNode(digits.pop(0))
n = head_node
while digits:
n.next = ListNode(digits.pop(0))
n = n.next
return head_node
def to_nodes(lists) -> List[ListNode]:
return [ to_node(sl) for sl in list(lists) ]
def from_node(head : ListNode) -> List:
r = []
while head:
r.append(head.val)
head = head.next
return r
```

### Tests

The test vectors are implemented by the following tests:

```
import os
import pytest
from typing import List
from supplemental import ListNode
from supplemental import to_node, to_nodes, from_node
class BaseTestClass:
def test_test_vector_from_problem_ok(self):
lists = to_nodes([[1,4,5], [1,3,4], [2,6]])
ret = self.mergeKLists(lists)
assert [1,1,2,3,4,4,5,6] == from_node(ret)
def test_k_equals_0_ok(self):
assert [] == from_node(self.mergeKLists([]))
def test_k_equals_1_ok(self):
assert [1] == from_node(self.mergeKLists(to_nodes([[1]])))
def test_empty_lists_ok(self):
assert [] == from_node(self.mergeKLists(to_nodes([[], []])))
```

```
import os
import pytest
from typing import List
from supplemental import ListNode
from supplemental import to_node, to_nodes, from_node
class BaseBenchmarkClass:
def test_large_k_ok(self, benchmark):
LARGE_NUM=10_000
nodes = to_nodes([ [x] for x in reversed(range(0,LARGE_NUM)) ])
assert list(range(0,LARGE_NUM)) == from_node(benchmark(self.mergeKLists, lists=nodes))
def test_large_k_and_n_ok(self, benchmark):
LARGE_NUM=500
sublist = lambda seed : [ x * seed for x in range(0, LARGE_NUM)]
matrix = [ sublist(x) for x in range(0,LARGE_NUM)]
matrix_nodes = to_nodes(matrix)
head = benchmark(self.mergeKLists, lists=matrix_nodes)
count = 0
while head:
count += 1
if head.next:
assert head.val <= head.next.val
head = head.next
assert LARGE_NUM * LARGE_NUM == count
```

It would be bad if the test tooling has bugs, so these are also tested:

```
from supplemental import ListNode
from supplemental import to_node, to_nodes, from_node
def test_to_node():
""" Make sure that the test tooling works. """
head = to_node([1])
assert 1 == head.val
assert None == head.next
head = to_node([1,2])
assert 1 == head.val
assert 2 == head.next.val
assert None == head.next.next
def test_from_node():
""" Make sure that the test tooling works. """
n = ListNode(1)
assert [1] == from_node(n)
n.next = ListNode(2)
assert [1,2] == from_node(n)
def test_transformer():
""" Make sure that the test tooling works. """
for test_value in [ [1], [1,2], [1,2,3] ]:
head = to_node(test_value)
assert from_node(head) == test_value
def test_to_nodes():
""" Make sure that the test tooling works. """
two_nodes = to_nodes([[1,2], [3,4]])
assert 2 == len(two_nodes)
assert 1 == two_nodes[0].val
assert 2 == two_nodes[0].next.val
assert 3 == two_nodes[1].val
assert 4 == two_nodes[1].next.val
```

```
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/023-merge-k-sorted-lists, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 4 items
test_supplemental.py .... [100%]
============================ slowest test durations ============================
(0.00 durations hidden. Use -vv to show these durations.)
=========================== 4 passed in 0.02 seconds ===========================
Tests ran in 0.16634120400703978 seconds
```

### Solution 1: Nested Loops \(O( k * n_{max})\)

This is a very straight forward solution.

Since all lists are guaranteed to be ordered the implementation looks at each of the \(k\) lists to find the smallest element and appends it to the result. It does so until all lists are empty, at most \(n_{max}\) times.

```
from typing import List
from supplemental import ListNode
class Solution:
"""
This is a very simple solution to the problem. In more complex environments (applications)
this implementation has some drawbacks because all passed in data is modified.
This could be solved by creating a copy of 'lists'. But leetcode has runtime constraints.
"""
def mergeKLists(self, lists: List[ListNode]) -> ListNode:
head = None
tail = None
# Done := the last element of the longest list has been processed -> no more elmenents
# to process
done = False
while not done:
# Python int is unbounded, so the MAX_INT trick does not work
# Also: It should work on non integers
lowest_list_value = None
lowest_list_idx = None
for n in range(0, len(lists)):
l = lists[n]
# an empty list (EOFed) is None
if l and not l.val is None:
if lowest_list_value is None or l.val < lowest_list_value:
lowest_list_value = l.val
lowest_list_idx = n
# IF no element has been found then
# all lists are empty and we are done
done = lowest_list_idx is None
if not done:
# The lowest value has been found.
# store it
if head is None:
head = ListNode(lowest_list_value)
tail = head
else:
tail.next = ListNode(lowest_list_value)
tail = tail.next
# the list where the lowest element has been found
# 'eat' the found element
lists[lowest_list_idx] = lists[lowest_list_idx].next
return head
```

#### Test Results

With the following test result:

```
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/023-merge-k-sorted-lists, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 4 items
test_1a-solution.py .... [100%]
============================ slowest test durations ============================
(0.00 durations hidden. Use -vv to show these durations.)
=========================== 4 passed in 0.04 seconds ===========================
Tests ran in 0.18874403899826575 seconds
```

Benchmark:

#### Analysis

Since all lists are guaranteed to be ordered the implementation looks at each of the \(k\) lists to find the smallest element and appends it to the result. It does so until all lists are empty, at most \(n_{max}\) times.

- Let \(k\) be the number of items passed in the
`lists`

parameter. - Let \(n_{max}\) be the longest list from
`lists`

- Let the total number of all elements be constant for Best/Avg/Worst case calculation.

The analysis follows the algorithms structure:

- Best Case
- The best case runtime is linear for \(k=1\) or \(n=1\):
\(\Omega(max(k, n_{max}))\). To be more precise: The theoretic lower bound for
this task is \(\Omega(e)\) where \(e\) is the number of elements in
`lists`

. - Average Case
- The overall runtime is quadratic, as can be seen from the description of the algorithm (two nested loops): \(O(k*n_{max})\).
- Worst Case
- The worst case behavior shows itself when the outer (\(k\)) and inner (\(n\)) loops have the same length (the maximum of \(n*k\) with \(n+k\) is at \(n=k\)). This is still \(O(n*k)\).
- Edge Cases
- These edge cases can be considered as well:
- \(k=0\) is trivial (\(O(1)\)).
- \(n_{max}=0\) is also trivial (\(O(k)\)).

### Solution 2: Heap based \(O(n * log k)\)

The first solution was not fast enough. But maybe there are some optimizations left?

- Code optimizations
- In python iterating over lists is expensive. Pushing some tasks into C code can substantially speed up the code. But this should be done last.
- Algorithmic optimizations
- The algorithm in the first solution is not to ideal when \(k \approx n\) because of the quadratic runtime. The expensive part is finding the next smallest element, which always is done \(n\) times. Reducing \(n\) to e.g. \(log n\) is not possible because the data structure forces a sequential walk through the lists with \(O(n)\). But finding the next smallest element can be improved. The first solution takes \(O(k)\). This can be improved to \(O(log k)\) by using e.g. a heap.

```
from typing import List
from supplemental import ListNode
import heapq
class Solution:
"""
This solution uses a heap to find the next smallest element.
"""
def mergeKLists(self, lists: List[ListNode]) -> ListNode:
head = None
tail = None
# we do not want to modify lists.
heads = list(lists)
# Heap of tupels (val, index into hedas to the NodeList where val came from).
# val is needed so that the elements can be sorted
# NodeList (i) is needed so we can advance
heap = []
for i in range(0, len(heads)):
l = heads[i]
if l and l.val is not None:
heapq.heappush(heap, (l.val, i))
found_item = True
# Done := all elements have been added to the heap
while len(heap) > 0:
val,index = heapq.heappop(heap)
# advance the list of the found node.
next_node = heads[index].next
heads[index] = next_node
if next_node and next_node.val is not None:
heapq.heappush(heap, (next_node.val, index))
if head is None:
head = ListNode(val)
tail = head
else:
tail.next = ListNode(val)
tail = tail.next
return head
```

#### Test Results

With the following test result:

```
============================= test session starts ==============================
platform linux -- Python 3.7.5, pytest-3.10.1, py-1.8.0, pluggy-0.12.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jens/Documents/leetcode.com/023-merge-k-sorted-lists, inifile:
plugins: benchmark-3.2.2, profiling-1.7.0
collected 4 items
test_2a-solution.py .... [100%]
============================ slowest test durations ============================
(0.00 durations hidden. Use -vv to show these durations.)
=========================== 4 passed in 0.02 seconds ===========================
Tests ran in 0.1691520269960165 seconds
```

Benchmark:

#### Analysis

- Average Case
- The overall runtime is \(O(n_{max} * log(k))\). The improvement is replacing the \(O(k)\) lookup with a heap based \(O(log(k))\) lookup.

A lot of the performance improvements stem from pushing computation into C code. The boost seen in the tests is not explainable by the algorithm itself (esp. the \(k=1\) case).

## Summary

All solutions work (except #1 which leads to a timeout for large data sets).

Solution | Correct | Status | Runtime | Memory | Language |
---|---|---|---|---|---|

#1, \(O(n*k)\) | No | Submission | Python3 | ||

#2, \(O(n*log k)\) | Yes | Submission | 104 ms, faster than 83.12% | 16.6 MB, less than 42.42% | Python3 |