How To Reduce/optimize Memory Usage When Calculating Area Of Skyline?
Solution 1:
You are allocating a separate key-value pair for every single integer value in your range. Imagine the case where R = 1
and L = 100000
. Your items
dictionary will be filled with 1000000 items. Your basic idea of processing/removing overlaps is is sound, but the way you do it is massive overkill.
Like so much else in life, this is a graph problem in disguise. Imaging the vertices being the rectangles you are trying to process and the (weighted) edges being the overlaps. The complication is that you can not just add up the areas of the vertices and subtract the areas of the overlaps, because many of the overlaps overlap each other as well. The overlap issue can be resolved by applying a transformation that converts two overlapping rectangles into non-overlapping rectangles, effectively cutting the edge that connects them. The transformation is shown in the image below. Notice that in some cases one of the vertices will be removed as well, simplifying the graph, while in another case a new vertex is added:
Normally, if we have m
rectangles and n
overlaps between them, constructing the graph would be an O(m)
operation because we would have to check all vertices for overlaps against each other. However, we can bypass a construction of the input graph entirely to get a O(m + n)
traversal algorithm, which is going to be optimal since we will only analyze each rectangle once, and construct the output graph with no overlaps as efficiently as possible. O(m + n)
assumes that your input rectangles are sorted according to their left edges in ascending order. If that is not the case, the algorithm will be O(mlog(m) + n)
to account for the initial sorting step. Note that as the graph density increases, n
will go from ~m
to ~m
. This confirms the intuitive idea that the fewer overlaps there are, them more you would expect the process will run in O(m)
time, while the more overlaps there are, the closer you will run to O(m)
time.
The space complexity of the proposed algorithm will be O(m)
: each rectangle in the input will result in at most two rectangles in the output, and 2m = O(m)
.
Enough about complexity analysis and on to the algorithm itself. The input will be a sequence of rectangles defined by L
, R
, H
as you have now. I will assume that the input is sorted by the leftmost edge L
. The output graph will be a linked list of rectangles defined by the same parameters, sorted in descending order by the rightmost edge. The head of the list will be the rightmost rectangle. The output will have no overlaps between any rectangles, so the total area of the skyline will just be the sum of H * (R - L)
for each of the ~m
output rectangles.
The reason for picking a linked list is that the only two operations we need is iteration from the head node and the cheapest insertion possible to maintain the list in sorted order. The sorting will be done as part of overlap checking, so we do not need to do any kind of binary searches through the list or anything like that.
Since the input list is ordered by increasing left edge and the output list is ordered by decreasing right edge, we can guarantee that each rectangle added will be checked only against the rectangles it actually overlaps. We will do overlap checking and removal as shown in the diagram above until we reach a rectangle whose left edge is less than or equal to the left edge of the new rectangle. All further rectangles in the output list are guaranteed not to overlap with the new rectangle. This check-and-chop operation guarantees that each overlap is visited at most once, and that no non-overlapping rectangles are processed unnecessarily, making the algorithm optimal.
Before I show code, here is a diagram of the algorithm in action. Red rectangles are new rectangles; note that their left edges progress to the right. Blue rectangles are ones that are already added and have overlap with the new rectangle. Black rectangles are already added and have no overlap with the new one. The numbering represents the order of the output list. It is always done from the right. A linked list is a perfect structure to maintain this progression since it allows cheap insertions and replacements:
Here is an implementation of the algorithm which assumes that the input coordinates are passed in as an iterable of objects having the attributes l
, r
, and h
. The iteration order is assumed to be sorted by the left edge. If that is not the case, apply sorted
or list.sort
to the input first:
from collections import namedtuple
# Defined in this order so you can sort a list by left edge without a custom key
Rect = namedtuple('Rect', ['l', 'r', 'h'])
classLinkedList:
__slots__ = ['value', 'next']
"""
Implements a singly-linked list with mutable nodes and an iterator.
"""def__init__(self, value=None, next=None):
self.value = value
self.next = nextdef__iter__(self):
"""
Iterate over the *nodes* in the list, starting with this one.
The `value` and `next` attribute of any node may be modified
during iteration.
"""while self:
yield self
self = self.nextdef__str__(self):
"""
Provided for inspection purposes.
Works well with `namedtuple` values.
"""return' -> '.join(repr(x.value) for x in self)
defprocess_skyline(skyline):
"""
Turns an iterable of rectangles sharing a common baseline into a
`LinkedList` of rectangles containing no overlaps.
The input is assumed to be sorted in ascending order by left edge.
Each element of the input must have the attributes `l`, r`, `h`.
The output will be sorted in descending order by right edge.
Return `None` if the input is empty.
"""defintersect(r1, r2, default=None):
"""
Return (1) a flag indicating the order of `r1` and `r2`,
(2) a linked list of between one and three non-overlapping
rectangles covering the exact same area as `r1` and `r2`,
and (3) a pointer to the last nodes (4) a pointer to the
second-to-last node, or `default` if there is only one node.
The flag is set to True if the left edge of `r2` is strictly less
than the left edge of `r1`. That would indicate that the left-most
(last) chunk of the tuple came from `r2` instead of `r1`. For the
algorithm as a whole, that means that we need to keep checking for
overlaps.
The resulting list is always returned sorted descending by the
right edge. The input rectangles will not be modified. If they are
not returned as-is, a `Rect` object will be used instead.
"""# Swap so left edge of r1 < left edge of r2if r1.l > r2.l:
r1, r2 = r2, r1
swapped = Trueelse:
swapped = Falseif r2.l >= r1.r:
# case 0: no overlap at all
last = LinkedList(r1)
s2l = result = LinkedList(r2, last)
elif r1.r < r2.r:
# case 1: simple overlapif r1.h > r2.h:
# Chop r2
r2 = Rect(r1.r, r2.r, r2.h)
else:
r1 = Rect(r1.l, r2.l, r1.h)
last = LinkedList(r1)
s2l = result = LinkedList(r2, last)
elif r1.h < r2.h:
# case 2: split into 3
r1a = Rect(r1.l, r2.l, r1.h)
r1b = Rect(r2.r, r1.r, r1.h)
last = LinkedList(r1a)
s2l = LinkedList(r2, last)
result = LinkedList(r1b, s2l)
else:
# case 3: complete containment
result = LinkedList(r1)
last = result
s2l = default
return swapped, result, last, s2l
root = LinkedList()
skyline = iter(skyline)
try:
# Add the first node as-is
root.next = LinkedList(next(skyline))
except StopIteration:
# Empty input iteratorreturnNonefor new_rect in skyline:
prev = root
for rect in root.next:
need_to_continue, replacement, last, second2last = \
intersect(rect.value, new_rect, prev)
# Replace the rectangle with the de-overlapped regions
prev.next = replacement
ifnot need_to_continue:
# Retain the remainder of the list
last.next = rect.nextbreak# Force the iterator to move on to the last node
new_rect = last.value
prev = second2last
return root.next
Computing the total area is now trivial:
skyline = [
Rect(-3, 0, 3), Rect(-1, 1, 2), Rect(2, 4, 4),
Rect(3, 7, 2), Rect(6, 8, 3),
]
processed = process_skyline(skyline)
area = sum((x.value.r - x.value.l) * x.value.h for x in processed) if processed else None
Notice the altered order of the input parameters (h
moved to the end). The resulting area
is 29
. This matches with what I get by doing the computation by hand. You can also do
>>> print(processed)
Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) -> Rect(l=2, r=4, h=4) ->
Rect(l=0, r=1, h=2) -> Rect(l=-3, r=0, h=3)
This is to be expected from the diagram of the inputs/output shown below:
As an additional verification, I added a new building, Rect(-4, 9, 1)
to the start of the list. It overlaps all the others and adds three units to area
, or a final result of 32
. processed
comes out as:
Rect(l=8, r=9, h=1) -> Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) -> Rect(l=2, r=4, h=4) -> Rect(l=1, r=2, h=1) -> Rect(l=0, r=1, h=2) -> Rect(l=-3, r=0, h=3) -> Rect(l=-4, r=-3, h=1)
Note:
While I am sure that this problem has been solved many times over, the solution I present here is entirely my own work, done without consulting any other references. The idea of using an implicit graph representation and the resulting analysis is inspired by a recent reading of Steven Skiena's Algorithm Design Manual, Second Edition. It is one of the best comp-sci books I have ever come across.
Technically, if a new rectangle does not overlap any other rectangles, it will be checked against one rectangle it does not overlap. If that extra check was always the case, the algorithm would have an additional m - 1
comparisons to do. Fortunately, m + m + n - 1 = O(m + n)
even if we always had to check one extra rectangle (which we don't).
Solution 2:
The reason for getting MemoryError
is huge size of the dictionary being created. In the worst case, the dict can have 10^10 keys, which would end up taking all your memory. If there really is a need, shelve
is a possible solution to make use of such large dict.
Let's say there is a building with 10 0 100
and another with 20 50 150
, then that list might have info like [(-10^9, 0), (0, 10), (50, 20), (150, 0), (10^9, 0)]
. As you come across more buildings, you can add more entries in this list. This will be O(n^2)
.
This might help you further.
Post a Comment for "How To Reduce/optimize Memory Usage When Calculating Area Of Skyline?"