How To Reduce/optimize Memory Usage When Calculating Area Of Skyline?

September 16, 2024 Post a Comment

I'm trying to calculate the area of skyline (overlapping rectangles with same baseline) building_count = int(input()) items = {} # dictionary, location on x axis is the key, height

Solution 1:

You are allocating a separate key-value pair for every single integer value in your range. Imagine the case where R = 1 and L = 100000. Your items dictionary will be filled with 1000000 items. Your basic idea of processing/removing overlaps is is sound, but the way you do it is massive overkill.

Like so much else in life, this is a graph problem in disguise. Imaging the vertices being the rectangles you are trying to process and the (weighted) edges being the overlaps. The complication is that you can not just add up the areas of the vertices and subtract the areas of the overlaps, because many of the overlaps overlap each other as well. The overlap issue can be resolved by applying a transformation that converts two overlapping rectangles into non-overlapping rectangles, effectively cutting the edge that connects them. The transformation is shown in the image below. Notice that in some cases one of the vertices will be removed as well, simplifying the graph, while in another case a new vertex is added:

Normally, if we have m rectangles and n overlaps between them, constructing the graph would be an O(m) operation because we would have to check all vertices for overlaps against each other. However, we can bypass a construction of the input graph entirely to get a O(m + n) traversal algorithm, which is going to be optimal since we will only analyze each rectangle once, and construct the output graph with no overlaps as efficiently as possible. O(m + n) assumes that your input rectangles are sorted according to their left edges in ascending order. If that is not the case, the algorithm will be O(mlog(m) + n) to account for the initial sorting step. Note that as the graph density increases, n will go from ~m to ~m. This confirms the intuitive idea that the fewer overlaps there are, them more you would expect the process will run in O(m) time, while the more overlaps there are, the closer you will run to O(m) time.

The space complexity of the proposed algorithm will be O(m): each rectangle in the input will result in at most two rectangles in the output, and 2m = O(m).

Enough about complexity analysis and on to the algorithm itself. The input will be a sequence of rectangles defined by L, R, H as you have now. I will assume that the input is sorted by the leftmost edge L. The output graph will be a linked list of rectangles defined by the same parameters, sorted in descending order by the rightmost edge. The head of the list will be the rightmost rectangle. The output will have no overlaps between any rectangles, so the total area of the skyline will just be the sum of H * (R - L) for each of the ~m output rectangles.

The reason for picking a linked list is that the only two operations we need is iteration from the head node and the cheapest insertion possible to maintain the list in sorted order. The sorting will be done as part of overlap checking, so we do not need to do any kind of binary searches through the list or anything like that.

Since the input list is ordered by increasing left edge and the output list is ordered by decreasing right edge, we can guarantee that each rectangle added will be checked only against the rectangles it actually overlaps. We will do overlap checking and removal as shown in the diagram above until we reach a rectangle whose left edge is less than or equal to the left edge of the new rectangle. All further rectangles in the output list are guaranteed not to overlap with the new rectangle. This check-and-chop operation guarantees that each overlap is visited at most once, and that no non-overlapping rectangles are processed unnecessarily, making the algorithm optimal.

Before I show code, here is a diagram of the algorithm in action. Red rectangles are new rectangles; note that their left edges progress to the right. Blue rectangles are ones that are already added and have overlap with the new rectangle. Black rectangles are already added and have no overlap with the new one. The numbering represents the order of the output list. It is always done from the right. A linked list is a perfect structure to maintain this progression since it allows cheap insertions and replacements:

Here is an implementation of the algorithm which assumes that the input coordinates are passed in as an iterable of objects having the attributes l, r, and h. The iteration order is assumed to be sorted by the left edge. If that is not the case, apply sorted or list.sort to the input first:

from collections import namedtuple

# Defined in this order so you can sort a list by left edge without a custom key
Rect = namedtuple('Rect', ['l', 'r', 'h'])

classLinkedList:
    __slots__ = ['value', 'next']

    """
    Implements a singly-linked list with mutable nodes and an iterator.
    """def__init__(self, value=None, next=None):
        self.value = value
        self.next = nextdef__iter__(self):
        """
        Iterate over the *nodes* in the list, starting with this one.

        The `value` and `next` attribute of any node may be modified
        during iteration.
        """while self:
            yield self
            self = self.nextdef__str__(self):
        """
        Provided for inspection purposes.

        Works well with `namedtuple` values.
        """return' -> '.join(repr(x.value) for x in self)


defprocess_skyline(skyline):
    """
    Turns an iterable of rectangles sharing a common baseline into a
    `LinkedList` of rectangles containing no overlaps.

    The input is assumed to be sorted in ascending order by left edge.
    Each element of the input must have the attributes `l`, r`, `h`.

    The output will be sorted in descending order by right edge.

    Return `None` if the input is empty.
    """defintersect(r1, r2, default=None):
        """
        Return (1) a flag indicating the order of `r1` and `r2`,
        (2) a linked list of between one and three non-overlapping
        rectangles covering the exact same area as `r1` and `r2`,
        and (3) a pointer to the last nodes (4) a pointer to the
        second-to-last node, or `default` if there is only one node.

        The flag is set to True if the left edge of `r2` is strictly less
        than the left edge of `r1`. That would indicate that the left-most
        (last) chunk of the tuple came from `r2` instead of `r1`. For the
        algorithm as a whole, that means that we need to keep checking for
        overlaps.

        The resulting list is always returned sorted descending by the
        right edge. The input rectangles will not be modified. If they are
        not returned as-is, a `Rect` object will be used instead.
        """# Swap so left edge of r1 < left edge of r2if r1.l > r2.l:
            r1, r2 = r2, r1
            swapped = Trueelse:
            swapped = Falseif r2.l >= r1.r:
            # case 0: no overlap at all
            last = LinkedList(r1)
            s2l = result = LinkedList(r2, last)
        elif r1.r < r2.r:
            # case 1: simple overlapif r1.h > r2.h:
                # Chop r2
                r2 = Rect(r1.r, r2.r, r2.h)
            else:
                r1 = Rect(r1.l, r2.l, r1.h)
            last = LinkedList(r1)
            s2l = result = LinkedList(r2, last)
        elif r1.h < r2.h:
            # case 2: split into 3
            r1a = Rect(r1.l, r2.l, r1.h)
            r1b = Rect(r2.r, r1.r, r1.h)
            last = LinkedList(r1a)
            s2l = LinkedList(r2, last)
            result = LinkedList(r1b, s2l)
        else:
            # case 3: complete containment
            result = LinkedList(r1)
            last = result
            s2l = default

        return swapped, result, last, s2l

    root = LinkedList()

    skyline = iter(skyline)
    try:
        # Add the first node as-is
        root.next = LinkedList(next(skyline))
    except StopIteration:
        # Empty input iteratorreturnNonefor new_rect in skyline:
        prev = root
        for rect in root.next:
            need_to_continue, replacement, last, second2last = \
                    intersect(rect.value, new_rect, prev)
            # Replace the rectangle with the de-overlapped regions
            prev.next = replacement
            ifnot need_to_continue:
                # Retain the remainder of the list
                last.next = rect.nextbreak# Force the iterator to move on to the last node
            new_rect = last.value
            prev = second2last

    return root.next

Computing the total area is now trivial:

skyline = [
    Rect(-3, 0, 3), Rect(-1, 1, 2), Rect(2, 4, 4),
    Rect(3, 7, 2), Rect(6, 8, 3),
]
processed = process_skyline(skyline)
area = sum((x.value.r - x.value.l) * x.value.h for x in processed) if processed else None

Notice the altered order of the input parameters (h moved to the end). The resulting area is 29. This matches with what I get by doing the computation by hand. You can also do

>>> print(processed)
Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) -> Rect(l=2, r=4, h=4) ->
Rect(l=0, r=1, h=2) -> Rect(l=-3, r=0, h=3)

This is to be expected from the diagram of the inputs/output shown below:

As an additional verification, I added a new building, Rect(-4, 9, 1) to the start of the list. It overlaps all the others and adds three units to area, or a final result of 32. processed comes out as:

Rect(l=8, r=9, h=1) -> Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) ->
Rect(l=2, r=4, h=4) -> Rect(l=1, r=2, h=1) -> Rect(l=0, r=1, h=2) ->
Rect(l=-3, r=0, h=3) -> Rect(l=-4, r=-3, h=1)

Note:

While I am sure that this problem has been solved many times over, the solution I present here is entirely my own work, done without consulting any other references. The idea of using an implicit graph representation and the resulting analysis is inspired by a recent reading of Steven Skiena's Algorithm Design Manual, Second Edition. It is one of the best comp-sci books I have ever come across.

Technically, if a new rectangle does not overlap any other rectangles, it will be checked against one rectangle it does not overlap. If that extra check was always the case, the algorithm would have an additional m - 1 comparisons to do. Fortunately, m + m + n - 1 = O(m + n) even if we always had to check one extra rectangle (which we don't).

Solution 2:

The reason for getting MemoryError is huge size of the dictionary being created. In the worst case, the dict can have 10^10 keys, which would end up taking all your memory. If there really is a need, shelve is a possible solution to make use of such large dict.

Let's say there is a building with 10 0 100 and another with 20 50 150, then that list might have info like [(-10^9, 0), (0, 10), (50, 20), (150, 0), (10^9, 0)]. As you come across more buildings, you can add more entries in this list. This will be O(n^2).

This might help you further.

tmahurin

How To Reduce/optimize Memory Usage When Calculating Area Of Skyline?

Solution 1:

Solution 2:

Post a Comment for "How To Reduce/optimize Memory Usage When Calculating Area Of Skyline?"

Widget HTML #3