shape-warrior-t

joined 1 year ago
 

For some time now, I've been thinking about the concept of interactively manipulating mathematical expressions and equations via software. Like doing some quick algebra in Notepad or similar, except there's no potential for arithmetic/algebra errors, typos, etc. ruining any results.

At the same time, I also wanted to experiment a bit with zippers from functional programming. You need some way of specifying what (sub)expression to perform operations on, and it seemed like this kind of data structure could help with that.

And so, I made AlgeZip, a small proof-of-concept of the whole general idea. Although this polished Python version was completed only a few days ago, there were various other versions before this one in different languages and with worse-quality code. Instructions for things are on GitHub; requires Python 3.12 to run.

For simplicity, I decided to use boolean expressions instead of generic numeric algebraic expressions/equations, and only decided to include the minimum in terms of commands and functionality. From my understanding, it should be possible to transform any boolean expression into any other boolean expression in AlgeZip (without using the r! command except to set things up), though I could be wrong.

Thoughts, comments, and criticism on the idea as a whole, the program, or the source code are welcome, though I'm not sure if I'll be making any changes at this time.

[–] shape-warrior-t@kbin.social 1 points 1 year ago (1 children)

My implementation is memoized by functools.cache, but that is a concern when it comes to recursive Fibonacci. That, and stack overflows, which are also a problem for my code (but, again, not for "reasonable" inputs -- fibonacci(94) already exceeds 2^64).

Time complexity-wise, I was more thinking about the case where the numbers get so big that addition, multiplication, etc. can no longer be modelled as taking constant time. Especially if math.prod and enumerate are implemented in ways that are less efficient for huge integers (I haven't thoroughly checked, and I'm not planning to).

[–] shape-warrior-t@kbin.social 2 points 1 year ago (3 children)

Given an input c, outputs the number of distinct lists of strings lst such that:

  1. ''.join(lst) == c
  2. for every string s in lst, s consists of an arbitrary character followed by one or more characters from '0123456789'

Sure hope I didn't mess this up, because I think the fundamental idea is quite elegant! Should run successfully for all "reasonable" inputs (as in, the numeric output fits in a uint64 and the input isn't ridiculously long). Fundamental algorithm is O(n) if you assume all arithmetic is O(1). (If not, then I don't know what the time complexity is, and I don't feel like working it out.)

from functools import cache
from itertools import pairwise
from math import prod

@cache
def fibonacci(n: int) -> int:
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)

def main(compressed: str) -> int:
    is_fragment_start = [i == 0 or c not in '0123456789' for i, c in enumerate(compressed)]
    fragment_start_positions = [i for i, s in enumerate(is_fragment_start) if s]
    fragment_lengths = [stop - start for start, stop in pairwise(fragment_start_positions + [len(compressed)])]
    return prod(fibonacci(fragment_length - 1) for fragment_length in fragment_lengths)

if __name__ == '__main__':
    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('compressed')
    print(main(parser.parse_args().compressed))

Idea: a00010 -> [a000, 10] -> [length 4, length 2] -> F(4) * F(2)
01a102b0305 -> [01, a102, b0305] -> [length 2, length 4, length 5] -> F(2) * F(4) * F(5)
where F(n) = fibonacci(n - 1) is the number of ways to partition a string of length n into a list of strings of length ≥2.

F(2) = 1 = fibonacci(1), F(3) = 1 = fibonacci(2), and F(n) = F(n - 2) + F(n - 1), so F is indeed just an offset version of the Fibonacci sequence.
To see why F(n) = F(n - 2) + F(n - 1), here are the ways to split up 'abcde': ['ab'] + (split up 'cde'), ['abc'] + (split up 'de'), and ['abcde'], corresponding to F(5) = F(3) + F(2) + 1.
And the ways to split up 'abcdef': ['ab'] + (split up 'cdef'), ['abc'] + (split up 'def'), ['abcd'] + (split up 'ef'), and ['abcdef'], corresponding to F(6) = F(4) + F(3) + F(2) + 1 = F(4) + F(5) = F(6 - 2) + F(6 - 1).
The same logic generalizes to all n >= 4.

[–] shape-warrior-t@kbin.social 1 points 1 year ago

So every list of strings, where each string is some character followed by one or more digits, is a distinct, valid decompressing option. Thanks for clarifying!

[–] shape-warrior-t@kbin.social 2 points 1 year ago (2 children)

Thanks for the update on checking through solutions, and thanks in general for all the work you've put into this community!

Would just like to clarify: what are the valid decompressed strings? For an input of a333a3, should we return 2 (either a333 a3 or a3 33 a3) or 1 (since a333 a3 isn't a possible compression -- it would be a336 instead)? Do we have to handle cases like a00010, and if so, how?

[–] shape-warrior-t@kbin.social 1 points 1 year ago (1 children)

My solution (runs in O(n) time, but so do all the other solutions so far as far as I can tell):

from itertools import pairwise

def main(s: str) -> str:
    characters = [None] + list(s) + [None]
    transitions = []
    for (_, left), (right_idx, right) in pairwise(enumerate(characters)):
        if left != right:
            transitions.append((right_idx, right))
    repeats = [(stop - start, char) for (start, char), (stop, _) in pairwise(transitions)]
    return ''.join(f'{char}{length}' for length, char in repeats)

if __name__ == '__main__':
    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('s')
    print(main(parser.parse_args().s))

Runthrough:
'aaabb' -> [None, 'a', 'a', 'a', 'b', 'b', None] -> [(1, 'a'), (4, 'b'), (6, None)] -> [(4 - 1, 'a'), (6 - 4, 'b')]

Golfed (just for fun, not a submission):

import sys
from itertools import pairwise as p
print(''.join(c+str(b-a)for(a,c),(b,_)in p([(i,r)for i,(l,r)in enumerate(p([None,*sys.argv[1],None]))if l!=r])))

[–] shape-warrior-t@kbin.social 0 points 1 year ago (1 children)

I actually found this challenge to be easier than this week's medium challenge. (Watch me say that and get this wrong while also getting the medium one correct...) Here's an O(n) solution:

bracket_pairs = {('(', ')'), ('[', ']'), ('{', '}')}

def main(brackets: str) -> str:
    n = len(brackets)
    has_match_at = {i: False for i in range(-1, n + 1)}
    acc = []
    for i, bracket in enumerate(brackets):
        acc.append((i, bracket))
        if len(acc) >= 2:
            opening_idx, opening = acc[-2]
            closing_idx, closing = acc[-1]
            if (opening, closing) in bracket_pairs:
                acc.pop(), acc.pop()
                has_match_at[opening_idx] = has_match_at[closing_idx] = True
    longest_start, longest_end = 0, 0
    most_recent_start = None
    for left_idx, right_idx in zip(range(-1, n), range(0, n + 1)):
        has_match_left = has_match_at[left_idx]
        has_match_right = has_match_at[right_idx]
        if (has_match_left, has_match_right) == (False, True):
            most_recent_start = right_idx
        if (has_match_left, has_match_right) == (True, False):
            most_recent_end = right_idx
            if most_recent_end - most_recent_start > longest_end - longest_start:
                longest_start, longest_end = most_recent_start, most_recent_end
    return brackets[longest_start:longest_end]

if __name__ == '__main__':
    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('brackets')
    print(main(parser.parse_args().brackets))

We start off by doing the same thing as this week's easy challenge, except we keep track of the indices of all of the matched brackets that we remove (opening or closing). We then identify the longest stretch of consecutive removed-bracket indices, and use that information to slice into the input to get the output.

For ease of implementation of the second part, I modelled the removed-bracket indices with a dict simulating a list indexed by [-1 .. n + 1), with the values indicating whether the index corresponds to a matched bracket. The extra elements on both ends are always set to False. For example, {([])()[(])}()] -> FFTTTTTTFFFFFTTFF, and ([{}]) -> FTTTTTTF. To identify stretches of consecutive indices, we can simply watch for when the value switches from False to True (start of a stretch), and from True to False (end of a stretch). We do that by pairwise-looping through the dict-list, looking for 'FT' and 'TF'.

[–] shape-warrior-t@kbin.social 3 points 1 year ago* (last edited 1 year ago) (1 children)

Here's an O(n) solution using a stack instead of repeated search & replace:

closing_to_opening = {')': '(', ']': '[', '}': '{'}
brackets = input()
acc = []
for bracket in brackets:
    if bracket in closing_to_opening:
        if acc and acc[-1] == closing_to_opening[bracket]:
            acc.pop()
        else:
            acc.append(bracket)
    else:
        acc.append(bracket)
print(''.join(acc))

Haven't thoroughly thought the problem through (so I'm not 100% confident in the correctness of the solution), but the general intuition here is that pairs of brackets can only match up if they only have other matching pairs of brackets between them. You can deal with matching pairs of brackets on the fly simply by removing them, so there's actually no need for backtracking.

Golfed, just for fun:

a=[]
[a.pop()if a and a[-1]==dict(zip(')]}','([{')).get(b)else a.append(b)for b in input()]
print(''.join(a))