• 0 Posts
  • 24 Comments
Joined 1 year ago
cake
Cake day: July 7th, 2023

help-circle
  • For people lacking context, Boeing split off and sold their division that became Spriti Aerosystems. The theory at the time was that Boeing’s core competency wasn’t building airplanes, it was managing relationships with other vendors. In particular, the actual plane manufacturing part of the company was undesirable due to perceived poor “Return on Net Assets.” The theory they pitched to shareholders was they should sell off non obviously profitable divisions so they reduced asset liability while keeping the same or better profits.

    That was their explanation, of course it was a terrible idea.








  • A major caveat I’ve noticed some people misunderstand: it’s corporate CLAs that are problematic. The Apache Foundation also requires contributors sign a CLA, but it’s to provide a legal fail safe and a way to update to say Apache 3.0 if need be one day. Apache’s non profit, open source mission aligns with respecting the rights of contributors and the community. Corporations, on the other hand, not so much.



  • CLAs can be abusive, but not necessarily. Apache Foundation contributors need to sign CLAs, which essentially codify in contract form the terms of the Apache 2.0 license. It’s a precaution, in case some jurisdiction doesn’t uphold the passive licensing scheme used otherwise. There’s also a relicensing clause, but that’s restricted to keeping in spirit, they can’t close the source.










  • Compression is actually a mathematical field that’s fairly well explored, and this isn’t compression. There are theoretical limits on how much you can compress data, so the data is always somewhere, either in the dictionary or the input. Trained models like these are gigantic, so even if it was perfect recall the ratio still wouldn’t be good. Lossy “compression” is another issue entirely, more of an engineering problem of determining how much data you can throw out while making acceptable compromises.


  • This is a classic problem for machine learning systems, sometimes called over fitting or memorization. By analogy, it’s the difference between knowing how to do multiplication vs just memorizing the times tables. With enough training data and large enough storage AI can feign higher “intelligence”, and that is demonstrably what’s going on here. It’s a spectrum as well. In theory, nearly identical recall is undesirable, and there are known ways of shifting away from that end of the spectrum. Literal AI 101 content.

    Edit: I don’t mean to say that machine learning as a technique has problems, I mean that implementations of machine learning can run into these problems. And no, I wouldn’t describe these as being intelligent any more than a chess algorithm is intelligent. They just have a much more broad problem space and the natural language processing leads us to anthropomorphize it.