overview for sapient

Qualcomm's Powerful PC Chip Is Worse for AMD Than for Intel in c/technology@lemmy.world

[-] sapient_cogbag@sh.itjust.works 2 points 6 months ago

I just wish we'd get solid, affordable RISC-V already. Especially with the arbitrary-length vector instruction extension, which I find to be a much better design for hardware compatibility than the fixed width extensions in x86 (and ARM too, AFAIK).

It's OK if you cry in c/linuxmemes@lemmy.world

[-] sapient_cogbag@sh.itjust.works 2 points 9 months ago

Life Pro Tip

Microsoft published a guide on how to install Linux. in c/linux@lemmy.ml

[-] sapient_cogbag@sh.itjust.works 2 points 11 months ago

I'm not sure they'll succeed in extinguishing linux. But I do get the worry, especially with WSL.

What I am more worried about is them potentially extinguishing git via their control of github. In particular, with their github cli tool and such >.<

This gives Google LESS access to your data! - YouTube in c/opensource@lemmy.ml

[-] sapient_cogbag@sh.itjust.works 1 points 11 months ago

I've always thought of "blob" in yerms of ot being opaque and hard to understand, like a blob of putty with little structure you can dig into to get at it, you just have to take it as one solid barely understandable mass to use it.

Never thought of it as Binary Large OBject ;p

Lemmy Safety now supports cleaning local pict-rs storage from CSAM in c/selfhosted@lemmy.world

[-] sapient_cogbag@sh.itjust.works 3 points 1 year ago

Something that might be useful long term is trying to train an AI and release weights to identify CSAM that admins can use to check images. The main problem is finding a way to do this without storing those kinds of images or video :/

My understanding is that right now, the main mechanisms involved use several central databases which use perceptual hashes of known CSAM material. The problem is that this ends up being a whackamole solution, and at least in theory governments could use these databases to censor copyrighted or more general "unapproved" content, though i imagine such a db would lose trust quickly and I'm not aware of this being an issue in practise.

One potential solution is "opportunistic training" where, when new CSAM material gets identified and submitted to the FBI or these databases by various server admins, a small amount of training is done on the AI weights before the image or video is deleted and only a perceptual hash remains. Furthermore, if a picture is reported as "known CSAM" by these dbs, then you do the same thing with that image before it gets deleted.

To avoid false positives, you also train the AI on general non-CSAM content.

Ideally this process would be fully automated so no-one has to look at that shit - over time, ypu'd theoretically get a neural net capable of identifying CSAM reliably with few or no false positives or false negatives ^.^. Admins could also try for some kind of distributed training, where each contributes weight deltas from local training, or each builds up LoRA-style improvement modules and people combine them to reduce bandwidth for modification sharing.

The Wikimedia Foundation has joined the fediverse by setting up their own Mastodon server! in c/fediverse@lemmy.world

[-] sapient_cogbag@sh.itjust.works 1 points 1 year ago

I came up with an idea (on my alt account ^.^) to improve discoverability... it's more focused on instance or group discovery, though it may be doable for users with a probabalistic reverse index for efficiency. See: https://infosec.pub/post/429743