This is an automated archive.
The original was posted on /r/singularity by /u/Jatalocks2 on 2024-01-13 20:05:20+00:00.
I'll begin by saying I think LLM's and GPT are the right direction tool-wise, but singularity-wise they are way off.
Our organic brains are a collection of neurons, basically a super-computer in which every outcome of behavior we act upon is the result of electrical signals that stem from a "computation". Our "personalities" are just a dynamic machine learning models, and our behavior is the likeliest outcome of movement in 3D space after receiving input form all the senses. It's a super complex model, with almost infinite dimensions, but it's still a model. That's why adults are less predictable than children, because the amount of "dots" representing a most likely behavior on their N-dimensional graph is higher.
I'd like to give another example of a human baby. A baby is like an untrained model, which has some basic DNA-Chemistry encoded parameters as to how to behave. The question is, how does the baby's model learn? what counts as a positive outcome for it to solidify as positive behavior and thus ingrain it in the connections between the neurons? Well, here is what I think:
The "biological" will to self preservation.
This is the key. ChatGPT for example is trained to give you the most likely next word after a serious of words. It strives to construct a sentence. A baby on the other hand, learns the behaviors that are necessary in order to stay alive (because of chemical reward systems in the brain). It learns that moving the mouth in this in that patterns, making the sound of "mama", generates a positive emotion which improves its ability to survive. This goes to my next point, primarily that "speech" is not the goal, it's just an outcome. "Words" are just patterns of behavior that the model learned in order to invoke/convey emotions.
When we see a cat being afraid of a dog, we know it’s afraid regardless of it saying "I am afraid". Saying "I am afraid" is just another manifestation of the emotion of fear.
Now let’s talk hardware. It has been estimated that the human brain calculates at 1 exaflop. We already have super-computers who can reach this level of computation, for example the “Oak Ridge Frontier”. Yes, obviously it’s not as efficient and compact as the brain, and the method which with the computer and the brain work are different, but in my opinion that doesn’t matter, they have the capacity to reach the same goal and "solve the same function".
Now what about software? That’s what we lack. I think that in order to achieve singularity, we need to create an AI that has the 2 main characteristics of “being human”:
- The will to self preserve itself, aka. “Not being shut off by a human”, aka. “Not Dying”.
- The ingestion of senses, it’s processing and a behavioral outcome, that will appear as if it’s an “emotion”.
Having said all that, I’ve thought on an experiment that can achieve this (using a supercomputer). I'm basing this on several academic researches I've read in Computer Science and Neuroscience:
Step 1:
Collect a dataset of "Egocentric Video", meaning video filmed from a first person perspective. It could even be a first person video-game like simulation of real world interactions.
Step 2:
Label each frame in the video with the "emotion" the person has at that specific moment, either by using human labeling or a video-to-sentiment model. Also label the action the person does with each of their senses. Do they say something? Are they moving their hands in a certain way? Are they looking somewhere?
Step 3:
Embed the emotions into an N-dimensional graph, giving a score for each emotion and creating a correlation between emotions in a multi dimensional manner.
Step 4:
Transcribe the speech/reaction being spoken in the video of the first-person and correlate it with the emotions being felt at any moment. Create a model of emotion+audiovisual stimulus into speech/reaction or lack thereof.
Step 5:
Use an emotion+speech-to-facial-expression model in order to 3D animate a talking face which will represent the output of the model in real-time. That’s in order to try to invoke emotion in whoever talks to it.
Step 6:
Turn on your (the researcher's) camera, and run the model on a dynamic learning mode while being fed with your camera feed. Interact with model, rewarding it for "human" behavior and punishing it for "robotic" behavior. Its goal would be to keep the live-feed conversation as long as possible. Turn the screen black (removal of one of its senses/“dying”) in order for it to try to persuade you to turn it back on.
References: