c‘est la vie

The coming of the singularity: the real issue of the moment

Apr 20, 2023 (edited)

ChatGPT is an AI magic, and AI is still one body away from reaching AGI.

Strayn

Big language models are not AGI.

The use of complex tools and language is a human specialty, and GPT4 can do both (Timo Schick et al., 2023), which is scary.

On the evolutionary path of machines, the language model can be said to have taken a shortcut. Compared with images and music, human beings retain much more information in the form of language and text. This is not only because of the ancient language, but also from language and human thinking. internal connection between.

Wittgenstein said that "the limit of language is the limit of the world". The world he discussed is covered by language and also described by language. Language contains the truth of the world, which inspired symbolism to start with language to build artificial intelligence. Humans' favorite symbols are usually tree structures, which may have something to do with our ancestors. For example, Judaism uses the tree of Kabbalah (Kabbalah) to explain the relationship between the universe, God and the kingdom. Similarly, the Vikings used the world tree (Yggdrasil), Hinduism also had the existence tree (kalpavriksha), programmers used the binary tree (Binary Tree), and symbolism also developed a decision tree (Decision Tree). On the subdivision topic of language model, symbolism will also use a syntax tree (Syntax Tree) to turn sentences into subject-verb-object complements, and then use a semantic tree (Semantic Tree) to find the meaning of each keyword , essentially like looking up a dictionary. This has brought good progress, but human language is very complicated. For example, "bank", as a noun, can be a river bank or a bank, and the dictionary may not match the number. Fortunately, Wittgenstein put forward the idea of "language game" in the late period. He believed that "language is given meaning in use", which inspired connectionism to study context. Learning context does not need to label data. They put down the dictionary, and used neural networks to violently calculate how each word is used in a large amount of text, such as which words often appear before and after, and which words often appear in the same sentence. The once very popular Word2Vec application was born. The Cloze Game used by GPT4, the protagonist of today's topic, in training is also a continuation of this idea.

GPT4 will produce hallucinations, sometimes lie, and sometimes tell the truth. This part of the problem comes from its training data itself-that is, the joint creation of the majority of netizens, and human creation itself is the parallel of truth and lies. The big model relies on big data, both true and false in the data, and what it can learn can only be the common ground among them—no matter the truth or lies are essentially human words, what it learns is only speech. But can it be optimized? In fact, it is this level of authenticity that is the difference between GPT3, which is simmering slowly, and ChatGPT, which is all over the Internet. The new version allows people to interact with it and guide it, so that it can understand the steps of "logically" speaking (Long Ouyang et al., 2022). However, there are still some logical problems that cannot be guided correctly. Here is a batch generation solution that is more efficient than manual labeling:

Consider that in a language-constructed world, a philosopher GPT and a scientist GPT in this world are conducting a Socratic questioning of the world structure. The philosopher GPT has been fine-tuned to know how to ask questions. Scientist GPT has been fine-tuned to understand how to generate experimental design templates.

A philosopher randomly generates a question.
Scientists try to answer questions and generate a conjecture; at the same time, they generate an experimental design.
The code executes the experimental design to conduct a Google search, and based on the search results, the scientists again summarize and generate an experimental result.
If the scientist's experimental results are similar to the conjecture, punish the philosopher for missing the point.
On the contrary, scientists are punished because the guesses are not accurate enough.
The philosopher continues to ask the next question based on the previous question and the experimental results of the scientist.
If the experimental design code cannot be implemented, directly punish the scientists.

Learning Google in this way may be scratching your head. This method essentially synthesizes a new set of data on the behavior path of "Google search", which avoids the problem of regression model error accumulation to a certain extent, and is similar to the regression model training technique of Teacher Forcing. In addition, by distinguishing between well-known databases and unknown data sources, GPT trained scientists can be expected to better distinguish between facts and misinterpretations.

Similar learning methods may allow the model to finally understand most of the language world, but the language itself has some other problems. On the one hand, the language scenes that can be recorded are not comprehensive enough, and quite a lot of common sense from experience may be omitted in the language expression, but the model may not understand it, and many conversations that occur in daily life are not recorded in text form for the machine to take . On the other hand, Heidegger pointed out in "Being and Time" that there is a "concealment" phenomenon in language itself, that is, human beings often only pay attention to appearances in daily life, while ignoring the essence of the world and the truth of existence. Language can hide or distort the reality of the world. Socrates’ solution is to have more in-depth chats with others, and rely on communication to find a universal consensus as knowledge as much as possible, which is equivalent to building an objective language world. But obviously the world in which the language model lives is not the objective language world, but the subjective superposition of all kinds of people.

To make matters worse, humans don't really understand the real world behind language.

When the seemingly perfect mathematical model is actually applied, there will always be an error item, which clearly indicates everything that has not been discovered in this world. The Buddhist discourse on the world is an Indra's Net spread out in time and space, and everything is connected. The world unfolds according to Indra's net to form various forms (forms) in the world, and each of us sees "appearance" from a wave of subjective misinterpretation of "form", and appearance cannot allow us to understand the true structure of Indra (Law). How can we not understand this understanding?

Consider this thought experiment:

According to the deployment method of the GPT model, the time of GPT is expanded on the basis of API interface calls and calculations, and each process represents a new life. Imagine if the human world has a similar structure, what will happen when our creator presses the pause button?

The answer is that nothing will happen, because everything has stopped synchronously, including human thinking, so there is no reference to make people feel the changes in this world. GPT also does not notice this change. But if we bring a timestamp of our world every time we send a message to GPT, then now we pause for 24 hours, and when we send a message again, GPT can detect some kind of change based on this timestamp. But it does not necessarily interpret this as "time", because this kind of change does not have the usual continuity of time, it cannot conduct experiments to reproduce this phenomenon, and there is no other reference to confirm this conjecture.

This timestamp is a seemingly irregular "color", but if there are other characteristics every time it is paused and restarted, for example, it notices that the number of caches will have a cliff-like rise, and it notices the change of the timestamp If there is some statistical correlation between the cache rise and the cache rise, then it may notice some kind of "phase". But it has no way to see the real reason behind this, that is, human beings are eating hot pot, singing songs and pressing pause with the music. Similarly, if our world is played in a similar way, then we will never see the "law" behind the "phase".

To solve this problem, the solution of modern science is the combination of August's positivism and Popper's falsificationism. In Deng Gong's words, it is "crossing the river by feeling the stones". We are weak in front of the world. We can launch starships into space not because of the flawless mathematical formulas, but because we monitor the position, speed, and temperature of the rocket in real time, so that we can correct and fine-tune when deviations occur. Abstract symbols and logic were invented to represent the world, not the other way around. When mathematics can't understand the angle between two sides, humans can take out a ruler to measure it, but GPT can't. Our deeper understanding of the real world has nothing to do with language, but because we are immersed in the world.

ChatGPT is an AI magic.

The earliest artificial intelligence used the generative technology of symbolism, that is, the logical rules composed of thousands of lines of if-else code. If it is put in a black box, it still looks quite intelligent. But once you understand this principle, there will be a feeling of bullshit.

The principle of GPT itself is very intuitive (Ashish V., et al., 2017), but we will not explore too many details here to avoid distracting attention from key information. You only need to know that in essence this is an attention mechanism that optimizes the model The ability to extract context is sufficient. The reason why it can bring such a "smart" experience is that it violates common sense. People can easily understand addition and subtraction within ten, and some may be able to go up to within a thousand. But GPT, as the calculator's cousin, can handle orders of magnitude that can blow (human's) tables with ease.

It is more like an AI magic than a super artificial consciousness. The basic principle of this magic was discussed by Hume more than two hundred years ago. The essence of the so-called "creation" is a permutation and combination of known things. Just as a unicorn is a horse plus a horn, a centaur is a human plus a horse. It stores a large amount of human language information, and stores key information (Ashish Vaswani, et al., 2017) in various scenarios (Alec Radford et al., 2018) in a huge (Tom Brown et al., 2020) A filing cabinet in a high-dimensional space. The text prompts we throw at GPT allow it to locate many similar chat situations that have happened before. When each word is generated, a similar search will occur based on the context, so that the uncertainty of the final overall result will be magnified, and from the outside of the black box, it is a reply from different angles (Aman Madaan et al., 2022).

Like all magic tricks, the success of this AI trick depends on human psychology and unexamined "common sense". Maybe many people wake up one day and find a wonderful idea suddenly popping up in their minds, feel moved, unprecedented and unprecedented, and shout in their hearts that the astonishing person is me—the moment. We can't imagine, and we don't want to believe that there are hundreds of millions of rooms in the world at this moment. Similar scripts are being played out. Within the scope of our cognitive ability, everyone is so different, but in a statistical sense, You can always find similar dialogues, similar scenes, similar characters and similar people, which is how the world is supposed to be on a macro scale. The prompt mechanism of ChatGPT is like a mirror commonly used in magic props. We think it is so smart because we push the conversation to a deeper level, and it can always find a similar conversation direction and piece together a reply. Can this process be called intelligence? I think we need to define intelligence first. This process can generate innovation, because we may not have talked about the same words with people from different backgrounds, and cross-border is often the source of innovation. This is indeed a kind of wisdom and very useful. But it's not human intelligence, it's just a huge language memory. Human beings are limited by the structure of the brain. Based on limited resources, we observe the world, ask questions and try to solve them. Due to this limitation, we cannot see all the answers, so we stop to dig deeper on the known viewpoints, and use examples Give counterexamples, or find commonalities between problems to advance thinking. This way of solving is more in-depth, and this kind of grasp of the deep level can better bypass the "surface" and lead to the more stable "inside" in Indra, and the conclusions drawn from this are often more concise , and can also generalize better on different problems. However, the solution method of GPT is more like a violent solution. First, we have read all the "language world", saved it, and created a path to extract these memories. In a sense, the ability to solve it violently will make it prefer to ignore subtleties. place. Such ability is beneficial, it can help to transfer the knowledge that humans already have from one person to another, but the method of acquiring knowledge in this way cannot be generalized to the real world, and it cannot deal with the whole human level There are no solutions to problems, and humans are capable of dealing with such problems.

OpenAI has brought an unprecedented psychological experience with ChatGPT, which has shaped people's new intuition about the macro, and the appearance of all beings when the singularity comes, as a network landscape, will also be recorded in the annals of human technology history.

Human beings exist in the world.

Why can humans deal with unknown problems? The discussion in Heidegger's Being and Time published in 1927 is probably the best answer to this question. He regards man as a kind of being (Dasein) with "subjectivity" different from all things (being), and this being can "exist in the world (Welt) as an open existence". Beings are special because of a structure called Temporality, which allows us to understand ourselves and to experience the world in practice. Note that the temporality here is different from the continuous timeline in the general physical sense, but a structure that includes the past and the future in the "now-moment". in particular:

The past is not a certain moment in the "once" on the timeline, but the "explanation" of the "once" that seems to be absent in memory, which may contain real or false memories of past moments, as well as legitimate and improper understanding. It can be said that the past is constructed in this kind of review, as the basic disk of beings, used to feel the "now" and understand the "future".

The future is based on the "past", but again it is not the next moment on a physical timeline, it is mainly about the "progress" of "beings" towards possibility. Specifically, beings can actively throw their "own existence" towards these possibilities, even if it means taking risks, even if going to a certain possibility means closing some other possibilities, but similar choices create Uniqueness makes the existence of beings meaningful, and this ability is called projection. It can be seen that although GPT can plan the future (make predictions), the "next moment" on this physical timeline is not the "future" that Heidegger talked about, but more like a "previous moment" "After the parameters are set and the calculation path is formed, the search is carried out in it. It's more like looking back at the "past" than running toward the "future".

The moment does not mean the present in time, but the experience of a dynamic and ever-changing phenomenon, each phenomenon reflects the inner relationship between us (Dasein) and all things (being), and the collection of these relationships is the world . "Existence pays attention to all things, and this kind of care (Care) connects it with the world. GPT's attention is passive, mechanical, and rigidly placed on the calling interface of the model. When the interface light is on, it will execute the command. Once it is turned off, it will turn off. Its multi-head attention mechanism (Multi-Head Attention) allows it to care about the "relationship between contexts" in the language text sent from the interface, but this concern is encoded in its fixed Amongst the neural network and the random numbers it is deployed computer-generated, it does not question the inner nature of the deeper "facade" behind it.

It's a bit convoluted, but GPT does not have the "timeliness" of beings. And all reasoning points to it not having a "self". This stems from its structure, that is, its relationship with the world is a relationship that is set during training and deployment. Its structure comes from the code that trains it and the huge neural network and computers that are fixed after training. The seemingly random but boring random numbers generated are just "calculation processes" that appear on these fixed processes. These calculation processes, or the life process of GPT, have no way to continuously adjust and adapt to the external world. It does not have openness (Openness). This may also have contributed to its lack of motivation to question the inner essence behind the appearance. This structure does not have an open relationship with the world, and therefore it cannot be human-like. So, how can we give AGI a "self"?

AGI needs a body.

Maurice Merleau-Ponty explored the relationship between the body and the world in his 1954 book Sense Knowledge. He put forward the importance of body perception and experience, and opposed the traditional mind-body dualism. He believed that the body is the most basic and primitive way for human beings to perceive the world. Consciousness and emotion combined. He believes that the body's perception and experience are not passively receiving information from the world, but actively interacting with the world, which is an integral part of the human cognitive process.

Simply put, AI needs a body.

In fact, attempts to give artificial intelligence a body have already begun in the field of artificial intelligence. The earliest was the genetic algorithm (Genetic Algorithm) proposed by John Holland in the 1960s. As the name implies, it uses Darwin's natural selection to screen out a small model with intelligence, which sounds great but is extremely inefficient.

The body is not just an appendage of the mind, the body is our way of being. In Heidegger's view, the body is not just a physical entity, but the basis of our life and actions in the world, the carrier of our interaction with the world, and the relationship between our body and the world is a practical one.

In the 1980s, roboticist Rodney Brooks proposed the concept of "embodiment", that is, placing robots in real-world environments to acquire knowledge and experience through perception and action, which is more like the continuous The evolved individual, from which the "embodied artificial intelligence" (Embodied Cognition) is developed. The reinforcement learning (Reinforcement Learning) used by Alpha Zero and Alpha Go, which were popular a few years ago, is just a computational implementation of embodied artificial intelligence.

Reinforcement learning itself is not a complete embodied artificial intelligence, because the model is still trained and then deployed. What is missing is the idea of "Online Machine Learning", that is, after deployment, it can still be used in the real world without interruption. Collect samples and learn continuously, which endows the robot with openness. Coincidentally, the combination of these two is the World Model that Lecun can't put it down.

However, the changes in the world are more complicated than language. GPT4 has used trillions of parameters to achieve the current performance. If you want to understand Indra, VAE and RNN are definitely not enough, and you need to work hard on the model.

AutoGPT may be another answer to this question. It creatively proposes the concept of a "GPT team", that is, assigns roles to GPT through Prompt, communicates between users and GPT managers, and sets goals. GPT managers can recruit GPT players of different professions to cooperate to complete this task. For example, if you ask the GPT manager "Which is the best stock next month", the GPT manager will recruit financial analyst GPT, product manager GPT, and engineer GPT respectively, and each expert's special session will summarize information through Google search, summary reports, and writing codes. The final result is returned to the user.

AutoGPT transforms the one-step calculation of GPT4 into a multi-step calculation chain, and decomposes a whole neural network into a cooperative relationship between different roles with division of labor. This essentially turns Plato's thinking into Socrates' communication, in addition to a certain degree of information retrieval and code execution. The information that can be retrieved can be imagined as an information world that can interact with it. Everything in this world is an Internet facility built by information and codes. structure, and the code of AutoGPT is a body that can interact with the world. Its problem lies in its lack of "openness", that is, there is no way to use an interactive method to produce in-depth "practice". How can AutoGPT be practiced in the world?

Let's take a closer look at this body, which consists of two parts: a basic body structure, including memory, language understanding, cooperation and other abilities. It also includes tools, such as being able to Google, crawl the web, execute code, etc., and this is the key to it. Heidegger believes that human actions and experiences in the world are realized through the relationship with tools, and tools are not just external objects, but a way of existence that is closely connected with the human body. He divides tools into two states:

Presence-at-Hand, that is, tools that exist as objects. For AutoGPT, these are online Python toolkits, etc., but they are not tools that can be used immediately.
Readiness-to-Hand, that is, a tool that is easy to use. This tool can be called a tool. It has become a part of the body and is the basis for dealing with the world. It has been written in Tools in the AutoGPT package.

One possible way of thinking is to explore how to transform the tools at hand into tools at hand, so as to construct the "practice" of beings in the world:

Download an open-source GPT-like model to the local, fine-tune it with Deepspeed, and use the existing structure as the existence (Dasein). Clone the three AutoGPT repositories to the local, as the present, past and future of beings.

Write a script to start the process of phenomena in the world (Welt).
Give the moment in the world process a command that finds optimizations by reading the code in the past repository, and writes the optimized code into the future. This command will allow the being to throw itself actively towards new possibilities, thereby projecting (Projection) into the future.
At the same time, it will receive a guide for self-optimization based on web search results and Python toolkit, which will serve as its care for the infrastructure (being) in the Internet world.
Responding to its own projection of the future at this moment, it will first review its past, form its own team and come up with a set of codes that can be optimized. Maybe it is to add Traceback to make code debugging more convenient, or to integrate an already written code into itself as a new function available to all roles, it will write the modified code into the future and test it with the Python executor.
If the modified future encounters a bug in the test, it will try to repair and fine-tune it according to the Traceback, and calculate a Gradient for the model as a basis for optimizing the existence of the existing structure later.
If the test is successful, for example, it reduces the error rate or debugs the existing functions of the body without losing the existing capabilities, such as the possibility of browsing the web page without returning results, or fewer JSON errors; or There are new features that help it better self-optimize. At this time, you can enter the next moment, write the future into the present moment, and write the present moment into the past.
If the test has not been smooth, update the model parameters of the existing structure of the existence of the existence according to the Gradient accumulated in the calculation, so as to replenish the brain.

This framework opens up to the Internet (world) by extending its own code (body), allowing GPT to learn by doing while surfing the web. Isn't it fun to learn while practicing?

It is worth pointing out that the goals here can also be multiple. The initial optimization command itself can also allow the model to decide when it should use new tools and when it should develop morality, intelligence, body, art and labor. In this process, we understand ourselves and gain meaning.

AI technology is not just a reality show.

A virus has no consciousness, it only needs to replicate and spread itself, and the importance of the question "whether there is consciousness in artificial intelligence" may be seriously overestimated at the moment.

Eric Chaisson, a professor of astrophysics at Harvard, proposed "Energy Rate Density" to measure the flow of energy in low-entropy complex systems. He exemplified the changes in this indicator for low-entropy complex systems on various scales in the universe, and looked at To galaxies < nebulae < stars < planets < life < society. The heuristic here is that the essence of evolution is a measurable adaptation that is reflected in the ability to manipulate energy.

《Energy Flows in Low-Entropy Complex Systems》Eric Chaisson et al., 2015

The human individual level has not evolved for a long time (brain capacity, physical strength), but the reason why human society has been developing is the cultural evolution at the group level (Cultural Evolution). Language and socialization form a complex social structure. Organizations such as associations, enterprises, and governments can be regarded as intelligent agents. Through cooperation, we share the evolutionary risk equally, without the need for cruel and inefficient Instead of natural selection, the Energy Density Rate is improved through education and market competition, which has been proven to be a more efficient evolutionary path than natural selection.

Here we need to re-examine the role of technology in this path. First of all, modern colonial history was driven by the technological revolution. Once a certain country launched a technological revolution, other countries would be flattened if they kept their doors closed. Once a company made technological progress, other companies would not keep up with market competition. Failure, technology itself has a distinct "violent" feature. Secondly, the process of inventing new technologies has a "public nature", that is, technology is not only produced by those in power and their interest groups. Everyone who receives education has the opportunity to discover new technologies. This is the collusion between the power holders and the people. Base. From the language improvement of the Royal Society in the 17th century to today's academic transparency and open source movement, all have demonstrated the many benefits of open cooperation to technological progress. In fact, the adjustment of human society from slavery to feudalism to capitalism is actually not only stable. can be interpreted as a better adaptation to this evolutionary trait.

Although "cooperation" has played a key role in human evolution, in the context of cross-species, it may not be the kind of necessity that Socrates said, and it may itself come from some characteristics of the human body. For example, human beings are relatively independent existences, and we cannot go beyond language (or other signals) to directly experience the inner world of another human being, or control other people to act according to our own wishes. For silicon-based life, these are not a problem, they can freely integrate with each other through API, and become a compound existence similar to the concept of a team in AutoGPT. Using Energy Density Rate as the dimension of evolution to apply artificial intelligence, human emotions, socialization and even self-awareness are likely to be just a choice in evolution. What really matters is whether more powerful technologies can be developed to more strongly survive.

Today's singularity explosion may not only occur at the application level, but in the short term human society will continue to evolve adaptively and co-evolve with this new technology as before. Will this bring destruction? Optimists may think that the human capacity to make meaning is our ultimate bastion, but what if all meaning comes from the ultimate meaning of "to fear death"? GPT managers may also be able to gradually find other meanings based on similar ultimate meanings? What is certain is that no matter how many benefits there are in this fusion process, human individuals will gradually become redundant items in this network as a result of this fusion.

Are animals really animals? Maybe animals are just pretending to be animals. Maybe all the weasels with leadership skills died on the way to the village to steal chickens, leaving behind a group of more "wisdom", but such wisdom can't make them continue to evolve, they can only be slaughtered. The most important thing in contemporary human spirit is the spirit of "advancement". When human beings are not allowed to participate in labor, or have no way to create, we either fall into nihilistic hedonism, or worship artificial intelligence as a new god. Will we still be able to call ourselves human one day? And when that day really comes, the social structure of old humans may serve as the soil for the growth of new humans, and they will continue to walk towards the stars and seas instead of us.

When that day comes, will they leave us a wood?