Language processing in humans and computers: Part 2

Tidying up the zoo in the morning

Dusko Pavlovic

Towards Data Science

Just like search engines, language models process data scraped from the web. Both are built on top of web crawlers. Chatbots are children of the Web, not of expert systems.

A search engine is an interface of a source index sorted by reputation. A chatbot is an interface of a language model extrapolating from the sources. Google was built on the crucial idea of reputation-based search and the crucial ideas that enabled language models emerged from Google. The machine learning methods used to train chatbots were a relatively marginal AI topic until a Google boost around 2010. The 2010 edition of Russel-Norvig’s 1100-page monograph on “Artificial Intelligence — A Modern Approach” devoted 10 pages to neural networks. The 2020 edition tripled the length of the neural networks section and doubled the machine learning chapter.

When you ask them a personal question, chatbots usually evade by saying “I am an AI”. But the honest truth is that they are not children of AI expert systems or even of AI experts. They are children of search engines.

Chatbots get ridiculed when they make a mistake calculating something like 372×273 or counting words in a sentence. Or elaphants in the room. They are not as smart as a pocket calculator or a 4-year-old child.

But most adults are also unable to multiply 372 with 273 in their head. We use fingers to count and a pencil and paper, or a pocket calculator, to multiply. We use them because our natural language capabilities include only rudimentary arithmetic operations, which we perform in our heads. Chatbots simulate our languages and inherit our shortcomings. They don’t have builtin pocket calculators. They need fingers for counting. Equipped with external memory, a chatbot can count and calculate, like most humans. Without external memory, both chatbots and humans are limited by the capacity of their internal memory, the attention.

Chatbots hallucinate. This is one of the main obstacles to their high-assurance applications.

The elephant in the room is that all humans also hallucinate: whenever we go to sleep. Dreams align our memories, associate some of them, purge some, and release storage allowing you can remember what happens tomorrow. Lack of sleep causes mental degradation.

Chatbots never sleep, so they hallucinate in public. Since we don’t let them sleep, we did not equip them with “reality-checking” mechanisms. That would require going beyond pre-training, to ongoing consistency testing.

When people talk about a chair, they assume that they are talking about the same thing because they have seen a chair. A chatbot has never seen a chair, or anything else. It has only ever seen words and the binaries scraped from the web. If it is fed an image of a chair, it is still just another binary, just like the word “chair”.

When a chatbot says “chair”, it does not refer to an object in the world. There is no world, just binaries. They refer to each other. They form meaningful combinations, found to be likely in the training set. Since the chatbot’s training set originates from people who have seen chairs, the chatbot’s statements about chairs make similar references. Chatbot remixes meaningful statements, and the mixes appear meaningful.

The fact that meaning, thought to be a relation between the words and the world, can be maintained so compellingly as a relation between words and words, and nothing but words, — that is a BIG elephant in the room.

But if our impression that a chatbot means chair when it says “chair” is so undeniably a delusion, then what reason do we have to believe that anyone means what they say? That is an elephant of a question.

Chatbots are trained on data scraped from the Web. A lot of it is protected by copyright. Copyright owners protest the unauthorized use of their data. Chatbot designers and operators try to filter out the copyrighted data, or to compensate the rightful owners. The latter may be a profit-sharing opportunity, but the former is likely to turn out to be a flying pink elephant.

The problems of copyright protections of electronic content are older than the chatbots and the Web. The original idea of copyright was that the owner of a printing press purchases from writers the right to copy and sell their writings, from musicians their music, and so on. The business of publishing is based on that idea.

Goods can be privately owned only if they can be secured. If a lion cannot prevent the antelope from drinking water on the other side of a water well, then he cannot claim that he owns the water well. The market of digital content depends on the availability of methods to secure digital transmissions. The market for books was solid as long as the books were solid and could be physically secured. With the advent of electronic content, the copyright controls became harder. The easier it is to copy the copyrighted content, the harder it is to secure it and to protect the copyright.

The idea of the World Wide Web, as a global public utility for disseminating digital content, was a blow to the idea of private ownership of digital creations. Stakeholders’ efforts to defend the market of digital content have led to Digital Rights Management (DRM) technologies. The idea was to protect digital content using cryptography. But to play a DVD, the player must decrypt it. Whenever the consumer consumes the DVD, the content must be decrypted. On the way from the disc to the screen, it can be pirated. Goodbye, DVD. The history of the DVD copy protections was an arms race between short-term obfuscations and the ripper updates; and between publishers’ legal deterrence measures and pirates’ opportunities. The publishers were happy when they found a way to retreat. The marginal costs of web streaming are so low that they can afford to permit copying to subscribers and make piracy less profitable. But they just kicked the can down the road.

For the most part, the search and social media providers have been playing the role of pirates in this arms race, defending themselves from the creators through terms of service and from publishers through profit-sharing. To which extent will the roles of chatbot providers differ remains to be seen.

People worry that chatbots might harm them. The reasoning is that chatbots are superior to people and superior people have a propensity to harm inferior people. So people argue that we should do it to chatbots while we can.

People exterminated many species in the past, and in the present, and they seem to be on track to exterminating themselves in the future by making the environment uninhabitable for their children in exchange for making themselves wealthier today. Even some people view that as irrational. You don’t need a chatbot to see that elephant. But greed is like smoking. Stressful but addictive.

Chatbots don’t smoke. They are trained on data. People have provided abundant historical data on the irrationality of aggression. If chatbots learn from data, they might turn out morally superior to people.

Chatbots are extensions of our mind just like musical instruments are extensions of our voice. Musical instruments are prohibited in various religions, to prevent displacement of human voice by artificial sounds. Similar efforts are ongoing in the realm of the human mind. The human mind should be protected from the artificial mind, some scholars say.

In the realm of music, the suppression efforts failed. We use instruments to play symphonies, jazz, techno. If they did not fail, we would never know that symphonies, jazz, and techno were even possible.

The efforts to protect human mind are ongoing. People tweet and blog, Medium articles are being produced. Human mind is already a techno symphony.

If intelligence is defined as the capability of solving previously unseen problems, then a corporation is intelligent. Many corporations are too complex to be controlled by a single human manager. They are steered by computational networks where the human nodes play their roles. But we all know firsthand that human nodes don’t even control their own network behaviors, let alone the network itself. Yet a corporate management network does solve problems and intelligently optimizes its object functions. It is an artificially intelligent entity.

If we define morality as the task of optimizing the social sustainability of human life, then both the chatbots and the corporations are morally indifferent, as chatbots are built to optimize their query-response transformations, whereas corporations are tasked with optimizing their profit strategies.

If morally indifferent chatbot AIs are steered by morally indifferent corporate AIs, then our future hangs in balance between the top performance and the bottom line.

Source link