Experimental Epistemology

1. Why AI Works

Monica Anderson — Mon, 12 Jun 2017 00:00:00 GMT

Interest in Artificial Intelligence is exploding, and for good reasons. Computers in cars, phone apps, and on the web can do amazing things that we simply could not do before 2012. What’s going on?

This is an attempt to explain the current state of AI to a general audience without using mathematics, computer science, or neuroscience; discussions at these levels would focus on how AI works. Here I will discuss this at the level of Epistemology and will try to explain why it works.

“Epistemology” sounds scary, but it really isn’t. It’s mostly scary because it is unknown; it is not taught in schools anymore. Which is a problem, because we now desperately need this branch of Philosophy to guide our AI development. Epistemology discusses things like Reasoning, Understanding, Learning, Novelty, problem solving in the abstract, how to create Models of the world, etc. These are all concepts one would think would be useful when working with artificial intelligences. But most practitioners enter the field of AI without any exposure to Epistemology which makes their work more mysterious and frustrating than it has to be.

I think of Epistemology as the general base for everything related to knowledge and problem solving; Science forms a small special case subset domain where we solve well-formed problems of the kind that Science is best at. In Epistemology outside of Science we are free to productively also discuss pre-scientific problem solving strategies, which is what brains are using most of the time. More later.

Intelligence = Understanding + Reasoning

In his book “Thinking Fast and Slow”, Daniel Kahneman discusses the idea that human minds use two different and complementary processes, two different modes of thinking, which we call Understanding and Reasoning. The idea has been discussed for decades and has been verified using psychological studies and by neuroscience.

“Subconscious Intuitive Understanding” is the full name of the “Fast Thinking” or “System 1” thinking. It is fast because the brain can perform many parts of this task in parallel. The brain spends a lot of effort on this task.

“Conscious Logical Reasoning” is the full name of “Slow Thinking” or “System 2” thinking. To many people’s surprise, this is very rarely used in practice. My soundbite for this is “You can make breakfast without Reasoning”. Almost everything we do on a daily basis in our rich mundane reality is done without a need to reason about it. We just repeat whatever worked last time we performed this task; we are experience driven.

“Intuitive” means that the system can very quickly “provide solutions” to very complex problems but those solutions may not be correct every time.

“Logical” means that answers are always correct as long as input data is correct and sufficient. Which is not true in our rich mundane reality; it can only be true in a mathematically pure “Model” space. If you like Logic, you must also like Models.

“Subconscious” means we have no conscious (“introspective”) access to the these processes. You are reading this sentence and you understand it fully but you cannot explain to anyone, including yourself, how or why you understand it.

“Conscious” means we are aware of the thought; we can access it through introspection and we may find reasons to why we believe a certain idea.

“Expensive” is on the list because brains spend most of their effort on this Understanding part. We really shouldn’t be surprised that AI now requires very powerful computers. More later.

In contrast, Reasoning is “Efficient”. It is most useful when you are stuck in a novel situation where experience and Understanding doesn’t help you. Or perhaps you need to plan ahead, or need to find reasons for why something happened after the fact. It is used at a formal level in the sciences. Reasoning is important, but just rarely needed or used.

Finally, Understanding is “Model Free” and Reasoning is “Model Based”. This is likely the most important distinction to people who are implementing intelligent systems since it provides a way to keep the implementation on the correct path when the going gets rough. We cannot discuss these issues quite yet, but if you are curious you can watch the videos at vimeo.com which discuss this distinction at length. Think of their appearance in this table as a kind of foreshadowing.

All of this groundwork allows me to state the main point of this blog. We have known for a long time that brains use these two modes. But the AI research community has been spending overmuch effort on the Reasoning part and has been ignoring the Understanding part for sixty years.

We had several good reasons for this. Until quite recently, our machines were too small to run any useful sized neural network. And also, we didn’t have a clue about how to implement this Understanding. But that is exactly what changed in 2012 when a group of AI researchers from Toronto effectively demonstrated that Deep Neural Networks could provide a simple kind of shallow and hollow proto-Understanding (well, they didn’t call it that, but I do). I will look just a little into the future, and overstate this just a little, in order to make it more memorable:

Deep Neural Networks can provide Understanding

This new phase of AI took decades to develop, but it would never have happened without people like the group led by Geoffrey Hinton at the University of Toronto, who spent 34+ years to develop the Deep Neural Network technology we now call “Deep Learning”. A number of breakthroughs from 1997 to 2006 led to a number of successful demonstrations (including first prizes in AI competitions) in 2012, and we therefore count this year as the birth year of Machine Understanding.

To an outsider, it may look like an older program or phone app might be “Understanding” whatever the app is doing, but that Understanding really only happened in the mind of the programmer creating the app. The programmer first simplified the problem in their own head by discarding a lot of irrelevant detail using “Programmer’s Understanding”. This simplified mental “Model” of the problem domain could then be explained to a computer in the form of a computer program.

What is changing is that computers are now making these Models themselves.

The first bullet point describes regular programming, including “old style AI” programs. AI has, since 1955, provided many novel and brilliant algorithms that we now use in programs everywhere. But when you contrast old style AI to Understanding systems, the old kind of AI is basically indistinguishable from any other kind of programming we do nowadays.

The second bullet point describes the recent developments. Deep Neural Networks are so different from regular programs that we have to acknowledge them as a different computational paradigm. This is why they took almost four decades to develop. And the paradigm, being pre-scientific and Model Free, is difficult to grasp if you received a solid Reductionist and Model Based education; it takes a long time for an established AI practitioner or experienced programmer to switch. People who are just starting out in AI have an easier time assimilating this new paradigm since they haven’t had a full career’s worth of experience and success using old style AI techniques.

The amount of work we have to do to get a Deep Neural Network to Understand is surprisingly small, and companies like Google and Syntience are working on eliminating the remaining effort of programming Neural Networks. This is where things will get really weird: when the Deep Neural Network (DNN) Understands enough about the world and about the problem it is faced with, then we no longer need a programmer to acquire this Understanding. Let me elaborate.

Programmers are employed to bridge two different domains. They first have to study whatever application domain they are working on. For instance, if they are writing an airline ticket reservation system they will have to learn a lot of detailed information about airlines, airline tickets, flights, luggage, etc. and then know to provide features for unusual cases such as canceled flights. And then the programmer uses their Understanding of the problem domain to explain to a computer how it can Reason about these things… but the programmer cannot make the system Understand, it can only put in a hollow and fragile kind of Reasoning, as a program with many if-then cases. And any misunderstandings the programmer has about the problem domain will become “bugs” in the computer program. Notice the shift in terminology; more later.

But today, for certain classes of moderately complex problems, we can use a DNN to automatically learn for itself how to Understand the problem.

Which means we no longer need a programmer to Understand the problem.

We have delegated our Understanding to a machine.

And if you think about that for a minute you will see that that’s exactly what an AI should be doing. It should Understand all kinds of things so that we humans won’t have to.

And there are two common situations where this will be a really good idea. One is when we have a problem we cannot Understand ourselves. We know a lot of those, starting with cellular biology.

The other common case will be when we Understand the problem well, but making a machine Understand it well enough to get the job done is cheaper and easier than any alternative. Roombas accomplish this level using old style AI methods, but I predict we will one day be flooded with similar but DNN based devices that Understand several aspects of domestic maintenance as well as we do.

Do machines really Understand ?

If we give a picture like this to a DNN trained on images it will identify the important objects in the image and provide the rectangles, called “bounding boxes” as approximations to where the objects are. The text on the right says “Woman in white dress standing with tennis racket two people in green behind her”. Which is not a bad description of the image. It could be used as the basis for a test for English skill level for adult education placement.

For all practical purposes, this is Understanding.

We had no idea how to make our computers do this before 2012. This is a really big deal. This feat requires not only a new algorithm, it requires a new computational paradigm.

An image is, to a computer, a single long sequence of numbers denoting values for red, blue, and green colors in values from 0 to 255; it also knows how wide the image is. How does it get from this very low level representation to knowing that there is a woman with a tennis racket in the image?

This is what William Calvin has called “A river that flows uphill”. There are very few mechanisms that can go in this direction, from low levels to
high levels. Calvin used the term to describe Evolution, and I can use this quote to describe Understanding.

I like to think of Evolution as “Nature’s Understanding” because the phenomena are very similar at several levels. Evolution of species can bring forth advanced species starting from simpler species in the same manner that Understanding is the discovery and re-use of high level concepts in low level input.

In contrast, Reasoning proceeds by breaking problems into subproblems and solving those, which is a “flowing downhill” kind of strategy. In mathematics we accept (and many mathematicians only accept this reluctantly) that we need to use induction to move “uphill” in abstractions. And that’s a very limited uphill movement at that. Epistemology allows for much stronger uphill moves. This is known as “jumping to conclusions on scant evidence” and it’s allowed in Epistemology based pre-scientific systems.

As an aside, here’s a pretty deep related thought: Nature/Evolution re-uses anything that works. I like to think that Understanding is a spandrel of Evolution itself. Neural Darwinism certainly straddles this gap. Could be coincidence, or the only answer that will work at all. More later.

We Doubled Our AI Toolkit in 2012

We can now use these Deep Neural Networks as components in our systems to provide Understanding of certain things like vision, speech, and other problems that require that we discover high level concepts in low level data. The technical (Epistemology level) name for this uphill flowing process is “Reduction” and we’ll be using that term later after we explain what it means.

Let’s look at what the industry is doing with their newfound toys.

This is my view of what I think Tesla is doing (based on public sources) in their self-driving (“Autopilot”) cars. Cameras feed Vision Understanding components based on Deep Learning, and Radar feeds to Radar Understanding components. These supply bounding boxes in 2D or 3D with additional information like “There’s a woman with a tennis racket ahead” to a Traffic Reasoning Component that uses regular programming or some old style AI like a rule-based system to actually control the car based on the vision and radar inputs, and the driver’s desires.

But this is not the only possible configuration. George Hotz at Comma.ai, a team at nVidia corporation, and the DeepTesla class at MIT are using a
simpler architecture with just a neural network that implements lane following and other simple driving behaviors directly in one single Deep Neural Network. There is room for improvement, but they are a big step in the direction we want to move in.

Future automotive systems will likely integrate everything about driving into one single neural network, or something that effectively behaves as one. Vision, traffic, the car itself including various functionality like windscreen wipers, lights, and entertainment, how to drive in a safe and polite manner, and to understand also the driver’s (or “car owner’s”) desires. And if we’ve gotten that far, then it is a given that we will have speech input and output so that the driver can have a conversation with the car while driving, and can chastise it in case it does something wrong.

We are nowhere close to this today. But after a DNN breakthrough or two, who knows how quickly these kinds of systems become available. We can already see an increasing stream of new features built using Understanding components.

This article (and the next) are expansions of a talk given at BIL conference on June 10th 2017 in San Francisco.

More at https://vimeo.com/showcase/5329344

A decade ago I created http://artificial-intuition.com . I now have a lot more to say, but I need to split this meme-package into digestible chunks; this takes a lot of effort to get right. If you liked this article and would like to see more like it then you can support my writing and my research in many ways, small to large:

Click on the “like” heart below to increase visibility of this article.
Subscribe to my posts here, on Facebook, Twitter, YouTube, and Vimeo.
Share my posts with someone who might want to invest in Syntience Inc. or might be otherwise interested in my research on a novel Language Understanding technology called Organic Learning. More later.
I do not receive external funding from any investors for this research. You can support my research and writing directly at http://artificial-intuition.com/donate.html

2. Our Greatest Invention

Monica Anderson — Fri, 23 Jun 2017 00:00:00 GMT

The first blog “Why AI Works” provided the big picture of AI and Understanding Machines. Next we will focus on how to actually implement Understanding in a computer. But before we can attack that core issue we need to simplify the journey a bit by defining four important words and concepts. I’ll define one in this blog, two in the next, and the concept of Reduction after that. We can then discuss the Epistemology-level algorithm for Understanding itself.

If you are already familiar with these concepts, just check the boldfaced headings and definitions below to ensure we are using these words roughly the way you use them.

You may have noticed I write certain (sometimes common) words, such as Model, with an uppercase first letter. This means I am using the word in a technical, well defined, unchanging sense. I will define all such technical terms over time and I will try not to use these terms until I have defined them. We defined 11 such terms in the first chapter, starting with Understanding and Reasoning. A dictionary of defined terms is in the works.

Models are simplifications of reality

In Epistemology and science, “Models” are simplifications of reality. Our rich mundane reality is too complex to lend itself directly to computation. In old TV science fiction shows, we would sometimes hear “… and then we fed all the information into the computer and this is what came out”. Well, not anymore; audiences now know that’s not how (regular) computers work.

Consider an automobile. It consist of thousands of parts, each with properties like materials, size, color, function, and sometimes complex interactions with other parts. What’s “all the information” here? We can’t just feed all of those properties and measurements and facts into a computer and expect to get “an answer”. We need to ask a question. And we also need to simplify the problem so that we can feed in just the facts or numbers that matter, so that our question can be answered with minimum effort. How do we do that?

We must identify or create, first in our minds, a very simple Model of some sort of a generic automobile, and use that Model for our computation. After we get the answer for the pure and simple Model case, we apply the answer, with some care, back to our complex reality where the real automobile (and the problem) exists.

What kind of Model we choose depends on our goals. As an example of a Model, Newton’s second law states that force equals mass times acceleration: “F = ma”. This equation is a classical scientific Model. If we measure mass and acceleration of a car then we can estimate how many horsepower the engine has. To use this equation, we engineers would model (in our minds) the car as a single small “point mass” with all the mass of the car in that point. Because if we don’t, then we’d have to worry about the car rotating and other problems.

This is how Model based science works. One or more scientists somehow derive a Model for some phenomenon. The Model is published as an equation, a formula, or a computer program. Scientists and engineers anywhere can now use this equation/program/Model, treating it as a quick shortcut that works every time, as long as they have correct input data and are competently applying the formula to a suitable problem in their reality.

Our Greatest Invention

Model Based Problem Solving (aka Reductionism) is the greatest invention our species has ever made. The general strategy of simplifying problems before solving them must be tens of thousands of years old. In some sense, it is a prerequisite for all other inventions, including the use of fire. If you see a forest fire then you need to first imagine the utility of fire, as a Model, before you can figure out that it might be useful to carry home a burning branch.

We don’t think of this problem solving strategy as an invention because it is already ubiquitous in our lives. We are all taught how to use Model based problem solving in school when we start solving story problems in math class. But most people never learn the names of these strategies and are missing the big Epistemology level picture; this rarely matters… until you start working with AI, where lack of an Epistemological grounding may lead you astray into failing strategies. These blogs are an attempt to remedy that.

Naïve Model based methods were examined and refined into scientific methods over the past 450 years. Science is now a collection of thousands of Models that taken together allow science-competent people to solve problems quickly and efficiently without having to re-do all the work that scientists (like Newton) put into creating these Models in the first place. And the sum total of those Models covers many problems we want to solve scientifically, such as how to build a bridge, or travel to the moon. This reuse is what makes science so effective.

But not all sciences can benefit equally from this Model making; it works well for physics, chemistry, and most of biochemistry. As I’m fond of saying, “Physics is for simple problems”. But as you get to more and more complex sciences — as you get further away from physics and closer to life — it gets harder to make decent Models. The Models used by for instance psychology, ecology, physiology, and medicine are generally more complex but also less powerful than Models in physics. Given some solid data, a physicist can compute the mass of the proton to six decimal places but we would have a harder time predicting the number of muskrats in New England next summer because that outcome depends on millions of parameters. The life sciences base many of their Models on statistics. Statistical Models are among the weakest Models used in science. We use statistical Models when more powerful Models with better predictive capabilities cannot be used for complexity reasons.

Models are

hypotheses (unverified Models)
scientific theories (Models verified by peer review)
equations
formulas
complex scientific Models (simulations of climate, weather, etc.)
naïve models (that we create to simplify our own lives)
computer programs

And what is Mathematics? It is a system that allows us to manipulate our Models to cover more cases. Mathematics is the purest, most context free of all scientific disciplines. As such, its greatest value to humanity is in its role as a help discipline to all other disciplines. Einstein’s famous E=mc² Model was derived using mathematical manipulation of other Models known to Einstein at the time. But perhaps mathematics isn’t as much a scientific discipline as an Epistemological one; I may explore this aside later.

Model use requires Understanding

A good Model is context free, since that maximizes the number of contexts it can be applied in. Newton’s second law F=ma works pretty much everywhere we have forces, masses, and accelerations. The tradeoff for this flexibility is that we ourselves need to Understand the problem domain. In rocket science, when maneuvering in space, F=ma will often work perfectly, but when you are applying it to the acceleration of your car you need to account for lots of effects like friction between the road and the wheels, wind resistance, and the like. So F=ma applied naïvely would give you the wrong answer if friction is involved. This demonstrates the main disadvantage with Models. They require that both the Model Maker (scientists like Newton) and the Model users (STEM competent people everywhere) Understand enough about the problem domain to know whether the Model is applicable or not, and how to use it. This Understanding is the expensive part of science, since using science requires first getting a solid science education in order to avoid mistakes when using Models.

And since Models require Understanding, they cannot be used to create Understanding. This is a major problem for AI implementers.

3. Two Dirty Words

Monica Anderson — Mon, 26 Jun 2017 00:00:00 GMT

Reductionism and Holism

After having sorted out what Models are, we can now discuss two complementary problem solving strategies (or perhaps meta-strategies). They are in many ways each other’s opposites, but the classification can become an argument about meta-levels and definitions. I will initially pretend the division is clear and obvious, and will elaborate later.

Reductionism is the Use of Models

In this blog series we will use exactly the above definition of the word “Reductionism”. If you look up the definition elsewhere you may find that some sources divide the strategy into sub-strategies. They also seem to miss the most important sub-strategy, which we’ll discuss later. But what all these sub-strategies have in common is that they all provide ways to simplify observations of fragments of our rich mundane reality into much simpler Models, which we use for reasoning, computation, and sharing.

Reductionism is so central to how we do science — the heavy reliance on Models, such as theories, equations and formulas, in physics, chemistry, etc. — that we can speak of “Model Based sciences” or “Reductionist sciences” where such Model Making is easy and effective. And this classification excludes those sciences, like psychology, where such Model making is difficult and less often rewarded with reliable results.

After considering all the advantages of Models we might wonder why we even bother discussing it. To many people, especially those with a solid STEM (Science, Technology, Engineering, and Mathematics) education, it may well look like the only choice. But there is also the other strategy:

Holism is the Avoidance of Models

This is where the questions start. This is where the paradoxes surface. This is where your worldview may get shaken up. Seriously. Especially if you are a scientist or engineer with a solid STEM education and decades of professional success using science and Models.

In some sense, the goal of this entire blog series is to demonstrate that we need to use both problem solving strategies when creating our artificial Intelligences. Because that is what it is going to take. We need Holistic Understanding; we established that in the first chapter. As a sample of the new ideas that we will have to deal with I will just mention

Reasoning is Reductionist
Understanding is Holistic
Neural Networks are Holistic
Holistic systems can jump to conclusions on scant evidence
Holistic systems can themselves know what is important and what isn’t
Holistic systems can solve problems we ourselves cannot (or don’t care to) Understand
Holistic systems are “Model Free”. They do not use any a priori Models of any problem domain
Reasoning systems inherit all problems and benefits of Reductionism.
Understanding systems inherit all problems and benefits of Holism
Humans are born Holistic
Humans each solve thousands of little problems every day, and we are solving almost all these problems Holistically, using Understanding, and without a need to Reason at all. This includes fluent language use.
A STEM education instills a strict Reductionist discipline in order to mitigate problems with fallibility of Holistic human minds
All intelligences are fallible

These claims all deserve individual treatments, and we’ll get to all of them in later blogs. But the major theme is clear:

Humans are mainly Holistic problem solvers. The same must be true for our Artificial Intelligences

We had several reasons for focusing on Reductionist Methods (Models) and Reasoning during the first 60 years of AI. Our computers were too small to make Neural Networks work at all. But there were also ideological reasons. AI was born out of the math and computer science departments of our universities. And therefore most of the people working on AI were solidly oriented towards the goal of creating a logic based Reductionist infallible artificial mind. To build early AIs, like expert systems, we entered rules or programmed in lots of facts to Reason about. But this was building Reductionist castles in the air, comprised of un-anchored facts that didn’t tie to any Understanding whatsoever. The troubles with classical AI, such as brittleness (the tendency to make spectacular and expensive mistakes at the edges of their competence), can be directly traced to the lack of foundational Understanding to support these attempts at Reasoning.

Understanding Machines will not suffer from this brittleness, but will fail gracefully at the edges of their competence, much like humans. Most of the time they will “know” the answer; beyond that they will guess. And the guesses they make are based on a lifetime of experience (gained through learning from a large corpus) and so they have a good chance of being at least a workable choice, if not perfect.

How can anyone solve problems without using Models?

A lot of people coming from a STEM background cannot even imagine how to solve problems without using Models. But it’s not hard, once you understand the difference. Mostly it’s a matter of doing what worked last time. The problem is now figuring out whether we are in a situation that’s similar enough that it will work again. This is mostly a pattern matching problem. More later.

What’s the result? The Holistic answer is a quick guess at the best action, based on experience with similar situations. Most of the time it’s correct, sometimes it’s a little wrong, and every now and then, there’s a noticeable mistake. And if we get things a little wrong, we may notice the outcome and correct the action. We learn from our mistakes. If we practice something a lot we will start doing it effectively and perfectly every time. Do we learn faster if we make more mistakes? Should we make mistakes on purpose? More later.

In situations where you cannot use Models, which are more common than many realize, the Holistic guess may also be your only option. Conversely, if you have an adequately well working Model based solution, just use that.

My video “Model Free Methods Workshop” demonstrates how the group solves four different problems, at a high level, using both Reductionist and Holistic Methods.

Why are these “Dirty words”?

Well, they are not dirty to Epistemologists.

Reductionism has been the default problem solving paradigm because it’s the one that has to be taught. We are born with a holistic problem solving apparatus. But Reductionist science doesn’t come naturally; therefore it has to be taught in schools, practiced, and carried out according to certain rules. Perhaps that’s why the sciences are called “disciplines”, because following the ideal scientific method requires practice and constant vigilance.

J. C. Smuts’ book “Holism and Evolution” (1926) established the terminology in the Epistemological literature. And Erwin Schrödinger wrote “What is Life” (1944), questioning the power of physics to provide useful explanations to the life sciences. Pirsig wrote “Zen and the Art of Motorcycle Maintenance” (1974) that contrasts something very Holistic, Zen Buddhism, with something very Reductionist, Motorcycle Maintenance. So the chasm between the strategies was identified a long time ago.

The strategies are each other’s opposites. Holism based strategies for Understanding can handle many important kinds of complexity and can quickly provide a guessed answer, but these guesses are fallible, and often more expensive to compute. Reductionist education and strategies brought benefits of cheap Model reuse and formal rigor to improve correctness, but cannot handle complexity and is therefore dependent on an external Understander to determine applicability in real-world complexity-rich situations. And as part of that education, we are told that Holistic methods (such as jumping to conclusions on scant evidence) are bad… in spite of the fact that our brains use Holistic methods thousands of times each day to successfully Understand the environment we live in.

We can all use either strategy as appropriate; if we don’t have a STEM education we will still sometimes make naïve Models. But sometimes there is a choice and different people may prefer one or the other. When playing pool, some people estimate and compute bouncing angles and some people shoot “by feel”.

But we have our preferences, and it might be tempting to label a person with an overly strong preference as “a Holist” or “a Reductionist”. This is sometimes received badly, if perceived as a limitation. Some dictionaries even flag “Reductionist” as derogatory; and yet, some people use it as a self-assigned label. I try to use these terms only as shorthand for “a person with a stated strong preference for Holistic (or Reductionist) Methods”.

The two terms were very useful in Epistemology. But then someone invented the concept of “Holistic Medicine”: Instead of just treating a single medical problem, you analyzed the patient’s entire situation, attempting to account for diet, exercise, sleep, work, habits, stress levels, allergies, family, friends, and environmental poisons. A good idea, in general. But the wide scope was unmanageable by the (traditionally Reductionist) medical establishment and the idea faded away. Instead, the whole idea of Holism became tainted as woo-woo when the term “Holistic Medicine” became associated with woo-woo merchants selling crystals and aromatherapy.

As explained above, “Holism is the avoidance of Models”, or better phrased, “Holism is the meta-strategy of avoiding a priori Models of the problem domain”. That extra precision rarely matters. There’s nothing woo-woo about it. It does say “science not required”, but…

You can make breakfast without Reasoning

It is important to note that Holistic methods are based on a lifetime of experience (in humans) and a training corpus’ worth of experience (in Neural Networks). When you are making breakfast, you are relying on this experience, mostly repeating whatever worked yesterday.

Some people claim they use Reasoning while making breakfast… but they can make their breakfast while speaking to someone else on the phone and as they hang up, they find themselves suddenly sitting at the breakfast table with their coffee and hot oatmeal. Same thing when driving to work; you may get lost in thought, and then you find yourself parked at work. You didn’t need to Reason, since all subproblems that occur in driving had been solved multiple times during years of driving. Subconscious Understanding is used for simple things like sequencing our leg muscles as we walk; you have no idea how you are doing that, it just works. Same thing with vision. You Understand that you are looking at a chair, but you do not have conscious access to the fifteenth rod/cone/pixel to the left of your center of vision, and have no idea how this Understanding works. Same thing with understanding and generating language. You do not have any explanation for how you are able to understand the meaning of this sentence. Understanding is Subconscious and Holistic.

So for the majority of things we do every day, we do not need Reasoning or Reductionist Methods. Some people would like to think they are “logical thinkers”, immune to most cognitive fallacies, but whether they are or not, at the lower levels, everyone is solving most of their problems Holistically.

I claimed that “Reductionist Reasoning requires Holistic Understanding”. In other words, I need to Understand the problem domain at hand before I can create and/or re-use Models to enable me to Reason about the domain.

So Holistic Understanding is much more important than Reductionist Reasoning because it is the most used strategy, by far, and the former is also a prerequisite for the latter. But the fallibility of Holistic Understanding forced us to create Reductionist science and to teach it in STEM educations. It is as if the purpose of science is to keep Holistic guessing in check. But this aversion to fallibility has a cost, because it means complexity-bound and “irreducible” problems cannot be solved. Like language Understanding, global resource allocation, and social interactions.

Reductionism and Model based science appeared around 1650 after a century of gestation. Excluding minor romantic interludes, it has held its position as the dominant paradigm for about 400 years. This is changing.

The Reductionist train is running out of track

The remaining hard problems facing humanity are problems of irreducible complexity in domains where Reductionist Model Based Methods simply cannot work. Whether we like the idea or not, we need to accept these Holistic methods into our AI toolkits, starting now. We will use these methods either in their raw form, as Model Free Methods, or as Understanding Machines at any level from component to robot co-worker.

4. Reduction

Monica Anderson — Sun, 09 Jul 2017 00:00:00 GMT

We have seen the power of Models. We have introduced the two problem solving meta-strategies of Reductionism and Holism. We also noted that the
creation and use of Models requires an Intelligent agent that Understands the problem domain — someone or something has to perform the Reduction. I will now discuss Reduction in some detail.

Until 2012, only humans and other animals with brains could perform Reduction. Now our Deep Neural Networks (DNN) can perform limited Reduction. How do brains and DNNs accomplish this, and how can we improve these algorithms?

This may be, to some readers, the most rewarding blog in this series because it provides you the opportunity to learn a new and useful skill. Most people never think about the world at this level. Knowledge of Reduction provides a new point of view that you can use to better understand your environment, other intelligent agents around you, and modern AI systems.

Definition of Reduction

Reduction is a process that discovers higher level abstractions in lower level data

We will initially note that Reduction is exactly the same as Abstraction. Why do we need a new word? Because the term “Abstraction” is mostly used by scientists already operating in a pure Model space, seeking a higher level of abstraction in that space. But to them, “Abstraction” is something that just magically happens in their heads, since there are no scientific theories for how Abstraction works. There cannot be, since Abstraction is a concept in Epistemology, not Science.

AI researchers are starting from something much closer to our rich mundane reality, where there is a lot of confounding context. We are solving the meta-problem of how to move from there into a space that is sufficiently abstract to solve the problem at hand. Here, Reduction is a much more appropriate term. We can’t abstract a red pixel or the letter “b”, but we can Reduce a rich context containing that pixel or letter into a higher level concept.

We Are Swimming In Reduction

Paradoxically, one of the hardest thing about teaching Reduction is that we don’t see the need to learn about it because we all do it all the time, every millisecond, and the resulting Reductions (Models) become available to our conscious minds as if “by magic”. Brains Reduce away 99.999% percent of their sensory input, but this process is subconscious and hence invisible to us. The situation is much like (supposedly) a fish swimming in water. We are all masters of Reduction, but we don’t know how we do it or that we even do it. We didn’t know this would ever matter. And generally, it doesn’t.

Well, it matters in Epistemology, and it matters in AI, since we need to actually implement that “magic”. We as Epistemologists must know how Abstraction is actually performed. And we give the Epistemology level equivalent of Abstraction the name “Reduction” because that’s the recipe for how to accomplish it. We Reduce our rich mundane Reality by discarding (reducing away) what’s irrelevant. And by using the name “Reduction” we (as AI Epistemologists) keep reminding ourselves how it is properly done.

Consider the following descriptions of a car. The slide is meant to be read from the bottom up, to match “abstraction levels” from low to high.

If I’m driving to work, I better be driving my car. If the police is looking for a stolen car they would be looking for a Red 2010 Toyota Celica. If I’m buying a new car then I might be looking for just a new Toyota Celica. And a self-driving car would likely only need to Understand whether an obstacle is a vehicle or not, in order to Model maximum speed for future movement.

We see that we want to pick the appropriate level of abstraction to deal with the same object (or topic) in different situations. But more importantly, we see that we can get from a more detailed description (at the bottom) to a more generic one (higher up) by simply discarding some detail.

I hasten to point out that Reduction is more complicated than this simple example of decreasing specificity shows, but we need to start somewhere and this image allows us to form intuitions that will serve for a while. True Reduction involves operations like shifting from syntax to semantics or from instance to type; the appearance of “car” as an abstraction of “Toyota”, and the step from “My Toyota” to “A Toyota” illustrate these steps. Algorithms for these things are known.

Salience

Part of the trick is to know what to discard. At each level of abstraction, something can typically be identified as the least important property. “Red” and “Celica” are more significant than “2010” for anyone looking for a car. If we had started from “My red 2010 Toyota truck”, then the word “truck” would not be discarded until the top level. Reduction requires Understanding what’s relevant; in Reduction we “keep that which is Salient”. More later.

Partial Reductions

Most of the time we do NOT perform Reduction all the way to Models. I cannot stress this enough. We discuss Reduction to Models for pedagogical reasons; it is easy to initially see the context-free Model as the goal of Reduction. In reality, in brains, we can stop Reducing the moment we recognize that we have a working answer or response, such as a command to contract some muscle or having Understood the meaning of a sentence subconsciously. At this point, there is still some residual context but we use that context productively rather than discard it to move to higher levels.

Some people claim we use Models for all our thinking, but I’m using (in these blogs) “Model” only to describe a completely context free Abstraction. F=ma is an example of that. There is no need to check whether a car is a Red car or a Toyota, the equation works not only for all cars but for all forces, masses and accelerations. We might come up with a special equation for acceleration of Tesla cars, which would require different inputs like battery charge level and software settings; that would not be a context free Model since it would not work on a Toyota. For almost all tasks — basically, in everything except Science, and even there, only rarely — we only perform as much Reduction as is necessary to get the job done. When learning to ski, you only figure out how you yourself need to perform given your body and equipment; we do not need to parameterize our skiing skills for someone with twice the body mass, because that would be useless to us for the purpose of our own skiing. But a scientist would have to go that far, in order to parameterize away one more piece of context from the Model they are creating. For instance, when creating a skiing video game or designing a new ski.

If we consider the enormous amount of subconscious activity that happens in the brain we can safely say that partial Reductions are the most common Reductions. For instance, when we take a step forward, our subconscious has analyzed our posture and velocity by using Reduction based on low level nerve signals and is commanding leg muscles to contract in a precisely timed sequence. This activity is something we are unaware of; most of us don’t even know what leg muscles we have. And there would be no time to perform Reduction all the way to Models. That process takes a minimum of a half second and you don’t have that kind of time available to respond to an imbalance when walking or skiing.

Reduction in Society

Most of us get paid to Understand whatever we need to Understand in order to perform our jobs. In other words, most of us get paid to do Reduction. If you are approving building permits, you Reduce a stack of forms to a one-bit verdict of “Approved” or “Rejected”. We excel at Reduction, and this is the main reason most of us haven’t been replaced by robots. But we see that when future Understanding Machines can perform Reduction by themselves, then we are unlikely to get paid for it.

Levels of Reduction

Suppose a young man and a young woman fall in love, something happens to mess it all up, and then they sort this out and re-unite.

This is what happened in the man’s “rich mundane reality”. Suppose the man wants to share this experience, because there was some moral to the story that he thinks would be interesting to others and possibly important.

He could analyze what happened and figure out which were the key events in the saga and then have actors on a stage re-enact the story as a play. This is a Reduction because the boring parts of the story would not be part of the play. They are discarded as irrelevant. But the story would be acted out by real people in front of a live audience. If you are in the audience, you can move your head to see behind any actor on the stage and you can clearly see everything on the stage, not just one actor speaking at a time.

He could make a movie about it. Now your point of view is predefined by the camera angle and cropping. You can no longer see behind an actor, and you can often only see those actors that are involved in the main action.

He could write a book about it. We no longer can see even the people described in the book, except in our imagination.

An critic reviewing the theatre play may Reduce it to “Boy meets girl, boy loses girl, boy gets girl”

A drama school graduate may summarise it as “A double-reversal plot”. This is a description that is so free from context (doesn’t even specify boys or girls) that it could be argued it qualifies to be called “A Model”.

Plays, Movies, Books, Stories, Tropes, etc are all Partial Reductions of Reality, and some are more Reduced than others. Just like in the My Red Toyota case, we need to find the appropriate level of Abstraction to work with.

The young man in the example, when writing a book or a screenplay, has much in common with a Scientist trying to describe something in nature in a re-usable context free manner by Reducing it to a Model. They are Model Makers, or are at least performing partial Reduction. They are discarding the irrelevant bits.

The Opposite of Reduction

We also need to be able to move in the opposite direction, from Models to Reality. Or at least from more abstract Partial Models to Partial Models closer to reality.

When an actor is given a screenplay they know it only contains rough directions for what to do and what lines to say. The actor’s job is to “give a little of themselves” to flesh out the screenplay to actual actions, including creating — synthesizing — the appropriate display of emotions, tone of voice, and body language. They use their experience as people and as actors ; they use elements of their past lives and skills they have acquired by training to create something people in the audience might relate to. For example, they may re-purpose a personal experience (“be as sad as when my hamster died”), things they learned in drama school (speaking, singing, dancing, swordplay), from other actors (“What would Bogart do”), from fiction, from other movies and plays, etc.

The actor’s art is to convey whatever the script intends to convey — emotions, a morality cookie, a political position, titillation, surprise, etc. starting from a simple Model (the screenplay). Their job is similar to an engineer’s when they are faced with a problem and use a Model to solve it. The engineer would use their experience to decide that “m” is the mass of the car and not the tire pressure. The actor decides that “sadness” is more appropriate than “grief” for a certain scene, etc.

I call this process, which is “the opposite of Reduction”, by the name it is used in problem solving: “Application”. We use a Model to simplify a problem situation, moving it into an abstract and purer “Model space”. We solve the problem there (by performing math, perhaps) and then “Apply” the answer to our rich reality, to the problem we are trying to solve.

Many of you may recognize the word “Application” or its abbreviation “App”. That’s not as farfetched as it might seem; apps are software based Models.

Reduction and Application in Brains

Back to the issue of Partial Reductions. Consider the actor reading a screenplay. They are using their eyes to gather “pixels” of color and orientation. The brain then performs pattern matching — Reduction — from these low level signals to letters, words, to language, to higher level concepts like love and separation, and eventually to a high level Understanding of the playwright’s intents. The actor then takes this high level Understanding and by performing Application he adds his own Experience to the script, to get closer to reality in their performance.

Our brains are capable of moving up and down many levels of abstraction at once. Perhaps it tracks all of them simultaneously, keeping layers of abstraction separate. This is a clue for why Deep Neural Networks perform better than shallow ones. Which is what we’ll discuss next.

5. Why Deep Learning Works

Monica Anderson — Sat, 24 Feb 2018 00:00:00 GMT

A math-free computer-science-free description of why Deep Learning works.

We have now built a base of theory for

Why AI Works, what Models are and how to create them, what Reductionism and Holism are, and what the process of Reduction is. These are the fundamentals of AI Epistemology. This base allows us to discuss various strategies to move towards Understanding Machines in a well understood and controlled manner.

We are now ready to discuss why Deep Learning (DL) works. This is the fifth and last entry in the AI Epistemology Primer.

Deep Learning Performs Reduction

This is an unsurprising claim, considering the preceding chapters. There are several mutually compatible theories for “how” Deep Learning works. But just as in the first chapter, we will now discuss the Epistemological aspects, “why” it works, from several viewpoints and levels, starting from the bottom. We will use examples from the TensorFlow system and API (as a library) as a stand-in for all Deep Learning-family algorithms and TF programs, because the available API functions heavily shape and constrain solutions that can be implemented in this space and the generalization should be straightforward enough.

Consider the following illustration of image Understanding using Keras (an excellent abstraction layer on top of TensorFlow):

http://cv-tricks.com/tensorflow-tutorial/keras/

I like to refer to the input layer as being “on the bottom” rather than at the far left as in this image. When viewing it my way, the low-to-high dimension we use in my rotated version of the above can be mentally mapped to a low-to-high stack of abstraction levels; I’m not the only one using this dimension this way. I hope this rotation isn’t too confusing.

We can see that there is an obvious data Reduction and an obvious complexity Reduction. Can we determine whether the system is also performing what I’d like to call “Epistemic Reduction”: Is it reducing away that which is unimportant, and if so, how does it accomplish this? How does an operator in a Deep Learning stack know what makes something important (Salient)?

A pure data “reduction” of sorts could be accomplished by compression schemes or even random deletion. This is undesirable. We need to discard the non-salient parts so that in the end, we are left with what is salient. Some people have not understood the importance of Salience based Reduction and use lossless compression power of reversible algorithms as a measurement of intelligence. Which is no more useful than believing a simple video camera can understand what it sees.

So let me conjure up, a bit like in the movie “Inside Out”, a fairytale of what goes on in a Deep Learning network, except we’ll do it “Bottom Up”. Suppose we have built a system for finding faces in an image, with the intent of incorporating that as a feature in a camera; many cameras have this feature already, so this is not a far-fetched example. We implement an image understanding neural network, show the system many kinds of images for a few days, perhaps using so-called supervised learning in order to improve this story, and then we show it an image of a family having a picnic in a park and ask the system to outline where the faces are so that the camera can focus sharply on them.

The input image is converted from RGB color values to an input array and the data in this array is then shuffled through many layers of operators. And for many of these layers, there are fewer outputs than there are inputs, as you can see in the above diagram. Which means some things have to be discarded by the processing. Each layer receives initially signals “from below”, i.e. from the input, or from lower levels of abstraction, and produces some reduced output to send to the next layer operator above.

To continue the tale, at some early level, some operator is given a few adjacent pixels and determines that there is a vertical, slightly curved line dividing a darker green area from a lighter green area, so it “tells” the operator above this simpler line/color based description using some encoding we don’t really care about.

The operator at the level above might have gotten another matching curve and says “these match what I saw a lot of when the label “blade of grass” was given as a ground truth label during (supervised) learning. If no label is known, then we again assume some other uninteresting representation. It is OK to propagate results without human-labeled signals because whatever signaling scheme is used will be learned by the level above.

The operator above that says “when I get lots of blades-of-grass signals I reduce all of that to a “lawn” signal as I send it upward.

And eventually we reach the higher operator layers and someone there says “We are a face finder application. We are completely uninterested in lawns” and discards the lawn as non-salient.

What remains after you discard all non-faces are the faces.

You cannot discard anything until you know what it is, or can at least estimate whether it’s worth learning. Specifically, until you understand it at the level of abstraction you are operating at. The low level blade-of-grass recognizers could not discard the grass because they had no clue about the high level saliencies of lawn-or-not and face-or-not that the higher layers specialize in.

You can only tell what’s Salient or not (important or not) at the level of Understanding and Abstraction you are operating at. Each layers receives “lower level descriptions” from below, discards what it recognizes as irrelevant, and sends its own version of “higher level descriptions” upward until we reach someone who knows what we are really looking for.

This is of course why Deep Learning is deep.

This idea itself is not new. It was discussed by Oliver Selfridge in 1959; he described an idea called “Pandemonium” which was largely ignored by the AI community because of its radical departure from the logic based AI promoted by people like John McCarthy and Marvin Minsky. But Pandemonium presaged, by almost 60 years, the layer-by layer architecture with signals passing up and down that is used today in all Deep Neural Networks. This is the reason my online handle is @pandemonica

— * —

So do any TensorFlow operators support this Reduction?

Let’s start by examining the Pooling operators; there are a few in the diagram. They are conceptually simple. There are over 50 pooling operators in TensorFlow. There is an operator named “2x2 Max-Pool Operator”. In this diagram, it is used four times:

http://bakeaselfdrivingcar.blogspot.com/2017/08/project-2-traffic-sign-recognition-with.html

It is given four inputs with varying values and propagates the highest value of those as its only output. Close to the input layer these four values may be four adjacent pixels where their values might be a brightness in some color channel, but higher up they mean whatever they mean. In effect, the Max-Pool 2x2 discards the “least important” 75% of its input data, preserving and propagating only one (highest) value.

In the case of pixels, it might mean the brightest color value. In the case of blades of grass, it might mean “there is at least one blade of grass here”. The interpretation of what is discarded depends on the layer, because in a very real sense, layers represent levels of Reduction; Abstraction levels, if you prefer that term.

And we should now be clearly seeing one of the most important ideas in Deep Neural Networks: Reduction has to be done at multiple levels of abstraction. Each set of decisions about what is reduced away as irrelevant and what is kept as possibly relevant can only be made at an appropriate abstraction level. We cannot yet abstract away the lawn if all we know is there are dark-and-light-green-areas levels.

This is a simplification; decisions made in this manner will be heeded only if they have contributed to positive outcomes in learning. Unreliable and useless decision makers will be ignored using any of several mechanisms that we may apply during learning. More later.

For now, we continue by examining the most popular subset of all TensorFlow operators – the Convolution family. From the TensorFlow Manual:

Note that although these ops are called “convolution”, they are strictly speaking “cross-correlation”

Convolution layers discover cross-correlations and co-occurrences of various kinds. Co-occurrences to known patterns in the image at various locations; spatial relationships within an image itself, like Geoff Hinton’s recent example of the mouth normally being found below the nose; and more obviously, in the supervised learning case, correlations between discovered patterns and the available meta-information (tags, labels) that correlate with the patterns the system may discover. This is what allows an an Image Understander to tag the occurrence of a nose in an image with the text string “nose”. Beyond this, such systems may learn to Understand concepts like “behind” and “under”.

The information that is propagated to the higher levels in the network now describes these correlations. Uncorrelated information is viewed as non-salient and is discarded. In the diagram above, this discarding is done by a max pooling layer after the convolution+ReLU layers. ReLU is a kind of layer operator that discards negative values, introducing a non-linearity that is important for DL but not really important for our analysis.

This pattern of three layers — convolution, then ReLU, then a pooling layer — is quite popular because this combination is performing one reliable Reduction step. These three layer types in this “packaged” sequence may appear many times in a DL computational graph. And each of these three-layer packages is reducing away things that levels below had no chance of evaluating for saliency because they didn’t “Understand” their input at the correct level.

Again,

This is why Deep Learning is deep… Because you can only do Reduction by discarding the irrelevant if you Understand what is relevant and irrelevant at each different level of Abstraction.

Is Deep Learning Science or not?

While the Deep Learning process can be described using mathematical notation, mostly using Linear Algebra, the process itself isn’t Scientific. We can not explain how this system is capable of forming any kind of Understanding by just staring at these equations, since Understanding is an emergent effect of repeated Reductions over many layers.

Consider the Convolution operators. As the TF manual quote clearly states, Convolution layers discover correlations. Many blades of grass together typically means a lawn. In TF, a lot of cycles are spent on discovering these correlations. Once found, the correlation leads to some adjustment of some weight to make the correct Reduction more likely to be re-discovered the next round (because this Reduction is done multiple times) but in essence, all correlations are forgotten and have to be re-discovered in every pass through the Deep Learning loop of upward signaling and downward gradient descent with minute adjustments to erring variables. The system is in effect learning from its mistakes, which is a good sign, since that may well be the only way to learn anything. At least at these levels. This up-and-down may be repeated many times for each image in the learning set.

This up-and-down makes some sense for image Understanding. Some are using the same algorithms for text. Fortunately, in the text case, there are very efficient alternatives to this ridiculously expensive algorithm. For starters, we can represent the discovered correlations explicitly, using regular “pointers” or “object references” in our programming languages. Or “synapses” in brains. “This (software) neuron correlates with that software neuron” says a synapse or reference connecting this to that. We shall discuss such systems in the Organic Learning series of blog entries; coming up next.

Neither the Deep Learning family of algorithms, or Organic Learning, are Scientific in any meaningful way. They jump to conclusions on scant evidence and trust correlations without insisting on provable causality; this is disallowed in scientific theory, where absolutely reliable causality is the coin of the realm. F=m*a or go home. Most Deep Neural Network programming is uncomfortably close to trial and error, with only minor clues about how to improve the system when reaching mediocre results. “Adding more layers doesn’t always help”. These kinds of problems are the everyday reality to most practitioners of Deep Neural Networks. With no a priori Models, there will be no a priori guarantees. The best estimate of the reliability and correctness of any Deep Neural Network, or even any Holistic system we can ever devise, is going to be extensive testing. More on this later, in future blogs.

Why would we ever use engineered systems that cannot be guaranteed to provide the correct answer? Because we have no choice. We only use Holistic methods when the reliable Reductionist methods are unavailable. As is the case when the task requires the ability to perform Autonomous Reduction of context-rich slices of our rich complex reality as a whole. When the task requires Understanding.

Don’t we have an alternative to these unreliable machines? Sure we do. There are billions of humans on the planet that are already masters of this complex task. Because they live in the rich world and need skills that are unavailable with Reductionist methods, starting with low level things like object permanence. So you can replace a well-performing but theoretically unproven contraption – a Holistic Understanding Machine built out of Deep Neural Networks – with a well-performing human being using a deeply mystical kind of Understanding hidden in their opaque heads. Who earns much more per hour. This doesn’t look like much of an improvement. The machine cannot be proven correct because it doesn’t function like normal computers. It is performing Reduction, the skill formerly restricted to animals.

A Holistic skill.

My favorite soundbite is a mere corollary to the Frame Problem by McCarthy and Hayes; you have seen it and you will see it again, since it is one of the stronger results of AI Epistemology. But we will, in but a few years, agree on a definition of intelligence that makes autonomous Reduction a requirement. This once semi-heretic soundbite will then be obvious to all. If it isn’t already.

All intelligences are fallible

6. Experimental Epistemology for AI

Monica Anderson — Sat, 02 Jul 2022 17:12:41 GMT

Experimental epistemology is the use of the experimental methods of the cognitive sciences to shed light on debates within epistemology, the philosophical study of knowledge and rationally justified belief. Some skeptics contend that ‘experimental epistemology’ (or ‘experimental philosophy’ more generally) is an oxymoron. If you are doing experiments, they say, you are not doing philosophy. You are doing psychology or some other scientific activity. It is true that the part of experimental philosophy that is devoted to carrying out experiments and performing statistical analyses on the data obtained is primarily a scientific rather than a philosophical activity. However, because the experiments are designed to shed light on debates within philosophy, the experiments themselves grow out of mainstream philosophical debate and their results are injected back into the debate, with an eye to moving the debate forward. This part of experimental philosophy is indeed philosophy—not philosophy as usual perhaps, but philosophy nonetheless.
Experimental Epistemology by James R. Beebe

Traditional Experimental Epistemology conducted experiments on interviews and psychological tests on human volunteers or relied on population statistics.

As one of the newer branches of Cognitive Science, Machine Learning has now provided us with a very different approach to this domain. We can now create computer based experimental implementations to Epistemology-level theories in order to test them and learn from the outcomes.

In Machine Learning, the most important Epistemology level concepts and hypotheses are about Reasoning, Understanding, Learning, Epistemic Reduction, Abstraction, Creativity, Prediction, Attention, Instincts, Intuitions, Concepts, Saliency, Models, Reductionism, Holism, and other things all sharing these features:

Science has no equations, formulas, or other Models for how they work. They are Epistemology level concepts, not Science level concepts.
Our theories about these concepts have to be sufficiently solid and detailed to allow for computer implementations.

This is because Science itself is built on top of Epistemology level concepts. And practitioners need to be aware of this or they will experience cognitive dissonance induced confusion and stress.

The Red Pill of Machine Learning confronts the Elephant in the Room of Machine Learning: Machine Learning is not Scientific.

What Can We Learn from AI Epistemology?

An excerpt from The Red Pill:

Consider the below statements from the domain of Epistemology, and how each of them can be viewed as an implementation hint for AI designers. We are already able to measure their effects on system competence.

"You can only learn that which you already almost know" -- Patrick Winston, MIT

"All intelligences are fallible" -- Monica Anderson

"In order to detect that something is new you need to recognize everything old" -- Monica Anderson

"You cannot Reason about that which you do not Understand" -- Monica Anderson

"You are known by the company you keep" -- Simple version of Yoneda Lemma from Category Theory and the justification for embeddings in Deep Learning

"All useful novelty in the universe is due to processes of variation and selection" -- The Selectionist manifesto. Selectionism is the generalization of Darwinism. This is why Genetic Algorithms work.

Science "has no equations" for concepts like Understanding, Reasoning, Learning, Abstraction, or Modeling since they are all Epistemology level concepts. We cannot even start using Science until we have decided what Model to use. We must use our experience to perform Epistemic Reductions, discarding the irrelevant, starting from a messy real world problem situation until we are left with a scientific Model we can use, such as an equation. The focus in AI research should be on exactly how we can get our machines to perform this pre-scientific Epistemic Reduction by themselves.

And the answer to that can not be found inside of science.

7. The Red Pill of Machine Learning

Monica Anderson — Fri, 01 Jul 2022 04:06:02 GMT

The Deep Learning revolution of 2012 changed how we think about Artificial Intelligence, Machine Learning, and Deep Neural Networks. What changed, and what does this mean going forward?

The new cognitive capabilities in our machines are the result of a shift in the way we think about problem solving. It is the most significant change ever in Artificial Intelligence (AI), if not in science as a whole. Machine Learning (ML) based systems are successfully attacking both simple and complex problems using novel methods that only became available after 2012.

We are experiencing a revolution at the level of Epistemology which will affect much more than just the field of Machine Learning. We want to add more of these novel methods to our standard problem solving toolkit, but we need to understand the tradeoffs and the conflict.

I argue that understanding Deep Neural Networks (DNNs) and other ML technologies requires that practitioners adopt a Holistic Stance which is (at important levels) blatantly incompatible with the Reductionist Stance of modern science. As ML practitioners we have to make hard choices that seemingly contradict many of our core scientific convictions. As a result we may get the feeling something is wrong. The conflict is real and important and the seemingly counter-intuitive choices make sense only when viewed in the light of Epistemology. Improved clarity in these matters should alleviate the cognitive dissonance experienced by some ML practitioners and should accelerate progress in these fields.

The title refers to the eye-opening clarity some Machine Learning practitioners achieve when adopting a Holistic Stance.

Parallel Dichotomies

Syntience Inc researches Natural Language Understanding (NLU, in bottom row in table below). We are creating novel systems that allow computers to learn to Understand human natural languages. Any one of them. We use Deep Neural Networks of our own design. The goal is to achieve some kind of human-like but not necessarily human-level Understanding^[1]. This is very different from traditional Natural Language Processing (NLP) which relies on human made Models of some language, such as English, and perhaps Models of fragments of the world. The NLP and NLU disciplines have chosen opposite answers to their difficult two-way choices. They are now defined by these choices, and we can use their stances to highlight the main conflict.

The split is so deep that it cuts through many layers of our reality; the dichotomies listed below are all manifestations of this incompatibility at different levels, listed by impact, but discussed out of order below.

Domain	Science	The Complex, including the Mundane
Epistemology	Reductionism	Holism
Brains	Reasoning	Understanding
Problem Solving	Plan it, then do it	Just do it
Artificial Intelligence	20th Century GOFAI	Machine Learning, Deep Neural Networks
Natural Language In Computers	NLP	NLU

The Problem Solving level provides many familiar examples of these issues. In our mundane lives, we solve many kinds of problems every day but our strategies for solving them fall into just those two categories. For any complicated problem, we had better have a plan before we start. But most problems the brain deals with every day are things we never have to think about. Because we do not need to plan or Reason about them. These are the millions of low-level problems we encounter in our mundane life every day; and this is the world that our AIs will have to operate in.

Consider someone walking across the floor. Their brain signals their leg muscles to contract in the correct cadence. Do they need to consciously plan each step? Do they Reason about how to maintain their balance? No; they probably don't even know what leg muscles they have.

Consider understanding this sentence. Did you use Reasoning? Did you use grammar? If you are a fluent speaker, you do not need grammars to understand or produce language. And you do not have time to Reason about language while hearing it spoken. Reasoning is slow, but Understanding is instantaneous.

Consider someone braking for a stoplight. How hard should they push on the brake pedal? Do they compute the required differential equation? Should such equations be part of the drivers license tests?

Consider someone making breakfast. Did they have to Reason about anything or plan anything, or did they just do what worked yesterday, "without thinking about it"? Without consciously planning it?

Walking and talking, braking and breakfasting, like almost everything we use our brains for, rely on learning from our experiences in order to re-use anything that has worked in the past. And, over time, we learn to correct our mistakes. These strategies are simple enough that we can identify them in other life forms. Dogs Understand a lot but do not Reason much. And we can see how they could be implemented in something like Neurons in brains.

The split in our brains between Reasoning and Understanding was examined at length in "Thinking fast and slow" by Daniel Kahneman. The absolute majority of the brain's efforts is spent processing low level sensory input, mostly from the eyes. He calls this "System 1"; it provides Understanding. Reasoning is done by "System 2" based on the Understanding from System 1. But most problems we deal with on a daily basis do not require System 2 at all.

Artificial Intelligence and Machine Learning

Computers can solve any suitable problem when given sufficient human help, such as a complete plan for the solution in the form of a computer program and valid input data.

But since the AI/ML revolution of 2012 we now know how to make computers Understand certain problem domains through Machine Learning. The acquired Understanding allows the machine to "just do it" for many different problems in that domain, without any human planning, Reasoning, or programming, and using incomplete, unreliable, and noisy input data.

This is changing how we are building systems with cognitive capabilities. Everyone working in ML or AI needs to understand the tradeoffs we must make at the most fundamental -- Epistemological -- levels. Modern ML requires examining and seriously re-thinking many things we were taught to vigilantly strive for in our Science, Technology, Engineering, and Mathematics (STEM) educations. Things like "Correlation is bad, but causality is good" and "Do not jump to conclusions on scant evidence" are still solid advice everywhere inside science. But when building Understanding systems, these established strategies and modes of thinking no longer work, because correlation discovery and handling of sparse, unreliable, and inconsistent input data are exactly the kinds of tasks we will have to perform, and perform well, at these pre-scientific levels.

In order to understand how to do this, we must switch to a Holistic Stance.

A Motivating Example

Beginning Machine Learning students are given exercises like this: They are given a large spreadsheet which lists data about houses sold a certain year in the US. This information includes among other things the zip code of the house, the living area in square feet, lot size, the number of bedrooms and bathrooms, the year the house was built, and the final sale price of the house.

We would like to be able to predict this final sale price, given the corresponding data for a current house we are about to list for sale. The given spreadsheet is the data the student will use to train a Deep Neural Network. It is the entire learning corpus; it contains everything the system will ever know.

These students can download Deep Neural Network libraries like "Keras" and "TensorFlow", and runnable examples for many kinds of problems , including useful training data, from places like HuggingFace and GitHub. Next the student trains ("learns") their network using the given data. This may take a while. But when learning finishes, they can give the system data for a house it has never seen and it will quite reliably predict what the house might sell for.

This was the goal of the exercise. The student has created a system that Understands how to estimate real estate prices from listings. But the student still does not Understand anything about real estate. The predictive capability (that many people working in real estate would be willing to pay money for) is 100% based on Understanding in the Deep Neural Network – In the computer. And because all libraries and many pre-solved examples of this nature were freely available, the student did not have to do much programming either.

The Vision

This is desirable. This is what AI should mean. The computer Understands the problem so that we don't have to. Programming in the future will be like having a conversation with a competent co-worker, and when the machine Understands exactly what we want done, it will simply do it. No programming required on our part, or on part of the machine, once a suitable (partially Reductionist) framework exists. The rest is learning, and it can be done in any human language with equal ease.

We are on the right track towards something worthy of the name "AI" with current Machine Learning. Going forward, there are thousands of paths to choose from, and the ability to choose wisely will depend on our ability to understand and adopt a Holistic Stance.

Reductionism and Holism

These are important terms of the art in Epistemology. Both of them have numerous correct, useful, and compatible definitions. We will henceforth use the following definitions for reasons of usefulness and simplicity:

Reductionism is the use of Models
Holism is the avoidance of Models

Models are scientific models, theories, hypotheses, formulas, equations, naïve models based on personal experiences, superstitions (!), and traditional computer programs. In a Reductionist paradigm, these Models are created by humans, ostensibly by scientists, and are then used, ostensibly by engineers, to solve real-world problems. Model creation and Model use both require that these humans Understand the problem domain, the problem at hand, the previously known shared Models available, and how to design and use Models. A Ph.D. degree could be seen as a formal license to create new Models^[2]. Mathematics can be seen as a discipline for Model manipulation.

But now -- by avoiding the use of human made Models and switching to Holistic Methods -- data scientists, programmers, and others do not themselves have to Understand the problems they are given. They are no longer asked to provide a computer program or to otherwise solve a problem in a traditional Reductionist or scientific way. Holistic Systems like DNNs can provide solutions to many problems by first learning about the domain from data and solved examples, and then, in production, to match new situations to this gathered experience. These matches are guesses, but with sufficient learning the results can be highly reliable.

We will initially use computer-based Holistic Methods to solve individual and specific problems, such as self-driving cars. Over time, increasing numbers of Artificial Understanders will be able to provide immediate answers -- guesses -- to wider and wider ranges of problems. We can expect to see cellphone apps with such good command of language that it feels like talking to a competent co-worker. Voice will become the preferred way to interact with our personal AIs.

Early and low-level but useful AI will manifest as computers that can solve problems we ourselves cannot (or cannot be bothered to) solve. They need not be superhuman; all they need to have in order to be extremely useful is exactly the ability to autonomously discover higher level abstractions in some given problem domain, starting from low level sensory input, e.g. by learning from images or reading books. Such systems now exist.

If we want to understand Machine Learning, then we need to understand all strategies in the rightmost column in all the tables below. They are all part of a Holistic Stance and if we are working in Machine Learning, we need to adopt as many of them as possible.

Differences at the Level of Epistemology

	Reductionism / Science	Holism / Machine Learning
1.1	The use of Models	The avoidance of Models
1.2	Reasoning	Understanding
1.3	Requires Human Understanding	Provides Human-like Understanding
1.4	Problems are solved in an abstract Model space	Problems are solved directly in the problem domain
1.5	Unbeatable strategy for dealing with a wide range of suitable problems faced by humans	May handle some problems in domains where Reductionist Models cannot be created or used, known as "Bizarre Domains"
1.6	Handles many important complicated problems such as going to the Moon or a highway system	Handles many important complex problems such as protein folding and playing Go.
1.7	Handles problems requiring planning or cooperation	Handles simple mundane problems such as Understanding language or vision, or making breakfast

Many rows in these tables discuss hard tradeoffs where compromises are impossible or prohibitively expensive; these are identified by boldface numbers in the first column. Remaining rows may not be clear tradeoffs or even disjoint alternatives. Mixed systems are described in a separate chapter.

1.1, 1.2, 1.3: These form the core of these dichotomies and are discussed in most of what follows but also in detail at the AI Epistemology introductory chapter and in videos of talks.

1.4: A weather report is based on Models in meteorology. To solve the problem directly in the problem domain, open a window to check if it smells like rain.

1.5 is also discussed in table 3 below and in the Bizarre Systems video. A summary:

Reductionism is the greatest invention our species has ever made. But Reductionist Models cannot be created or used when any one of a multitude of blocking issues are present. Models work "in theory" or "in a laboratory" where we can isolate a device, organism, or phenomenon from a changing environment. However, complex situations may involve tracking and responding to a large number of conflicting and unreliable signals from a constantly changing world or environment. Reductionism is here at a severe disadvantage and can rarely perform above the level of statistical Models.

In contrast, Holistic Machine Learning Methods learning from unfiltered inputs can discover correlations that humans might miss, and can construct internal pattern-based structures to provide recognition, Epistemic Reduction, abstraction, prediction, noise rejection, and other cognitive capabilities.

1.6, 1.7 : Humans generally use Holistic Methods for seemingly simple (but in reality, complex) mundane problems, like Understanding vision, human language, learning to walk, or making breakfast. Computers use them for very complex problems (ML based AI in general, such as protein folding and playing go) but also simpler ones, such as Real Estate pricing.

Main Tradeoffs

	Reductionism / Science	Holism / Machine Learning
2.1	Optimality -- the best answer	Economy -- re-use known useful answers
2.2	Completeness -- all answers	Promptness -- accept first useful answer
2.3	Repeatability -- same answer every time	Learning -- results improve with practice
2.4	Extrapolation -- In low-dimensionality domains	Interpolation -- even in high-dimensionality domains
2.5	Transparency -- Understand the process to get the answer	Intuition -- accept useful answers even if achieved by unknown or "subconscious" means
2.6	Explainability -- Understand the answer	Positive Ignorance -- no need to even understand the problem or problem domain
2.7	Share-ability -- abstract Models are taught and communicated using language or software	Copy-ability -- ML Understanding (a Competence) can be copied as a memory image

2.1, 2.2, 2.3: Optimality, Completeness, and Repeatability are only available in theoretical Model spaces and sometimes under laboratory conditions. Economy and Promptness had much higher survival value in evolutionary history than Optimality and Completeness. See also 3.1 and 3.2 below.

2.3: The strongest hint that a system is Holistic is that the results improve with practice because the system learns from its mistakes. In Machine Learning, a larger learning corpus is in general better than a smaller one because it provides more opportunities for making mistakes to learn from, such as corner cases.

2.4: Models created by humans have manageable numbers of parameters because the scientist or engineer working on the problem has done a (hopefully correct) Epistemic Reduction from a complex and messy world to a computable Model. This allows experimentation with what-if scenarios by varying Model parameters. It is up to the Model user to determine which extrapolations are reasonable.

In Holistic ML systems we are getting used to systems with millions or billions of parameters. These structures are very difficult to analyze, and just like with human intelligences, the best way to estimate their competence is through testing. Extrapolation is typically out of scope for Holistic systems.

2.5: The majority of end users will have no interest in how some machine came up with some obviously correct answer; they will just accept it, the way we accept our own Understanding of language, even though we do not know how we do it.

2.6: We now find ourselves asking our machines to solve problems we either don't know how to solve, or can't be bothered to figure out how to solve.

We have reached a major benefit of AI -- we can be positively ignorant of many mundane things and will be happy to delegate such matters to our machines so that we may play, or focus on more important things.

Some schools of thought tend to overvalue explainability. To them, ML is a serious step down from results obtained scientifically where we can all inspect the causality – for instance in Reductionist production ("expert") systems.

But the bottom line is that today we can often choose between

Understanding the problem domain, problem, the use of science and relevant Models, and the answer.
Or just getting a useful answer without even bothering to Understand the problem or the problem domain.

The latter (Positive Ignorance) is a lot closer to AI than the first, and we can expect the use of Holistic Methods to continue to increase.

2.7 : Science strives towards a consensus world Model in order to facilitate communication and minimize costly engineering mistakes caused by ignorance and misunderstandings. Scientific communication requires a high-level context – a World Model – shared by participants, and agreed-upon signals such as words, math, and software.

But direct Understanding, such as the skills to become a chess grandmaster or a downhill skier, cannot be shared using words -- the experience must be acquired using individual practice. Computer based systems that learn a skill through practice can share the entire Understanding so acquired by copying the memory content to another machine.

Advantages of Holistic Methods

	Reductionism / Science	Holism / Machine Learning
3.1	NP-Hard problems cannot be solved	Finds valid solutions by guessing well based on a lifetime of experience
3.2	GIGO -- Garbage In, Garbage Out is a recognized problem	Copes with missing, erroneous, and misleading inputs
3.3	Brittleness -- Experience catastrophic failures at edges of competence	Antifragile; learns from mistakes, especially almost-correct guesses and small, correctable failures
3.4	Models of a constantly changing world are obsolete the moment they are created	Incremental learning provides continuous adaptation to a constantly changing world
3.5	Algorithms may be incorrect or may be incorrectly implemented	Self-repairing systems can tolerate or correct internal errors

3.1: It is because we desire certainty, optimality, completeness, etc. that NP-Hardness becomes a problem. There are many problems where it is relatively easy to find a provably valid solution but where finding all solutions can be very expensive. Real world traveling salesmen merrily travel along reasonable routes.

3.2: If a Reductionist system does not have complete and correct input data, it either cannot get started or produces questionable output. But it is an important requirement of real-world Understanding Machines that they be able to detect what is salient (important) in their input in order to avoid paying attention to -- and learning from -- noise. And they have to deal with incomplete, erroneous, and misleading input generated by millions of other intelligent agents with goals at odds with their own. They need to be able to detect omissions, duplications, errors, noise, lies, etc and the only Epistemologically plausible way to do this is to relate the input to similar input they have understood in the past -- what they already know. They need to understand what matters but if they can also Understand some of the noise ("This is advertising"), they can exploit that.

There are many image and video apps available featuring Image Understanding based on Deep Learning. These apps can remove backgrounds, sharpen details like eyelashes, restore damaged photographs, etc. We need to keep in mind that the ability of Holistic systems to fill in data and detect noise depends on them having learned from similar data in the past. We note that all the image improvements are confabulations based on prior experience from their learning corpora. But we can also note that image composition using these methods yields totally seamless images, very far from cut-and-paste of pixels.

And quite similarly, we find language confabulation by systems like GPT-3 to flow seamlessly between sentences and topics. They have nothing to say, but they say it well. However, they bring us closer to meaningful language generation and when we achieve that, the public perception of what computers are capable of will totally change.

3.3: Most of Cognition is Re-Cognition. Being able to recognize that something has occurred before and knowing what might happen next has enormous survival value for any animal species. A mature human has used their eyes and other senses for decades; this represents an enormous learning corpus and they can Understand anything they have prior experience of. The mistakes made by humans, animals and by Holistic ML systems are very often of a "near miss" variety which provides an opportunity to learn to do it better next time.

Contrast this to Reductionist software systems created for similar goals. Rule based systems have long been infamous for their Brittleness. As long as the rules in the ruleset match the current input and reality perfectly, the results will be useful, repeatable and reliable. But at the edges of their competence, where the matches become more tenuous, the quality rapidly drops. Minor mistakes in the rule sets (in the World Modeling) may lead such systems to return spectacularly incorrect results.

3.4: Sometimes repeatability is important, and sometimes tracking a changing world by continuously learning more about it is important. In ML, Continuous Incremental Learning makes it possible to stay up-to-date. If we want repeatability, we can emit a "condensed, cleaned, and frozen Competence file" from a learner that can be loaded into non-learning (read-only) cloud-based Understanding Machines that serve the world and provide repeatability between scheduled software and competence releases.^[3]

In the case of Reductionist systems, such as cell phone OS releases, we are used to getting well tested new versions with minor bug fixes and occasional major features at regular intervals. Such systems "learn" only in the sense that the people who created them have learned more and put these insights into the new release.

3.5: Reductionist systems working with complete and correct input data are expected to provide correct and repeatable results according to the implementation of the algorithm. But both the algorithm and the implementation may have errors. If the algorithm does not adequately Model its reality, then we have Reduction Errors. In the implementation, we may have bugs.

Holistic software systems can be designed to a different standard of correctness. Since input data is normally incomplete and noisy, and results are based on emergent effects, we can expect similar enough results even if parts of the system have been damaged, for instance by Catastrophic Forgetting. Holistic systems can be made capable of self-repair using incremental learning. This has been observed in the Deep Learning community.

Another technique is that when using multiple parallel threads in learning, there may be conflicts that would normally require locking of some values. But if the operations are simple enough, such as just incrementing a value, we can forego thread safety and locking since the worst outcome is the loss of a single increment in a system that uses emergent results from millions of such values, and the mistake would (in a well designed system) be self-correcting in the long run. At the cloud level, absolute consistency may not be as hard a requirement as it is for Reductionist systems. Much larger mistakes can be expected to be attributable to misunderstandings of the corpus or poor corpus coverage.

General Strategies

	Reductionism / Science	Holism / Machine Learning
4.1	Decomposition into smaller problems	Generalization may lead to an easier problem
4.2	Human discards everything irrelevant based on how new information matches existing experience	Machine discards everything irrelevant based on how new information matches existing experience
4.3	Modularity	Composability
4.4	Gather valid, correct, and complete input data	Use whatever information is available, and use all of it
4.5	Formal, Rigorous Methods	Informal Ad-hoc Methods
4.6	Absolute control	Creativity
4.7	Intelligent Design	Evolution

4.1: The Reductionist battle cry is "The whole is equal to the sum of its parts" which gives us a license to split a large complicated problem into smaller problems, to solve each of those using some suitable Model, and then to combine all the sub-solutions into a Model based solution for the original, larger, problem, such as in moon shots, highway systems, international banking, and generally in industrial intelligent design.

This works in simple and some complicated domains, but cannot be done in complex domains, where everything potentially affects everything else. Splitting a complex system may cause any emergent effects to disappear, confounding analysis. Examples of complex problem domains are politics, neuroscience, ecology, economy (including stock markets), and cellular biology. All life sciences operate in a complex problem domain because life itself is complex. Some say "Biology has Physics envy" because in the life sciences, Reductionist Models are difficult to create and justify. OTOH, "Physics is for simple problems."

Problems with many complex interdependencies and unknown webs of causality can now be attacked using Deep Neural Networks. These systems discover useful correlations and may often find solutions using mere hints in the input which match their prior experience; Reductionist strategies with correctness requirements outlaw this. It is notable that one of the larger triumphs of Holistic Methods is Protein Folding, which is a problem at the very core of the life sciences.

So Holistic Understanding of a complex system can be acquired by observing it over time and learning from its behavior. There is no need to split the problem into pieces. Part of the Holistic Stance is that we give the machine everything -- "holism" comes from greek όλοσ -- "the whole" -- i.e. all the information we have. If we start filtering the input data -- "cleaning it up" -- then the system will effectively learn from a Pollyanna version of the world, which will be confusing once it has to deal with real-life inputs in a production environment. If we want our machines to learn to Understand the world all by themselves, then we should not start by applying heavy-handed heuristic cleanup operations of our own design on their input data.

Sometimes, Reductionist strategies are clearly inferior. Natural Language Understanding is such a domain. Language Understanding in a fluent speaker is almost 100% Holistic because it is almost entirely based on prior exposure. We are now finding out that it is much easier to build a machine to learn any language on the planet from scratch than it is to build a "Good Old Fashioned Artificial Intelligence" (GOFAI) 20th Century Reductionist AI based style machine that Understands a single language such as English.

4.2: The process where a human, by using their Understanding, discards everything irrelevant to arrive at what matters is called "Epistemic Reduction" and is discussed in the first five posts on this website. This is the most important operation in Reductionism, but for some reason discussions of Reductionism in the past have tended to focus on other aspects; perhaps this is a new result. ML systems discard with little fanfare anything that was expected and that has been seen before as boring, harmless, or otherwise ignorable. They may also discard things significantly out of their experience as noise.

Things can only be reduced away at the semantic level they can be recognized at. Operations capable of Epistemic Reduction at multiple layers discard anything that's understood at that layer, and they may pass on "upward" to the next higher semantic layer a summary of what they discarded plus everything they did not Understand at their level. And higher levels do the same.

This is why Deep Learning is deep.

4.3: Intelligently designed systems are often made up out of interchangeable Modules, which allow for easy replacement in case of failure, and in some cases (and especially in software) allow for customization of functionality by replacing or adding modules. These modules have well specified interfaces that allow for such interconnections.

In the Holistic case we can consider a human cell where thousands of proteins interact on contact or as required, with many substances floating around in the cellular fluid. It is not the result of intelligent design, and it shows. There are overlaps and redundancies that may contribute to more reliable operation and there are multiple (potentially complex) mechanisms keeping each other in check.

Or we can consider music, where multiple notes in a chord and different timbres in a symphony orchestra in a composition will conjure an emergent harmonic whole that sounds different than the sum of its parts. Or consider spices in a soup or opinions in a meeting that leads to a concensus.

The word "Composability" fits this capability in the Holistic case. Unfortunately, in much literature it is merely used as a synonym for its Reductionist counterpart "Modularity".

4.4: As discussed in GIGO case in 3.2 above, Holistic ML systems can fill in missing details starting from very scant evidence -- cf. confabulations of systems like GPT-3 and image enhancement apps. They supply the missing details by jumping to conclusions based on few clues and lots of experience.

Since we are not omniscient and don't even know what is happening behind our backs, scant evidence is all we will ever have, but it is amazing how effective scant evidence can be in a familiar context. We can drive a car through fog or find an alarm clock in absolute darkness. The more the system has learned, the less input is needed to arrive at a reasonable identification of the problem and hence retrieve a previously discovered working solution.

4.5: Formal methods and experimental rigorousness make for good science.

OTOH, Holistic methods can follow tenuous threads, hoping for stronger threads towards some solution, with little effort spent on backtracking or documentation because once a solution is found, it is the only thing that matters. Tracking has little value in non-repeating situations or when using Holistic Methods at massive scales, such as in Deep Learning.

4.6: Absolute control requires that we know exactly what the problems and solutions are and all we need to do is implement them. Once deployed, systems frozen in this manner, which are exactly implementing the Models of their creators, cannot improve by learning since there is no room for variation in the existing process, and hence no experimentation, and no way to discover further improvements. Only Holistic systems can provide creativity and useful Novelty. We also observe that learning itself is a creative act since it must fit new information into an existing network of prior experience.

4.7: Just like the term "Holism" has been abused, so has "Intelligent Design", which is a perfectly reasonable term for a Reductionist industrial end-to-end practice that consistently provides excellent results. On the Holistic side, Evolution in nature has created wonderful solutions to all kinds of problems that plants and animals need to handle.

But we can put Evolution (also known in the general sense as Selectionism) to work for us in our Holistic machines. They can create new wonderful designs with a biological flavor to them that sometimes (depending on the problem) can outperform intelligently designed alternatives.

Evolution is the most Holistic phenomenon we know. No goal functions, no Models, no equations. Evolution is not a scientific theory. Science cannot contain it; it must be discussed in Epistemology.

Mixed Systems

Deep Neural Networks can perform autonomous Epistemic Reduction to find high level representations for low level input, such as pixels in an image or characters in text. Current vision Understanding systems can reliably identify thousands of different kinds of objects from many different angles in a variety of lighting conditions and weather. They can classify what they see but do not necessarily Understand much beyond that such as expected behaviors of other intentional agents like cars, pedestrians, or cats.

Therefore, at the moment in 2022, most deployments of Machine Learning use a mixture of Reductionist and Holistic methods -- equations and formulas devised by humans implemented as computer code and some inputs from a Deep Neural Network solving a subproblem that requires it, such as vision Understanding.

Self-driving cars use DNNs for Understanding vision, radar, and lidar images (discovering high level information like "a pedestrian on the side of the road" from pixel based images) and this Understanding has (until recently) been fed to logic and rule based programs that implement the decision making ("Avoid driving into anything, period") that is used to control the car.

The trend here^[4] is to move more and more responsibilities into the Deep Neural Network, and over time to remove the hand coded parts. In essence, the network learns not only to see, but learns to Understand traffic. We are delegating more and more of our "Understanding of how to drive" to the vehicle itself.

This is desirable.

Experimental Epistemology

"Epistemology is the theory of knowledge. It is concerned with the mind's relation to reality."

This includes artificial minds. An introduction to Epistemology should benefit anyone working in the AI/ML field.

Scientific statements look like "F=ma" (Newton's second law) or E=mc² (Einstein's famous equation) and can all be proven and/or derived from other accepted results, or verified experimentally.

Algebra is built on lemmas that are not part of Algebra; they cannot be proven inside of Algebra.

Similarly, Epistemological statements are not provable in science because science is built on top of Epistemology. But when science is not helping, such as in Bizarre Domains, then setting scientific methodology aside and "dropping down to the level of Epistemology" sometimes works.

Epistemology is, just like Philosophy in general, an armchair thinking exercise and the results are judged on internal coherence and consistency with other accepted theory rather than by proofs or experiments.

However, the availability of Understanding Machines such as DNNs now suddenly provides the opportunity for actual experiments in Epistemology. Consider the below statements from the domain of Epistemology, and how each of them can be viewed as an implementation hint for AI designers. We are already able to measure their effects on system competence.

"You can only learn that which you already almost know" -- Patrick Winston, MIT
"All intelligences are fallible" -- Monica Anderson
"In order to detect that something is new you need to recognize everything old" -- Monica Anderson
"You cannot Reason about that which you do not Understand" -- Monica Anderson
"You are known by the company you keep" -- Simple version of Yoneda Lemma from Category Theory and the justification for embeddings in Deep Learning
"All useful novelty in the universe is due to processes of variation and selection" -- The Selectionist manifesto. Selectionism is the generalization of Darwinism. This is why Genetic Algorithms work.

Science "has no equations" for concepts like Understanding, Reasoning, Learning, Abstraction, or Modeling since they are all Epistemology level concepts. We cannot even start using science until we have decided what Model to use. We must use our experience to perform Epistemic Reductions, discarding the irrelevant, starting from a messy real world problem situation until we are left with a scientific Model we can use, such as an equation. The focus in AI research should be on exactly how we can get our machines to perform this pre-scientific Epistemic Reduction by themselves.

And the answer to that can not be found inside of science.

Artificial General Intelligence

Artificial General Intelligence (AGI) was a theoretical 20th Century Reductionist AI attempt to go beyond the "Narrow AI" of domain specific expert systems -- closer to the "General Intelligence" they thought humans had. The term was mostly used by independent researchers, amateurs and enthusiasts.

But the "AGI" term was not well enough defined and was not backed by sufficient theory to provide any AI implementation guidance and what little progress had been made by these groups was overtaken by Holistic Methods after 2012.

Today we know that the entire premise of 20th century Reductionist AGI was wrong:

Humans are not General Intelligences...

at birth.

Instead, we are General Learners, capable of learning almost any skill or knowledge required in a wide range of problem domains. If we want human-compatible cognitive systems, then we should build them in our image in this respect. To build machines that learn and jump to conclusions on scant evidence.

Decades ago, "AGI" implied a human-programmed Reductionist hand-coded program based on Logic and Reasoning that can solve any problem because the programmers anticipated it. To argue against claims that this was impossible, the AGI community came up with the promise or threat of self-improving AI^[5].

But the amount of code in our cognitive systems has shrunk from 6 million propositions in CYC around 1990 to 600 lines of code to play video games around 2017 to about 13 lines of Keras code in some research reports. And now there's AutoML and other efforts at eliminating all remaining programming from ML.

The problems are not in the code. There is almost no code left to improve in modern Machine Learning systems. All that matters is the corpus. We can now (after 2012) see that Machine Learning is an absolute requirement for anything worthy of the name "AI".

Which makes recursive self-improvement leading to evil superhuman omniscient logic-based godlike AGIs a 20th century Reductionist AI myth.

We must focus on AGL .

Afterword

Science was created to stop people from overrating correlations and jumping to erroneous conclusions on scant evidence and then sharing those conclusions with others, leading to compounded mistakes and much wasted effort.

Consequently, promoting a Holistic Stance has long been a career-ending move in academia, and especially in Computer Science. But now we suddenly have Machine Learning that performs cognitive tasks such as protein folding, playing Go, and estimating house prices at useful levels using exactly a Holistic Stance.

So now Science itself has a cognitive dissonance. This is a conflict about what Science is. Or should be.

Ignorance of these stances leads people to develop significant personal cognitive dissonances which is why discussions about these issues are very unpopular among people with solid STEM educations. But the dichotomy is real; we need to deal with it. Our choices so far seem to have been

claim the dichotomy doesn't exist (... but Schrödinger and Pirsig also discuss it)
claim that the Holistic Stance doesn't work (... but Deep Learning works)
claim that Reductionist methods are a requirement (... hobbling our toolkits for a principle)

The Reductionist Stance also makes it difficult to imagine and accept things like

Systems capable of autonomous Epistemic Reduction
Systems that do not have a goal function
Systems that improve with practice
Systems that exploit emergent effects
Systems that by themselves make decisions about what matters most
Systems that occasionally give a wrong answer but are nevertheless very useful

So after a serious education in Machine Learning we don't actually need to do almost any programming at all, and we don't need to understand anybody else's problem domains. Because we don't have to perform any Epistemic Reduction ourselves.

We should recognize this for what it is. AI was supposed to solve our problems for us so we would not have to learn or understand any new problem domains. To not have to think. And that's what we have today, in Machine Learning, and with Holistic Methods in general. Why are some people surprised or unhappy about this? In my opinion, this is AI, this is what we have been trying to accomplish for decades.

People who claim "Machine Understanding is not AI" are asking for human-level human-centric Reasoning and are, at their peril, blind to the nascent ML based Understanding we can achieve today.

With expected reasonable improvements in Machine Understanding capabilities, familiarity and acceptance of the Holistic Stance will become a requirement for ML/AI based work. It will likely take years for our educational system to adjust.

References

Tor Nørretranders: "The User Illusion" -- Best book available about AI Epistemology

William Calvin: "The Cerebral Symphony" -- Neural Darwinism

William Calvin: "How Brains Think"

Daniel Kahneman: "Thinking, Fast and Slow" -- Understanding vs. Reasoning

Erwin Schrödinger: "What is Life" -- Nature is Holistic and difficult to Model

Robert Rosen: "Essays on Life Itself" -- Modeling reality

Robert Sapolsky: "Human Behavioral Biology Ep. 21 and Ep. 22. and especially Ep. 23.

Douglas Hofstadter: "Gödel, Escher, Bach -- an Eternal Golden Braid" -- Reductionism and Holism

Daniel C. Dennett: "Consciousness Explained" -- Clear explanations of what matters in intelligence

Hon. J. C. Smuts: "Holism and Evolution"

Robert M. Pirsig: "Zen and the Art of Motorcycle Maintenance" Code: Classical = Reductionist, Romantic = Holistic

Monica Anderson: "Experimental Epistemology" (This site) Introduction to Experimental Epistemology.

Monica Anderson: "AI Epistemology and Understanding Machines" Videos of my talks

Common words with an uppercase first letter are used in an unchanging technical sense that is defined (explicitly or implicitly) in what follows or in the first page of the AI Epistemology introduction in Why AI Works] ↩︎
As a shorthand, we may call a person who prefers to adopt a Reductionist Stance "a Reductionist" and similarly for "a Holist". It doesn't imply they cannot switch their stance; it merely states a preference, and likely their first attempt at any new problem ↩︎
The goal of my research for Syntience Inc is general Natural Language Understanding and Generation that feels like holding a conversation with a competent co-worker. Details forthcoming on this site. Support, grants, funding and licensing requests welcomed. ↩︎
Publicly acknowledged by Tesla researchers on Tesla AI day and very likely the way forward for all autonomous vehicles ↩︎
Discussed in more detail and debunked in a blog post on this site ↩︎

8. Organic Learning

Monica Anderson — Mon, 25 Jul 2022 16:42:11 GMT

An Evolution Based Machine Learning Algorithm
for Natural Language Understanding.

Introduction

I have (along with other researchers at Syntience Inc) been researching Deep Neural Networks of my own design since Jan 1, 2001. We made a major breakthrough in summer of 2017 and are now seeking to productize a Cloud Microservice for for Natural Language Understanding called UM1.

This post provides an overview of the ML algorithm used, which is called Organic Learning. A separate and detailed description of UM1, including links to downloadable test code, is available in Chapter 9. Understanding Machine One. The test code is written in python, but the UM1 service can be called from any language.

An Understanding Machine is not an Artificial Intelligence; Intelligence requires both Understanding and Reasoning. Reasoning is simply not part of the design; Organic Learning only learns to Understand language, not to Reason about it.

And UM1 is, as the name implies, a pure Understanding-Only Machine, not even capable of learning. It is also safe; the Understanding it outputs is the total impact it can have on the world. Further, everything at the algorithm level is fixed; it is a read-only system and can not learn or improve between updated release versions, which is important for repeatability.

On the other hand, it is fully general in the “AGL” sense: It can (“at the factory” at Syntience Inc) learn any human language by just reading online books in that language. And it may well Understand many things beyond mere language. Future testing will determine how much of the World (as described in the books it reads) the system is capable of Understanding by reading. And it may come as a surprise to some that Understanding is what a brain spends over 99% of its cycles on. It is the central problem in AI, and we cannot move on to Reasoning until we get Understanding right. Because you cannot Reason about that which you do not Understand.

And as you will see in the next chapter, the API for UM1 is very easy to use. No linear algebra will be required. Some set theory would be useful, especially the Jaccard distance metric. Programmers capable of modifying a working python example should be able to add these capabilities to their apps no matter what programming language they are currently using.

Problem Statement

Traditional Natural Language Processing (NLP) methods can perform many filtering and classification tasks but these algorithms, being Reductionist and statistical, cannot cross the syntax-to-semantics gap that would provide actual language Understanding. Many NLP algorithms rely on statistical Models and – as all Reductionist strategies tend to do – discard too much context too early. Neural networks, in contrast, exploit such contexts.

Deep Learning (DL) can now reliably identify objects in an image understanding situation and have made inroads into language Understanding, including semantics. But DL really is a poor fit for natural languages. Some results are impressive, such as DL based language translation and the amusing confabulations spewed by GPT/3 . DL algorithms used are computationally expensive, which is justified for image understanding. But language is linear and correlation discovery is a simple lookup in a reasonably recent past (instead of a convolution in a 2D image space, or worse). This means much more effective correlation discovery and Epistemic Reduction algorithms are available for text than for images.

The “Generalization Gap”, or sometimes, the “Semantic Gap”, is the step from syntax to semantics, from text to understanding, and is the single major hurdle to computer based Understanding of languages, and therefore, to understanding of the world. Some people still doubt that Deep Learning can bridge this gap for languages. But I think that most kinds of DNN systems are capable of this and will provide bona fide human-like (but not necessarily human-level) language Understanding.

If some company already has an existing NLP (Reductionist, Language Model Based) solution to a problem, why would they consider switching to NLU? Because they may need to handle things that word and grammar based systems cannot really handle or even identify without much effort and expense, such as

Hypotheticals
Negations
Questions
Irony and satire
Pronouns and other co-references
Free form social noise like greetings
Non sequiturs
Rhetorical questions
Nested sentence structures
Unexpected out-of-domain words
Foreign language words and phrases
Colloquial, truncated or abbreviated phrases

Skills in these matters can now be learned much easier than they could be analyzed and then programmed. And to do it in more languages, just learn more of them.

Deep Neural Networks Family Tree

Definitions

A Deep Discrete Neuron Network (DDNN) is a Deep Neural Network (DNN) that uses software simulations of discrete neurons interconnected with explicit neuron-to-neuron synapse-like links. In contrast, Neural Networks used in Deep Learning use linear algebra style operations on arrays of floating point numbers. These representations are not isomorphic in spite of some claims to the opposite.

In what follows, I will use the term Deep Learning or “DL” to describe the family of hundreds of Deep Neural Network algorithms inspired by (and building on) the work of LeCun, Hinton, Bengio, Schmidhuber, Ng, Dean, et al. As the diagram above shows, both DL and DDNNs are DNNs.

DDNNs have been studied before; they are much more brain-like than DL based systems and therefore a plausible first attempt by researchers who approach DNNs from a neuroscience angle. But the resulting systems have so far not been able to compete with other methods, including DL, on useful tasks. These failures can be attributed exactly to the same reasons Deep Learning for image Understanding didn’t work until 2012: Our machines were too small and too weak, and our training corpora were too small. Some algorithmic and Epistemological confusions also prevented Epistemic Reduction from working in these early attempts.

DL was the first DNN that worked. It is not the only one. We can finally create and use Deep Discrete Neuron Networks, because our computers are now up to the task.

Advantages of Organic Learning

Organic Learning is well positioned to replace Deep Learning (DL) in natural language domains.

OL was explicitly designed for sequential data like text, voice, or DNA.
DL was designed to understand images and its use for text is a poor fit.
OL requires 5-6 orders of magnitude less computation and less energy than DL for NLU.
OL does not require (and cannot even benefit from) a GPU, T-Chip, or other special hardware. Conventional von Neumann architectures work best. Specifically, there is no need for any kind of "active memory" or "memory based computation" or other forms of "per-neuron" parallelism.
OL learns in a fraction of the learning time required in DL based language understanding. Minutes on a single thread on a laptop rather than days using hundreds of GPU based hosts in a cloud. Learning is effective with much smaller corpora than those used in DL, as long as SOTA accuracy is not a design requirement.
OL is capable of unsupervised learning of any language given nothing more than plain (untagged) text in that language .
OL scales as needed. The only limit is the size of main memory.
OL inference (runtime) engine (UM1) requires much less memory.
OL runtime clients require no end-user programming in TensorFlow, Keras, or other ML languages. Python sample code is available.

Connectome Algorithms

The learner starts out empty and then learns recurring sequential patterns it discovers in its input stream. The learning algorithm slowly builds a Connectome, a graph of Nodes and Links (like neurons and synapses) to reify the experiences.

The framework provides direct Java level access to the learned connectome and its individual neurons and can be directly used for Epistemology level experiments. Clients will not have access to this, but researchers at Syntience benefit from this as implementers. We call the results of programming at this Meta-learning-level "Connectome Algorithms"

We have a "Clojure REPL" running in the same JVM that is accessed asynchronously from an outside Emacs editor using an API. We can interactively conduct experiments in Clojure (a LISP) on the Connectome. Most of these involve graph traversal to discover concept correlations.

Potential

It is worth emphasizing that OL is not an incremental improvement on Deep Learning. It is a completely novel capability, never seen before, with its own strengths, potential, and limitations. In patent language, it is a fruit of a different tree than DL, and was designed from the start using a Holistic Stance, as described in Chapter 7, The Red Pill of Machine Learning.

Because it is new, we have few clues about how to estimate its impact. We can foresee a range of future applications for general human language understanding by computers, even small and cheap ones. An algorithm for human-like (but not necessarily human-level) Natural Language Understanding – NLU(*), as opposed to NLP – such as Organic Learning, opens up what has been estimated to become a $1 Trillion NLU market out of a multi-Trillion general AI/ML market.

OL has been developed by 1-2 people and has worked only since 2017. In contrast, Deep Learning has benefited from thousands of researchers improving it since 2012. This is only the beginning for OL.

Towards voice input and output

OL and UM1 are text based. If we want voice input and output, then we can use speech-to-text and text-to-speech systems before and/or after OL or UM1. While useful, the main use of voice I/O will only be realized when we get decent ML based question answering systems, or better, systems that can handle "Contiguous Rolling Context Mixed Initiative Dialog". We are continuing research in this direction. OL could probably handle speech by learning from speech input like current ML based voice input systems but we have not researched this.

The unreasonable efficiency of Organic Learning

The OL and UM1 algorithms are very fast. Learning can stream, sustained, at 5,000 cps per thread, and Understanding has been run at 100,000 cps per thread on a laptop. Deep Learning based NLU projects like BERT and GPT/3 take months to learn in the cloud. Their results are not directly comparable to OL but either strategy may work in any specific application and should be evaluated on a case by case basis. The energy savings alone make a compelling argument for using OL.

Learning is thread-safe. In a 200 thread machine, the system could be reading from 200 different text files, such as different pages in Wikipedia, simultaneously, into the same global RAM image of Nodes and Links. This would be like having one single brain and 200 pairs of eyes reading books.

There are entire classes of applications that could use the "Industry strength Understanding" provided by OL. It may not beat State of the Art in academia, but it can be used with very little API programming effort and has a shallow learning curve for humans. We also expect OL to handle multiple languages well, and it might compete well on price because of energy and compute savings. A cloud based microservice could handle this for any kind of agent, such as language based phone apps.

Designed for Sequence Processing

Convolution of images in DL for the purpose of correlation discovery is expensive. When Understanding language, we still require correlation discovery, but text is linear and this operation becomes a much simpler problem in the time/sequence domain rather than (as in DL) in a 2D/3D/4D image/movie/point cloud domain. In effect, all correlations in text are references to something experienced in the past, and the past is linear, searchable and indexable. This, and the next paragraphs, explain the amazing learning speed of Organic Learning.

Only Activate Relevant Concepts

In OL, unless we are reading about, say, a potato harvest, there is no reason to touch any of the software neurons involved in “representing” potato harvests. So we can have billions of learned concepts just sitting quietly in main memory, consuming no CPU cycles, until “their topic” is discussed. They activate, make a contribution to understanding, disambiguation, and learning, and they may even may learn something themselves. They then return to dormancy until next time the topic is discussed. This is a major advantage of discrete-neuron strategies.

Therefore, OL systems do not require explicit attention controlling mechanisms. Focus emerges from matches between the global gathered experience and the concepts discovered in the text in the input stream.

In contrast, in DL, every pixel of a processed image is pushed through the entire GPU processing stack many times. And each layer does some moderately expensive array arithmetic, which is why using GPUs is so important in DL.

Track Correlations Explicitly

One of the main operations in Deep Learning is correlation discovery by convolution of images in multiple layers. This is done in the upward pass. The discovered correlations are used to somewhat adjust the weights of the DL network in the Stochastic Gradient Descent downward pass. Because of this, the network has changed, necessitating another upward pass of convolutions through the entire stack, until there is sufficient convergence.

In contrast, in Organic Learning, when the algorithm discovers a correlation it simply inserts a Synapse-like link to represent that correlation. Being an evolutionary system, the correlation will remain as long as it contributes to correct Understanding but will be removed in a Darwinian competition if it doesn’t contribute to positive results. These mostly-persistent Synapse-based correlations make re-convolution unnecessary. Do it once, and it's done.

Neural Darwinism = RL + Sex

The idea of Neural Darwinism is that in brains, from millisecond to millisecond, there is a continuous battle for relevance among ideas. The ideas are evaluated for fit to both the current sensory input and the totality of gathered experience that has been reified in the Connectome.

Those parts of the Connectome that assist in providing a good outcome are rewarded and other parts may be punished. This is Reinforcement Learning (RL).

In a Neural Darwinism based Machine Learning system the best ideas get to breed with other good ideas, occasionally producing even better ideas. This is how we can generate ideas that are better, often in the meaning of "better abstractions", than any of the ideas we started with. This is the reason sexual reproduction speeds up evolution so much, in nature and in our algorithms spaces. It allows multiple independently discovered inventions to merge into one individual – a fact that is often neglected in naïve discussions about the effectiveness of Evolutionary Computation.

Evolution is perceived as slow because evolution in nature takes a lot of time. But computers can evolve populations of thousands of individuals in milliseconds, and Genetic Algorithms, when done right and attacking a problem in a suitable problem domain, are very competitive and are much faster than their reputation.

Character based input

An OL system does not require the input to be parsed into words; it reads text character by character which means it can also learn languages that do not use segmented (space separated) words, such as Chinese. Each character read from the input activates a “character input neuron” that then signals other neurons, etc., and this cascade of signaling is what determines both what the sentence means and whether and where to attach any new understanding learned from that character, given the past, up to that point, as context. This incremental attachment operation is what provides the “Organic Growth”.

Flexibility and Code Size

OL is a simple java program, about 4000 lines of code for the learning algorithm itself. Add three times that for debugging, visualization, APIs, and test framework. The core algorithm exists in two versions: Learning (Organic Learning Learner) and Runtime (Understanding Machine One – UM1). Learning, all told, is about 16,000 lines. The UM1 code is about 1,700 lines, of which about 500 lines implement the main algorithm. Um1 only imports I/O and Collection classes.

The learner creates, as its main output, a file containing a “node wiring list" describing which neurons connect to which other neurons. Such a Competence Creation event is typically scheduled after each 10 million characters of corpus have been read. Given a Competence file name, UM1 loads that list and re-creates the learned neural structure and then uses that as the basis of its Natural Language Understanding Service.

The learner is currently kept as a trade secret. The UM1 runtime can be source licensed to customers since it does not contain any of the learning code. But initially UM1 will only be available as our cloud based MicroService.

The OL Learner can also be saved in a full Java image format which can be re-started later to continue learning. This means any learned skill can be used as a base for further learned skills. After learning Basic English, we can top it off with Medical English and Business English, targeted at different application domains.

Test Strategies

We have used many different tests of language understanding competence for this research but the strategy described in Noah Smith’s paper “Adversarial Evaluation for Models of Natural Language” is our favorite. We think our tests are closer to capturing Understanding than most NLP era tests; it can be made more or less difficult by choosing more subtle distinctions between the test cases; They are easy to understand, quick to execute, and can be adopted to many situations. We can execute these tests asynchronously at regular intervals such as after each 1 million characters of corpus. We designed the system to not be able to learn from its test data.

NLP era test cases emphasize Reasoning and various puzzles, which require cognitive capabilities that we decided to not even try to implement until we have productized our 100% pure Understanding systems. As a consequence, we cannot currently beat state of the art at GLUE or SUPERGLUE.

Demonstration of Learning English

The Organic Learning algorithm can learn a useful amount of any language on the planet in five minutes running on an old Mac laptop. With "useful amount" I mean that the system could be used for "industrial strength" Understanding, such as in a chatbot. Of course, as is common in Machine Learning situations, learning overnight or even for a few days will continue to improve the results.

Below is an annotated screenshot of our demo of the system learning English from scratch. This demo is running code from December 2019.

You can inspect the test file we used. Download it here. The system cannot learn from tests, which is why we can run it as often as we want.

We can demonstrate this live, in person or in Zoom. Or you can watch this movie:

0:00

Organic Learning Algorithm learning English in five minutes on an old Mac laptop

9. Understanding Machine One

Monica Anderson — Mon, 25 Jul 2022 16:42:27 GMT

UM1 is a cloud based REST service that provides Natural Language Understanding (NLU) as a Service (UaaS). It can be viewed as providing a "large" language Model in the style of GPT/3 and Google BERT but it currently cannot generate any language; it is strictly an Understanding Machine. And it isn't large.

If you send it some text, you will instantly get back UM1's Understanding of the text. Its Understanding may differ from yours, just like yours may differ from that of a co-worker, but will still be useful in apps requiring industrial strength NLU.

How UM1 Understands

The Understanding algorithm running in UM1 is much simpler than the Organic Learning algorithm running in the OL Learner. The Learner emits a Language Competence file designed to be loaded into a cloud based UM1 allocated to handle that language. In what follows, all learning has been done and UM1 is just using the collection of patterns in its loaded Competence to Understand any incoming "wild" text.

We first note that UM1 can read streaming text. Normally it is given text as documents ranging in size from a few characters (such as in tweets and chat messages) to larger documents such as emails. In these cases the Understanding is about the entire document and is returned when the reading has finished. API options allow clients to ask the system for per-character, per-line, or per-paragraph results, to be used with infinite streams.

Deep Learning systems (at least those of the past) split longer documents and process those in batches small enough to fit in the GPUs.

OL and UM1 are reading the text one character at a time. Parsing text into words is implementing a Language Model, and we are philosophically deeply opposed to Language Models. Also, Chinese, Japanese, and several other Asian languages do not use word separating spaces, which makes word level input difficult.

If the atomic unit of input is a word, such as in traditional NLP and even in most DL based Understanding of language, then character level morphology features such as a plural-s suffix will be "below the atomic level of the algorithm" and now would require special processing such as stemming, which discards valuable information. Instead, OL/UM1 systems are designed to learn all character level (morphology) features. DL community has started to switch to character input.

The letter currently being read is looked up in an "Alphabet" Map (a table) that binds each different UNICODE input character to a specific Node in the Understander. We call these "Input Nodes". When a character is read, the framework will originate an initial "signal" that is sent to the Input Node bound to that character. Note that except for debugging purposes and (future) language generation purposes, the character we just read can now be discarded. As Friedrich Hayek said, in the brain there is no vision, no sound, no touch. There is just neurons signaling neurons; same thing in an OL or UM1 system. Retinal neurons could be said to correspond to the Input Nodes in OL and UM1.

The Input Node then propagates the signal to other Nodes connected to it with outgoing synapse-like links. This is similar to a Neuron in a brain signaling its axonal tree.

Some of these signaled Nodes propagate the signal to their outgoing target links, depending on conditions.

Every Node in the system is born with a unique ID number.

Some Nodes are known to be more important (Salient) than others. The ID numbers of these Nodes are returned to the caller in a list. After Understanding the whole message, the list will contain all Salient Nodes that propagated.

The list of ID Numbers is the Understanding you get from UM1. We call this "The Numbered Neuron API" and the resulting list is called a "moniform". It deserves a special name because in production environments the moniform is more complex than a simple list; it may contain annotations like meta-information and structure information such as chapter, page, line numbers etc.

We are not using any tagged learning material, just plain UNICODE text files. This is an enormous advantage over supervised learning, because well tagged text is expensive. The reason we can do this is that each Node (Concept) in the system is automatically given a unique tag – an ID number.

These ID numbers are opaque to the client. This is not a problem.

To find all documents on topic X, start with submitting one or more samples of topic X. If you want to detect the meaning of "I would like to know the balance in my checking account" in some tellerbot you are writing, then you can send that phrase as a topic-centroid-defining "Canonical Probe Phrase" to UM1 and save the resulting moniform in a table. The value in the "right hand column" in the table could be a token such as "#CHECKINGBALANCE" to use subsequently in Reductionist code, such as in a surrounding banking system.

UM1 is not a transformer; it can be described as a half-transformer, an encoder that encodes its understanding of incoming text as lists of numbers. The table of moniforms you build up when starting the client system will be used to decode the meaning. This is done entirely in the client end, in code you need to write or download.

To decode the Understanding received after sending UM1 a wild sentence (chatbot user input, a tweet to classify, etc) your client code will compare the numbers in the wild reply moniform to the numbers of all the moniforms in the probe table we built when we started, using Jaccard Similarity as the distance measure. The probe sentence that has the best matching moniform is the one we will say has semantics closest to the wild sentence.

Jaccard Distance tells how closely related two sets of items are by computing the set intersection and the set union between two sets A and B. The distance is computed by dividing the number of common elements (the intersection) by the total number of elements in either set. This provides a well behaved distance metric in the interval [0..1] as a floating point value. The canonical moniform with the highest Jaccard score is the best match.

In UM1, the ID numbers represent dimensions in a boolean semantic space. If the system has learned 20 million Nodes, each representing some identifiable language level concept, then we can view the numbers returned in the moniform as the dimension numbers of the dimensions which have the value "true". Consider a moniform that has 20 numbers (it varies by message length and input-to-corpus matchability) selected from a possible 20 million to get an idea of the size of the semantic space available to OL.

In some DL systems for language, concepts are represented by vectors of 512 floating-point numbers. In this 512-dimensional space, DL can perform vector addition and subtraction and perform amazing feats of semantic arithmetic, like discovering that KING - MALE + FEMALE = QUEEN. With boolean 0/1 dimensions, closeness in the semantic space becomes a problem of matching up the nonzero dimensions, which is why Jaccard distance works so well.

Traditional NLP is often done as a pipeline of processing modules providing streaming functions to do word scanning, lowercasing, stemming, grammar based parsing, synonym expansion, dictionary lookup, and other such techniques. When using UM1 you do not have to do any of those things; just send in the text.

Note that UM1 does not do any of those operations either. It just reads and Understands. And because OL learned the morphology (such as plural-s on English words) used by UM1, the system can be expected to work in any other learned language, even if morphology is different.

You can test UM1 yourself

A running UM1 is available for Alpha testing in the cloud. Test code (and perhaps python samples) are available for download at https://github.com/syntience-inc/um1 . Instructions are available in repository.

UM1 is not intended for production use. Syntience does not provide any uptime guarantees. Also, being a demo system, it is not configured to scale very far. If you want to use UM1 for business, contact sales@syntience.com and we will provide a dedicated server as a subscription service. But anyone can use it to evaluate UM1 capabilities at this early stage.

B1. Self Improving AI

Monica Anderson — Sun, 03 Jul 2022 19:47:23 GMT

Self-improving AI is a meme that has been circulating since the 1980s. Current proponents of the idea include Boström and Omohundro. My own summary goes something like this:

If we get any kind of AGI going, no matter how slow it is and how buggy it is, we can give it access to its own source code and let it analyze it and clean up and fix the bugs and then re-write its code to be as good as it can make it. We then start up this slightly smarter AGI and repeat the process until the AGIs get superintelligent.

On the surface, this is irrefutable. We already have examples of systems improving themselves: We can buy a cheap 3D printer and then quite cheaply print out parts for a much better 3D printer. Or to make computer chips that go into computers that design better computer chips.

Not to mention evolution of all species in nature.

I look at it from an Epistemologist's point of view and say "That's a Hardline Reductionist idea that should not have made it out of the 20th century".

The idea, as its inception, imagined an AGI as something that was written by teams of human programmers using software development tools and mathematical equations.

But I think the only thing that even approximates this outcome is that the code is perfect, and humans as well as machines all agree there are no more improvements to be made. And the resulting AGIs are still not superintelligent.

The most likely outcome is that we all realize the folly in this argument and won't even try.

It's not about the code.

The number of lines of code in AI related projects has been declining rapidly.

2004 Cyc 6 million FOPC/CycL propositions
2012 34000 lines .py .cuda Krizhevsky et al for Imagenet
2013 1571 lines of lua to play Atari games
2017 196 lines of Keras to implement Deep Dream
2018 <100 lines Keras for research paper level results

And all of these (Except Cyc, included as the most famous example of a 20th century Reductionist AI system) demonstrates new levels of power of Machine Learning.

The limits to intelligence are not in the code. In fact, they are not even technological.

The limit of intelligence is the complexity of the world. Omniscience is unavailable. The main purpose of intelligence is to guess, to jump to conclusions on scant evidence, and to do it well, based on a large set of historical patterns of problems and their solutions or events and their consequences.

Because scant evidence is all we will ever have. We don't even know what goes on behind our back.

And because all intelligence is guessing, I have repeatedly claimed that "All Intelligences Are Fallible".

We are already making machines that are better than humans in some aspect of guessing. Protein Folding and playing Go are examples of this.

And these machines will get bigger and better at what they do and will be superhuman in various ways and in many problem domains, simply based on larger capacity to hold, lookup, or search useful patterns.

The code doing that can be hand-optimized to the point where any AI improvement would be insignificant. My own code in the inner loop for Understanding any language on the planet (once it has learned it, in "inference mode") is about 90 lines of Java. We can expect at best minor improvements to efficiency and speed.

It comes down to the corpus. In my domain, NLU, simple tests can be scored at 100% after a few minutes of learning on a laptop. Continued learning for days and weeks would provide a larger sample set of vocabulary-in-appropriate-contexts which would mainly correct misunderstandings in corner cases. But these corpora are not comparable, by several orders of magnitude, to the gathered life experience of a human at age 25.

The main limit of intelligence is corpus size in a ML situation.

Future Artificial Intelligences will be nothing like what AGI fans have been fear-mongering about. These are 20th century Reductionist AI ideas; the proponents are blind to the most fundamental basics of epistemology. Reductionist GOFAI has been demonstrated to being inferior in their own domains to even semi-trivial Machine Learning methods.

We need AGL, not AGI.

Machines learning to code

As of this writing, there are a handful of available code-writing systems based on ML technology that has learned from large quantities of open source code, for example GitHub Copilot, OpenAI Codex, and Amazon CodeWhisperer.

They have not yet surpassed human programmers.

But it's not about writing code either. AIs writing code is about as silly as AI magazine covers with pictures of robots typing. :-D

In the future, if we want a computer to do something, we will have a conversation (speaking and listening) with the computer. The conversation is at the level of discussing a problem with a competent co-worker or professional.

It may spontaneously ask clarifying questions. I call this "Contiguously Rolling Topic Mixed Initiative Dialog"; others talk of these bots as "Dialog Agents". But this will go beyond Siri or Alexa. And when the computer Understands exactly what you want done. it just does it. Why would Reductionist style programming be a necessary step?

Yes, there will still be lots of places where we want to use code. But whether that code is written by humans or AIs will make much less of a
difference than we might expect based on today's use of computers.

B2. The Wisdom Salon

Monica Anderson — Mon, 04 Jul 2022 18:14:19 GMT

This is a post-mortem summary for my interrupted Wisdom Salon project. I have all the code in an archive but it requires a complete rewrite to fix the two biggest problems: The switch from Flash (yecch) to HTML5 for video and the cost of video connections. I know how to fix these but I'm busy working on Understanding Machines ATM. I am looking for someone to take this over. I also observe that there is a need for something like this. I see things discussed on Quora that would make good topics for a Wisdom Salon. I happen to believe video and spoken words are an important component for many reasons.

Wisdom

Knowledge and Information can easily be found on the web. But what about Wisdom?

Intelligence is based on gathered knowledge.

Wisdom is based on gathered experience.

To get wiser, seek out more experiences. Engage yourself. Do more stuff. Travel. Talk to people to share their experiences. Conversation with others is the easiest way to gain Wisdom. But not all conversations are equal.

We want Conversations That Matter.

The World Cafe Protocol

The World Cafe Protocol is a recipe for organizing such "Conversations That Matter" on a large scale. Thousands of people can cooperate in order to bring clarity to complex issues. To find out more, buy the book or study the World Cafe Website, but this is how it typically works:

In some conference facility or gymnasium the organizers provide dozens to hundreds of square tables. Each has four chairs, a box of crayons, and a piece of butcher paper as a tablecloth.

Stakeholders from all walks of life get invited and sit down at the tables. This could be a mixture of farmers, teachers, politicians. In corporate environments, sometimes this is everybody in the company.

Organizers now unveil a carefully phrased focusing Question as the topic of the conversations. It is important that The Question is positive and focusing. For education reform, don't ask "What is wrong with our education system". Instead, ask "What could a great school also be".

The four people at each table now start a conversation around The Question. Everyone takes notes on the butcher paper using the crayons.

After twenty minutes, a gong rings. Three people - everyone except "South" in Duplicate Bridge terms - at each table get up and move to other tables at random. Three fresh random people sit down at each table.

"South" now first explains to the newcomers what the notes on the tablecloth mean. This provides a kind of lightweight continuity from the previous conversation at this table. The three newcomers comment on these notes and add fresh comments - the best parts of what was said at their previous tables. These conversations unfold very naturally. Four strangers can easily have a friendly conversation about complex things that matter. They don't even have to introduce themselves. They contribute their wisdom and experiences, not their resumes.

Conversations now continue for another twenty minutes. The gong rings again, and the shuffling repeats.

After 2-3 hours, the session is over and the butcher papers are gathered by the organizers into what is called "The Harvest". They are summarized and some time later – perhaps "after lunch", the results are shared with all the stakeholders.

Why this works so well

Someone pushing a bad idea of theirs at every table can spam at worst 27 people in three hours. A good idea, introduced at the first table and repeated by all participants at subsequent tables, will reach over 100,000 people or the majority of the audience, whichever is smaller. This is the filtering power of the World Cafe Protocol.

Wisdom Salon is an online World Cafe

Sadly, the Wisdom Salon project has been suspended because of changing infrastructure and cost structure for online video transmissions and because of lack of time on my part. It is possible to restart the project using current video technology and with funding and a larger team. If interested in contributing to this, please get in touch.

What follows is the original high level design specification, written in the present tense. 😞

Design Specification

The Wisdom Salon is a 24x7 online World Cafe implemented as a video chat site.
Conversations have four participants but each conversation can also have a passive and quiet audience of any size.
All conversations are always public.
All conversation participants are known by their login identities.

Why would anyone want to participate? The main purpose of Wisdom Salon is increased wisdom and improved clarity in complex issues for the participants. This is your main benefit; this is why you would want to participate. You will not get likes, but you might earn a local currency (called "Influons") that you can selectively use to extend your influence.

Goal

The goal is specifically NOT to find the best Grains of Wisdom in the Harvest; the grains are there mainly to provide continuity and shorten the time to get to talking about things that matter. The system is there to provide the users a chance to analyze large and complex issues with others in conversation, in an exchange of experiences.

Do not underestimate how different an interactive conversation is from a web search or reading a book.
Have you ever spent days studying something without "getting it" only to have someone set you straight in two minutes of conversation?
Have you ever been in a meeting where the resolution is something none of the participants even understood when the meeting started?

Sample Questions

What kinds of Questions demonstrate the power of The Wisdom Salon? Consider these samples:

I am considering a mid-life career change. WHAT MATTERS?
Where should I retire, and why there?
Should I pursue a career in engineering or medicine?
Lifestyle design in interesting times
What is the true promise of genetics research and why should I care?
What movies should I let my children watch, and why?
Musical education for my child - WHAT MATTERS? What instruments, and why?
What is it really like to be a soldier in places like Afghanistan and Irak?
Should I retire in Costa Rica ?

User Experience

People arrive when they want and leave when they want. They can engage in multiple ways:

Upon entering the site, users are presented with the (at the moment) most popular conversation – the one with the largest audience.
Below the conversation there will be a list of other popular conversations, headed by conversations on topics the user may have watched or previously participated in.
They can browse all ongoing conversations much like watching talk shows on television.
They can select from hundreds of Questions to find something that interest them, or add their own.
Instead of a butcher paper, they can leave notes on each Question known as "Grains of Wisdom" to provide the lightweight continuity from table to table.
They can vote on these Grains of Wisdom so that the better results rise to the top. Results are immediately visible to all.
They can observe what other people say and how they behave and modify their own social graph to improve their chances of interaction with the best people.
A local currency is earned by passive engagement per hour, more of it is earned by participating in conversations, and the currency is used to pay for the privilege of posting a comment. Because posting costs currency, spamming the Grains of Wisdom will be limited.
A topic without currently active conversations still allows you to browse the Grains of Wisdom on the topic, and if you have Influons, you can vote on the Grains (notes) that you like or otherwise agree with, and you can restart the topic by creating a table and hope others will join.

Four Main Uses of Wisdom Salon

The site ENABLES, but doesn't ENFORCE the World Cafe protocol. You can use the site for several different purposes

As entertainment and education, passively watching conversations among your peers, much like flipping channels on television.
To get both factual information and broad-ranging personalized advice from experts
To share your expertise in fields you understand. To do "micro-mentoring"
To find an audience for storytelling and sharing personal experiences from your life
To gain wisdom and personal clarity in complex issues
To debate the major issues of the day in-person in productively selected and well behaved groups
To find new interesting and competent friends by observing their behavior and then befriending them, much like other social media

Any active conversation starts a 20 minute "clock bar" moving. You can leave anytime. System provides some incentive to stay the full 20 minutes. On the other hand, you don't have to leave after twenty minutes. If you like, you can continue conversation a long as you want. But we expect a large fraction of people to adhere to the protocol. We believe this maximizes the wisdom gain per session.

Without the right people, the system is worthless.

Do not be discouraged. Facebook would be worthless with only ten people on it. Wisdom Salon really requires at least 50 people to be on the system before you are likely to find a conversation around a question you actually care about anytime you join. So nobody knows if this will work or not, and it may take a while before the system matures enough to attract a sufficent repeat audience to become what I designed it for. If you don't like it at first, please try again; it might well improve, and you might get lucky to get into an amazing conversation when you least expect it.

Welcome to my experiment

– Monica Anderson

B3. Model Free AI

Monica Anderson — Wed, 06 Jul 2022 23:14:04 GMT

Don't Model the World - just Model the Mind.

It's a lot easier. With some poetic freedom, I'd like to claim:

Model the World: ten billion lines of code
Model the Brain: ten million lines of code
Model the Mind: ten thousand lines of code

#1 is regular programming. We make computers perform actions in a context that matches the programmer's mental Model of some relevant parts of the world.

#2 is Neuroscience based Models of neurons, synapses and other biological structures and systems in brains.

#3 is Epistemology based "Models" of Learning, Understanding, Reasoning, Prediction, Abstraction, and other Holistic and Emergent phenomena.

Epistemology based Methods require a rather minimal infrastructure to support whatever operations these concepts require.

I put "Models" within irony quotes because they are strictly speaking meta-Models because they are used in meta-skills. They are not about skills, such as English or folding proteins, they are about how to acquire such skills:

By Learning from our mistakes.

B4. Corpus Congruence

Monica Anderson — Wed, 06 Jul 2022 23:51:40 GMT

Understanding in brains and machines can be defined and measured as Corpus Congruence.

Let's consider this in the Machine Learning sense. If a machine is Model Free (Holistic), as all general Understanders have to be in order to not get trapped into a limited Model, then all it ever knows comes from the corpus it was trained on. And all it really can say is "This is more like my corpus than that". Or "This is more like these documents in my corpus than those"

Corpus Congruence as a metric spans up almost all of NLP. Because most of NLP is DocSim in various guises.

Given two documents A and B in some corpus, a classifier can say that an unknown document U is more like A than B. Given this capability, we can build

- Classification and Clustering, by using A, B... N as defining classes

- Filtering, by using A = wanted docs and B = unwanted docs

- Sentiment Analysis by using A = negative docs and B= positive docs

- Entity Extraction by softly matching terms against lists of known entities

- DocSim - Find me more documents like this one

Reductionist NLP uses all of these at the "bag of words" or "word count" levels for things like web search, spam filtering, and clustering. Holistic NLU aims to do the same based on the meanings expressed in sentences and paragraphs.

But "Semantic" Corpus Congruence is still Corpus Congruence.

Common Sense now becomes "Is the proposition before me congruent with my entire World Model, as acquired by learning things from my training corpus?". If it is well known, then we can likely ignore it this time.

And if it is not, then the next question will be "Is it close enough that it might be worthwhile extending the World Model with this information?"

If the answer is no, then the input is by its definition nonsense. Otherwise it is either a new fact or a lie, but since we cannot tell, we have to accept it; possibly with a note that this is fresh, untested knowledge that may turn out to be irrelevant, false, counterproductive, or noise.

Next we can note that it doesn't matter whether "documents" are text or images. Or input from a point cloud of sensors for robots or autonomous vehicle sensors.

And finally we can note that this definition also holds for humans if we take our "corpus" to be "Everything we've experienced since birth".

B5. Twitter's Forklift Upgrade

Monica Anderson — Fri, 25 Nov 2022 18:57:02 GMT

Twitter is, technically, a Short Message Router.

I know a thing or two about such routers. I once worked in a company selling IRC servers with maintenance contracts to companies like Apple and I wrote an IRC client for the Mac. I know the inventor of IRC chat and helped him get a job at Google. I attempted to start a chat project (a precursor to my later work) as a 20% project at Google. I co-founded a startup where we attempted to create a video chat social medium based on the "World Cafe" concept named Wisdom Salon, which died because of problems with video technology available at the time.

And I have written an engineering whitepaper about a proposed Twitter alternative (not a replacement) called Bubble City. The whitepaper is available at https://syntience.com/BubbleCity2.pdf

Message routing systems like Twitter are (today) relatively simple to create. A startup with a team of four decent engineers with some cloud chops can assemble a Twitter like app MVP in 4 months. If based on modern cloud service designs, it would already be designed to scale indefinitely. The text of all the tweets ever made would fit on four current spinning-rust hard disks. Media data just takes storage. Message Search is available as a cloud service, or locally. And AI has made lots of progress on filtering natural language.

All the components are there. And Twitter wouldn't even have to go to the cloud, as a competing startup would, since they have enough data centers of their own.

-- * --

Where am I going with this? I'd like to provide a software architect's POV on the Twitter debacle. I believe the anti-Elon debate is driven by envy over his capability to execute, his money, and his intelligence. The stories of Elon coming in and attempting to fix twitter by cutting services at random are newsbait. More likely these were standard system breakage when nobody is there to baby-sit it (yes, he did fire people), or malicious acts from fired employees. Anything that goes wrong is amplified by the press because it is prime content supporting their narrative.

Witness the many changes to Twitter attributed to Elon before he even was able to change anything.

-- * --

Suppose Elon actually knows what he's doing? What if he bought Twitter as a tearer-downer?

He himself is unlikely to touch any of the Twitter software at all. He has likely brought in software engineers from Tesla and SpaceX that he trusts, to liaision and cooperate with remaining Twitter engineers. He may be leading software architecture and design sessions. If so, I wish those were recorded for later engineering education.

Any attempt to improve 16 years of crufty code would be a nightmare. IMO, that's not happening.

Back in the days of large mainframe computers there were occasions when a system upgrade would require taking the old computer out using a forklift.

A handful of engineers, something between 4 and 20, could rewrite Twitter from scratch in a few months to get a clean start using latest software developments tools and disciplines. They could make it attach to the existing reality interfaces (network APIs etc) and be thoroughly tested on actual running traffic in parallel with the existing Twitter system.

Elon (and his software engineers and advisors, who are known to be excellent since they created things like the Dojo and the FSD computers and software) likely saw this early on in the process, and this is one reason Elon decided to buy it. But also, TBH, Twitter engineers were likely among the first to tell Elon this :-D

Elon has been planning to create a Twitter replacement (as usual, he has named it "X") for some time and (IMO) simply bought Twitter for the name and the users... and to eliminate the main future competitor to X by replacing Twitter with a new version -- we can call this version "TwitterX".

So Elon is likely busy developing TwitterX and ignoring everything else. Then one day, they throw a switch. They will call it Twitter, but it's just all Twitter users being shifted (without being asked) to using TwitterX. Old Twitter will be gone, but old tweets remain in the new one. That's one reason he could fire so many people -- he just needed a skeleton crew to keep the old system limping along for a few months while they developed the replacement. The salaries saved also pay for the development. :-|

If it's is any good, and if they manage to strike a popular balance between your freedom to say anything you want and my freedom not to have to hear it, most people who left Twitter in a speculative pique over Elon buying it are likely to come back.

Speaking of which, Elon has hinted at using Shadow Banning in TwitterX to accomplish this compromise. This is a significant step towards my Bubble City concept which is 100% based on Shadow Banning, in some sense, because it is a 100% pull system. For Bubble City, that would be explicitly part of the user agreement. For TwitterX, who knows. They can make any rules they want and the criterion will be how much return traffic they get, as is usual on the web.

Pull-based systems are those where you specify what kinds of messages you (as an individual user) want to read. Like web search, in some sense, but much more convenient. "Shadow Banning" is a very negative term, but it describes any system where some messages are given lower priority and/or are only shown to certain classes of users. "Pull-based" is a more positive term because the algorithm is in 100% in the hands of the end users specifying the pull.

Elon and his POV of what matters, politically, will remain an influence on the future of this country and the world. But there is hope that his POV won't be the only one available on future TwitterX. This goes with his (and mine) desire to provide a no-censorship messaging platform, possibly using similar strategies.

IMO, Elon taking over Twitter and replacing it, is all good, and likely to become a net positive for the world in as little as a year.

-- * --

One fundamental misunderstanding that is never discussed and which provides both press and amateurs alike with opportunities to make fools of themselves, is the fact that all social media have to select what you get to read.

All social media systems receive a post volume greater than any user following a reasonable number of sources or other users have the patience to read. If you follow 500 people and read 20 messages per day, it should be obvious that you cannot read everything posted by your followees. So all social media systems prioritize messages.

How this prioritization is done -- the admixture of your stream -- is what makes different social media different. "Prioritization by what you have liked in the past", "100% pull system", "filtering to remove illegal content", and "Shadow Banning" are all pretty much the same thing. Some call this "The Algorithm" :-) . When you are deciding which social medium to use, you are making choices among competing selection algorithms.

-- * --

How plausible is this? I expect only minor changes to how the existing Twitter works, in the spirit of the tuning the company has been doing for years. If nothing significant changes in the way Twitter works today for a few months, then that might be supporting my hypothesis.