7. The Red Pill of Machine Learning

Reductionism is the use of Models. Holism is the avoidance of Models. Models are scientific models, theories, hypotheses, formulas, equations, naïve models based on personal experiences, superstitions (!), and traditional computer programs

7. The Red Pill of Machine Learning

The Deep Learning revolution of 2012 changed how we think about Artificial Intelligence, Machine Learning, and Deep Neural Networks. What changed, and what does this mean going forward?

The new cognitive capabilities in our machines are the result of a shift in the way we think about problem solving. It is the most significant change ever in Artificial Intelligence (AI), if not in science as a whole. Machine Learning (ML) based systems are successfully attacking both simple and complex problems using novel methods that only became available after 2012.

We are experiencing a revolution at the level of Epistemology which will affect much more than just the field of Machine Learning. We want to add more of these novel methods to our standard problem solving toolkit, but we need to understand the tradeoffs and the conflict.

I argue that understanding Deep Neural Networks (DNNs) and other ML technologies requires that practitioners adopt a Holistic Stance which is (at important levels) blatantly incompatible with the Reductionist Stance of modern science. As ML practitioners we have to make hard choices that seemingly contradict many of our core scientific convictions. As a result we may get the feeling something is wrong. The conflict is real and important and the seemingly counter-intuitive choices make sense only when viewed in the light of Epistemology. Improved clarity in these matters should alleviate the cognitive dissonance experienced by some ML practitioners and should accelerate progress in these fields.

The title refers to the eye-opening clarity some Machine Learning practitioners achieve when adopting a Holistic Stance.

Parallel Dichotomies

Syntience Inc researches Natural Language Understanding (NLU, in bottom row in table below). We are creating novel systems that allow computers to learn to Understand human natural languages. Any one of them. We use Deep Neural Networks of our own design. The goal is to achieve some kind of human-like but not necessarily human-level Understanding[1]. This is very different from traditional Natural Language Processing (NLP) which relies on human made Models of some language, such as English, and perhaps Models of fragments of the world. The NLP and NLU disciplines have chosen opposite answers to their difficult two-way choices. They are now defined by these choices, and we can use their stances to highlight the main conflict.

The split is so deep that it cuts through many layers of our reality; the dichotomies listed below are all manifestations of this incompatibility at different levels, listed by impact, but discussed out of order below.

Domain Science The Complex, including the Mundane
Epistemology Reductionism Holism
Brains Reasoning Understanding
Problem Solving Plan it, then do it Just do it
Artificial Intelligence 20th Century GOFAI Machine Learning, Deep Neural Networks
Natural Language In Computers NLP NLU

The Problem Solving level provides many familiar examples of these issues. In our mundane lives, we solve many kinds of problems every day but our strategies for solving them fall into just those two categories. For any complicated problem, we had better have a plan before we start. But most problems the brain deals with every day are things we never have to think about. Because we do not need to plan or Reason about them. These are the millions of low-level problems we encounter in our mundane life every day; and this is the world that our AIs will have to operate in.

Consider someone walking across the floor. Their brain signals their leg muscles to contract in the correct cadence. Do they need to consciously plan each step? Do they Reason about how to maintain their balance? No; they probably don't even know what leg muscles they have.

Consider understanding this sentence. Did you use Reasoning? Did you use grammar? If you are a fluent speaker, you do not need grammars to understand or produce language. And you do not have time to Reason about language while hearing it spoken. Reasoning is slow, but Understanding is instantaneous.

Consider someone braking for a stoplight. How hard should they push on the brake pedal? Do they compute the required differential equation? Should such equations be part of the drivers license tests?

Consider someone making breakfast. Did they have to Reason about anything or plan anything, or did they just do what worked yesterday, "without thinking about it"? Without consciously planning it?

Walking and talking, braking and breakfasting, like almost everything we use our brains for, rely on learning from our experiences in order to re-use anything that has worked in the past. And, over time, we learn to correct our mistakes. These strategies are simple enough that we can identify them in other life forms. Dogs Understand a lot but do not Reason much. And we can see how they could be implemented in something like Neurons in brains.

The split in our brains between Reasoning and Understanding was examined at length in "Thinking fast and slow" by Daniel Kahneman. The absolute majority of the brain's efforts is spent processing low level sensory input, mostly from the eyes. He calls this "System 1"; it provides Understanding. Reasoning is done by "System 2" based on the Understanding from System 1. But most problems we deal with on a daily basis do not require System 2 at all.

Artificial Intelligence and Machine Learning

Computers can solve any suitable problem when given sufficient human help, such as a complete plan for the solution in the form of a computer program and valid input data.

But since the AI/ML revolution of 2012 we now know how to make computers Understand certain problem domains through Machine Learning. The acquired Understanding allows the machine to "just do it" for many different problems in that domain, without any human planning, Reasoning, or programming, and using incomplete, unreliable, and noisy input data.

This is changing how we are building systems with cognitive capabilities. Everyone working in ML or AI needs to understand the tradeoffs we must make at the most fundamental -- Epistemological -- levels. Modern ML requires examining and seriously re-thinking many things we were taught to vigilantly strive for in our Science, Technology, Engineering, and Mathematics (STEM) educations. Things like "Correlation is bad, but causality is good" and "Do not jump to conclusions on scant evidence" are still solid advice everywhere inside science. But when building Understanding systems, these established strategies and modes of thinking no longer work, because correlation discovery and handling of sparse, unreliable, and inconsistent input data are exactly the kinds of tasks we will have to perform, and perform well, at these pre-scientific levels.

In order to understand how to do this, we must switch to a Holistic Stance.

A Motivating Example

Beginning Machine Learning students are given exercises like this: They are given a large spreadsheet which lists data about houses sold a certain year in the US. This information includes among other things the zip code of the house, the living area in square feet, lot size, the number of bedrooms and bathrooms, the year the house was built, and the final sale price of the house.

We would like to be able to predict this final sale price, given the corresponding data for a current house we are about to list for sale. The given spreadsheet is the data the student will use to train a Deep Neural Network. It is the entire learning corpus; it contains everything the system will ever know.

These students can download Deep Neural Network libraries like "Keras" and "TensorFlow", and runnable examples for many kinds of problems , including useful training data, from places like HuggingFace and GitHub. Next the student trains ("learns") their network using the given data. This may take a while. But when learning finishes, they can give the system data for a house it has never seen and it will quite reliably predict what the house might sell for.

This was the goal of the exercise. The student has created a system that Understands how to estimate real estate prices from listings. But the student still does not Understand anything about real estate. The predictive capability (that many people working in real estate would be willing to pay money for) is 100% based on Understanding in the Deep Neural Network – In the computer. And because all libraries and many pre-solved examples of this nature were freely available, the student did not have to do much programming either.

The Vision

This is desirable. This is what AI should mean. The computer Understands the problem so that we don't have to. Programming in the future will be like having a conversation with a competent co-worker, and when the machine Understands exactly what we want done, it will simply do it. No programming required on our part, or on part of the machine, once a suitable (partially Reductionist) framework exists. The rest is learning, and it can be done in any human language with equal ease.

We are on the right track towards something worthy of the name "AI" with current Machine Learning. Going forward, there are thousands of paths to choose from, and the ability to choose wisely will depend on our ability to understand and adopt a Holistic Stance.

Reductionism and Holism

These are important terms of the art in Epistemology. Both of them have numerous correct, useful, and compatible definitions. We will henceforth use the following definitions for reasons of usefulness and simplicity:

  • Reductionism is the use of Models
  • Holism is the avoidance of Models

Models are scientific models, theories, hypotheses, formulas, equations, naïve models based on personal experiences, superstitions (!), and traditional computer programs. In a Reductionist paradigm, these Models are created by humans, ostensibly by scientists, and are then used, ostensibly by engineers, to solve real-world problems. Model creation and Model use both require that these humans Understand the problem domain, the problem at hand, the previously known shared Models available, and how to design and use Models. A Ph.D. degree could be seen as a formal license to create new Models[2]. Mathematics can be seen as a discipline for Model manipulation.

But now -- by avoiding the use of human made Models and switching to Holistic Methods -- data scientists, programmers, and others do not themselves have to Understand the problems they are given. They are no longer asked to provide a computer program or to otherwise solve a problem in a traditional Reductionist or scientific way. Holistic Systems like DNNs can provide solutions to many problems by first learning about the domain from data and solved examples, and then, in production, to match new situations to this gathered experience. These matches are guesses, but with sufficient learning the results can be highly reliable.

We will initially use computer-based Holistic Methods to solve individual and specific problems, such as self-driving cars. Over time, increasing numbers of Artificial Understanders will be able to provide immediate answers -- guesses -- to wider and wider ranges of problems. We can expect to see cellphone apps with such good command of language that it feels like talking to a competent co-worker. Voice will become the preferred way to interact with our personal AIs.

Early and low-level but useful AI will manifest as computers that can solve problems we ourselves cannot (or cannot be bothered to) solve. They need not be superhuman; all they need to have in order to be extremely useful is exactly the ability to autonomously discover higher level abstractions in some given problem domain, starting from low level sensory input, e.g. by learning from images or reading books. Such systems now exist.

If we want to understand Machine Learning, then we need to understand all strategies in the rightmost column in all the tables below. They are all part of a Holistic Stance and if we are working in Machine Learning, we need to adopt as many of them as possible.

Differences at the Level of Epistemology

Reductionism / Science Holism / Machine Learning
1.1 The use of Models The avoidance of Models
1.2 Reasoning Understanding
1.3 Requires Human Understanding Provides Human-like Understanding
1.4 Problems are solved in an abstract Model space Problems are solved directly in the problem domain
1.5 Unbeatable strategy for dealing with a wide range of suitable problems faced by humans May handle some problems in domains where Reductionist Models cannot be created or used, known as "Bizarre Domains"
1.6 Handles many important complicated problems such as going to the Moon or a highway system Handles many important complex problems such as protein folding and playing Go.
1.7 Handles problems requiring planning or cooperation Handles simple mundane problems such as Understanding language or vision, or making breakfast

Many rows in these tables discuss hard tradeoffs where compromises are impossible or prohibitively expensive; these are identified by boldface numbers in the first column. Remaining rows may not be clear tradeoffs or even disjoint alternatives. Mixed systems are described in a separate chapter.

1.1, 1.2, 1.3: These form the core of these dichotomies and are discussed in most of what follows but also in detail at the AI Epistemology introductory chapter and in videos of talks.

1.4: A weather report is based on Models in meteorology. To solve the problem directly in the problem domain, open a window to check if it smells like rain.

1.5 is also discussed in table 3 below and in the Bizarre Systems video. A summary:

Reductionism is the greatest invention our species has ever made. But Reductionist Models cannot be created or used when any one of a multitude of blocking issues are present. Models work "in theory" or "in a laboratory" where we can isolate a device, organism, or phenomenon from a changing environment. However, complex situations may involve tracking and responding to a large number of conflicting and unreliable signals from a constantly changing world or environment. Reductionism is here at a severe disadvantage and can rarely perform above the level of statistical Models.

In contrast, Holistic Machine Learning Methods learning from unfiltered inputs can discover correlations that humans might miss, and can construct internal pattern-based structures to provide recognition, Epistemic Reduction, abstraction, prediction, noise rejection, and other cognitive capabilities.

1.6, 1.7 : Humans generally use Holistic Methods for seemingly simple (but in reality, complex) mundane problems, like Understanding vision, human language, learning to walk, or making breakfast. Computers use them for very complex problems (ML based AI in general, such as protein folding and playing go) but also simpler ones, such as Real Estate pricing.

Main Tradeoffs

Reductionism / Science Holism / Machine Learning
2.1 Optimality -- the best answer Economy -- re-use known useful answers
2.2 Completeness -- all answers Promptness -- accept first useful answer
2.3 Repeatability -- same answer every time Learning -- results improve with practice
2.4 Extrapolation -- In low-dimensionality domains Interpolation -- even in high-dimensionality domains
2.5 Transparency -- Understand the process to get the answer Intuition -- accept useful answers even if achieved by unknown or "subconscious" means
2.6 Explainability -- Understand the answer Positive Ignorance -- no need to even understand the problem or problem domain
2.7 Share-ability -- abstract Models are taught and communicated using language or software Copy-ability -- ML Understanding (a Competence) can be copied as a memory image

2.1, 2.2, 2.3: Optimality, Completeness, and Repeatability are only available in theoretical Model spaces and sometimes under laboratory conditions. Economy and Promptness had much higher survival value in evolutionary history than Optimality and Completeness. See also 3.1 and 3.2 below.

2.3: The strongest hint that a system is Holistic is that the results improve with practice because the system learns from its mistakes. In Machine Learning, a larger learning corpus is in general better than a smaller one because it provides more opportunities for making mistakes to learn from, such as corner cases.

2.4: Models created by humans have manageable numbers of parameters because the scientist or engineer working on the problem has done a (hopefully correct) Epistemic Reduction from a complex and messy world to a computable Model. This allows experimentation with what-if scenarios by varying Model parameters. It is up to the Model user to determine which extrapolations are reasonable.

In Holistic ML systems we are getting used to systems with millions or billions of parameters. These structures are very difficult to analyze, and just like with human intelligences, the best way to estimate their competence is through testing. Extrapolation is typically out of scope for Holistic systems.

2.5: The majority of end users will have no interest in how some machine came up with some obviously correct answer; they will just accept it, the way we accept our own Understanding of language, even though we do not know how we do it.

2.6: We now find ourselves asking our machines to solve problems we either don't know how to solve, or can't be bothered to figure out how to solve.

We have reached a major benefit of AI -- we can be positively ignorant of many mundane things and will be happy to delegate such matters to our machines so that we may play, or focus on more important things.

Some schools of thought tend to overvalue explainability. To them, ML is a serious step down from results obtained scientifically where we can all inspect the causality – for instance in Reductionist production ("expert") systems.

But the bottom line is that today we can often choose between

  1. Understanding the problem domain, problem, the use of science and relevant Models, and the answer.

  2. Or just getting a useful answer without even bothering to Understand the problem or the problem domain.

The latter (Positive Ignorance) is a lot closer to AI than the first, and we can expect the use of Holistic Methods to continue to increase.

2.7 : Science strives towards a consensus world Model in order to facilitate communication and minimize costly engineering mistakes caused by ignorance and misunderstandings. Scientific communication requires a high-level context –  a World Model –  shared by participants, and agreed-upon signals such as words, math, and software.

But direct Understanding, such as the skills to become a chess grandmaster or a downhill skier, cannot be shared using words -- the experience must be acquired using individual practice. Computer based systems that learn a skill through practice can share the entire Understanding so acquired by copying the memory content to another machine.

Advantages of Holistic Methods

Reductionism / Science Holism / Machine Learning
3.1 NP-Hard problems cannot be solved Finds valid solutions by guessing well based on a lifetime of experience
3.2 GIGO -- Garbage In, Garbage Out is a recognized problem Copes with missing, erroneous, and misleading inputs
3.3 Brittleness -- Experience catastrophic failures at edges of competence Antifragile; learns from mistakes, especially almost-correct guesses and small, correctable failures
3.4 Models of a constantly changing world are obsolete the moment they are created Incremental learning provides continuous adaptation to a constantly changing world
3.5 Algorithms may be incorrect or may be incorrectly implemented Self-repairing systems can tolerate or correct internal errors

3.1: It is because we desire certainty, optimality, completeness, etc. that NP-Hardness becomes a problem. There are many problems where it is relatively easy to find a provably valid solution but where finding all solutions can be very expensive. Real world traveling salesmen merrily travel along reasonable routes.

3.2: If a Reductionist system does not have complete and correct input data, it either cannot get started or produces questionable output. But it is an important requirement of real-world Understanding Machines that they be able to detect what is salient (important) in their input in order to avoid paying attention to -- and learning from -- noise. And they have to deal with incomplete, erroneous, and misleading input generated by millions of other intelligent agents with goals at odds with their own. They need to be able to detect omissions, duplications, errors, noise, lies, etc and the only Epistemologically plausible way to do this is to relate the input to similar input they have understood in the past -- what they already know. They need to understand what matters but if they can also Understand some of the noise ("This is advertising"), they can exploit that.

There are many image and video apps available featuring Image Understanding based on Deep Learning. These apps can remove backgrounds, sharpen details like eyelashes, restore damaged photographs, etc. We need to keep in mind that the ability of Holistic systems to fill in data and detect noise depends on them having learned from similar data in the past. We note that all the image improvements are confabulations based on prior experience from their learning corpora. But we can also note that image composition using these methods yields totally seamless images, very far from cut-and-paste of pixels.

And quite similarly, we find language confabulation by systems like GPT-3 to flow seamlessly between sentences and topics. They have nothing to say, but they say it well. However, they bring us closer to meaningful language generation and when we achieve that, the public perception of what computers are capable of will totally change.

3.3: Most of Cognition is Re-Cognition. Being able to recognize that something has occurred before and knowing what might happen next has enormous survival value for any animal species. A mature human has used their eyes and other senses for decades; this represents an enormous learning corpus and they can Understand anything they have prior experience of. The mistakes made by humans, animals and by Holistic ML systems are very often of a "near miss" variety which provides an opportunity to learn to do it better next time.

Contrast this to Reductionist software systems created for similar goals. Rule based systems have long been infamous for their Brittleness. As long as the rules in the ruleset match the current input and reality perfectly, the results will be useful, repeatable and reliable. But at the edges of their competence, where the matches become more tenuous, the quality rapidly drops. Minor mistakes in the rule sets (in the World Modeling) may lead such systems to return spectacularly incorrect results.

3.4: Sometimes repeatability is important, and sometimes tracking a changing world by continuously learning more about it is important. In ML, Continuous Incremental Learning makes it possible to stay up-to-date. If we want repeatability, we can emit a "condensed, cleaned, and frozen Competence file" from a learner that can be loaded into non-learning (read-only) cloud-based Understanding Machines that serve the world and provide repeatability between scheduled software and competence releases.[3]

In the case of Reductionist systems, such as cell phone OS releases, we are used to getting well tested new versions with minor bug fixes and occasional major features at regular intervals. Such systems "learn" only in the sense that the people who created them have learned more and put these insights into the new release.

3.5: Reductionist systems working with complete and correct input data are expected to provide correct and repeatable results according to the implementation of the algorithm. But both the algorithm and the implementation may have errors. If the algorithm does not adequately Model its reality, then we have Reduction Errors. In the implementation, we may have bugs.

Holistic software systems can be designed to a different standard of correctness. Since input data is normally incomplete and noisy, and results are based on emergent effects, we can expect similar enough results even if parts of the system have been damaged, for instance by Catastrophic Forgetting. Holistic systems can be made capable of self-repair using incremental learning. This has been observed in the Deep Learning community.

Another technique is that when using multiple parallel threads in learning, there may be conflicts that would normally require locking of some values. But if the operations are simple enough, such as just incrementing a value, we can forego thread safety and locking since the worst outcome is the loss of a single increment in a system that uses emergent results from millions of such values, and the mistake would (in a well designed system) be self-correcting in the long run. At the cloud level, absolute consistency may not be as hard a requirement as it is for Reductionist systems. Much larger mistakes can be expected to be attributable to misunderstandings of the corpus or poor corpus coverage.

General Strategies

Reductionism / Science Holism / Machine Learning
4.1 Decomposition into smaller problems Generalization may lead to an easier problem
4.2 Human discards everything irrelevant based on how new information matches existing experience Machine discards everything irrelevant based on how new information matches existing experience
4.3 Modularity Composability
4.4 Gather valid, correct, and complete input data Use whatever information is available, and use all of it
4.5 Formal, Rigorous Methods Informal Ad-hoc Methods
4.6 Absolute control Creativity
4.7 Intelligent Design Evolution

4.1: The Reductionist battle cry is "The whole is equal to the sum of its parts" which gives us a license to split a large complicated problem into smaller problems, to solve each of those using some suitable Model, and then to combine all the sub-solutions into a Model based solution for the original, larger, problem, such as in moon shots, highway systems, international banking, and generally in industrial intelligent design.

This works in simple and some complicated domains, but cannot be done in complex domains, where everything potentially affects everything else. Splitting a complex system may cause any emergent effects to disappear, confounding analysis. Examples of complex problem domains are politics, neuroscience, ecology, economy (including stock markets), and cellular biology. All life sciences operate in a complex problem domain because life itself is complex. Some say "Biology has Physics envy" because in the life sciences, Reductionist Models are difficult to create and justify. OTOH, "Physics is for simple problems."

Problems with many complex interdependencies and unknown webs of causality can now be attacked using Deep Neural Networks. These systems discover useful correlations and may often find solutions using mere hints in the input which match their prior experience; Reductionist strategies with correctness requirements outlaw this. It is notable that one of the larger triumphs of Holistic Methods is Protein Folding, which is a problem at the very core of the life sciences.

So Holistic Understanding of a complex system can be acquired by observing it over time and learning from its behavior. There is no need to split the problem into pieces. Part of the Holistic Stance is that we give the machine everything -- "holism" comes from greek όλοσ -- "the whole" -- i.e. all the information we have. If we start filtering the input data -- "cleaning it up" -- then the system will effectively learn from a Pollyanna version of the world, which will be confusing once it has to deal with real-life inputs in a production environment. If we want our machines to learn to Understand the world all by themselves, then we should not start by applying heavy-handed heuristic cleanup operations of our own design on their input data.

Sometimes, Reductionist strategies are clearly inferior. Natural Language Understanding is such a domain. Language Understanding in a fluent speaker is almost 100% Holistic because it is almost entirely based on prior exposure. We are now finding out that it is much easier to build a machine to learn any language on the planet from scratch than it is to build a "Good Old Fashioned Artificial Intelligence" (GOFAI) 20th Century Reductionist AI based style machine that Understands a single language such as English.

4.2: The process where a human, by using their Understanding, discards everything irrelevant to arrive at what matters is called "Epistemic Reduction" and is discussed in the first five posts on this website. This is the most important operation in Reductionism, but for some reason discussions of Reductionism in the past have tended to focus on other aspects; perhaps this is a new result. ML systems discard with little fanfare anything that was expected and that has been seen before as boring, harmless, or otherwise ignorable. They may also discard things significantly out of their experience as noise.

Things can only be reduced away at the semantic level they can be recognized at. Operations capable of Epistemic Reduction at multiple layers discard anything that's understood at that layer, and they may pass on "upward" to the next higher semantic layer a summary of what they discarded plus everything they did not Understand at their level. And higher levels do the same.

This is why Deep Learning is deep.

4.3: Intelligently designed systems are often made up out of interchangeable Modules, which allow for easy replacement in case of failure, and in some cases (and especially in software) allow for customization of functionality by replacing or adding modules. These modules have well specified interfaces that allow for such interconnections.

In the Holistic case we can consider a human cell where thousands of proteins interact on contact or as required, with many substances floating around in the cellular fluid. It is not the result of intelligent design, and it shows. There are overlaps and redundancies that may contribute to more reliable operation and there are multiple (potentially complex) mechanisms keeping each other in check.

Or we can consider music, where multiple notes in a chord and different timbres in a symphony orchestra in a composition will conjure an emergent harmonic whole that sounds different than the sum of its parts. Or consider spices in a soup or opinions in a meeting that leads to a concensus.

The word "Composability" fits this capability in the Holistic case. Unfortunately, in much literature it is merely used as a synonym for its Reductionist counterpart "Modularity".

4.4: As discussed in GIGO case in 3.2 above, Holistic ML systems can fill in missing details starting from very scant evidence -- cf. confabulations of systems like GPT-3 and image enhancement apps. They supply the missing details by jumping to conclusions based on few clues and lots of experience.

Since we are not omniscient and don't even know what is happening behind our backs, scant evidence is all we will ever have, but it is amazing how effective scant evidence can be in a familiar context. We can drive a car through fog or find an alarm clock in absolute darkness. The more the system has learned, the less input is needed to arrive at a reasonable identification of the problem and hence retrieve a previously discovered working solution.

4.5: Formal methods and experimental rigorousness make for good science.

OTOH, Holistic methods can follow tenuous threads, hoping for stronger threads towards some solution, with little effort spent on backtracking or documentation because once a solution is found, it is the only thing that matters. Tracking has little value in non-repeating situations or when using Holistic Methods at massive scales, such as in Deep Learning.

4.6: Absolute control requires that we know exactly what the problems and solutions are and all we need to do is implement them. Once deployed, systems frozen in this manner, which are exactly implementing the Models of their creators, cannot improve by learning since there is no room for variation in the existing process, and hence no experimentation, and no way to discover further improvements. Only Holistic systems can provide creativity and useful Novelty. We also observe that learning itself is a creative act since it must fit new information into an existing network of prior experience.

4.7: Just like the term "Holism" has been abused, so has "Intelligent Design", which is a perfectly reasonable term for a Reductionist industrial end-to-end practice that consistently provides excellent results. On the Holistic side, Evolution in nature has created wonderful solutions to all kinds of problems that plants and animals need to handle.

But we can put Evolution (also known in the general sense as Selectionism) to work for us in our Holistic machines. They can create new wonderful designs with a biological flavor to them that sometimes (depending on the problem) can outperform intelligently designed alternatives.

Evolution is the most Holistic phenomenon we know. No goal functions, no Models, no equations. Evolution is not a scientific theory. Science cannot contain it; it must be discussed in Epistemology.

Mixed Systems

Deep Neural Networks can perform autonomous Epistemic Reduction to find high level representations for low level input, such as pixels in an image or characters in text. Current vision Understanding systems can reliably identify thousands of different kinds of objects from many different angles in a variety of lighting conditions and weather. They can classify what they see but do not necessarily Understand much beyond that such as expected behaviors of other intentional agents like cars, pedestrians, or cats.

Therefore, at the moment in 2022, most deployments of Machine Learning use a mixture of Reductionist and Holistic methods -- equations and formulas devised by humans implemented as computer code and some inputs from a Deep Neural Network solving a subproblem that requires it, such as vision Understanding.

Self-driving cars use DNNs for Understanding vision, radar, and lidar images (discovering high level information like "a pedestrian on the side of the road" from pixel based images) and this Understanding has (until recently) been fed to logic and rule based programs that implement the decision making ("Avoid driving into anything, period") that is used to control the car.

The trend here[4] is to move more and more responsibilities into the Deep Neural Network, and over time to remove the hand coded parts. In essence, the network learns not only to see, but learns to Understand traffic. We are delegating more and more of our "Understanding of how to drive" to the vehicle itself.

This is desirable.

Experimental Epistemology

"Epistemology is the theory of knowledge. It is concerned with the mind's relation to reality."

This includes artificial minds. An introduction to Epistemology should benefit anyone working in the AI/ML field.

Scientific statements look like "F=ma" (Newton's second law) or E=mc2 (Einstein's famous equation) and can all be proven and/or derived from other accepted results, or verified experimentally.

Algebra is built on lemmas that are not part of Algebra; they cannot be proven inside of Algebra.

Similarly, Epistemological statements are not provable in science because science is built on top of Epistemology. But when science is not helping, such as in Bizarre Domains, then setting scientific methodology aside and "dropping down to the level of Epistemology" sometimes works.

Epistemology is, just like Philosophy in general, an armchair thinking exercise and the results are judged on internal coherence and consistency with other accepted theory rather than by proofs or experiments.

However, the availability of Understanding Machines such as DNNs now suddenly provides the opportunity for actual experiments in Epistemology. Consider the below statements from the domain of Epistemology, and how each of them can be viewed as an implementation hint for AI designers. We are already able to measure their effects on system competence.

  • "You can only learn that which you already almost know" -- Patrick Winston, MIT

  • "All intelligences are fallible" -- Monica Anderson

  • "In order to detect that something is new you need to recognize everything old" -- Monica Anderson

  • "You cannot Reason about that which you do not Understand" -- Monica Anderson

  • "You are known by the company you keep" -- Simple version of Yoneda Lemma from Category Theory and the justification for embeddings in Deep Learning

  • "All useful novelty in the universe is due to processes of variation and selection" -- The Selectionist manifesto. Selectionism is the generalization of Darwinism. This is why Genetic Algorithms work.

Science "has no equations" for concepts like Understanding, Reasoning, Learning, Abstraction, or Modeling since they are all Epistemology level concepts. We cannot even start using science until we have decided what Model to use. We must use our experience to perform Epistemic Reductions, discarding the irrelevant, starting from a messy real world problem situation until we are left with a scientific Model we can use, such as an equation. The focus in AI research should be on exactly how we can get our machines to perform this pre-scientific Epistemic Reduction by themselves.

And the answer to that can not be found inside of science.

Artificial General Intelligence

Artificial General Intelligence (AGI) was a theoretical 20th Century Reductionist AI attempt to go beyond the "Narrow AI" of domain specific expert systems -- closer to the "General Intelligence" they thought humans had. The term was mostly used by independent researchers, amateurs and enthusiasts.

But the "AGI" term was not well enough defined and was not backed by sufficient theory to provide any AI implementation guidance and what little progress had been made by these groups was overtaken by Holistic Methods after 2012.

Today we know that the entire premise of 20th century Reductionist AGI was wrong:

Humans are not General Intelligences...

at birth.

Instead, we are General Learners, capable of learning almost any skill or knowledge required in a wide range of problem domains. If we want human-compatible cognitive systems, then we should build them in our image in this respect. To build machines that learn and jump to conclusions on scant evidence.

Decades ago, "AGI" implied a human-programmed Reductionist hand-coded program based on Logic and Reasoning that can solve any problem because the programmers anticipated it. To argue against claims that this was impossible, the AGI community came up with the promise or threat of self-improving AI[5].

But the amount of code in our cognitive systems has shrunk from 6 million propositions in CYC around 1990 to 600 lines of code to play video games around 2017 to about 13 lines of Keras code in some research reports. And now there's AutoML and other efforts at eliminating all remaining programming from ML.

The problems are not in the code. There is almost no code left to improve in modern Machine Learning systems. All that matters is the corpus. We can now (after 2012) see that Machine Learning is an absolute requirement for anything worthy of the name "AI".

Which makes recursive self-improvement leading to evil superhuman omniscient logic-based godlike AGIs a 20th century Reductionist AI myth.

We must focus on AGL .


Science was created to stop people from overrating correlations and jumping to erroneous conclusions on scant evidence and then sharing those conclusions with others, leading to compounded mistakes and much wasted effort.

Consequently, promoting a Holistic Stance has long been a career-ending move in academia, and especially in Computer Science. But now we suddenly have Machine Learning that performs cognitive tasks such as protein folding, playing Go, and estimating house prices at useful levels using exactly a Holistic Stance.

So now Science itself has a cognitive dissonance. This is a conflict about what Science is. Or should be.

Ignorance of these stances leads people to develop significant personal cognitive dissonances which is why discussions about these issues are very unpopular among people with solid STEM educations. But the dichotomy is real; we need to deal with it. Our choices so far seem to have been

  • claim the dichotomy doesn't exist (... but Schrödinger and Pirsig also discuss it)
  • claim that the Holistic Stance doesn't work (... but Deep Learning works)
  • claim that Reductionist methods are a requirement (... hobbling our toolkits for a principle)

The Reductionist Stance also makes it difficult to imagine and accept things like

  • Systems capable of autonomous Epistemic Reduction
  • Systems that do not have a goal function
  • Systems that improve with practice
  • Systems that exploit emergent effects
  • Systems that by themselves make decisions about what matters most
  • Systems that occasionally give a wrong answer but are nevertheless very useful

So after a serious education in Machine Learning we don't actually need to do almost any programming at all, and we don't need to understand anybody else's problem domains. Because we don't have to perform any Epistemic Reduction ourselves.

We should recognize this for what it is. AI was supposed to solve our problems for us so we would not have to learn or understand any new problem domains. To not have to think. And that's what we have today, in Machine Learning, and with Holistic Methods in general. Why are some people surprised or unhappy about this? In my opinion, this is AI, this is what we have been trying to accomplish for decades.

People who claim "Machine Understanding is not AI" are asking for human-level human-centric Reasoning and are, at their peril, blind to the nascent ML based Understanding we can achieve today.

With expected reasonable improvements in Machine Understanding capabilities, familiarity and acceptance of the Holistic Stance will become a requirement for ML/AI based work. It will likely take years for our educational system to adjust.


Tor Nørretranders: "The User Illusion" -- Best book available about AI Epistemology

William Calvin: "The Cerebral Symphony" -- Neural Darwinism

William Calvin: "How Brains Think"

Daniel Kahneman: "Thinking, Fast and Slow" -- Understanding vs. Reasoning

Erwin Schrödinger: "What is Life" -- Nature is Holistic and difficult to Model

Robert Rosen: "Essays on Life Itself" -- Modeling reality

Robert Sapolsky: "Human Behavioral Biology Ep. 21 and Ep. 22. and especially Ep. 23.

Douglas Hofstadter: "Gödel, Escher, Bach -- an Eternal Golden Braid" -- Reductionism and Holism

Daniel C. Dennett: "Consciousness Explained" -- Clear explanations of what matters in intelligence

Hon. J. C. Smuts: "Holism and Evolution"

Robert M. Pirsig: "Zen and the Art of Motorcycle Maintenance" Code: Classical = Reductionist, Romantic = Holistic

Monica Anderson: "Experimental Epistemology" (This site) Introduction to Experimental Epistemology.

Monica Anderson: "AI Epistemology and Understanding Machines" Videos of my talks

  1. Common words with an uppercase first letter are used in an unchanging technical sense that is defined (explicitly or implicitly) in what follows or in the first page of the AI Epistemology introduction in Why AI Works] ↩︎

  2. As a shorthand, we may call a person who prefers to adopt a Reductionist Stance "a Reductionist" and similarly for "a Holist". It doesn't imply they cannot switch their stance; it merely states a preference, and likely their first attempt at any new problem ↩︎

  3. The goal of my research for Syntience Inc is general Natural Language Understanding and Generation that feels like holding a conversation with a competent co-worker. Details forthcoming on this site. Support, grants, funding and licensing requests welcomed. ↩︎

  4. Publicly acknowledged by Tesla researchers on Tesla AI day and very likely the way forward for all autonomous vehicles ↩︎

  5. Discussed in more detail and debunked in a blog post on this site ↩︎