Understanding in brains and machines can be defined and measured as Corpus Congruence.
Let's consider this in the Machine Learning sense. If a machine is Model Free (Holistic), as all general Understanders have to be in order to not get trapped into a limited Model, then all it ever knows comes from the corpus it was trained on. And all it really can say is "This is more like my corpus than that". Or "This is more like these documents in my corpus than those"
Corpus Congruence as a metric spans up almost all of NLP. Because most of NLP is DocSim in various guises.
Given two documents A and B in some corpus, a classifier can say that an unknown document U is more like A than B. Given this capability, we can build
- Classification and Clustering, by using A, B... N as defining classes
- Filtering, by using A = wanted docs and B = unwanted docs
- Sentiment Analysis by using A = negative docs and B= positive docs
- Entity Extraction by softly matching terms against lists of known entities
- DocSim - Find me more documents like this one
Reductionist NLP uses all of these at the "bag of words" or "word count" levels for things like web search, spam filtering, and clustering. Holistic NLU aims to do the same based on the meanings expressed in sentences and paragraphs.
But "Semantic" Corpus Congruence is still Corpus Congruence.
Common Sense now becomes "Is the proposition before me congruent with my entire World Model, as acquired by learning things from my training corpus?". If it is well known, then we can likely ignore it this time.
And if it is not, then the next question will be "Is it close enough that it might be worthwhile extending the World Model with this information?"
If the answer is no, then the input is by its definition nonsense. Otherwise it is either a new fact or a lie, but since we cannot tell, we have to accept it; possibly with a note that this is fresh, untested knowledge that may turn out to be irrelevant, false, counterproductive, or noise.
Next we can note that it doesn't matter whether "documents" are text or images. Or input from a point cloud of sensors for robots or autonomous vehicle sensors.
And finally we can note that this definition also holds for humans if we take our "corpus" to be "Everything we've experienced since birth".