Monday, February 11, 2013

Thoughts on Gold’s Theorem


Thoughts on Gold's Theorem and article "Gold’s Theorem and cognitive science"

http://psyling.psy.cmu.edu/papers/years/2004/logical/gold-johnson.pdf

* Gold's Theorem *

(short representation of the theorem as I understand it)

I. Given an environment E and a language L, the learner learns L given E if there is some time tn such that at tn and all
times afterward, the learner correctly guesses that L is the target language present in the environment. (Gold himself called this condition "identification in the limit".)

 In my opinion, that is to say that someone has learned a language after she starts to distinguish the properly formed sentences from the improper ones in respect to language L.

II. Let's deconstruct a language to a set of languages, each differing with one sentence, which is invalid in Ln but is valid in Ln+1. This set is infinite and each language is contained in all subsequent. The last one Linf, would contain all sentences from all other languages in the set.

III. Theorem states that function F that properly recognizes the language given a set of example sentences will either never converge for Linf or it will jump to Linf without properly converging on some of the previous sub-language.

 Formal definition: (GT) Any class of languages with the Gold Property is unlearnable.
 Where class of languages with the Gold Property is the infinite deconstruction set of a language.

* Interpretation *

My interpretation is that the "Golden Property" actually can only be observed in a language deconstruction set. Therefore the theorem concerns a single language and it unlearnability via expanding set of examples. Therefore I think that the whole premise in the article is flawed.

* Alternative presentation of the theorem *

Here's how I would cover the same concept:
a) language is infinite as there are infinitely many sentences that would conform with its grammar;
b) decomposing the language in sub-languages differing by a single sentence forms infinite set which complies with the "golden rule";
c) negative evidence - examples of incorrect sentences, will be points lying outside the vector representing the language;
d) Learning language is establishing a function that determines whether a point belongs to the vector;
Prove:
Let the language be a vector, and each proper sentence of it is a point on that vector; that makes each sub-language a represented as line segment. Given a set of dots on that vector, they will always be contained in more than one line segment. Successful learning the language will always rely on a "projection" of the received data points in order to encompass the vector and its infiniteness. Therefore the knowledge of a language is more than a set of proper examples and negative evidence.

* Thoughts on the Model *

I like to develop this generic model further by allowing the exceptions of the grammar rules to lay outside of the vector. Ideally each rule will define a plane in multidimensional space and the intersection of the planes will define the vector for the language. Some tensor will encompass the the vector and all the exception points. Then the volume of that tensor may be used to define the difficulty of learning that language. Or we can set the volume of the tensor encompassing 95% of all proper sentences, because one exception laying far from the vector may cause more volume than many that are closer. Such model also accounts for the fact that projection (of samples to vector), alone, is also insufficient to provide learning of the language. A lot of samples will be required on top of the projection in order to account for all exceptions.
How we form the space - if we use as many dimensions as there are grammar rules in the language, the end result will be a single point, as each rule eliminates one dimension out. To bring this to one dimensional result we add a dimension enumerating the set of all proper sentences. Bringing this in to view with the exceptions laying outside of the vector, we are to say that there are alternative vectors originating from the same point obeying the all rules, but defining different set, which is wrong.
So we need to start with all possible sentences, regardless of language. Then we start to chop down dimensions for each grammar rule we add. But there is no guarantee the end result will be a line, it might as well be a plane. That doesn't change the concept of having exceptions of the rules lie outside of the rule restricted region, but provides insight to why the language acquisition is difficult - the more dimensions there are in the final region, the more difficult it will be to obtain a function that determines if a point belongs to the proper grammar.
There is another aspect of language that may be represented in the model - the ***. There are many sentences that follow proper grammar, but we do not see in the day to day language use. These could be many, but let's take one example - "Noun is very very very very very ... very adjective". We can see sentences with 2-3 times "very" in them, but one with more than 5 starts to seem odd. And technically we can have infinitely many times the word "very" without breaking the grammar. This indicates that there are whole segments in the multidimensional language shape that are not used. Introducing some additional (soft) rules will definitely reduce the language volume and its dimensionality. These additional rules should be soft, since they can still find use in some fringe situations.

* End thoughts *

- Overall I am convinced that the both rationalist and empiricist approach are needed to explain language acquisition.
- Learning a language is not a discrete event.
- Natural languages are not strictly defined.
- Even grammatical rules are "soft" and breaking them does not necessarily prevent communication.

Iliyan Bobev 2013 (c) all rights reserved.

No comments:

Post a Comment