What is VC dimension? – Limit Infinite

In machine learning, the problem of learning is that you have a set of data (x1, x2, … , xn) that you have the results (y1, y2, …, yn) on. The problem and approach of machine learning is that you want to “fit” the results to the data using a function and you hope that the same function will generalize to input data outside of the set (x1, x2, …, xn).

In choosing the function, you have a set of functions called hypotheses set. If the hypotheses set is rich (complex) enough, you will likely find a function that will fit the input/results data. Otherwise, you will only be able to obtain a poorly approximating function for the input/results data.

Now say, you fit the input data/results very well. What is the guarantee that the function approximates the data/results outside of the input data set well? This is crux of learning theory.

You want to have two things in machine learning:

a function that fits the input data/results well
a function that generalizes to data outside of the input data set.

And, it turns out these two place two contradicting restrictions on the hypotheses set. To satisfy, 1, you want your hypotheses set to be rich (complex). The VC theorem of learning theory states that to satisfy 2, you want your hypotheses not to be so rich (complex).

For example, if you have 5000 input/result points. You could have your hypotheses set as the set of 10000th degree polynomial. Now, given any set of 5000 points, you are sure to find a polynomial (in fact, several several polynomials because the degree is much greater than the number of points) that fits with zero error. You satisfied requirement 1. But, there is no assurance from theory that the polynomial that you found would fit to a point that is not one of the 5000 given points. For example, two different polynomials that fit the 5000 input/result points perfectly well, might behave totally differently on the remaining input space. So, which one is a candidate for generalizing to outside of the input/result points?

Thus, the learning theory asks you to have a hypotheses set that is only rich enough to give you low enough error (do not target perfect fit aka zero error).

VC dimension is what characterizes the richness (complexity) of the hypotheses set.