**Problem Detail:**

Given a hypothesis $h:X\rightarrow Y$ ($h$ is returned by an Empirical Risk Minimization (ERM) strategy with realizable case i.e. $h$ is consistent with the sample examples) over $X=[0,1]\subseteq R$ where $Y=\{0,1\}$ and $D$ is the uniform distribution over $X$. How can I prove that for the accuracy parameter $0\leq\epsilon\leq 1$ we might get $err_D(h)>\epsilon$? where $err_D(h)$ is the true error of $h$ in the distribution $D$.

**This is a homework** and all I can do is using probabilities (as its shown in the course slides and other sources) to give some bounds on the error (i.e. $P(err_D(h)> \epsilon)\leq k(1-\epsilon)^m$) where $k$ is the number of *bad* hypotheses $|H_B|$ s.t. ($\bar{h}\in H_B | err_D(\bar{h})>\epsilon)$ and $m$ is the sample size.

MY question is: is there any other way (beside using probabilities) to prove this? It just seems odd to me when the question asks for a prove and the answer is about probability bound. I am new to the field of learning theory and this might seems like a silly question but I am really trying to understand the intuition behind using probabilities.

###### Asked By : seteropere

#### Answered By : Shaull

Consider a case where the actual label is $1$ for all $x\in [0,1]$. Suppose $H$ consists of two classifiers: one that realizes this (i.e. $\forall x\in [0,1], h(x)=1$), and one that is almost entirely wrong: $g(x)=0$ except for some finite set of numbers $S$.

If the sample is exactly $S$, then both classifiers have the same error on the sample set (0), so you may choose $g$. However, you'll get that $err_D(g)=1>\epsilon$.

As mentioned in the comments, this doesn't really prove anything useful, since the probability that the sample will be exactly $S$ is $0$.

###### Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/14670

**3.2K people like this**

## 0 comments:

## Post a Comment

Let us know your responses and feedback