Cheap and Secure Web Hosting Provider : See Now

Definition and properties of support

, , No Comments
Problem Detail: 

From Xiong, Hui, Shashi Shekhar, Pang-Ning Tan, and Vipin Kumar. "TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases." Knowledge and Data Engineering, IEEE Transactions on 18, no. 4 (2006): 493–508. :

Support measures the fraction of transactions that contain a particular subset of items. The notions of support and correlation may not necessarily agree with each other. This is because item pairs with high support may be poorly correlated while those that are highly correlated may have very low support. For instance, suppose we have an item pair {A, B}, where supp(A) = supp(B) = 0.8 and supp(A, B) = 0.64. Both items are uncorrelated because supp(A, B) = supp(A)supp(B). In contrast, an item pair {A, B} with supp(A) = supp(B) = supp(A, B) = 0.001 is perfectly correlated despite its low support.

I don't quite follow the example about support. Could anyone give a proof of $supp(A)supp(B) = supp(A, B) \iff A \text{ and } B \text{ are uncorrelated}$ ? I don't even quite understand what $supp(A, B)$ means. Is there a book or paper on support vs correlation?

Asked By : qed

Answered By : Kittsil

First, the definition is clear: support is exactly "the fraction of transactions that contain a particular subset of items." This is a data-mining term, not a statistics term.

$supp(A)$ is the fraction of transactions that contain item $A$. $supp(B)$ is the fraction of transactions that contain item $B$. $supp(A,B)$ is the fraction of transactions that contain the subset $\{A,B\}$.

Now, your question really seems to be more about how the authors expound on this, which does seem to be a little unclear. As a statistic, support can basically mean, "What is the probability that this subset of items is contained in a random transaction?" That is, you can consider that $supp(A)=\mathbf{P}(A)$.

Then you can see that $$supp(A)supp(B) = supp(A, B) \iff A \text{ and } B \text{ are uncorrelated}$$ comes from the definition of statistical independence, $$\mathbf{P}(A\cap B) = \mathbf{P}(A)\mathbf{P}(B) \iff A \perp B.$$

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/45061

3200 people like this

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback