Information: Processed Data
VS
Knowledge: Information that is modeled to be useful
We need INFORMATION to be able to get KNOWLEDGE
X : random variable with distribution p(x)
$I(X) = log_2(\frac{1}{p(x)})$
expected number of bits needed to encode a randomly drawn value of Y
Entropy H(Y) of a random variable Y
$H(Y) = -\sum_{k=1}^{K} P(y=k) log_2P(y=k) = \sum p(y=k)log_2(\frac{1}{P(y=k)})$