**Problem Detail:**

One of the inputs to my neural network is a set. I have a set $S = \{s_0, s_1, ..., s_n\}$ in which all values $s_i$ are constant. An example of such a set could be the set of French wines (Beaujolais, Languedoc-Rousillon, Champagne) or the set of players in a sports event (Player A, Player B, ...). The input to the neural network is a subset $T$ of $S$ (e.g., Player A competing against Player B or Beaujolais wine being served at a table, but nothing else).

Due to the restrictions of my neural network design, all input values must be normalized within the interval $[0,1]$. How would I encode the set $T$ to obtain an input to the neural network? How do I normalize the values in my set $S$ in a way to respect this condition?

My current idea is to **use one boolean input per $s_i$**: there would be $\#(S)=n$ boolean inputs, all set to 0 except for the values in $T$, which would be all 1. However, this presents the obvious flaw that for large $n$ there would be a lot of input neurons. Moreover, if one imposes the additional restriction of having *at most* $m$ elements in $T$, the resulting model would not efficiently correspond to the model (i.e. what if $m+1$ values were set to true)?

Is there a better way of modeling such a situation? Or better, is there a *standard* way for handling input sets with multiple possibilities?

###### Asked By : Pickle

###### Answered By : D.W.

Yes, the "one boolean per set element" is the standard way of encoding such a set. This is known as a "one-hot encoding". Yes, there will be a lot of input neurons, but that's not necessarily a serious problem; current procedures for training neural networks are able to handle millions of nodes with no problems.

If you know that $T$ will contain at most $m$ elements, in principle there are alternative encodings (e.g., use $m \lg n$ wires, where you use $\lg n$ wires for each element of $T$ to encode which element of $S$ it is)... but in practice I do not expect them to perform well.

In that case I think a better approach is to try to build a feature vector that is shorter than $n$ dimensions. Do you have some domain knowledge you can use to identify attributes of the elements? For instance, you could have a feature that says "how many elements of $T$ are red wines?" and "how many elements of $T$ are white wines?" and "how many elements of $T$ are dessert wines?" and so on. You'll have to use your domain knowledge about the task to identify what attributes might be relevant to the classification task. In this way, the number of input wires can be made much smaller than $n$.

Question Source : http://cs.stackexchange.com/questions/67062

**3200 people like this**

## 0 comments:

## Post a Comment

Let us know your responses and feedback