Cheap and Secure Web Hosting Provider : See Now

How can image processing neural networks be effectively trained?

, , No Comments
Problem Detail: 

I was just thinking about image processing neural networks and how to effectively train them in regard of the available dataset.

Let's say you'd want to build a neural network which can distinguish between pictures showing a cat or a dog. I do not have much experience in the field, but I think it would take a pretty big dataset until the network comes to an acceptable success rate.

How could a big enough dataset be acquired? You can't just automatically generate pictures showing a cat or a dog. This means a human would be needed to catagorize thousands and thousands of pictures before the data could be fed to the untrained neural network.

How is it done in practice? Are small datasets (say 100 pictures) ran over and over again? Or is there an completely different, more effective approach?

Asked By : Bobface
Answered By : D.W.

Yes. Typically a very large data set is required: tens or hundreds of thousands of labelled images, or perhaps even more. No, 100 images almost certainly won't be enough.

Just running over the same 100 images won't help. Doing that won't provide the machine learning algorithm any new information that it doesn't already have.

Where do we get these large data sets from? Basically, through tedious human effort. There are many variants of this:

  • Sometimes it involves people manually, tediously labelling images one by one.

  • Sometimes one can find an existing data set that someone else has already labelled for you (e.g., ImageNet).

  • Sometimes you can pay other people to do the labelling for you (e.g., crowdsourcing on Mechnical Turk or something like that).

  • Sometimes you can find images that are already labelled: for instance, if you do a Google Image search on "cat" it will find many, many images of cats for you. Well, most of them will be cats; there may be a few other images in there (e.g., a picture of a tiger, a picture of someone named Cat), but the overwhelming majority will be cats. How does Google Image search know to return these images, when you search for "cat"? Google might be looking at the keywords on the same page where the photo is found. So, basically, this is a case where we already have images on the Internet that have already been labelled by whoever made them available, so it's a particularly interesting special case of the above.

That's the basic idea. There are variations, of course, but I hope it gives you the basic idea. As you can see, coming up with the training set can be extremely tedious and/or expensive, and the availability of labelled data to use for training a machine learning classifier can be one of the main limiting factors on our ability to build better classifiers. This is one of the reasons behind the current hype around "big data": labelled data is valuable (because it can be used to train interesting classifiers), so companies that already have lots of (labelled) data have a valuable asset.

Best Answer from StackOverflow

Question Source :

3200 people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback