Recognizing Handwritten Digits using Machine Learning
First of all what is handwriting recognition?
Handwriting recognition is a computer’s ability to recognize and interpret handwritten input.
Machine Learning plays an import role in computer technology with the help of machine learning human efforts of recognition of handwritten text or digit can be reduced. One of the applications which come in mind is OCR(optical Character Recognition) software. OCR is software must read handwritten text, or pages of printed books for general electronic documents in which each character is well defined. The handwritten digits are not always of the same size, width, orientation and justified to margins as they differ from writing of person to person, so the general problem would be while classifying the digits due to the similarity between digits such as 1 and 7, 5 and 6, 3 and 8, 2 and 5, 2 and 7, etc. This problem is faced more when many people write a single digit with a variety of different handwriting.
To address this issue in Python the scikit-learn library provides a good example to better understand the issues involved, and the possibility of making prediction. This article involves predicting a numeric value, and then reading and interpreting an image that uses a handwritten font. We will use Google colab to implement the recognition of hand written digit. so first create a new notebook
After Creating new notebook you will be directed to a page as shown below the basic things that one should know about google colab are shown in the image
The Digits Dataset
The scikit-learn library provides numerous datasets that are useful for testing many problems of data analysis and prediction of the results. Also in this case there is a dataset of images called Digits. This dataset consists of 1,797 images that are 8x8 pixels in size. Each image is a handwritten digit in grayscale.
Thus, you can load the Digits dataset into your Notebook.
After loading the dataset, you can analyze the content. First, you can read lots of information about the datasets by calling the DESCR attribute.
A textual description of the dataset, the authors who contributed to its creation and the references will appear as shown in Figure
The images of the handwritten digits are contained in a digits.images array. Each element of this array is an image that is represented by an 8x8 matrix of numerical values that correspond to a grayscale from white, with a value of 0, to black, with the value 15.
You can visually check the contents of this result using the matplotlib library.
The numerical values represented by images, i.e., the targets, are contained in the digit.targets array and It was reported that the dataset is a training set consisting of 1,797 images. You can determine if that is true with the following commands
Learning and predicting
An estimator that is useful in this case is sklearn.svm.SVC, which uses the technique of Support Vector Classification (SVC). Thus, you have to import the SVM module of the scikit-learn library. You can create an estimator of SVC type and then choose an initial setting, assigning the values C and gamma generic values.
This dataset contains 1,797 elements, and so you can consider the first 1,791 as a training set and will use the last six as a validation set.You can see in detail these six handwritten digits by using the matplotlib library:
Now you can train the svc estimator that you defined earlier. After a short time, the trained estimator will appear with text output.
Result
Now you have to test your estimator, making it interpret the six digits of the validation set.
If you compare them with actual digits as follows
You can see that the svc estimator has learned correctly.
- “I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Exprience. Thank you www.suvenconsultants.com"