Saturday, 7 November 2015

Using SVM Classifier

Support vector Machine (SVM) is one of most famous machine learning tool for classification problem. This is supervised learning technique .Read More
SVM Margin
 we are going to see how to use SVM classifier in python.

Our Demonstration uses digit dataset . This dataset uses 64 feature vector to identify handwritten digit [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] . It means this 64 feature extracted from Handwritten digit used to classify handwritten digit in 9 classes .

So Again we gonna use skilearn Python Package .

Lets see some about Some of Parameter of our classifier that is used to customize our classification .
We discuss only few most useful parameters .

  • kernel : Used to convert low dimensional feature space to high dimensional space in order to make it separable. example : rbf,poly,leaner.
  • C : Controls trade off between smooth decision boundary and classify training point correctly. High value leads to over-fitting that is not good for generalisation.
  • gamma : Defines how far influence of a single training example reach.(high value of gamma leads to over fitting) so for generalise classification we need low value of gamma .
Others Parameters are : cache_size, class_weight, coef0, decision_function_shape, max_iter, probability, random_state, shrinking, tol, verbose .

What is Over-fitting 

From this figure you can see the Difference . But real question is which situation is good. And Answer for this problem is figure1 . Overfitting Situation is not good but it looks like that it will gives better result . but when you will apply this classifier for general new data you will see that performance of over fitting is less then proper fitted hyperplain .

Hyperplain : Line or plain in dimensional space that is used as boundary of classification  .

SVM Classifier [Python] [Github link]

We Use rbf kernel.

  1. __auther__=''
  2. #SVM Classifier
  3. import sklearn.datasets as data
  4. from sklearn.svm import SVC
  5. import numpy as np
  6. # Get digits Dataset****
  7. data=data.load_digits()
  8. #feature vector
  9. #Label Vector
  10. # **************************
  11. if __name__=='__main__':
  12.     # clsf = classifier
  13.     clsf=SVC(kernel='rbf',gamma=0.001,C=0.1) #SVM Classier
  14.     # There are other argument like
  15.     # [[[C, cache_size, class_weight, coef0,
  16.     # decision_function_shape, gamma, kernel,
  17.     # max_iter, probability, random_state,shrinking,
  18.     # tol,verbose]]]
  19.     # you can pass in order to custmize your classifier
  20.     # Trainig Classifier  ***
  21., Y)
  22.     #Now predict values for given classifier
  23.     prediction = clsf.predict(X)
  24.     #
  25.     print 'printing data for few classification'
  26.     for i in [4,50,200,300,600,700,900,1100,1500,1600,1700,344,1123]:
  27.         print 'Feature : ',X[i],'\t Real Digit :',Y[i],'\tPredicted Digit',prediction[i]
  28.         print '********************************'
  29.     print '\n\n\n'
  30.     # Print Accuracy Test
  31.     from sklearn.metrics import accuracy_score
  32.     print 'Accuracy Check ',accuracy_score(prediction,Y)*100,'%  Wow _/\_ that is GOOD :)'

Output :
SVM classifier on digit dataset
Result with approximately 98% accuracy

1 comment:

  1. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Data Science with Python , kindly contact us
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Sangita Mohanty
    Skype id: training_maxmunus
    Ph:(0) 9738075708 / 080 - 41103383



Blogger Widgets