the basis of different aspects including religion. They
defined hate speech in their work and then gathered data from Yahoo and American Jews Congress (AJC), where Yahoo provided its data from news groups and AJC gave URL marked as offensive websites. Data gathered from Yahoo was not lengthy as was in Attenberg’s URLs, which contained huge text descriptions. They classified data at paragraph level in their first attempt and then used this data set for annotation by asking annotators to manually annotate the data set. They focused on stereotype and thus decided to make language model for stereotypes to mark hate speech. They made an anti-semitic speech classifier first. They identified 9000 paragraphs matching to their regular expression and then removed those paragraphs that were not offensive. Then further seven categories were chosen to annotate the data. After this annotation for their gold corpus, they used two fold cross validation classifier to find a refined data set. A template based classification strategy was adopted along with brown cluster and parts of speech tagging. Log odds based on ration were used to select feature which were 4379 features earlier. Than these feature were fed to SVM classifier which reduced these features to 3537 features after elimination process (based on 1.5 thresholds of log odds). They achieved 0.91 base line accuracy. They used SVM with linear kernel function and 10 fold cross validation by getting an overall accuracy of 0.96, precision 0.59, recall 0.68 and f-measure 0.63. Motivated by work done in [6], Kwok and Wang [4] proposed a method for detecting hatred speech against black over Twitter. They arranged hundreds of tweets to analyse keywords or sentiments indicating hate speeches. To judge the severity of arguments, a questionnaire was floated to students of different races. A training dataset of 24582 tweets was pre-processed to correct spelling variation, remove stop words and eliminate URL etc. In order to classify tweets, NB classifier was used to identify racist and non-racist tweets and prominent feature were identified from those tweets. The classifier has shown an accuracy of 86%. Later on, a unigram approach was used for making vocabulary. Thus, 9437 unique words were classified in the racist training dataset and 8401 unique word in non-racist data set. An accuracy of 76% was achieved by using 10 fold cross validation. Burnap and Williams [9] used machine learning approach for classification of tweets as hate speech or antagonistic focusing racism, ethnicity and religion. Motivation behind this work was extensive public reaction over social media over murder of a drummer, Lee Rigby in Woolwich, UK. They collected 450,000 tweets during the heat period of this event. After that a sample data of 2000 tweets was annotated by four people and tweets agreed by three people (75%) were selected and others were discarded. In feature selection phase, they employed Stanford Lexical Parser along with Context Free Lexical Model to take out typed dependencies in tweets. They used Bag of words (BoW) as feature in two phases. In first phase, they used all typed dependencies as features and in second phase, they executed Meta-analysis to find best feature for hate speech classification. For this purpose, they used Bayesian Logistic Regression. They used WEKA classifier such as Random Forest Decision Trees, SVM etc. They achieved 95% f-measure using 10 cross fold validation. Ting et al. [17] in their work proposed architecture for discovering hate groups over Facebook with the help of social network and text mining analysis. Extracted features include keywords that are frequently used in groups. Sureka et al. [19] proposed an approach that is based upon the data mining and social network analysis for discovering hate promoting videos, users and their hidden communities on YouTube. Chen et al. [23] presented a framework for identification of extremist videos on YouTube. Author extracted International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-1, Jan.-2017