• ثبت نام
    • ورود به سامانه
    مشاهده مورد 
    •   صفحهٔ اصلی
    • نشریات انگلیسی
    • Journal of Computing and Security
    • Volume 1, Issue 4
    • مشاهده مورد
    •   صفحهٔ اصلی
    • نشریات انگلیسی
    • Journal of Computing and Security
    • Volume 1, Issue 4
    • مشاهده مورد
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    PSA: A Hybrid Feature Selection Approach for Persian Text Classification

    (ندگان)پدیدآور
    Bagheri, AyoubSaraee, MohamadNadi, Shiva
    Thumbnail
    دریافت مدرک مشاهده
    FullText
    اندازه فایل: 
    1.126 مگابایت
    نوع فايل (MIME): 
    PDF
    نوع مدرک
    Text
    زبان مدرک
    English
    نمایش کامل رکورد
    چکیده
    In recent decades, as enormous amount of data being accumulated, the number of text documents is increasing vastly. E-mails, web pages, texts, news and articles are only part of this grow. Thus the need for text mining techniques, including automatic text classification, is rising. In automatic text classification, feature selection from within any text appears to be the most important step. Since the feature space in textual data includes tens of thousands of words, feature selection is used for dimension reduction. Different techniques, from statistical to machine learning approaches for feature selection in text have been reported in literature, each with advantages and disadvantages. However up to now there have been very rare researches on utilizing advantages of both learning and statistical approaches. In this paper a new algorithm for feature selection in text is presented to improve the classification performance substantially. The proposed approach - PSA - is based on simulated annealing algorithm and document frequency method. So it can benefit from advantages of both statistical and learning techniques. The simulated annealing algorithm requires an appropriate function for fitness evaluation, where document frequency method as an evaluation function has low computational cost. In addition, a new Persian text dataset, i.e. Persian 7-NewsGroups Dataset, is introduced for evaluating the proposed approach. Therefore, to justify and evaluate our approach, the performance of the PSA is compared to famous methods such as chi-square and correlation coefficient on Persian 7-NewsGroups dataset. The results show that the PSA has overall better performance in comparison to the other methods.
    کلید واژگان
    Text Classification
    Text Mining
    Feature Selection
    Simulated Annealing Algorithm
    Persian Language

    شماره نشریه
    4
    تاریخ نشر
    2014-10-01
    1393-07-09
    ناشر
    University of Isfahan & Iranian Society of Cryptology
    سازمان پدید آورنده
    Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer Engineering Department, Isfahan University of Technology, Isfahan, Iran
    School of Computing, Science and Engineering, University of Salford, Manchester, UK
    Islamic Azad University, Najafabad Branch, Isfahan, Iran

    شاپا
    2322-4460
    2383-0417
    URI
    http://jcomsec.ui.ac.ir/article_21859.html
    https://iranjournals.nlai.ir/handle/123456789/283148

    مرور

    همه جای سامانهپایگاه‌ها و مجموعه‌ها بر اساس تاریخ انتشارپدیدآورانعناوینموضوع‌‌هااین مجموعه بر اساس تاریخ انتشارپدیدآورانعناوینموضوع‌‌ها

    حساب من

    ورود به سامانهثبت نام

    آمار

    مشاهده آمار استفاده

    تازه ترین ها

    تازه ترین مدارک
    © کليه حقوق اين سامانه برای سازمان اسناد و کتابخانه ملی ایران محفوظ است
    تماس با ما | ارسال بازخورد
    قدرت یافته توسطسیناوب