• ثبت نام
    • ورود به سامانه
    مشاهده مورد 
    •   صفحهٔ اصلی
    • نشریات انگلیسی
    • Journal of Advances in Computer Engineering and Technology
    • Volume 5, Issue 2
    • مشاهده مورد
    •   صفحهٔ اصلی
    • نشریات انگلیسی
    • Journal of Advances in Computer Engineering and Technology
    • Volume 5, Issue 2
    • مشاهده مورد
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

    (ندگان)پدیدآور
    Elhadi, Mohamed
    Thumbnail
    دریافت مدرک مشاهده
    FullText
    اندازه فایل: 
    846.7کیلوبایت
    نوع فايل (MIME): 
    PDF
    نوع مدرک
    Text
    Original Research Paper
    زبان مدرک
    English
    نمایش کامل رکورد
    چکیده
    Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for the creation of Arabic text corpora. In particular, we create a text classification process for Arabic news articles downloaded from web news portals and sites. The suggested procedure is a pilot project that uses some human predefined set of documents that have been assigned to some subjects or categories. A vectorized Term Frequency, Inverse Document Frequency (TF-IDF) based information processing was used for the initial verification of the categories. The resulting validated categories used to predict categories for new documents. The experiment used 1000 initial documents pre-assigned into five categories of each with 200 documents assigned. An initial set of 2195 documents were downloaded from a number of Arabic news sources. They were pre-processed for use in testing the utility of the suggested classification procedure using the cosine similarity as a classifier. Results were very encouraging with very satisfying precision, recall and F1-score. It is the intention of the authors to improve the procedure and to use it for Arabic corpora creation.
    کلید واژگان
    Arabic text classification
    TFIDF-Vector space model
    news articles
    Corpora creation
    Clustering and Classification

    شماره نشریه
    2
    تاریخ نشر
    2019-05-01
    1398-02-11
    ناشر
    Science and Research Branch,Islamic Azad University
    سازمان پدید آورنده
    Computer Technology Department, Faculty of Information Technology, Zawia University, Libya

    شاپا
    2423-4192
    2423-4206
    URI
    http://jacet.srbiau.ac.ir/article_14021.html
    https://iranjournals.nlai.ir/handle/123456789/21315

    مرور

    همه جای سامانهپایگاه‌ها و مجموعه‌ها بر اساس تاریخ انتشارپدیدآورانعناوینموضوع‌‌هااین مجموعه بر اساس تاریخ انتشارپدیدآورانعناوینموضوع‌‌ها

    حساب من

    ورود به سامانهثبت نام

    آمار

    مشاهده آمار استفاده

    تازه ترین ها

    تازه ترین مدارک
    © کليه حقوق اين سامانه برای سازمان اسناد و کتابخانه ملی ایران محفوظ است
    تماس با ما | ارسال بازخورد
    قدرت یافته توسطسیناوب