• ثبت نام
    • ورود به سامانه
    مشاهده مورد 
    •   صفحهٔ اصلی
    • نشریات انگلیسی
    • Journal of AI and Data Mining
    • Volume 7, Issue 3
    • مشاهده مورد
    •   صفحهٔ اصلی
    • نشریات انگلیسی
    • Journal of AI and Data Mining
    • Volume 7, Issue 3
    • مشاهده مورد
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A Joint Semantic Vector Representation Model for Text Clustering and Classification

    (ندگان)پدیدآور
    Momtazi, S.Rahbar, A.Salami, D.Khanijazani, I.
    Thumbnail
    دریافت مدرک مشاهده
    FullText
    اندازه فایل: 
    1.029 مگابایت
    نوع فايل (MIME): 
    PDF
    نوع مدرک
    Text
    Research/Original/Regular Article
    زبان مدرک
    English
    نمایش کامل رکورد
    چکیده
    Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use semantic models for document vector representations. Latent Dirichlet allocation (LDA) topic modeling and doc2vec neural document embedding are two well-known techniques for this purpose. In this paper, we first study the conceptual difference between the two models and show that they have different behavior and capture semantic features of texts from different perspectives. We then proposed a hybrid approach for document vector representation to benefit from the advantages of both models. The experimental results on 20newsgroup show the superiority of the proposed model compared to each of the baselines on both text clustering and classification tasks. We achieved 2.6% improvement in F-measure for text clustering and 2.1% improvement in F-measure in text classification compared to the best baseline model.
    کلید واژگان
    Text mining
    Semantic representation
    Topic modeling
    Neural document embedding
    Document and Text Processing

    شماره نشریه
    3
    تاریخ نشر
    2019-07-01
    1398-04-10
    ناشر
    Shahrood University of Technology
    سازمان پدید آورنده
    Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran.
    Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran.
    Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran.
    Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran.

    شاپا
    2322-5211
    2322-4444
    URI
    https://dx.doi.org/10.22044/jadm.2019.7400.1876
    http://jad.shahroodut.ac.ir/article_1457.html
    https://iranjournals.nlai.ir/handle/123456789/294922

    مرور

    همه جای سامانهپایگاه‌ها و مجموعه‌ها بر اساس تاریخ انتشارپدیدآورانعناوینموضوع‌‌هااین مجموعه بر اساس تاریخ انتشارپدیدآورانعناوینموضوع‌‌ها

    حساب من

    ورود به سامانهثبت نام

    آمار

    مشاهده آمار استفاده

    تازه ترین ها

    تازه ترین مدارک
    © کليه حقوق اين سامانه برای سازمان اسناد و کتابخانه ملی ایران محفوظ است
    تماس با ما | ارسال بازخورد
    قدرت یافته توسطسیناوب