A New Method for Sentence Vector Normalization Using Word2vec

Abdolahi, Mohamad; Zahedi, Morteza

doi:https://dx.doi.org/10.22075/ijnaa.2019.4177

(ندگان)پدیدآور

Abdolahi, MohamadZahedi, Morteza

دریافت مدرک

FullText

اندازه فایل:

356.1کیلوبایت

نوع فايل (MIME):

PDF

نوع مدرک

Text
Research Paper

زبان مدرک

English

نمایش کامل رکورد

چکیده

Word embeddings (WE) have received much attention recently as word to numeric vectors architecture for all text processing approaches and has been a great asset for a large variety of NLP tasks. Most of text processing task tried to convert text components like sentences to numeric matrix to apply their processing algorithms. But the most important problems in all word vector-based text processing approaches are different sentences size and as a result, different dimension of sentences matrices. In this paper, we suggest an efficient but simple statistical method to convert text sentences into equal dimension and normalized matrices Proposed method aims to combines three most efficient methods (averaging based, most likely n-grams, and word's mover distance) to use their advantages and reduce their constraints. The unique size resulting matrix does not depend on language, Subject and scope of the text and words semantic concepts. Our results demonstrate that normalized matrices capture complementary aspects of most text processing tasks such as coherence evaluation, text summarization, text classification, automatic essay scoring, and question answering.

کلید واژگان

Text Preprocessing
Sentence Normalization
Word Embedding
Word Vector
Sentence Vector

شماره نشریه

تاریخ نشر

2019-12-01
1398-09-10

ناشر

Semnan University

سازمان پدید آورنده

Kharazmi International Campus Shahrood University of Technology, Shahrood, Iran
Kharazmi International Campus Shahrood University of Technology, Shahrood, Iran

شاپا

2008-6822

URI

https://dx.doi.org/10.22075/ijnaa.2019.4177
https://ijnaa.semnan.ac.ir/article_4177.html
https://iranjournals.nlai.ir/handle/123456789/322982