• ورود به سامانه
      مشاهده مورد 
      •   صفحهٔ اصلی
      • نشریات انگلیسی
      • Journal of Soft Computing in Civil Engineering
      • Volume 1, Issue 2
      • مشاهده مورد
      •   صفحهٔ اصلی
      • نشریات انگلیسی
      • Journal of Soft Computing in Civil Engineering
      • Volume 1, Issue 2
      • مشاهده مورد
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Predicting Budget from Transportation Research Grant Description: An Exploratory Analysis of Text Mining and Machine Learning Techniques

      (ندگان)پدیدآور
      Singhal, AyushGopalakrishnan, KasthuriranganKhaitan, Siddhartha
      Thumbnail
      دریافت مدرک مشاهده
      FullText
      اندازه فایل: 
      1.175 مگابایت
      نوع فايل (MIME): 
      PDF
      نوع مدرک
      Text
      Regular Article
      زبان مدرک
      English
      نمایش کامل رکورد
      چکیده
      Funding agencies such as the U.S. National Science Foundation (NSF), U.S. National Institutes of Health (NIH), and the Transportation Research Board (TRB) of The National Academies make their online grant databases publicly available which document a variety of information on grants that have been funded over the past few decades. In this paper, based on a quantitative analysis of the TRB's Research In Progress (RIP) online database, we explore the feasibility of automatically estimating the appropriate funding level, given the textual description of a transportation research project. We use statistical Text Mining (TM) and Machine Learning (ML) technologies to build this model using the 14,000 or more records of the TRB's RIP research grants big data. Natural Language Processing (NLP) based text representation models such as the Latent Dirichlet Allocation (LDA), Latent Semantic Indexing (LSI) and the Doc2Vec are used to vectorize the project descriptions and generate semantic vectors. Each of these representations are then used to train supervised regression models such as Random Forest (RF) regression. Out of the three latent feature generation models, we found LDA gives the least Mean Absolute Error (MAE). However, based on the correlation coefficients, it was found that it is not very feasible to accurately predict the funding level directly from the unstructured project abstract, given the large variations in source agencies, subject areas, and funding levels. By using separate prediction models for different types of funding agencies, funding levels were better correlated to the project abstract.
      کلید واژگان
      Text Mining
      Transportation research
      Natural Language Processing (NLP)
      Big Data
      Deep Learning
      Soft Computing
      Data Mining

      شماره نشریه
      2
      تاریخ نشر
      2017-10-01
      1396-07-09
      ناشر
      Pouyan Press
      سازمان پدید آورنده
      R&D, Contata Solutions, LLC, Minneapolis, Minnesota, USA
      Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208, USA
      Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011, USA

      شاپا
      2588-2872
      URI
      https://dx.doi.org/10.22115/scce.2017.49604
      http://www.jsoftcivil.com/article_49604.html
      https://iranjournals.nlai.ir/handle/123456789/44853

      مرور

      همه جای سامانهپایگاه‌ها و مجموعه‌ها بر اساس تاریخ انتشارپدیدآورانعناوینموضوع‌‌هااین مجموعه بر اساس تاریخ انتشارپدیدآورانعناوینموضوع‌‌ها

      حساب من

      ورود به سامانهثبت نام

      تازه ترین ها

      تازه ترین مدارک
      © کليه حقوق اين سامانه برای سازمان اسناد و کتابخانه ملی ایران محفوظ است
      تماس با ما | ارسال بازخورد
      قدرت یافته توسطسیناوب