نمایش مختصر رکورد

dc.contributor.authorPouramini, A.en_US
dc.contributor.authorKhaje Hassani, S.en_US
dc.contributor.authorNasiri, Sh.en_US
dc.date.accessioned1399-07-09T06:04:19Zfa_IR
dc.date.accessioned2020-09-30T06:04:19Z
dc.date.available1399-07-09T06:04:19Zfa_IR
dc.date.available2020-09-30T06:04:19Z
dc.date.issued2018-07-01en_US
dc.date.issued1397-04-10fa_IR
dc.date.submitted2016-01-17en_US
dc.date.submitted1394-10-27fa_IR
dc.identifier.citationPouramini, A., Khaje Hassani, S., Nasiri, Sh.. (2018). Data Extraction using Content-Based Handles. Journal of AI and Data Mining, 6(2), 399-407. doi: 10.22044/jadm.2017.990en_US
dc.identifier.issn2322-5211
dc.identifier.issn2322-4444
dc.identifier.urihttps://dx.doi.org/10.22044/jadm.2017.990
dc.identifier.urihttp://jad.shahroodut.ac.ir/article_990.html
dc.identifier.urihttps://iranjournals.nlai.ir/handle/123456789/294898
dc.description.abstractIn this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text features such as textual delimiters, keywords, constants or text patterns, which we call handles, to construct patterns for the target data regions and data records. We offer a polynomial algorithm, in which these patterns are checked against the page elements in a mixed bottom-up and top-down traverse of the DOM-tree. The extracted data is directly mapped onto a hierarchical XML structure, which forms the output of the wrapper. The wrappers that are generated by this method are robust and independent of the HTML structure. Therefore, they can be adapted to similar websites to gather and integrate information.en_US
dc.format.extent1186
dc.format.mimetypeapplication/pdf
dc.languageEnglish
dc.language.isoen_US
dc.publisherShahrood University of Technologyen_US
dc.relation.ispartofJournal of AI and Data Miningen_US
dc.relation.isversionofhttps://dx.doi.org/10.22044/jadm.2017.990
dc.subjectWeb Data Record Extractionen_US
dc.subjectWeb Wrapper Generationen_US
dc.subjectWeb Information Extractionen_US
dc.subjectDocument and Text Processingen_US
dc.titleData Extraction using Content-Based Handlesen_US
dc.typeTexten_US
dc.typeResearch/Original/Regular Articleen_US
dc.contributor.departmentDepartment of Computer Engineering, University of Sirjan Technology, Sirjan, Iran.en_US
dc.contributor.departmentDepartment of Computer Engineering, University of Sirjan Technology, Sirjan, Iran.en_US
dc.contributor.departmentDepartment of Computer Engineering, University of Sirjan Technology, Sirjan, Iran.en_US
dc.citation.volume6
dc.citation.issue2
dc.citation.spage399
dc.citation.epage407


فایل‌های این مورد

Thumbnail

این مورد در مجموعه‌های زیر وجود دارد:

نمایش مختصر رکورد