A deep learning-based solution for digitization of invoice images with automatic invoice generation and labelling

dc.authoridgormez, yasin/0000-0001-8276-2030
dc.authoridARSLAN, Halil/0000-0003-3286-5159
dc.contributor.authorArslan, Halil
dc.contributor.authorIsik, Yunus Emre
dc.contributor.authorGormez, Yasin
dc.date.accessioned2024-10-26T18:02:37Z
dc.date.available2024-10-26T18:02:37Z
dc.date.issued2024
dc.departmentSivas Cumhuriyet Üniversitesi
dc.description.abstractNowadays, the level of invoice traffic between companies has reached enormous levels. Invoices are crucial financial documents for companies, and they need to extract this information from these documents to access and control them quickly when necessary. While electronic invoices can be easily transferred to the company's ERP system with the help of integrators, information from printed invoices must be entered into the ERP system. Information entry is generally performed manually by company employees, so the probability of error is high. The automatic recognition of information in printed invoices will reduce the possibility of error. It will also save time and money by reducing workforce requirements. This study proposes a deep learning-based solution for detecting fields in image invoices that are in high demand among businesses. The system offers an end-to-end solution, which includes a novel method for generating synthetic invoices and automatic labeling. Three invoice templates were used to evaluate the usability of the system and an adaptive fine-tuning-based solution is proposed for newly coming invoice templates. Furthermore, 6 different object detection models were compared to find the most suitable one for our problem. The system was also tested with 1022 real invoice images that were manually labeled to test real-world usage. The results indicated that the fine-tuned model achieved an accuracy that was 8.4% higher than the baseline models. In tests performed on CPU, TOOD and Cascade-RCNN models were the most successful algorithms, while YOLOv5 was the fastest running algorithm. Depending on the priority of the needs, both algorithms can be preferred for real-time usage in the detection of invoice fields. The synthetic invoice generation code is available at https://github.com/SCU-CENG/Invoice-Generation.
dc.description.sponsorshipThe dataset used for this publication was collected by Detay Teknoloji R amp;amp;D Center. We thank them for their collaborative work. The numerical calculations reported in this paper were partially performed at TUBITAK ULAKBIM, High Performance and Grid
dc.description.sponsorshipThe dataset used for this publication was collected by Detay Teknoloji R &D Center. We thank them for their collaborative work. The numerical calculations reported in this paper were partially performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).
dc.identifier.doi10.1007/s10032-023-00449-4
dc.identifier.endpage109
dc.identifier.issn1433-2833
dc.identifier.issn1433-2825
dc.identifier.issue1
dc.identifier.scopus2-s2.0-85168962762
dc.identifier.scopusqualityQ2
dc.identifier.startpage97
dc.identifier.urihttps://doi.org/10.1007/s10032-023-00449-4
dc.identifier.urihttps://hdl.handle.net/20.500.12418/28261
dc.identifier.volume27
dc.identifier.wosWOS:001059915000001
dc.identifier.wosqualityQ3
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer Heidelberg
dc.relation.ispartofInternational Journal on Document Analysis and Recognition
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectInvoice processing
dc.subjectDigitalization
dc.subjectObject detection
dc.subjectAutomatic generation
dc.subjectAutomatic labelling
dc.titleA deep learning-based solution for digitization of invoice images with automatic invoice generation and labelling
dc.typeArticle

Dosyalar