End to End Invoice Processing Application Based on Key Fields Extraction

Yükleniyor...
Küçük Resim

Tarih

2022

Yazarlar

Arslan, Halil

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IEEE

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

In this paper, an automatic invoice processing system, which is in great demand among private and public companies, was proposed. The proposed system supports all invoice file types that can be submitted by companies. Companies can easily submit invoices to the system via the web interface or email, and all invoices submitted to the system are queued and processed sequentially. If the invoice is a text file, the invoice information is extracted from the text by using template matching. If the invoice is an image, the text and table areas are detected and extracted. For table detection, we used both image processing based and YOLOv5-based deep learning method. Cell extraction was then performed from the extracted table images. As a result of these processes, all text and table cells were obtained as images and these images were converted into machine-readable text using the open-source software Tesseract OCR. Tesseract already provides trained models for English and Turkish. However, these models do not provide successful results for invoices submitted by companies in Turkish. Therefore, the new fine-tuned model trained with invoices in Turkish was used for OCR. The experimental results showed that the trained Turkish model was more accurate than the Turkish and English models provided by Tesseract. In addition, the YOLOv5-based table detection model was more accurate than the image-processing-based table detection method.

Açıklama

Anahtar Kelimeler

Invoice processing, key fields extraction, text detection, deep learning, table extraction, optical character recognition

Kaynak

IEEE ACCESS

WoS Q Değeri

Q2

Scopus Q Değeri

N/A

Cilt

10

Sayı

Künye

Arslan, H. (2022). End to End Invoice Processing Application Based on Key Fields Extraction. IEEE Access, 10, 78398-78413.