I wondered if I could easily make an OCR application that starts on the desktop, so I made it.
· MacOS Catalina 10.15.4 ・ Visual Studio Code ・ Python3.8.1
pip install pysimplegui
2. Tesseract OCR(OCR)
brew install tesseract
sudo pip3 install pyocr
sudo pip3 install pillow
● What are these libraries? Including that, I have greatly referred to the following sites for OCR (thanks). Try simple OCR with Tesseract + PyOCR
● This may be easier to understand for Windows users. [Python] How to transcribe an image and convert it to text (tesseract-OCR, pyocr)
Now let's see how to write the code to read the characters from the image (importing the library is included in the whole source code at the end, so I will omit it here).
def scan_file_to_str(file_path, langage):
   """read_file_to_str
Generate a string from an image file
        Args:
            file_path(str):File path to read
            langage(str): 'jpn'Or'eng'
        Returns:
Read character string
   """
   tools = pyocr.get_available_tools()
   if len(tools) == 0:
      print("No OCR tool found")
      sys.exit(1)
   tool = tools[0]
   text = tool.image_to_string(
      #Open the file sent as an argument
      Image.open(file_path),
      #Specify the language sent as an argument('jpn'Or'eng')
      lang=langage,
      builder=pyocr.builders.TextBuilder(tesseract_layout=6)
   )
   #Finally returns the string read from the image
   return text
It's really surprising that you can read a character string from an image in just 15 lines. I was impressed.
Next, I will put this on the GUI. I think tkinter is famous when it comes to Python GUI. I used to write code using tkinter at first, but when I was doing the research, I came across the following article.
[If you use Tkinter, try using PySimpleGUI](https://qiita.com/dario_okazaki/items/656de21cab5c81cabe59#exe%E5%8C%96%E3%81%AB%E3%81%A4 % E3% 81% 84% E3% 81% A6)
I was also impressed by the fact that the GUI could be implemented with simple code, so I decided to use it.
Here is the code for the GUI part.
#Set theme(There are many themes)
sg.theme('Light Grey1')
#Where and what to place(I think it will be easier to assemble if you know that it is arranged in units of lines.)
layout = [
    #The first line(Text:Put the text)
    [sg.Text('File to read(Multiple selections possible)', font=('IPA Gothic', 16))],
    #2nd line(InputText:Text box, FilesBrowse:File dialog)
    [sg.InputText(font=('IPA Gothic', 14), size=(70, 10),), sg.FilesBrowse('Select files', key='-FILES-'),],
    #3rd line(Text:text, Radio:Radio button x 2)
    [sg.Text('Language to read', font=('IPA Gothic', 16)), 
    sg.Radio('Japanese', 1, key='-jpn-', font=('IPA Gothic', 10)),
    sg.Radio('English', 1, key='-eng-', font=('IPA Gothic', 10))],
    #4th line(Button:button)
    [sg.Button('Read execution'),],
    #5th line(MLine:100 columns x 30 rows textarea)
    [sg.MLine(font=('IPA Gothic', 14), size=(100,30), key='-OUTPUT-'),]
]
#Get window(The argument of Window is "Title, Layout")
window = sg.Window('Easy OCR', layout)
#List to put the read files
files = []
#Now turn an infinite loop and wait for an event such as a button click.
while True:
    event, values = window.read()
    #None is the "✕" button in the window. When this is pressed, it breaks out of the loop and closes the window.
    if event == None:
        break
    
    # 'Read execution'When the button is pressed
    if event == 'Read execution':
        # key='-FILES-'The value of InputText specified in';'Get a list of filenames separated by
        files.extend(values['-FILES-'].split(';'))
        #Radio buttons are values['-jpn-']Then language is'jpn',Otherwise'eng'
        language = 'jpn' if values['-jpn-'] else 'eng'
        text = ''
        #Loop by the number of files
        for i in range(len(files)):
            if not i == 0:
                #There is a delimiter for each file
                text += '================================================================================================\n'
                #The scan defined earlier here_file_to_Receive the read string with str method
                text += scan_file_to_str(files[i], language)
         
                if language == 'jpn':
                #In the case of Japanese character strings, there was a lot of extra space, so I deleted it.
                text = text.replace(' ', '')
                #Leave two lines apart from the string in the next file
                text += '\n\n'
        #Read data(=text)Key='-OUTPUT-'Display on the MLine specified in
        window.FindElement('-OUTPUT-').Update(text)
        #Inform the end with a pop-up window
        sg.Popup('Has completed')
window.close()
Regarding the GUI, there are some other things that I have referred to a lot, so I will post them.
・ Learning Notes for K-TechLabo Seminar → The PDF text is very easy to understand. -Create a UI that replaces VBA with PySimpleGUI (file dialog, list, log output) → The same person as the article introduced earlier is written. I also learned from here.
import os
import sys
from PIL import Image
import PySimpleGUI as sg
import pyocr
import pyocr.builders
def scan_file_to_str(file_path, langage):
   """read_file_to_str
Generate a string from an image file
        Args:
            file_path(str):File path to read
            langage(str): 'jpn'Or'eng'
        Returns:
Read character string
   """
   tools = pyocr.get_available_tools()
   if len(tools) == 0:
      print("No OCR tool found")
      sys.exit(1)
   tool = tools[0]
   text = tool.image_to_string(
      Image.open(file_path),
      lang=langage,
      builder=pyocr.builders.TextBuilder(tesseract_layout=6)
   )
   return text
#Set theme
sg.theme('Light Grey1')
layout = [
   #The first line
   [sg.Text('File to read(Multiple selections possible)', font=('IPA Gothic', 16))],
   #2nd line
   [sg.InputText(font=('IPA Gothic', 14), size=(70, 10),), sg.FilesBrowse('Select files', key='-FILES-'),],
   #3rd line
   [sg.Text('Language to read', font=('IPA Gothic', 16)), 
   sg.Radio('Japanese', 1, key='-jpn-', font=('IPA Gothic', 10)),
   sg.Radio('English', 1, key='-eng-', font=('IPA Gothic', 10))],
   #4th line
   [sg.Button('Read execution'),],
   #5th line
   [sg.MLine(font=('IPA Gothic', 14), size=(100,30), key='-OUTPUT-'),]
]
#Get window
window = sg.Window('Easy OCR', layout)
files = []
a = 0
while True:
   event, values = window.read()
   if event == None:
      break
   if event == 'Read execution':
      files.extend(values['-FILES-'].split(';'))
      language = 'jpn' if values['-jpn-'] else 'eng'
      text = ''
      for i in range(len(files)):
         if not i == 0:
            text += '================================================================================================\n'
         text += scan_file_to_str(files[i], language)
         if language == 'jpn':
            text = text.replace(' ', '')
         text += '\n\n'
      window.FindElement('-OUTPUT-').Update(text)
      sg.Popup('Has completed')
window.close()
 
Let me read two images
[English 1st (from The White House Building)]

[2nd English]
☟
【result】

I think English is quick to read and has a high degree of accuracy.
[Japanese (from Aozora Bunko)]
☟
【result】

Japanese takes time. Still, the accuracy is at a level that seems to be usable.
Actually, I wanted to make this app an executable file that runs on the desktop of Mac or Windows, but neither pyinstaller nor py2app worked, so I decided to write an article in this state. If I can do that in the future, I will update it.
Also, if you have any suggestions, opinions, or suggestions such as "Isn't it different here?" Or "There is such a way here," please feel free to write in the comment section.
Recommended Posts