OCR stands for Optical Character Recognition. It refers to the conversion of images to text. Let's say you have an image of a document that you want as text for a project. You can either type it by yourself or let ocr do it for you. The conversion may NOT be as accurate, but in most cases it is pretty decent.
There is a python tool called pytesseract, that allows you to convert images to text. We will be creating a Web Application using Flask (a Python Web Framework) that will allow users to upload an image and download the converted text. So let's begin!
Getting Stuff Ready
=>We need to download a few packages before we write the actual code. So open your terminal and type:
sudo apt-get install tesseract-ocr pip install flask pytesseract pillow
=>Create a folder named ocrPython. Inside it, create a file named app.py and two folders: media and templates.
The Backend
=>First of all, we need to import the required libraries. Open app.py and type:
from flask import Flask
from flask import request,redirect,render_template,url_for from werkzeug.utils import secure_filename import os import pytesseract from PIL import ImageFilter from PIL import Image import sys
=>We need to change the encoding to utf-8, otherwise some characters won't be understood by the system and it will throw an error.
reload(sys) sys.setdefaultencoding("utf-8")
=>Now, we create a Flask app and register the url endpoints to it. We'll keep the application simple with just two endpoints, one to upload the image and another to show the converted text.
Flask boilerplate:
app = Flask(__name__) app.config['UPLOAD_FOLDER'] = './media' #Directory where the uploaded images will be stored
Index Endpoint: To upload an image
@app.route('/') def index(): return render_template('index.html')
/submitImage/ Endpoint: To get the uploaded image and convert it to text
@app.route('/submitImage/',methods=['POST',]) def submitImage(): #Get the uploaded image and save it inside media/ folder image = request.files['ocrImage'] filename = secure_filename(image.filename) image.save(os.path.join(app.config['UPLOAD_FOLDER'], filename)) #Open the saved image and convert it to text img = Image.open(os.path.join(app.config['UPLOAD_FOLDER'], filename)) text = pytesseract.image_to_string(img) #Convert image to string #Save the text inside a .txt filed f = open(os.path.join(app.config['UPLOAD_FOLDER'], filename)+'.txt','w') f.write(text) f.close() return render_template('textFile.html',text=text,filename=f)
Note that this is NOT the safest and best way to upload an image. You should always use validations, such as checking the file extension or mime type before saving the image. Also, make the filenames unique by adding some unique value (such as current time) to it. For now, it gets our work done, so it's okay. But do NOT do it in an actual project.
The statement:
text = pytesseract.image_to_string(img) #Convert image to string
is where the actual magic happends. Note that this may take some time, depending on your system and the nature of uploaded image, so be patient after you upload an image!
More boilerplate:
if __name__ == '__main__': app.run('0.0.0.0',8000) #Run the application on localhost, port 8000
The Frontend
=>Create a file named index.html inside the templates folder that we created earlier. We'll create a simple html page using bootstrap.
<html> <head> <meta charset="utf-8"> <title>Index | OCR Application</title> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous"> </head> <body> <center><h2>OCR Application using Flask</h2></center> <div class='container'> <div class="row"> <div class='col-xs-12'> <form class="form" action="/submitImage/" method="POST" enctype="multipart/form-data"> <div class='row'> <div class='col-xs-4 col-xs-offset-2'> <label for='ocrImage'>Choose Image: </label> <input class='form-control' type="file" name="ocrImage" id="ocrImage" src="" alt="" onchange='showImage(event)'> </div> <div class='col-xs-4'> <img src='' id='uploadedImage' width='200' alt='no image selected'> </div> </div> <div class='row'> <div class='col-xs-12 col-xs-offset-2'> <input class='btn btn-success' type="submit" value="Convert" /> </div> </div> </form> </div> </div> </div> </body> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script> <script type="text/javascript" charset="utf-8"> function showImage(event){ uploadedImage = document.getElementById('ocrImage').files[0]; console.log(uploadedImage); document.getElementById('uploadedImage').src = URL.createObjectURL(event.target.files[0]); } </script> </html>
We've created an html page consisting of a heading, a file input, an image box and a button. Whenever user uploads the image, (s)he can preview it before it actually gets uploaded.
The form will be submitted to the '/submitImage'. Note that we've used enctype="multipart/form-data", to allow file uploads via this form.
=>Now, create another file named textFile.html inside the templates folder. Add the following code to it:
<html> <head> <meta charset="utf-8"> <title>Result | OCR Application</title> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous"> </head> <body> <center><h2>OCR Application using Flask</h2></center> <div class='container-fluid'> <div class="row"> <div class='col-xs-12'> <textarea id='textValue' readonly name="result" rows="8" cols='250' value=''>{{ text }}</textarea> </div> </div> </div> <form method="get" accept-charset="utf-8"> <button type="button" class='btn btn-info' onclick='downloadText()'>Download</button> </form> </body> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script> <script type="text/javascript" charset="utf-8"> function downloadText(){ var text = document.getElementById('textValue').value; var val = "data:x-application/text," + escape(text); window.open(val); } </script> </html>
This page is pretty straightforward. It shows a textarea that contains the extracted text.
On clicking the Download button, the contents of this textarea are downloaded to the user's system.
That's it! The entire source code is available on my github repo.
Comment below for further queries or suggestions!
No comments:
Post a Comment