Parallel Bulk Image Resizing – Python

Posted on June 28, 2016

0


Usually multi-threaded programming can be typically. One typically will need a queue to maintain the inputs. From this thread-safe queue the threads pick an input, act on it and save the result. This can usually take up most of the afternoon depending on the job.

This recipe is adapted from [link]. It introduces a way to do batch processing in parallel on python. This is a quick way for people dealing with lot of images.

With Python multiprocessing package one can quickly parallelize tasks. Usually in just minutes. In this demo, I have several images (~a few thousands) which need to be resized. I have written a python function to resize 1 image.


# This function loads a file, resize it and write in the output folder
def create_icon( inputFileName ):
  im = cv2.imread( inputFileName )
  small_im = cv2.resize( im, (400,400) )
  cv2.imwrite( 'out/'+inputFileName, small_im )

To distribute this function over several tasks, I need to have a list (array) of all the files to process and then map() this list onto the function using thread pool. annotations.txt is a text file which contains all the file URLs.


imagesList = [line.rstrip('\n') for line in open('annotations.txt')]
# images list contains all the image urls

Final code looks like :


import cv2
from multiprocessing import pool
from multiprocessing.dummy import Pool as ThreadPool

# This function loads a file, resize it and write in the output folder
def create_icon( inputFileName ):
im = cv2.imread( inputFileName )
small_im = cv2.resize( im, (100,100) )
cv2.imwrite( 'out/'+inputFileName, small_im )

fileList = 'images/annotations.txt'

# Load the list of images in the array lines
imagesList = [line.rstrip('\n') for line in open(fileList)]

# Create thread pool
pool = ThreadPool(4)
pool.map( create_icon, imagesList )

I could resize about 400 images (320×240) in less than a second with 4 threads.

Advertisements