Multithreading in Python

November 30, 2018

Often we build applications which might require several tasks to run simultaneously within the same application. This is where the concept of multithreading comes into play. This post provides a comprehensive explanation of using the Multithreading(Threading) module in Python.

Introduction

Multithreading a.k.a Threading in python is a concept by which mutliple threads are launched in the same process to achieve parallelism and multitasking within the same application. Executing different threads are equivalent to executing different programs or different functions within the same process.

Table of Contents

What is Multithreading(Threading) ?

Multithreading can be simply understood as executing multiple threads simultaneously within the same process. These threads share the same memory space as the process.
For example, a Python GUI such as Pycharm or a Jupyter Notebook keeps autosaving your code as and when you make changes, which clearly illustrates multiple tasks being performed within the same process.

image.png

Threading is generally preferred for lightweight processes since multiple tasks are run within the same process and uses the memory allocated for that process (i.e.) uses the memory space of that process.

Multithreading shouldn’t be confused with multiprocessing because multiprocessing is where two or more processes run for a single application without having a shared state within them as they run as different independent processes.

When to use Multithreading ?

  • Threads are most suitable for lightweight tasks.
  • When you have objects being shared across the same application.
  • When you want create responsive web UIs.
  • When you have multiple tasks which are more I/O bound.
  • Creating a lag free application with a backend database connection.

Threading Example

To create a thread, you simply do the following,

Below example illustrates how a function can be called inside a thread,

import threading

def new_function(a, b, kw2=10, kw1=None)
    print("Hello")

# Create a thread

new_thread = threading.Thread(target=function_name, args=(arg1, arg2), kwargs={'kw1': 1, 'kw2': '2'})

To start the above thread,

new_thread.start()

Let’s take a simple example of an application where you compute the negative and positive elements of a list in 2 parallel threads.

Note: This is just a just a simple example for the purpose of demonstrating threading.

Let’s write 2 functions,

  • compute_negative() - to obtain the negative elements from the list
  • compute_positive() - to obtain the positive elements from the list
  • import threading
    
    def compute_negative(arr):
        new_arr = []
        for i in arr:
            if i<0:
                new_arr.append(i)
        print(f"Negative Elements: {new_arr}")
    
    def compute_positive(arr):
        new_arr = []
        for i in arr:
            if i>0:
                new_arr.append(i)  
        print(f"Positive Elements: {new_arr}")
    
    if __name__ == '__main__':
        
        a = [1, 2, -4, 5, -7, 6, 10, -50, 100, -87, 20]
        print(f"Input List: {a}")
        
        t1 = threading.Thread(target=compute_negative, args=(a,)) # Create Thread 1
        t2 = threading.Thread(target=compute_positive, args=(a,)) # Create Thread 2
    
        t1.start()  # Thread 1 starts here
        t2.start()  # Thread 2 starts here
    
    Input List: [1, 2, -4, 5, -7, 6, 10, -50, 100, -87, 20]
    Negative Elements: [-4, -7, -50, -87]
    Positive Elements: [1, 2, 5, 6, 10, 100, 20]
    

    Multithreading Practical Use Case

    Let’s visualize a simple practical example where multithreading comes into play.

    Say, for example you build a frontend GUI where you show an exported version of the data from a database to the user. So, as and when the data is refreshed in the database, the same must be performed in the UI as well. If this is performed as a sequential task, the UI would freeze for sometime when the backend data is being refreshed.

    Approach with Multithreading:
    For our above use case, we can have 2 threads running at the same time, where one of them refreshes the data in the background and the other displaying the available data at the moment. This would not cause any hindrance to the user while the backend data is being refreshed in a separate thread.

    image.png

    What is join in Threading ?

    Before we define what join in threading is, let’s analyse our problem statement.

    There is an application running with 2 threads in parallel. Thread 1 completes 30 seconds faster than thread 2.

    So, upon completion of the first thread, would the program exit or wait for 2nd thread to complete ?

    Well, if you guessed the program would terminate, then you are right. This is where the magic of join method of a thread comes into play.

    Join basically makes the program to wait for the thread to finish. So, an additional join after starting threads would make the application wait to successfully complete all the threads in it.

    Let’s make more sense of join with the below 2 examples,

    Example 1 : Threading without Join

    Let’s create a program to display the time thrice with a 0.5 second delay and have some code after the threads.

    Let’s look at the execution.

    import threading
    import time
    import sys
    
    
    def print_time(val):
        """
        Display the time 3 times 
        with a 0.5 second delay
        """
        for i in range(3):
            time.sleep(0.5)
            print("Process:{0} Time is {1}".format(val, time.time()))
    
    
    if __name__ == '__main__':
        
        t1 = threading.Thread(target=print_time, args=(1,))
        t2 = threading.Thread(target=print_time, args=(2,))
    
        t1.start()
        t2.start()
        
        print("Threading Complete. We are at the end of the program.")
    
    Threading Complete. We are at the end of the program.
    Process:1 Time is 1543436647.7216341
    Process:2 Time is 1543436647.722194
    Process:1 Time is 1543436648.2265742
    Process:2 Time is 1543436648.227299
    Process:1 Time is 1543436648.729373
    Process:2 Time is 1543436648.731555
    


    It is evident from the above example that even before the 2 threads complete their execution, the line of code present after starting the 2 threads is being executed.

    So, how do you wait for the threads to complete before you continue with the execution of the rest of the program ?

    This is where join comes in as the perfect solution. Now let’s analyze the same example with join.

    Here the program waits for the threads to complete before arriving to the end.

    Example 2 : Threading with Join

    import threading
    import time
    import sys
    
    
    def print_time(val):
        """
        Display the time 3 times 
        with a 0.5 second dh elay
        """
        for i in range(3):
            time.sleep(0.5)
            print("Process:{0} Time is {1}".format(val, time.time()))
    
    
    if __name__ == '__main__':
        
        t1 = threading.Thread(target=print_time, args=(1,))
        t2 = threading.Thread(target=print_time, args=(2,))
    
        t1.start()
        t2.start()
        
        t1.join()
        t2.join()
        
        print("Threading Complete. We are at the end of the program.")
    
    Process:1 Time is 1543436975.869845
    Process:2 Time is 1543436975.8704278
    Process:2 Time is 1543436976.37433
    Process:1 Time is 1543436976.37479
    Process:2 Time is 1543436976.87863
    Process:1 Time is 1543436976.878934
    Threading Complete. We are at the end of the program.
    

    Shared Objects - Thread Lock

    Now, what if you wish to share data among the running threads. This is the most useful part of the threading module showing how data can be shared across 2 or more threads.

    What happens if two or more threads try to make changes to a shared object at the same time ?
    This would result in unexpected and asynchronous results. Thread locks help to combat this issue.

    The thread lock is designed in such a way that at a single time only one thread can make changes to a shared object.

    This locking mechanism ensures that a clean synchronization is established between the threads thereby avoiding unexpected results due to this simultaneous execution.

    Practical Use Case: For example, sharing objects would be very useful in a case where there is a frontend UI to display a table’s data and this table’s data is manipulated from 2 data sources being refreshed periodically for every 5 minutes. So, if there’s a delay in any of these 2 data refreshes, and if both threads try to manipulate the same object at the same time, it might lead to inconsistent results.

    Example 1 : Threading without Lock

    Let’s create a scenario wherein a deadlock situation is created, thereby yielding inconsistent results.

    Our program would consist of 2 functions,

  • refresh_val() - increment val by 100000 times
  • main() - create 2 threads which call refresh_val simultaneously
  • We will call this main function 10 times in our code

    import threading
    
    val = 0 # global variable val
    
    def refresh_val():
        """
        Increment val 10000 times
        """
        global lock, val
        counter = 100000
        while counter > 0:
            val += 1
            counter -= 1
    
    
    def main():
        global val
        val = 0
        
        # creating threads
        t1 = threading.Thread(target=refresh_val)
        t2 = threading.Thread(target=refresh_val)
    
        # start threads
        t1.start()
        t2.start()
    
        # wait until threads complete
        t1.join()
        t2.join()
    
    
    if __name__ == "__main__":
        for i in range(1,11):
            main()
            print("Step {0}: val = {1}".format(i, val))
    
    Step 1: val = 200000
    Step 2: val = 191360
    Step 3: val = 200000
    Step 4: val = 200000
    Step 5: val = 200000
    Step 6: val = 199331
    Step 7: val = 200000
    Step 8: val = 200000
    Step 9: val = 157380
    Step 10: val = 200000
    


    Let’s perform the above same operation, with the locking mechanism present in threading.

    Here is where the threading module introduces 2 methods,

  • Acquire - Block until the lock is released
  • Release - Release
  • When a lock is acquired by a thread for a shared object, no other thread can make changes to this object at the same time. After a lock is acquired, if another thread attempts to access an object, it would have to wait until the lock is released.

    Methods to create a lock

    Method 1 :

    import threading
    
    lock = threading.Lock() # create a lock
    try:
        lock.acquire() # Block the lock
        # code goes here
    finally:
        lock.release() # Release the lock
    


    Method 2 :

    import threading
    
    lock = threading.Lock() # create a lock
    with lock:
        # code goes here
    


    Example 2 : Threading with Lock

    Let’s perform the above same operation, using lock

    As you can see below, there the value is being incremented as expected without any inconsistency.

    import threading
    
    val = 0 # global variable val
    
    lock = threading.Lock() # create a lock
    
    def refresh_val():
        """
        Increment val 10000 times
        """
        global lock, val
        counter = 100000
        while counter > 0:
            lock.acquire() # Block the lock
            val += 1
            lock.release() # Release the lock
            counter -= 1
    
    
    def main():
        global val
        val = 0
        
        # creating threads
        t1 = threading.Thread(target=refresh_val)
        t2 = threading.Thread(target=refresh_val)
    
        # start threads
        t1.start()
        t2.start()
    
        # wait until threads complete
        t1.join()
        t2.join()
    
    
    if __name__ == "__main__":
        for i in range(1,11):
            main()
            print("Step {0}: val = {1}".format(i, val))
    
    Step 1: val = 200000
    Step 2: val = 200000
    Step 3: val = 200000
    Step 4: val = 200000
    Step 5: val = 200000
    Step 6: val = 200000
    Step 7: val = 200000
    Step 8: val = 200000
    Step 9: val = 200000
    Step 10: val = 200000
    

    When to not use Multithreading ?

  • Not suitable for CPU intensive tasks.
  • Having multiple heavyweight threads can slow down your main process.
  • Individual threads are not killable.
  • Creating too many threads for a single application might make your code longer and process slower.
  • Conclusion

    We can summarise by our learning that Multithreading can be used in cases where you would like to perform multiple tasks within the same application accessing some shared objects.

    To get rid of inconsistency during deadlock situations, the threading lock mechanism can be used.

    Hope, by the end of this post, you can leverage multithreading based on your requirements.

    Comments and feedback are welcome. Cheers!

    comments powered by Disqus