目录
Small changes, big differences
1. Faster String Concatenation: Choose “join()” or “+” Skillfully
2. Faster List Creation: Use “[]” Over “list()”
3. Faster Membership Testing: Use a Set Over a List
4. Faster Data Generation: Use Comprehensions Over For Loops
5. Faster Loops: Prioritize Local Variables
6. Faster Execution: Prioritize Built-In Modules and Libraries
7. Faster Function Calls: Leverage Cache Decorator for Easy Memoization
8. Faster Infinite Loop: Prefer “while 1” Over “while True”
9. Faster Start-Up: Import Python Modules Smartly
“Python is too slow.”
This sentiment echoes frequently in discussions about programming languages, often overshadowing Python’s numerous strengths.
The truth is, Python is fast if you can write it in a Pythonic way.
The devil is in the details. Experienced Python developers are armed with an arsenal of subtle yet powerful tricks to significantly enhance their code’s performance.
These tricks might seem minor at first glance, but they can lead to substantial improvements in efficiency. Let’s delve into 9 of these approaches, transforming the way you write and optimize Python code.
String concatenation will become a bottleneck of your Python program if a large number of strings are waiting to be handled.
Basically, there are two ways of string concatenation in Python:
join()
?function to combine a list of strings into one+
?or?+=
?symbol to add every single string into oneSo which way is faster?
Talk is cheap, let’s define 3 different functions for concatenating the same strings:
mylist = ["Yang", "Zhou", "is", "writing"] # Using '+' def concat_plus(): result = "" for word in mylist: result += word + " " return result # Using 'join()' def concat_join(): return " ".join(mylist) # Directly concatenation without the list def concat_directly(): return "Yang" + "Zhou" + "is" + "writing"
Based on your first impression, which function do you think is the fastest, and which is the slowest?
The real result may surprise you:
import timeit print(timeit.timeit(concat_plus, number=10000)) # 0.002738415962085128 print(timeit.timeit(concat_join, number=10000)) # 0.0008482920238748193 print(timeit.timeit(concat_directly, number=10000)) # 0.00021425005979835987
As shown above, for concatenating a list of strings, the?join()
?method is faster than adding the strings one by one in a for loop.
The reason is straightforward. On one hand, strings are immutable data in Python, each?+=
?operation results in the creation of a new string and the copying of the old string, which is computationally expensive.
On the other hand, the?.join()
?method is specifically optimized for joining a sequence of strings. It precalculates the size of the resulting string and then builds it in one go. So it avoids the overhead associated with the?+=
?operation in a loop, hence it's faster.
However, the fastest function in our testing is to concatenate string literals directly. Its high speed is due to:
.join()
?method.In a word, if you need to concatenate a list of strings, choose?join()
?over?+=
. If you would like to concatenate strings directly, just use?+
?to do it.
Creating a list is not a big deal. Two common ways are:
list()
?function[]
?directlyLet’s use a simple code snippet to test their performance:
import timeit print(timeit.timeit('[]', number=10 ** 7)) # 0.1368238340364769 print(timeit.timeit(list, number=10 ** 7)) # 0.2958830420393497
As the result shows, executing the?list()
?function is slower than using the?[]
?directly.
It is because the?[]
?is a literal syntax, while?list()
?is a constructor call. Calling a function needs extra time without a doubt.
From the same logic, when creating a dictionary, we should also harness?{}
?over?dict()
.
The performance of a membership checking operation heavily depends on the underlying data structures:
import timeit large_dataset = range(100000) search_element = 2077 large_list = list(large_dataset) large_set = set(large_dataset) def list_membership_test(): return search_element in large_list def set_membership_test(): return search_element in large_set print(timeit.timeit(list_membership_test, number=1000)) # 0.01112208398990333 print(timeit.timeit(set_membership_test, number=1000)) # 3.27499583363533e-05
As the above code demonstrates, membership testing in a set is much faster than in a list.
Why is it?
element in list
) is done by iterating over each element until the desired element is found or the end of the list is reached. Therefore, this operation has a?time complexity?of O(n).element in set
), Python uses a hashing mechanism, whose time complexity, on average, is O(1).The point here is to carefully consider the underlying data structure when writing programs. Harnessing the right data structure can speed up our code significantly.
There are?four types of comprehensions?in Python: list, dictionary, set, and generator. They not only provide a more concise syntax for creating relative data structures, but also have better performance than using for loops. Because they are optimized in Python's C implementation.
import timeit def generate_squares_for_loop(): squares = [] for i in range(1000): squares.append(i * i) return squares def generate_squares_comprehension(): return [i * i for i in range(1000)] print(timeit.timeit(generate_squares_for_loop, number=10000)) # 0.2797503340989351 print(timeit.timeit(generate_squares_comprehension, number=10000)) # 0.2364629579242319
The above code is a simple speed comparison between a list comprehension and a for loop. As the result shows, the?list comprehension?is faster.
In Python, accessing a local variable is faster than accessing a global one or an attribute of an object.
Here is an instance to prove this:
import timeit class Example: def __init__(self): self.value = 0 obj = Example() def test_dot_notation(): for _ in range(1000): obj.value += 1 def test_local_variable(): value = obj.value for _ in range(1000): value += 1 obj.value = value print(timeit.timeit(test_dot_notation, number=1000)) # 0.036605041939765215 print(timeit.timeit(test_local_variable, number=1000)) # 0.024470250005833805
This is how Python works. Intuitively, when a function is compiled, the local variables inside it are known, but other outside variables need time to be retrieved.
This is a minor deal but we can leverage it to optimize our code when handling a large size of data.
When engineers say Python, it means CPython by default. Because CPython is the default and most widely used implementation of the Python language.
Given that most of its built-in modules and libraries are written in C, a faster and lower-level language, we should utilize the built-in arsenal and avoid reinventing the wheels.
import timeit import random from collections import Counter def count_frequency_custom(lst): frequency = {} for item in lst: if item in frequency: frequency[item] += 1 else: frequency[item] = 1 return frequency def count_frequency_builtin(lst): return Counter(lst) large_list = [random.randint(0, 100) for _ in range(1000)] print(timeit.timeit(lambda: count_frequency_custom(large_list), number=100)) # 0.005160166998393834 print(timeit.timeit(lambda: count_frequency_builtin(large_list), number=100)) # 0.002444291952997446
The above program compares two approaches to count element frequency in a list. As we can see, leveraging the built-in?Counter
?from the?collections
?module is faster, neater, and better than writing a for loop by ourselves.
Caching is a commonly used technique to avoid repeated computations and speed up programs.
Fortunately, we don’t need to write our own caching processing code in most cases, since Python provides an out-of-box?decorator?for this purpose —?@functools.cache
.
For instance, the following code will execute two Fibonacci number generation functions, one has a caching decorator but the other doesn’t:
import timeit import functools def fibonacci(n): if n in (0, 1): return n return fibonacci(n - 1) + fibonacci(n - 2) @functools.cache def fibonacci_cached(n): if n in (0, 1): return n return fibonacci_cached(n - 1) + fibonacci_cached(n - 2) # Test the execution time of each function print(timeit.timeit(lambda: fibonacci(30), number=1)) # 0.09499712497927248 print(timeit.timeit(lambda: fibonacci_cached(30), number=1)) # 6.458023563027382e-06
The result proves how the?functools.cache
?decorator makes our code faster.
The basic?fibonacci
?function is inefficient because it recomputes the same Fibonacci numbers multiple times during the process of getting the result of?fibonacci(30)
.
The cached version is significantly faster since it caches the results of previous calculations. So, it only computes every Fibonacci number once, and subsequent calls with the same arguments are retrieved from the cache.
Merely adding a built-in decorator can make such a big improvement, this is what Pythonic means. 😎
To make an infinite while loop, we can use?while True
?or?while 1
.
The difference in their performance is usually negligible. But it’s fun to know that?while 1
?is slightly faster.
It stems from the fact that?1
?is literal, but?True
?is a global name that needs to be looked up in the global scope of Python, so a minuscule overhead is needed.
Let’s also check the real comparison of these two ways in a code snippet:
import timeit def loop_with_true(): i = 0 while True: if i >= 1000: break i += 1 def loop_with_one(): i = 0 while 1: if i >= 1000: break i += 1 print(timeit.timeit(loop_with_true, number=10000)) # 0.1733035419601947 print(timeit.timeit(loop_with_one, number=10000)) # 0.16412191605195403
As we can see, the?while 1
?is indeed slightly faster.
However, modern Python interpreters (like CPython) are highly optimized, and such differences are typically insignificant. So we don’t need to worry about this negligible difference. Not to mention that?while True
?is more readable than?while 1
.
It seems natural to import all modules at the top of a Python script.
Actually, we don’t have to do that.
Furthermore, if a module is too large, importing it as needed is a better idea.
def my_function(): import heavy_module # rest of the function
As the code above,?heavy_module
?is imported inside a function. This is an idea of “lazy loading”, where the import is deferred until?my_function
?is called.
The benefit of this approach is that if?my_function
?is never called during the execution of our script, then?heavy_module
?is never loaded, saving resources and reducing the startup time of our script.