Why Is Python’s set So Fast? Understanding the Difference from list and the Power of Hash Tables
Contents
Introduction
When working with Python, checking whether a certain element exists in a collection is a common operation. For example:
if value in my_list:
# do something
At a glance, this code seems simple and efficient. However, when working with large datasets, it can become very slow.
Surprisingly, just by switching from a list
to a set
, the execution time can drop from hundreds of seconds to just a few milliseconds.
In this article, we’ll explore why set
can be so much faster, digging into the underlying data structures and illustrating the difference with a fun example from Dragon Ball.
List vs Set: The Basics
Feature | list | set |
---|---|---|
Data structure | Array | Hash table |
Order maintained | Yes | No |
Duplicates allowed | Yes | No |
Lookup speed (avg) | O(n) | O(1) |
Why is list
slow?
Python’s list
is essentially an array. When you write something like value in my_list
, Python has to scan each element one by one until it finds a match — or reaches the end.
For example:
a = [1, 3, 5, 7, 9]
print(5 in a) # → Python checks 1 → 3 → 5 (found!)
This linear search is fine for small lists. But as the number of elements increases (e.g. 100,000+), performance drops significantly.
Why is set
fast?
Python’s set
, on the other hand, is implemented using a hash table. This means each element is converted internally into a hash value, which acts like a “map” to its location in memory.
Here’s a simplified view:
"apple"
→ hash function → 42- Store
"apple"
in bucket #42 - On lookup:
"apple"
→ 42 → direct access!
Because it skips the full scan and jumps straight to the relevant location (on average), value in my_set
is blazingly fast.
Experiment: Goku vs Bulma – A Lookup Showdown
To demonstrate the difference, let’s imagine a world where Goku searches for Dragon Balls manually (like a list
), while Bulma uses the high-tech Dragon Radar (like a set
).
import time
from random import randint
print("Starting Dragon Ball search experiment!\n")
# --- World Dragon Balls (Data Setup) ---
world_dragonballs = [randint(1, 10_000_000) for _ in range(100_000)]
radar_signals = [randint(1, 10_000_000) for _ in range(100_000)]
# --- Goku's Manual Search (list) ---
print('Goku: "Alright! I’ll search every single one!" (list)')
start = time.time()
found_list = sum(1 for signal in radar_signals if signal in world_dragonballs)
elapsed_list = time.time() - start
print(f"[List Search] Found: {found_list}, Time: {elapsed_list:.3f}s")
# --- Bulma's Radar (set) ---
print('\nBulma: "Goku, just use the Dragon Radar!" (set)')
dragon_radar = set(world_dragonballs)
start = time.time()
found_set = sum(1 for signal in radar_signals if signal in dragon_radar)
elapsed_set = time.time() - start
print(f"[Set Search] Found: {found_set}, Time: {elapsed_set:.3f}s")
# --- Result ---
print("\nResult:")
if elapsed_list > elapsed_set:
print(f"Bulma’s Dragon Radar was {round(elapsed_list / elapsed_set)}x faster than Goku’s manual search!")
else:
print("Unbelievable... Goku’s raw effort actually won this time?")
Sample Output:
Starting Dragon Ball search experiment!
Goku: "Alright! I’ll search every single one!" (list)
[List Search] Found: 987, Time: 83.714s
Bulma: "Goku, just use the Dragon Radar!" (set)
[Set Search] Found: 987, Time: 0.009s
Result:
Bulma’s Dragon Radar was 9155x faster than Goku’s manual search!
Summary
In Python, set
is a powerful choice when:
- You need to check existence quickly
- You don’t care about the order of elements
- You want to remove duplicates
Use list
when:
- You need to maintain order
- You want to store duplicates
- You’re doing operations that depend on element positions
Conclusion
Python’s set
is more than just a container that rejects duplicates — it’s a high-performance tool thanks to its underlying hash table.
If you ever find yourself writing slow in
checks over large lists, give set
a try. You might just save thousands of milliseconds — or in Goku’s case, thousands of searches.
コメントを送信