Why Is Python’s set So Fast? Understanding the Difference from list and the Power of Hash Tables

Introduction

When working with Python, checking whether a certain element exists in a collection is a common operation. For example:

if value in my_list:
    # do something

At a glance, this code seems simple and efficient. However, when working with large datasets, it can become very slow.

Surprisingly, just by switching from a list to a set, the execution time can drop from hundreds of seconds to just a few milliseconds.

In this article, we’ll explore why set can be so much faster, digging into the underlying data structures and illustrating the difference with a fun example from Dragon Ball.


List vs Set: The Basics

Featurelistset
Data structureArrayHash table
Order maintainedYesNo
Duplicates allowedYesNo
Lookup speed (avg)O(n)O(1)

Why is list slow?

Python’s list is essentially an array. When you write something like value in my_list, Python has to scan each element one by one until it finds a match — or reaches the end.

For example:

a = [1, 3, 5, 7, 9]
print(5 in a)  # → Python checks 1 → 3 → 5 (found!)

This linear search is fine for small lists. But as the number of elements increases (e.g. 100,000+), performance drops significantly.


Why is set fast?

Python’s set, on the other hand, is implemented using a hash table. This means each element is converted internally into a hash value, which acts like a “map” to its location in memory.

Here’s a simplified view:

  • "apple" → hash function → 42
  • Store "apple" in bucket #42
  • On lookup: "apple" → 42 → direct access!

Because it skips the full scan and jumps straight to the relevant location (on average), value in my_set is blazingly fast.


Experiment: Goku vs Bulma – A Lookup Showdown

To demonstrate the difference, let’s imagine a world where Goku searches for Dragon Balls manually (like a list), while Bulma uses the high-tech Dragon Radar (like a set).

import time
from random import randint

print("Starting Dragon Ball search experiment!\n")

# --- World Dragon Balls (Data Setup) ---
world_dragonballs = [randint(1, 10_000_000) for _ in range(100_000)]
radar_signals = [randint(1, 10_000_000) for _ in range(100_000)]

# --- Goku's Manual Search (list) ---
print('Goku: "Alright! I’ll search every single one!" (list)')

start = time.time()
found_list = sum(1 for signal in radar_signals if signal in world_dragonballs)
elapsed_list = time.time() - start

print(f"[List Search] Found: {found_list}, Time: {elapsed_list:.3f}s")

# --- Bulma's Radar (set) ---
print('\nBulma: "Goku, just use the Dragon Radar!" (set)')

dragon_radar = set(world_dragonballs)

start = time.time()
found_set = sum(1 for signal in radar_signals if signal in dragon_radar)
elapsed_set = time.time() - start

print(f"[Set Search] Found: {found_set}, Time: {elapsed_set:.3f}s")

# --- Result ---
print("\nResult:")
if elapsed_list > elapsed_set:
    print(f"Bulma’s Dragon Radar was {round(elapsed_list / elapsed_set)}x faster than Goku’s manual search!")
else:
    print("Unbelievable... Goku’s raw effort actually won this time?")

Sample Output:

Starting Dragon Ball search experiment!

Goku: "Alright! I’ll search every single one!" (list)
[List Search] Found: 987, Time: 83.714s

Bulma: "Goku, just use the Dragon Radar!" (set)
[Set Search] Found: 987, Time: 0.009s

Result:
Bulma’s Dragon Radar was 9155x faster than Goku’s manual search!

Summary

In Python, set is a powerful choice when:

  • You need to check existence quickly
  • You don’t care about the order of elements
  • You want to remove duplicates

Use list when:

  • You need to maintain order
  • You want to store duplicates
  • You’re doing operations that depend on element positions

Conclusion

Python’s set is more than just a container that rejects duplicates — it’s a high-performance tool thanks to its underlying hash table.

If you ever find yourself writing slow in checks over large lists, give set a try. You might just save thousands of milliseconds — or in Goku’s case, thousands of searches.


References

コメントを送信

You May Have Missed