Skip to content

perf: Defer heavy imports in ServiceLocator to reduce import time#1836

Open
Lidang-Jiang wants to merge 1 commit intoapify:masterfrom
Lidang-Jiang:perf/reduce-import-time
Open

perf: Defer heavy imports in ServiceLocator to reduce import time#1836
Lidang-Jiang wants to merge 1 commit intoapify:masterfrom
Lidang-Jiang:perf/reduce-import-time

Conversation

@Lidang-Jiang
Copy link
Copy Markdown

Summary

Ref #1253 — Importing the crawler class takes over 2 seconds.

This is a partial optimization that addresses the ServiceLocator module's contribution to slow import times. The _service_locator.py module was eagerly importing heavy dependencies (Configuration, EventManager, LocalEventManager, FileSystemStorageClient, StorageClient, ServiceConflictError) at module level, even though these are only needed when services are first accessed.

Changes:

  • Move Configuration, EventManager, LocalEventManager, FileSystemStorageClient, StorageClient, and ServiceConflictError imports from module level into the methods where they are used
  • Add TYPE_CHECKING guards for type annotations that reference these classes
  • No behavioral changes — all services are still lazily instantiated as before

Impact:

  • import crawlee is ~73% faster (0.40s → 0.11s)
  • Full from crawlee.crawlers import ParselCrawler import is not significantly affected because _basic_crawler.py eagerly imports many of the same modules — further optimization of that module could be a follow-up

The ServiceLocator docstring already states "All services are initialized to its default value lazily" — this change makes the imports consistent with that design intent.

Before
$ python -c "import time; start = time.perf_counter(); import crawlee; print(f'{time.perf_counter() - start:.4f}s')"
0.4220s
0.3898s
0.3837s
After
$ python -c "import time; start = time.perf_counter(); import crawlee; print(f'{time.perf_counter() - start:.4f}s')"
0.1188s
0.1069s
0.1123s
Test results (113 tests pass)
$ python -m pytest tests/unit/ -v --timeout=120 -x -k "service_locator or storage_client or configuration"

tests/unit/test_service_locator.py::test_default_configuration PASSED
tests/unit/test_service_locator.py::test_custom_configuration PASSED
tests/unit/test_service_locator.py::test_configuration_overwrite_not_possible PASSED
tests/unit/test_service_locator.py::test_configuration_conflict PASSED
tests/unit/test_service_locator.py::test_default_event_manager PASSED
tests/unit/test_service_locator.py::test_custom_event_manager PASSED
tests/unit/test_service_locator.py::test_event_manager_overwrite_not_possible PASSED
tests/unit/test_service_locator.py::test_event_manager_conflict PASSED
tests/unit/test_service_locator.py::test_default_storage_client PASSED
tests/unit/test_service_locator.py::test_custom_storage_client PASSED
tests/unit/test_service_locator.py::test_storage_client_overwrite_not_possible PASSED
tests/unit/test_service_locator.py::test_storage_client_conflict PASSED
... (+ 101 storage_client and configuration tests)

==================== 113 passed, 1574 deselected in 14.90s =====================

Test plan

  • All 12 test_service_locator.py tests pass
  • All 113 service_locator/storage_client/configuration tests pass
  • ruff check passes
  • import crawlee benchmark shows ~73% improvement
  • CI passes

Move top-level imports of Configuration, EventManager, LocalEventManager,
FileSystemStorageClient, StorageClient, and ServiceConflictError into the
methods where they are actually used. These modules are now only imported
on first access rather than at module load time.

This reduces `import crawlee` time by ~73% (0.40s -> 0.11s) by deferring
the import of pyee, pydantic-based Configuration, and storage client
initialization until they are actually needed.

The ServiceLocator docstring already states "All services are initialized
to its default value lazily" - this change makes the imports match that
design intent.

Ref apify#1253
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants