When using a custom PlaywrightBrowserPlugin with Camoufox's AsyncNewBrowser and passing persistent_context=True along with user_data_dir, the crawler crashes with an AttributeError during browser initialization.
Steps to Reproduce
- Create a custom
CamoufoxPlugin extending PlaywrightBrowserPlugin
- Override
new_browser() to launch Camoufox with a persistent context
- Pass
persistent_context=True and user_data_dir to AsyncNewBrowser
- Run the crawler
from camoufox import AsyncNewBrowser
from typing_extensions import override
from crawlee._utils.context import ensure_context
from crawlee.browsers import (
PlaywrightBrowserPlugin,
PlaywrightBrowserController,
BrowserPool,
)
from crawlee.crawlers import PlaywrightCrawler
from .routes import router
class CamoufoxPlugin(PlaywrightBrowserPlugin):
@ensure_context
@override
async def new_browser(self) -> PlaywrightBrowserController:
if not self._playwright:
raise RuntimeError("Playwright browser plugin is not initialized.")
return PlaywrightBrowserController(
browser=await AsyncNewBrowser(
self._playwright,
headless=False,
persistent_context=self._user_data_dir is not None,
user_data_dir=self._user_data_dir,
),
max_open_pages_per_browser=1, # Increase, if camoufox can handle it in your usecase.
header_generator=None, # This turns off the crawlee header_generation. Camoufox has its own.
)
async def main() -> None:
"""The crawler entry point."""
crawler = PlaywrightCrawler(
max_requests_per_crawl=10,
request_handler=router,
browser_pool=BrowserPool(
plugins=[CamoufoxPlugin(user_data_dir="./user_data_dir")]
),
ignore_http_error_status_codes=[401],
max_request_retries=0,
)
await crawler.run(
[
"https://crawlee.dev/",
]
)
The crawler should launch Camoufox with a persistent context and run successfully, reusing the specified user_data_dir.
The crawler crashes with the following traceback:
Traceback (most recent call last):
File "/home/user/crawler/.venv/lib64/python3.13/site-packages/crawlee/crawlers/_basic/_context_pipeline.py", line 100, in __call__
result = await middleware_instance.action()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/crawler/.venv/lib64/python3.13/site-packages/crawlee/crawlers/_basic/_context_pipeline.py", line 40, in action
self.output_context = await self.generator.__anext__()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/crawler/.venv/lib64/python3.13/site-packages/crawlee/crawlers/_playwright/_playwright_crawler.py", line 334, in _open_page
crawlee_page = await self._browser_pool.new_page(proxy_info=context.proxy_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/crawler/.venv/lib64/python3.13/site-packages/crawlee/_utils/context.py", line 45, in async_wrapper
return await method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/crawler/.venv/lib64/python3.13/site-packages/crawlee/browsers/_browser_pool.py", line 281, in new_page
return await self._get_new_page(page_id, plugin, proxy_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/crawler/.venv/lib64/python3.13/site-packages/crawlee/browsers/_browser_pool.py", line 313, in _get_new_page
browser_controller = await asyncio.wait_for(self._launch_new_browser(page_id, plugin), timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.13/asyncio/tasks.py", line 507, in wait_for
return await fut
^^^^^^^^^
File "/home/user/crawler/.venv/lib64/python3.13/site-packages/crawlee/browsers/_browser_pool.py", line 365, in _launch_new_browser
browser = await plugin.new_browser()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/crawler/.venv/lib64/python3.13/site-packages/crawlee/_utils/context.py", line 45, in async_wrapper
return await method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/crawler/example/main.py", line 21, in new_browser
return PlaywrightBrowserController(
browser=await AsyncNewBrowser(
...<6 lines>...
header_generator=None, # This turns off the crawlee header_generation. Camoufox has its own.
)
File "/home/user/crawler/.venv/lib64/python3.13/site-packages/crawlee/browsers/_playwright_browser_controller.py", line 98, in __init__
self._browser.contexts[0] if len(self._browser.contexts) > 0 else None
^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'BrowserContext' object has no attribute 'contexts'
The issue appears to be in PlaywrightBrowserController.init. When Camoufox is launched with persistent_context=True, AsyncNewBrowser returns a BrowserContext object rather than a Browser object. The controller then tries to access .contexts on what it assumes is a Browser instance, but since it's actually a BrowserContext, the attribute doesn't exist.
This behavior differs from launching without a persistent context, where AsyncNewBrowser returns a standard Playwright Browser object that does have a .contexts attribute.
When using a custom
PlaywrightBrowserPluginwith Camoufox'sAsyncNewBrowserand passingpersistent_context=Truealong withuser_data_dir, the crawler crashes with anAttributeErrorduring browser initialization.Steps to Reproduce
CamoufoxPluginextendingPlaywrightBrowserPluginnew_browser()to launch Camoufox with a persistent contextpersistent_context=Trueanduser_data_dirtoAsyncNewBrowserThe crawler should launch Camoufox with a persistent context and run successfully, reusing the specified user_data_dir.
The crawler crashes with the following traceback:
The issue appears to be in PlaywrightBrowserController.init. When Camoufox is launched with persistent_context=True, AsyncNewBrowser returns a BrowserContext object rather than a Browser object. The controller then tries to access .contexts on what it assumes is a Browser instance, but since it's actually a BrowserContext, the attribute doesn't exist.
This behavior differs from launching without a persistent context, where AsyncNewBrowser returns a standard Playwright Browser object that does have a .contexts attribute.