2.3 KiB

Zotero Paper Fetcher (Automator)

This script automates saving an academic paper (or any URL) to Zotero using the official Zotero Connector Google Chrome extension, driven by Playwright in Python.

Why this approach?

Zotero Connector is a Chrome extension that provides the official, most robust way of getting high-quality metadata and full-text PDFs (if proxy or site access allows). However, standard browser automation (Headless Chrome) blocks Chrome extensions from running.

This script elegantly solves the problem by:

  1. Automatically downloading the latest Zotero Connector extension.
  2. Unpacking it from its .crx format.
  3. Launching Chromium using Playwright in --headless=new mode (which DOES allow extensions, unlike the old headless mode) with a persistent user data directory.
  4. Auto-closing setup tabs.
  5. Invoking the Zotero Connector programmatically by accessing its background service worker (Zotero.Connector_Browser.saveWithTranslator(...)).

Prerequisites

  1. uv installed.
  2. Zotero Desktop must be currently running on your machine (the extension communicates with the Zotero desktop app securely on port 1969).

Setup

First, install dependencies and set up the playwright environment using uv:

uv sync
uv run playwright install chromium

Usage

Simply pass the URL of the paper you want to add to Zotero:

uv run zotero_automator.py "https://arxiv.org/abs/1706.03762"

If you want to watch the browser process visually (helpful for debugging if a site requires a captcha or login, or just to verify the extension is working), pass the --headed flag:

uv run zotero_automator.py "https://arxiv.org/abs/1706.03762" --headed

How It Works

  • setup_extension(): Locates the EKHAGK... identifier for the Zotero extension on the Chrome web store and downloads the raw .crx payload. It unpacks the contents into ./zotero_extension/.
  • save_to_zotero(): Starts an async_playwright session pointing to a local profile folder (./chrome_profile/). The extension injects its translator scripts on network idle. We find the extension's background service worker, trigger the programmatic save, and then poll sessionProgress until Zotero finishes downloading the PDFs and metadata.