49 lines
2.3 KiB
Markdown

# Zotero Paper Fetcher (Automator)
This script automates saving an academic paper (or any URL) to Zotero using the official **Zotero Connector** Google Chrome extension, driven by Playwright in Python.
## Why this approach?
Zotero Connector is a Chrome extension that provides the official, most robust way of getting high-quality metadata and full-text PDFs (if proxy or site access allows). However, standard browser automation (Headless Chrome) blocks Chrome extensions from running.
This script elegantly solves the problem by:
1. Automatically downloading the latest Zotero Connector extension.
2. Unpacking it from its `.crx` format.
3. Launching Chromium using Playwright in `--headless=new` mode (which DOES allow extensions, unlike the old headless mode) with a persistent user data directory.
4. Auto-closing setup tabs.
5. Invoking the Zotero Connector programmatically by accessing its background service worker (`Zotero.Connector_Browser.saveWithTranslator(...)`).
## Prerequisites
1. [**uv**](https://github.com/astral-sh/uv) installed.
2. **Zotero Desktop** must be currently running on your machine (the extension communicates with the Zotero desktop app securely on port `1969`).
## Setup
First, install dependencies and set up the playwright environment using `uv`:
```bash
uv sync
uv run playwright install chromium
```
## Usage
Simply pass the URL of the paper you want to add to Zotero:
```bash
uv run zotero_automator.py "https://arxiv.org/abs/1706.03762"
```
If you want to watch the browser process visually (helpful for debugging if a site requires a captcha or login, or just to verify the extension is working), pass the `--headed` flag:
```bash
uv run zotero_automator.py "https://arxiv.org/abs/1706.03762" --headed
```
## How It Works
- **`setup_extension()`**: Locates the `EKHAGK...` identifier for the Zotero extension on the Chrome web store and downloads the raw `.crx` payload. It unpacks the contents into `./zotero_extension/`.
- **`save_to_zotero()`**: Starts an `async_playwright` session pointing to a local profile folder (`./chrome_profile/`). The extension injects its translator scripts on network idle. We find the extension's background service worker, trigger the programmatic save, and then poll `sessionProgress` until Zotero finishes downloading the PDFs and metadata.