49 lines
2.3 KiB
Markdown
49 lines
2.3 KiB
Markdown
# Zotero Paper Fetcher (Automator)
|
|
|
|
This script automates saving an academic paper (or any URL) to Zotero using the official **Zotero Connector** Google Chrome extension, driven by Playwright in Python.
|
|
|
|
## Why this approach?
|
|
|
|
Zotero Connector is a Chrome extension that provides the official, most robust way of getting high-quality metadata and full-text PDFs (if proxy or site access allows). However, standard browser automation (Headless Chrome) blocks Chrome extensions from running.
|
|
|
|
This script elegantly solves the problem by:
|
|
1. Automatically downloading the latest Zotero Connector extension.
|
|
2. Unpacking it from its `.crx` format.
|
|
3. Launching Chromium using Playwright in `--headless=new` mode (which DOES allow extensions, unlike the old headless mode) with a persistent user data directory.
|
|
4. Auto-closing setup tabs.
|
|
5. Invoking the Zotero Connector programmatically by accessing its background service worker (`Zotero.Connector_Browser.saveWithTranslator(...)`).
|
|
|
|
## Prerequisites
|
|
|
|
1. [**uv**](https://github.com/astral-sh/uv) installed.
|
|
2. **Zotero Desktop** must be currently running on your machine (the extension communicates with the Zotero desktop app securely on port `1969`).
|
|
|
|
## Setup
|
|
|
|
First, install dependencies and set up the playwright environment using `uv`:
|
|
|
|
|
|
```bash
|
|
uv sync
|
|
uv run playwright install chromium
|
|
```
|
|
|
|
## Usage
|
|
|
|
Simply pass the URL of the paper you want to add to Zotero:
|
|
|
|
```bash
|
|
uv run zotero_automator.py "https://arxiv.org/abs/1706.03762"
|
|
```
|
|
|
|
If you want to watch the browser process visually (helpful for debugging if a site requires a captcha or login, or just to verify the extension is working), pass the `--headed` flag:
|
|
|
|
```bash
|
|
uv run zotero_automator.py "https://arxiv.org/abs/1706.03762" --headed
|
|
```
|
|
|
|
## How It Works
|
|
|
|
- **`setup_extension()`**: Locates the `EKHAGK...` identifier for the Zotero extension on the Chrome web store and downloads the raw `.crx` payload. It unpacks the contents into `./zotero_extension/`.
|
|
- **`save_to_zotero()`**: Starts an `async_playwright` session pointing to a local profile folder (`./chrome_profile/`). The extension injects its translator scripts on network idle. We find the extension's background service worker, trigger the programmatic save, and then poll `sessionProgress` until Zotero finishes downloading the PDFs and metadata.
|