Skip to main content

I Built a Chrome Extension to Rip My Data Out of Jira's Cold, Clammy Hands

· 8 min read
Victor Jimenez
Software Engineer & AI Agent Builder

I needed to get data out of Jira. Not just the title, but the full description, comments, and all attachments, packaged neatly for use in other scripts. The official way involves wrestling with an API that feels like it was designed by a committee that never spoke to each other. The unofficial way involves paying $20/month for a SaaS tool that is just a glorified curl command wrapped in a pretty dashboard. I chose the third way: build it myself.

This is the story of jiraextractor, a simple, privacy-first Chrome extension that does one thing and does it well: it rips a Jira ticket's entire contents into a clean, local ZIP file. No servers, no data collection, no nonsense.

The Problem: Data Hostage Situations

You have a Jira ticket. You need its contents. Your options are, frankly, terrible.

  1. Manual Labor: Copy-paste the description, manually save every image and attachment one-by-one, and try to piece it all together. This is soul-destroying, error-prone, and takes forever.
  2. The "Official" API: Generate an API token, make a request to /rest/api/3/issue/{issueKey}, get back a 5000-line JSON monstrosity. The description is a blob of proprietary Jira Document Format HTML. Attachments are metadata requiring separate authenticated API calls to download each one. It is a project in itself.
  3. The SaaS Grift: Pay a monthly fee for a service that does the above for you. Your private ticket data gets processed on their servers.

I don't want to authenticate, I don't want to parse a novel of JSON, and I certainly don't want to pay someone else to fetch for me. I'm already logged into Jira in my browser. The data is right there on the page. Why can't I just take it?

The Solution: A Browser-Based Heist

The extension works by leveraging the fact that I'm already authenticated in the browser. It uses a content script to scrape the DOM, a background worker to fetch attachments and package everything, and the browser's own download manager to save the result. It is a completely client-side operation.

Tech Stack

ComponentTechnologyWhy
PlatformChrome Extension (Manifest V3)Direct access to authenticated browser context
DOM scrapingContent script (content.js)Runs in the Jira tab context
Attachment fetchBackground service workerHandles cross-origin requests
CORS fixdeclarativeNetRequestStrips Authorization header for Atlassian media
PackagingJSZipIn-memory ZIP creation, no server needed
IconsPython + PillowA few lines beats any web-based icon generator

1. Triggering the Extraction

The UI is a simple button. When clicked, it sends a message to the content script injected into the active Jira tab.

content.js
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
if (request.action === 'extract') {
extractTicket().then(result => {
sendResponse(result);
}).catch(error => {
sendResponse({ error: error.message });
});
return true; // Keep message channel open for async response
}
});

The extractTicket() function (in extractor.js) is the core scraper. It hunts through the Jira DOM using a list of potential selectors for the title, description, comments, and attachments. Jira's class names can vary between versions and instances, so you can't rely on a single selector. The scraper tries several known patterns to find the data, making it surprisingly resilient.

2. The CORS Dragon and declarativeNetRequest

Here's where it gets interesting. The scraped attachment URLs point to resources on domains like api.media.atlassian.com. My background script needs to fetch these URLs.

A normal fetch from a background script to a different domain would trigger a CORS preflight OPTIONS request if it includes custom headers, like Authorization. Even though my script doesn't add this header, Chrome sometimes attaches the session's authentication headers anyway. Atlassian's media servers see this OPTIONS request with an Authorization header and immediately reject it.

Chrome Silently Adds Auth Headers to Cross-Origin Requests

Even if your extension code doesn't set an Authorization header, Chrome may attach session auth headers automatically. Atlassian's S3-signed media URLs reject the resulting preflight. The fix is declarativeNetRequest to strip the header before it leaves the browser.

The modern, correct way is with chrome.declarativeNetRequest. It lets you define rules to modify requests on the fly, handled by the browser itself.

background.js
chrome.runtime.onInstalled.addListener(() => {
chrome.declarativeNetRequest.updateDynamicRules({
removeRuleIds: [1], // Clear old rule
addRules: [
{
id: 1,
priority: 1,
action: {
type: 'modifyHeaders',
requestHeaders: [
{ header: 'Authorization', operation: 'remove' }
]
},
condition: {
urlFilter: 'media.atlassian.com',
resourceTypes: ['xmlhttprequest', 'other']
}
}
]
});
});

This rule tells Chrome: "If the service worker tries to fetch anything from a media.atlassian.com URL, remove the Authorization header before you send it." The preflight request is avoided, the simple GET goes through, and the download succeeds.

3. Packaging with JSZip

Once the attachments are fetched as blobs, the background script uses the JSZip library to create a ZIP archive in memory.

  1. A ticket.json file is created with the scraped text content.
  2. Each fetched attachment blob is added to an attachments/ directory within the virtual ZIP.
  3. JSZip generates the final .zip file as a blob.
  4. The chrome.downloads.download() API is called to save the blob to the user's disk.
The Web Page Is the Real API

For read-only tasks where you're already authenticated, a well-written DOM scraper is often faster to build and more practical than wrestling with a bloated REST API. The key constraint: Jira's class names vary between versions. Use multiple selector fallbacks, not a single fragile query.

The final JSON is clean and predictable:

ticket.json
{
"title": "Ticket Title",
"ticketKey": "PROJ-123",
"url": "https://...",
"extractedAt": "2024-01-01T00:00:00.000Z",
"description": "Clean text description without HTML",
"comments": [
{
"id": 1,
"author": "John Doe",
"timestamp": "3 days ago",
"body": "Comment text"
}
],
"attachments": [
{
"id": 1,
"name": "file.pdf",
"url": "https://..."
}
]
}

What I Learned

  • Scraping > API (Sometimes): For read-only tasks where I am already authenticated, a well-written DOM scraper is often faster to build and more practical than wrestling with a bloated, over-abstracted REST API. The web page is the real API.
  • declarativeNetRequest is a Superpower: This API is the definitive solution for handling complex CORS and header-related issues in Manifest V3. It is faster and safer than the old webRequest API. For any extension that interacts with third-party resources, this is the one to learn.
  • Keep it Client-Side: Not everything needs a server. By using JSZip and the Downloads API, the entire process runs on the user's machine. Faster, cheaper, and infinitely more secure than piping user data through a random backend.
  • Build Your Own Damn Tools: The project even includes a small Python script using Pillow to generate the extension icons. I refuse to waste time in a web-based "icon generator" or install some massive npm dependency. A few lines of code I own and understand is always better.
Full extension file structure
jiraextractor/
manifest.json
popup.html
popup.js
content.js
background.js
extractor.js
lib/
jszip.min.js
icons/
icon16.png
icon48.png
icon128.png
create_icons.py

Project Hygiene

The repository now includes an MIT LICENSE.

Why this matters for Drupal and WordPress

Agencies and product teams that run Drupal or WordPress projects often track work in Jira — migrations, feature backlogs, and client tickets. Getting full ticket content (description, comments, attachments) into a form you can script (e.g. for backlog import, content migration, or agent context) usually means fighting the Jira API or paying for a third-party export tool. A browser extension that runs where you're already logged in and outputs clean local JSON keeps data on your machine and fits into pipelines that feed Drupal/WordPress tooling without extra APIs or SaaS.

References


Looking for an Architect who doesn't just write code, but builds the AI systems that multiply your team's output? View my enterprise CMS case studies at victorjimenezdev.github.io or connect with me on LinkedIn.