If you do bookkeeping once a month, you’ve probably lived this loop:
- invoices arrive via Gmail (sometimes forwarded, replied-to, or resent)
- you download the PDFs
- you store them in Drive
- you file them during bookkeeping (move/delete/rename)
- and then automation re-imports the same PDFs again 😩
This post shows a Google Apps Script setup that:
✅ scans your entire Gmail (excluding spam/trash)
✅ exports PDF attachments into Google Drive folders by month (YYYY-MM)
✅ detects duplicates by content hash (MD5)
✅ never re-imports the same email again—even if you move/delete files in Drive
✅ still labels threads in Gmail for visibility (but does not rely on labels for correctness)
The core idea
Relying on a Gmail thread label to skip work is tempting, but it breaks this common scenario:
A thread gets labeled “processed”… then later a new reply comes in with a new PDF attachment.
If you skip by label, you miss that PDF.
Instead, we do two things:
1) Message-ID tracking (stored in PropertiesService)
- Every Gmail message is processed exactly once.
- This is what prevents re-imports after you manually move/delete files.
2) MD5 de-duplication per month folder
- Forwarded invoices and resent PDFs won’t create duplicates in Drive.
We still set a label (pdf-imported) for visibility, but the automation does not depend on it.
What you’ll get in Google Drive
A structure like this:
4_Gmail-Files/
2026-02/
2026-02-11__MESSAGEID__A0__Invoice_123.pdf
2026-02-14__MESSAGEID__A0__Receipt.pdf
2026-03/
...
You can do monthly bookkeeping and manually move/delete files however you like. The importer won’t recreate them.
Step 1: Create a Drive folder
Create a folder in Google Drive (e.g. 4_Gmail-Files) and copy its folder ID from the URL:
https://drive.google.com/drive/folders/<FOLDER_ID>
Step 2: Add the Apps Script
Go to script.google.com → New project → paste this code into Code.gs.
Replace
ROOT_FOLDER_IDwith your own folder ID.
/**
* Gmail → PDF attachments → Google Drive monthly folders (YYYY-MM)
*
* ✅ scans ALL mail (in:anywhere), excluding spam/trash
* ✅ catches new replies in old threads via LOOKBACK window
* ✅ stores PDFs in <ROOT>/<YYYY-MM>/
* ✅ prevents duplicates:
* - message-id tracking (each Gmail message processed once)
* - MD5 content de-dupe (identical PDFs per month folder stored once)
* ✅ Gmail thread label for visibility (not used for skipping)
*/
const ROOT_FOLDER_ID = "PASTE_YOUR_FOLDER_ID_HERE";
// Lookback window: must be large enough to catch “old thread, new reply”
const LOOKBACK_DAYS = 365;
const GMAIL_QUERY =
`in:anywhere -in:spam -in:trash newer_than:${LOOKBACK_DAYS}d has:attachment filename:pdf`;
// Label threads where we actually saved at least one PDF (visibility only)
const THREAD_LABEL = "pdf-imported";
// Mark thread as read after saving something?
const MARK_THREAD_AS_READ = false;
// How many threads to process per run
const MAX_THREADS_PER_RUN = 50;
// Tracker sizing: large enough so you don’t “forget” processed messages
const MAX_TRACKED_MESSAGES = 30000;
// Must be >= LOOKBACK_DAYS (recommend bigger)
const KEEP_MESSAGE_IDS_DAYS = 900;
function run() {
const root = DriveApp.getFolderById(ROOT_FOLDER_ID);
const labelObj = THREAD_LABEL ? getOrCreateGmailLabel_(THREAD_LABEL) : null;
const tracker = loadMessageTracker_();
pruneMessageTracker_(tracker);
const threads = GmailApp.search(GMAIL_QUERY, 0, MAX_THREADS_PER_RUN);
if (!threads.length) {
saveMessageTracker_(tracker);
return;
}
// monthKey -> { monthFolder, md5Set }
const monthCache = new Map();
for (const thread of threads) {
const messages = thread.getMessages();
let savedAnyInThread = false;
for (const msg of messages) {
const messageId = msg.getId();
// Prevent re-imports even if files are moved/deleted in Drive
if (tracker.ids[messageId]) continue;
const msgDate = msg.getDate();
const monthKey = formatMonthKey_(msgDate);
const { monthFolder, md5Set } = getMonthContext_(root, monthKey, monthCache);
const attachments = msg.getAttachments({ includeInlineImages: false });
for (let i = 0; i < attachments.length; i++) {
const att = attachments[i];
const originalName = att.getName() || `attachment_${i}.pdf`;
const isPdf =
att.getContentType() === "application/pdf" ||
originalName.toLowerCase().endsWith(".pdf");
if (!isPdf) continue;
const blob = att.copyBlob();
const md5 = md5Hex_(blob);
// Content de-dupe in the month folder
if (md5Set.has(md5)) continue;
const safeName = sanitizeFilename_(originalName);
const datePrefix = Utilities.formatDate(
msgDate,
Session.getScriptTimeZone(),
"yyyy-MM-dd"
);
const uniqueDriveName = `${datePrefix}__${messageId}__A${i}__${safeName}`;
monthFolder.createFile(blob).setName(uniqueDriveName);
md5Set.add(md5);
savedAnyInThread = true;
}
// Mark this Gmail message as processed (even if it had no PDFs)
tracker.ids[messageId] = Date.now();
tracker.order.push(messageId);
}
if (savedAnyInThread) {
if (labelObj) thread.addLabel(labelObj);
if (MARK_THREAD_AS_READ) thread.markRead();
}
}
saveMessageTracker_(tracker);
}
// ===== Tracker (PropertiesService) =====
function loadMessageTracker_() {
const props = PropertiesService.getUserProperties();
const raw = props.getProperty("PDF_TRACKER_V1");
if (!raw) return { ids: {}, order: [] };
try {
const obj = JSON.parse(raw);
if (!obj.ids) obj.ids = {};
if (!obj.order) obj.order = [];
return obj;
} catch (e) {
return { ids: {}, order: [] };
}
}
function saveMessageTracker_(tracker) {
pruneMessageTracker_(tracker);
PropertiesService.getUserProperties().setProperty("PDF_TRACKER_V1", JSON.stringify(tracker));
}
function pruneMessageTracker_(tracker) {
const cutoff = Date.now() - KEEP_MESSAGE_IDS_DAYS * 24 * 60 * 60 * 1000;
const newOrder = [];
for (const id of tracker.order) {
const ts = tracker.ids[id];
if (!ts) continue;
if (ts >= cutoff) newOrder.push(id);
else delete tracker.ids[id];
}
tracker.order = newOrder;
if (tracker.order.length > MAX_TRACKED_MESSAGES) {
const removeCount = tracker.order.length - MAX_TRACKED_MESSAGES;
for (let i = 0; i < removeCount; i++) {
const id = tracker.order[i];
delete tracker.ids[id];
}
tracker.order = tracker.order.slice(removeCount);
}
}
// ===== Month context + MD5 =====
function getMonthContext_(rootFolder, monthKey, monthCache) {
if (monthCache.has(monthKey)) return monthCache.get(monthKey);
const monthFolder = getOrCreateSubfolder_(rootFolder, monthKey);
const md5Set = new Set();
const files = monthFolder.getFiles();
while (files.hasNext()) {
const f = files.next();
const md5 = f.getMd5Checksum();
if (md5) md5Set.add(String(md5).toLowerCase());
}
const ctx = { monthFolder, md5Set };
monthCache.set(monthKey, ctx);
return ctx;
}
function md5Hex_(blob) {
const bytes = blob.getBytes();
const digest = Utilities.computeDigest(Utilities.DigestAlgorithm.MD5, bytes);
return digest
.map(b => {
const v = (b < 0 ? b + 256 : b);
return (v < 16 ? "0" : "") + v.toString(16);
})
.join("");
}
// ===== Helpers =====
function formatMonthKey_(dateObj) {
return Utilities.formatDate(dateObj, Session.getScriptTimeZone(), "yyyy-MM");
}
function getOrCreateSubfolder_(parentFolder, name) {
const it = parentFolder.getFoldersByName(name);
if (it.hasNext()) return it.next();
return parentFolder.createFolder(name);
}
function sanitizeFilename_(name) {
return String(name)
.replace(/[\/\:*?"<>|]/g, "_")
.replace(/\s+/g, " ")
.trim();
}
function getOrCreateGmailLabel_(labelName) {
let label = GmailApp.getUserLabelByName(labelName);
if (!label) label = GmailApp.createLabel(labelName);
return label;
}
Step 3: Authorize the script
In Apps Script:
1) Select function run
2) Click Run
3) Accept permissions (Gmail read + Drive access)
Step 4: Set a trigger (every 6 hours)
1) Click the clock icon (“Triggers”)
2) Add Trigger:
- Function:
run - Event source: Time-driven
- Type: Hour timer
- Interval: Every 6 hours
Why this works for monthly bookkeeping
When you do bookkeeping, you’ll usually move/delete/rename PDFs inside the month folder.
This automation doesn’t care what happens in Drive afterward.
- It remembers processed Gmail message IDs.
- So the same email is never imported again, even if you remove the stored PDF.
- And it still catches new PDFs posted later in an old thread.
Tuning tips
- If you get invoices as replies to very old threads, increase
LOOKBACK_DAYS. - Always keep
KEEP_MESSAGE_IDS_DAYS >= LOOKBACK_DAYS(recommended: much bigger). - If you have high email volume, increase
MAX_THREADS_PER_RUN(and keep an eye on Apps Script quotas).
Happy automating 👋
Top comments (0)