Safe Ways to Delete Files Older Than X Days Using Scripts and Tools

Delete Files Older Than N Days in Cloud Storage (S3, Google Drive, OneDrive)Managing storage costs, keeping data tidy, and complying with retention policies often require automatically removing files older than a certain age. This article covers practical approaches to delete files older than N days in three popular cloud storage platforms: Amazon S3, Google Drive, and Microsoft OneDrive. For each service I’ll explain native policy options, script-based methods, and examples you can adapt. I also cover safety precautions, testing strategies, and suggestions for logging and monitoring.


Key considerations before deleting files

  • Retention and compliance: Verify legal or business retention requirements before deleting any data.
  • Backups and snapshots: Ensure backups exist if you might need deleted data later.
  • Versioning and recycle bins: Understand how versioning, trash, and soft-delete work on the platform (they can affect cost and recoverability).
  • Testing: Always test deletion logic on a small sample or non-production bucket/folder.
  • Logging and monitoring: Keep logs of deletions and set up alerts for failed jobs.

Amazon S3

S3 Lifecycle Rules are the simplest, most reliable way to expire objects older than N days. They run on AWS’s side and don’t require running your own servers.

Example: delete objects after N days

  • Create a lifecycle rule targeting a bucket (optionally a prefix or tags).
  • Set “Expire current version of object” to N days.
  • If your bucket uses versioning and you want to remove previous versions, also set rules for noncurrent versions and for incomplete multipart uploads.

AWS Management Console:

  1. Open S3 → select bucket → Management → Lifecycle rules → Create lifecycle rule.
  2. Scope: choose prefix/tags if needed.
  3. Lifecycle rule actions: choose “Expire current versions” and set number of days.
  4. Save.

Terraform example:

resource "aws_s3_bucket_lifecycle_configuration" "example" {   bucket = "my-bucket"   rule {     id     = "expire-objects-after-n-days"     status = "Enabled"     filter {       prefix = "path/to/folder/"     }     expiration {       days = 30     }   } } 

Notes:

  • Lifecycle rules are evaluated once a day; deletion timing is not immediate at the exact N-day boundary.
  • Costs may persist briefly for delete markers and previous versions; lifecycle can be configured for these too.

Scripted approaches (for custom logic)

Use AWS CLI, boto3 (Python), or SDKs when you need conditional deletions beyond lifecycle rules (e.g., based on metadata).

AWS CLI (deleting objects older than N days in a prefix):

# Requires GNU date or equivalent; adjust on macOS (gdate) N=30 cutoff=$(date -d "$N days ago" --utc +%Y-%m-%dT%H:%M:%SZ) aws s3api list-objects-v2 --bucket my-bucket --prefix path/to/folder/    --query "Contents[?LastModified<='${cutoff}'].{Key:Key}" --output json    | jq -r '.[].Key' | while read key; do     echo "Deleting $key"     aws s3api delete-object --bucket my-bucket --key "$key"   done 

Python (boto3) example:

import boto3 from datetime import datetime, timezone, timedelta s3 = boto3.client('s3') bucket = 'my-bucket' prefix = 'path/to/folder/' days = 30 cutoff = datetime.now(timezone.utc) - timedelta(days=days) paginator = s3.get_paginator('list_objects_v2') for page in paginator.paginate(Bucket=bucket, Prefix=prefix):     for obj in page.get('Contents', []):         if obj['LastModified'] < cutoff:             print("Deleting", obj['Key'])             s3.delete_object(Bucket=bucket, Key=obj['Key']) 

Caveats:

  • For large buckets, use pagination and consider batching deletes (DeleteObjects API supports up to 1000 keys per request).
  • If versioning is enabled, delete markers and previous versions must be handled separately.

Google Drive

Google Drive doesn’t provide a built-in lifecycle rule like S3. You can use Google Workspace (Gmail/Drive) retention policies in enterprise accounts, but for general use you’ll typically use the Drive API or Google Apps Script to remove older files.

Google Apps Script (easy, runs on Google servers)

This script deletes files in a specified folder older than N days.

function deleteFilesOlderThanN() {   var folderId = 'YOUR_FOLDER_ID';   var folder = DriveApp.getFolderById(folderId);   var days = 30;   var cutoff = new Date();   cutoff.setDate(cutoff.getDate() - days);   var files = folder.getFiles();   while (files.hasNext()) {     var file = files.next();     var modified = file.getLastUpdated();     if (modified < cutoff) {       Logger.log('Trashing: ' + file.getName() + ' (' + file.getId() + ')');       file.setTrashed(true); // move to trash; use file.setTrashed(false) to restore     }   } } 
  • Deploy as a time-driven trigger (e.g., daily) in Apps Script editor (Triggers → Add trigger).
  • setTrashed(true) moves files to Drive Trash (recoverable for 30 days by default).

For domain-wide or more complex needs, use the Drive REST API with a service account or OAuth2 credentials. Query files by modifiedTime:

Sample Python using google-api-python-client:

from googleapiclient.discovery import build from google.oauth2 import service_account from datetime import datetime, timedelta, timezone SCOPES = ['https://www.googleapis.com/auth/drive'] SERVICE_ACCOUNT_FILE = 'service-account.json' FOLDER_ID = 'your-folder-id' DAYS = 30 creds = service_account.Credentials.from_service_account_file(     SERVICE_ACCOUNT_FILE, scopes=SCOPES) service = build('drive', 'v3', credentials=creds) cutoff = (datetime.now(timezone.utc) - timedelta(days=DAYS)).isoformat() query = f"'{FOLDER_ID}' in parents and modifiedTime < '{cutoff}' and trashed = false" page_token = None while True:     response = service.files().list(q=query, spaces='drive',                                     fields='nextPageToken, files(id, name, modifiedTime)',                                     pageToken=page_token).execute()     for file in response.get('files', []):         print('Trashing', file['name'])         service.files().update(fileId=file['id'], body={'trashed': True}).execute()     page_token = response.get('nextPageToken', None)     if page_token is None:         break 

Notes:

  • Deleting a file via the API typically moves it to Trash; to permanently remove use files.delete().
  • Respect API quotas and implement exponential backoff for errors.

Microsoft OneDrive

OneDrive has built-in retention in enterprise Microsoft 365 using retention policies and Microsoft Purview. For personal OneDrive or custom rules, use Microsoft Graph API or Power Automate.

Power Automate (no-code)

Create a scheduled Flow:

  1. Trigger: Recurrence (daily).
  2. Action: List files in folder (OneDrive for Business: List files in folder).
  3. Use a filter or condition to compare lastModifiedDateTime to utcNow() minus N days.
  4. For matching files, add action Delete file.

Pros: easy to set up; visible in Office 365 admin. Cons: for many files performance may lag or hit limits.

Microsoft Graph API (scripted)

Use Graph API to list drive items and check lastModifiedDateTime.

Python example using msal + requests:

import requests, msal from datetime import datetime, timedelta, timezone TENANT_ID = 'your-tenant-id' CLIENT_ID = 'your-client-id' CLIENT_SECRET = 'your-client-secret' DRIVE_ITEM_PATH = '/me/drive/root:/path/to/folder:/children'  # or /drives/{drive-id}/items/{item-id}/children DAYS = 30 app = msal.ConfidentialClientApplication(CLIENT_ID, authority=f'https://login.microsoftonline.com/{TENANT_ID}', client_credential=CLIENT_SECRET) token = app.acquire_token_for_client(scopes=['https://graph.microsoft.com/.default'])['access_token'] headers = {'Authorization': 'Bearer ' + token} cutoff = datetime.now(timezone.utc) - timedelta(days=DAYS) r = requests.get('https://graph.microsoft.com/v1.0' + DRIVE_ITEM_PATH, headers=headers) items = r.json().get('value', []) for item in items:     lm = datetime.fromisoformat(item['lastModifiedDateTime'].replace('Z', '+00:00'))     if lm < cutoff:         print('Deleting', item['name'])         requests.delete(f"https://graph.microsoft.com/v1.0/me/drive/items/{item['id']}", headers=headers) 

Notes:

  • For OneDrive for Business and SharePoint, use the appropriate drive URLs (/drives/{id}).
  • Graph API supports delta queries for incremental scans to improve efficiency.

Safety, testing, and logging

  • Start with “Trash/Soft-delete” where available, then after a retention window permanently delete.
  • Use dry-run mode: list objects that would be deleted without removing them.
  • Maintain an audit log: record file IDs, names, timestamps, user who triggered deletion, and job run id.
  • Use exponential backoff and error handling in scripts for API rate limits.
  • For high-volume workloads consider batching (S3 DeleteObjects, Drive batch requests where supported).

Example workflow templates

  1. S3 (recommended): set lifecycle rule → verify with S3 Inventory → enable logging.
  2. Google Drive: Apps Script daily trigger that sets files to Trash → monitor Trash → permanently delete after 30 days.
  3. OneDrive: Power Automate scheduled flow for small folders; Graph API + delta tokens for larger scale.

Comparison: S3 vs Google Drive vs OneDrive

Platform Native lifecycle Soft-delete/trash Best for large-scale automation
Amazon S3 Yes (Lifecycle rules) Versioning + delete markers S3 (lifecycle + lifecycle + SDKs)
Google Drive No (use Apps Script/API) Yes (Trash) Drive API with service account
OneDrive Retention via M365/Purview; not simple per-folder for personal Yes (Recycle Bin) Microsoft Graph API / Power Automate

Troubleshooting tips

  • No files deleted: check timestamps (timezones), filters/prefixes, and query syntax.
  • Hitting rate limits: add exponential backoff, reduce page sizes, or use native policies where possible.
  • Versioning keeps old data: add lifecycle rules to remove noncurrent versions (S3) or permanently delete versions (Drive with revisions API).

Final checklist before running deletions

  • Confirm retention/compliance rules.
  • Test on a sample folder or sandbox account.
  • Keep soft-delete enabled for recovery window.
  • Enable logging and alerts.
  • Schedule regular audits to ensure rules behave as expected.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *