Open Source

StreamVault: Mass Asset Retrieval Platform

A high-performance S3 bulk downloader designed to overcome a critical limitation in the AWS S3 ecosystem: the inability to efficiently download entire folder hierarchies containing numerous files.

Get Started Star on GitHub

Tested with 25,000+ files

50GB+ archives

ISC License

StreamVault Download Process

$ streamvault init --bucket my-assets

Initializing StreamVault...

Connected to S3 bucket: my-assets

$ streamvault download --path "project/design-assets" --archive

Scanning folder structure... Found 258 files (124 MB)

⟳Processing files... 100% complete

$ streamvault download --path "marketing/product-images" --archive

Scanning folder structure... Found 532 files (350 MB)

✓ Download complete! Archive created successfully.

$ Generating download URL: https://my-bucket.s3.amazonaws.com/archives/batch-20240503.zip

Processed

790 Files

Completed

2/2 Jobs

Cache Valid

59 min

The Challenge

AWS S3's Native Interface Limitations

AWS S3's native interface presents significant challenges when managing bulk operations.

No Native Folder Downloads
The S3 Console doesn't support downloading entire folder structures.
Limited Batch Operations
Managing thousands of files becomes unwieldy through the standard interface.
Technical Barriers
Alternative solutions require AWS CLI proficiency or custom API development.
Resource Consumption
Naive implementations risk memory exhaustion and connection timeouts.

AWS S3 Console

my-important-bucket/

s3://

project/Cannot download directory

assets/

documents/

media/

...

Error: Operation not supportedConsole

Limited to single file operationsAPI

StreamVault solves these limitations with efficient streaming architecture

Core Features

Key Storage and Job Handling Features

StreamVault delivers a scalable, memory-efficient approach to bulk downloads through advanced queuing systems and streaming architecture.

S3-Based Archive Storage

All generated ZIP archives are stored directly in your S3 bucket at the destination path configured in your .env file.

Intelligent Job Caching

If multiple users request the same folder, StreamVault returns the existing archive URL instead of regenerating the archive.

Configurable Cache Expiry

Job results are maintained for a configurable period (default: 1 hour) before being expired from the system.

Resource-Efficient Processing

By leveraging S3 for storage and Redis for job state management, the system maintains minimal local resource usage.

On-the-fly Archive Creation

Progressive ZIP generation without requiring full local storage, enabling efficient processing of large archives.

Secure Delivery Mechanism

Configurable authentication for download access with pre-signed URLs and customizable expiration times.

Architecture

StreamVault Solution Architecture

StreamVault employs a sophisticated, microservices-based architecture to process bulk S3 downloads efficiently.

Component Roles

API Server
Handles client requests, job validation, and queue management
Redis Queue
Distributes workload and maintains job state
Worker Nodes
Execute download and archival operations with resource management
Monitoring Dashboard
Provides real-time visibility into system operations
Job Processor
Manages job execution, including download and archival processes

System Architecture

Click to expand

API Response Time

<100ms

Memory Usage

~500MB

Concurrency

Configurable

Scalability

Horizontal

Storage

Amazon S3

Queue System

Redis

Technical Capabilities

Maximum Download Size

Tested up to 50GB (theoretically unlimited)

File Count Capacity

Successfully processed 25,000+ files in testing

Memory Efficiency

Constant memory usage regardless of download size

Throughput

Limited primarily by network bandwidth

Concurrency

Configurable parallel processing (default: 2 workers)

Infrastructure

Self-hosted or cloud deployment options

Getting Started

Deploy StreamVault in Minutes

Follow these simple steps to get StreamVault up and running.

terminal

# Clone the repository

$git clone https://github.com/Slacky300/StreamVault.git

$cd StreamVault

# Configure environment variables

$cp .env.example .env

# Edit .env with your AWS credentials and settings

# Deploy with Docker Compose

$docker-compose up -d

Creating network "streamvault_default" with the default driver

Creating streamvault_redis_1 ... done

Creating streamvault_api_1 ... done

Creating streamvault_worker_1 ... done

# StreamVault is now running!

Core Environment Variables

# Service Configuration
NODE_ENV=development             # Environment mode (development/production)
PORT=3000                        # API service port

# Redis Configuration
REDIS_HOST=redis                 # Redis server host
REDIS_PORT=6379                  # Redis server port
REDIS_CONNECTION_TIMEOUT=60000   # Redis connection timeout in ms
REDIS_PASSWORD=your_redis_password # Redis password (if applicable)

# AWS Configuration
AWS_REGION=us-east-1             # AWS region
AWS_ACCESS_KEY_ID=your_key       # AWS credentials
AWS_SECRET_ACCESS_KEY=your_secret # AWS credentials
AWS_BUCKET_NAME=your_bucket_name # S3 bucket name
AWS_S3_EXPORT_DESTINATION_PATH=your_export_path # S3 export path
AWS_S3_GENERATE_PRESIGNED_URL=true # Generate pre-signed URLs for downloads
AWS_S3_PRESIGNED_URL_EXPIRATION=3600 # Pre-signed URL expiration time in seconds

Performance

Performance Benchmarks

StreamVault delivers consistent performance even with large workloads, maintaining low memory usage and rapid processing.

Scenario	Files	Total Size	Processing Time*	Peak Memory	Testing Status
Small Archive	100	500MB	45s	220MB	Tested
Medium Archive	1,000	5GB	8m 20s	340MB	Tested
Large Archive	25,000	50GB	1h 45m	480MB	Tested
Enterprise Scenario	100,000+	500GB+	~18h**	510MB**	Projected

*Processing times depend on network bandwidth and S3 throttling limits

**Projected values based on small-to-large scale testing; not yet verified with actual 500GB+ workloads

Testing Environment

All benchmarks were conducted on an AWS t2.large instance with the following specifications:

CPU2 vCPUs
RAM8GB
NetworkUp to 1Gbps (burst capacity)

This demonstrates that StreamVault can handle substantial workloads even on moderately-sized infrastructure, making it suitable for teams with various resource constraints.

Screenshots

See StreamVault in Action

Visual overview of the StreamVault interface and monitoring capabilities.

Interface

Dashboard Overview (BullMQ Dashboard)

Dashboard Overview (BullMQ Dashboard)

Interface

Active Jobs Management Interface

Monitoring

Detailed Job Execution Logs and Diagnostics

Logging

Completed Jobs History and Status

Results

API

API Reference

Integrate StreamVault into your applications with our simple API.

POST

Create Download Job

POST /create-job

Request Body:

{
  "s3Key": "path/to/s3/folder"
}

Response:

{
  "message": "Small download job created successfully",
  "s3Key": "__outputs/",
  "sizeOfFolder": "320.40 MB",
  "thresholdValueInGB": 10,
  "createdAt": "04/29/2025 01:54:21 PM",
  "jobId": "__outputs_-small",
  "isLargeDownload": false
}

GET

Check Job Status

GET /job/:jobId

Response:

{
  "message": "Job details retrieved successfully",
  "jobDetails": {
    "jobId": "job_14a72b9e3d",
    "name": "large-download",
    "data": {
      "s3Key": "path/to/s3/folder",
      "sizeOfFolder": 16492674825,
      "progress": 78,
      "state": "active"
    }
  }
}

Job Completion Response

{
  "downloadUrl": "https://bucket-name.s3.bucket-region.amazonaws.com/path-of-archive",
  "status": "completed",
  "totalSize": "320.40 MB",
  "noOfFiles": 1227
}

Ready to Transform Your S3 Download Experience?

Join the StreamVault community today and overcome the limitations of AWS S3's native interface.

Star on GitHub Get Started

By contributing, you agree to our ISC License

StreamVault: Mass Asset Retrieval Platform

AWS S3's Native Interface Limitations

No Native Folder Downloads

Limited Batch Operations

Technical Barriers

Resource Consumption

Key Storage and Job Handling Features

StreamVault Solution Architecture

Deploy StreamVault in Minutes

Performance Benchmarks

See StreamVault in Action

API Reference

Request Body:

Response:

Response:

Ready to Transform Your S3 Download Experience?