Open Source

StreamVault: Mass Asset Retrieval Platform

A high-performance S3 bulk downloader designed to overcome a critical limitation in the AWS S3 ecosystem: the inability to efficiently download entire folder hierarchies containing numerous files.

Tested with 25,000+ files
50GB+ archives
ISC License
StreamVault Download Process
$ streamvault init --bucket my-assets
Initializing StreamVault...
Connected to S3 bucket: my-assets
$ streamvault download --path "project/design-assets" --archive
Scanning folder structure... Found 258 files (124 MB)
Processing files... 100% complete
$ streamvault download --path "marketing/product-images" --archive
Scanning folder structure... Found 532 files (350 MB)
✓ Download complete! Archive created successfully.
$ Generating download URL: https://my-bucket.s3.amazonaws.com/archives/batch-20240503.zip
Processed
790 Files
Completed
2/2 Jobs
Cache Valid
59 min
The Challenge

AWS S3's Native Interface Limitations

AWS S3's native interface presents significant challenges when managing bulk operations.

  • No Native Folder Downloads

    The S3 Console doesn't support downloading entire folder structures.

  • Limited Batch Operations

    Managing thousands of files becomes unwieldy through the standard interface.

  • Technical Barriers

    Alternative solutions require AWS CLI proficiency or custom API development.

  • Resource Consumption

    Naive implementations risk memory exhaustion and connection timeouts.

AWS S3 Console
my-important-bucket/
s3://
project/Cannot download directory
assets/
documents/
media/
...
Error: Operation not supportedConsole
Limited to single file operationsAPI
StreamVault solves these limitations with efficient streaming architecture
Core Features

Key Storage and Job Handling Features

StreamVault delivers a scalable, memory-efficient approach to bulk downloads through advanced queuing systems and streaming architecture.

S3-Based Archive Storage

All generated ZIP archives are stored directly in your S3 bucket at the destination path configured in your .env file.

Intelligent Job Caching

If multiple users request the same folder, StreamVault returns the existing archive URL instead of regenerating the archive.

Configurable Cache Expiry

Job results are maintained for a configurable period (default: 1 hour) before being expired from the system.

Resource-Efficient Processing

By leveraging S3 for storage and Redis for job state management, the system maintains minimal local resource usage.

On-the-fly Archive Creation

Progressive ZIP generation without requiring full local storage, enabling efficient processing of large archives.

Secure Delivery Mechanism

Configurable authentication for download access with pre-signed URLs and customizable expiration times.

Architecture

StreamVault Solution Architecture

StreamVault employs a sophisticated, microservices-based architecture to process bulk S3 downloads efficiently.

Component Roles
  • API Server

    Handles client requests, job validation, and queue management

  • Redis Queue

    Distributes workload and maintains job state

  • Worker Nodes

    Execute download and archival operations with resource management

  • Monitoring Dashboard

    Provides real-time visibility into system operations

  • Job Processor

    Manages job execution, including download and archival processes

System Architecture
StreamVault Architecture
Click to expand
API Response Time
<100ms
Memory Usage
~500MB
Concurrency
Configurable
Scalability
Horizontal
Storage
Amazon S3
Queue System
Redis
Technical Capabilities
Maximum Download Size
Tested up to 50GB (theoretically unlimited)
File Count Capacity
Successfully processed 25,000+ files in testing
Memory Efficiency
Constant memory usage regardless of download size
Throughput
Limited primarily by network bandwidth
Concurrency
Configurable parallel processing (default: 2 workers)
Infrastructure
Self-hosted or cloud deployment options
Getting Started

Deploy StreamVault in Minutes

Follow these simple steps to get StreamVault up and running.

terminal
# Clone the repository
$git clone https://github.com/Slacky300/StreamVault.git
$cd StreamVault
# Configure environment variables
$cp .env.example .env
# Edit .env with your AWS credentials and settings
# Deploy with Docker Compose
$docker-compose up -d
Creating network "streamvault_default" with the default driver
Creating streamvault_redis_1 ... done
Creating streamvault_api_1 ... done
Creating streamvault_worker_1 ... done
# StreamVault is now running!
Core Environment Variables
# Service Configuration
NODE_ENV=development             # Environment mode (development/production)
PORT=3000                        # API service port

# Redis Configuration
REDIS_HOST=redis                 # Redis server host
REDIS_PORT=6379                  # Redis server port
REDIS_CONNECTION_TIMEOUT=60000   # Redis connection timeout in ms
REDIS_PASSWORD=your_redis_password # Redis password (if applicable)

# AWS Configuration
AWS_REGION=us-east-1             # AWS region
AWS_ACCESS_KEY_ID=your_key       # AWS credentials
AWS_SECRET_ACCESS_KEY=your_secret # AWS credentials
AWS_BUCKET_NAME=your_bucket_name # S3 bucket name
AWS_S3_EXPORT_DESTINATION_PATH=your_export_path # S3 export path
AWS_S3_GENERATE_PRESIGNED_URL=true # Generate pre-signed URLs for downloads
AWS_S3_PRESIGNED_URL_EXPIRATION=3600 # Pre-signed URL expiration time in seconds
Performance

Performance Benchmarks

StreamVault delivers consistent performance even with large workloads, maintaining low memory usage and rapid processing.

Scenario
Files
Total Size
Processing Time*
Peak Memory
Testing Status
Small Archive100500MB45s220MB
Tested
Medium Archive1,0005GB8m 20s340MB
Tested
Large Archive25,00050GB1h 45m480MB
Tested
Enterprise Scenario100,000+500GB+~18h**510MB**
Projected

*Processing times depend on network bandwidth and S3 throttling limits

**Projected values based on small-to-large scale testing; not yet verified with actual 500GB+ workloads

Testing Environment

All benchmarks were conducted on an AWS t2.large instance with the following specifications:

  • CPU2 vCPUs
  • RAM8GB
  • NetworkUp to 1Gbps (burst capacity)

This demonstrates that StreamVault can handle substantial workloads even on moderately-sized infrastructure, making it suitable for teams with various resource constraints.

Screenshots

See StreamVault in Action

Visual overview of the StreamVault interface and monitoring capabilities.

Interface
Dashboard Overview
Active Jobs
Job Logs
Completed Jobs

Dashboard Overview (BullMQ Dashboard)

Thumbnail for Dashboard Overview
Thumbnail for Active Jobs
Thumbnail for Job Logs
Thumbnail for Completed Jobs
API

API Reference

Integrate StreamVault into your applications with our simple API.

POST
Create Download Job
POST /create-job

Request Body:

{
  "s3Key": "path/to/s3/folder"
}

Response:

{
  "message": "Small download job created successfully",
  "s3Key": "__outputs/",
  "sizeOfFolder": "320.40 MB",
  "thresholdValueInGB": 10,
  "createdAt": "04/29/2025 01:54:21 PM",
  "jobId": "__outputs_-small",
  "isLargeDownload": false
}
GET
Check Job Status
GET /job/:jobId

Response:

{
  "message": "Job details retrieved successfully",
  "jobDetails": {
    "jobId": "job_14a72b9e3d",
    "name": "large-download",
    "data": {
      "s3Key": "path/to/s3/folder",
      "sizeOfFolder": 16492674825,
      "progress": 78,
      "state": "active"
    }
  }
}
Job Completion Response
{
  "downloadUrl": "https://bucket-name.s3.bucket-region.amazonaws.com/path-of-archive",
  "status": "completed",
  "totalSize": "320.40 MB",
  "noOfFiles": 1227
}

Ready to Transform Your S3 Download Experience?

Join the StreamVault community today and overcome the limitations of AWS S3's native interface.

By contributing, you agree to our ISC License