✅ Completed Components
1. AWS Infrastructure (Validated)
- ✅ Docker image building with renv
- ✅ ECR authentication and push
- ✅ Image management with content hashing
- ✅ Configuration system
- ✅ Worker script for task execution
- ✅ S3 integration for task data
2. Direct API Implementation (NEW)
- ✅
starburst_map()- Main parallel map function - ✅
starburst_cluster()- Cluster management object - ✅ Chunk-based task distribution
- ✅ Progress tracking
- ✅ Cost estimation
- ✅ Result polling and aggregation
📋 Setup Requirements
IAM Roles Setup
staRburst requires two IAM roles for ECS Fargate execution.
Quick Setup:
Roles Created: 1. starburstECSExecutionRole - For ECS/ECR/CloudWatch access 2. starburstECSTaskRole - For S3 data access
Detailed Documentation: See docs/IAM_SETUP.md for: - Complete trust policies - Required permissions - Manual setup instructions - Troubleshooting guide
📖 Usage Examples
Basic Usage
library(starburst)
# Simple parallel map
results <- starburst_map(
1:100,
function(x) x^2,
workers = 10
)Advanced Usage
# With custom configuration
results <- starburst_map(
data_list,
expensive_function,
workers = 50,
cpu = 4,
memory = "8GB",
region = "us-east-1"
)
# Using cluster object
cluster <- starburst_cluster(workers = 20, cpu = 8, memory = "16GB")
results1 <- cluster$map(data1, function(x) analyze(x))
results2 <- cluster$map(data2, function(x) process(x))🚀 Next Steps
- Create IAM Roles: Set up the two required IAM roles in AWS
-
End-to-End Test: Run full test with
starburst_map()on AWS - Documentation: Update README with new API examples
- Future Backend (Optional v2): Implement full Future API for furrr compatibility
💡 Design Decisions
Why Direct API vs Future Backend?
Chose Direct API because: - ✅ Simpler implementation (~300 lines vs ~800+ lines) - ✅ Immediate value - works today - ✅ Easier to debug and maintain - ✅ Uses all validated AWS infrastructure - ✅ Clear, intuitive interface
Future backend can be added later as an enhancement for users who want furrr compatibility.
