major changes
- Added documentation via mdbook - Created basic VS code extension - Implemented if else blocks and changed some syntax - fixed some issues
This commit is contained in:
52
docs/src/examples/data-pipeline.md
Normal file
52
docs/src/examples/data-pipeline.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# Data Pipeline
|
||||
|
||||
> ⚠️ This documentation is AI-generated and may contain errors.
|
||||
|
||||
Data processing pipeline showing how to process multiple batches in parallel.
|
||||
|
||||
## Script
|
||||
|
||||
See `/examples/data_pipeline.rsh` in the repository.
|
||||
|
||||
```rush
|
||||
#!/usr/bin/env rush
|
||||
|
||||
DATASET = "user_analytics"
|
||||
INPUT_DIR = "$HOME/data/raw"
|
||||
OUTPUT_DIR = "$HOME/data/processed"
|
||||
BATCH_SIZE = "1000"
|
||||
|
||||
echo "Data Processing Pipeline: $DATASET"
|
||||
|
||||
# Pre-processing stages
|
||||
for stage in validate clean normalize {
|
||||
STAGE_UPPER = "$stage"
|
||||
echo " Stage: $STAGE_UPPER"
|
||||
}
|
||||
|
||||
# Process batches in parallel
|
||||
BATCH_1_IN = "$INPUT_DIR/batch_001.csv"
|
||||
BATCH_1_OUT = "$OUTPUT_DIR/batch_001.json"
|
||||
# ... (define other batches)
|
||||
|
||||
parallel {
|
||||
run {
|
||||
echo "[batch_001] Processing $BATCH_1_IN -> $BATCH_1_OUT"
|
||||
echo "[batch_001] Transformed 1000 records"
|
||||
}
|
||||
|
||||
run {
|
||||
echo "[batch_002] Processing $BATCH_2_IN -> $BATCH_2_OUT"
|
||||
echo "[batch_002] Transformed 1000 records"
|
||||
}
|
||||
# ... (more batches)
|
||||
}
|
||||
|
||||
echo "All batches processed successfully"
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
- **Parallel data processing**: Process multiple batches simultaneously
|
||||
- **Path construction**: Building input/output file paths
|
||||
- **Pipeline stages**: Sequential setup, parallel processing, sequential summary
|
||||
Reference in New Issue
Block a user