# Data Pipeline > ⚠️ This documentation is AI-generated and may contain errors. Data processing pipeline showing how to process multiple batches in parallel. ## Script See `/examples/data_pipeline.rsh` in the repository. ```rush #!/usr/bin/env rush DATASET = "user_analytics" INPUT_DIR = "$HOME/data/raw" OUTPUT_DIR = "$HOME/data/processed" BATCH_SIZE = "1000" echo "Data Processing Pipeline: $DATASET" # Pre-processing stages for stage in validate clean normalize { STAGE_UPPER = "$stage" echo " Stage: $STAGE_UPPER" } # Process batches in parallel BATCH_1_IN = "$INPUT_DIR/batch_001.csv" BATCH_1_OUT = "$OUTPUT_DIR/batch_001.json" # ... (define other batches) parallel { run { echo "[batch_001] Processing $BATCH_1_IN -> $BATCH_1_OUT" echo "[batch_001] Transformed 1000 records" } run { echo "[batch_002] Processing $BATCH_2_IN -> $BATCH_2_OUT" echo "[batch_002] Transformed 1000 records" } # ... (more batches) } echo "All batches processed successfully" ``` ## Key Concepts - **Parallel data processing**: Process multiple batches simultaneously - **Path construction**: Building input/output file paths - **Pipeline stages**: Sequential setup, parallel processing, sequential summary