Question 1 of 65
A machine learning team needs to prepare a large dataset stored in Amazon S3 for training. The dataset contains millions of records with missing values, categorical variables that need encoding, and numerical features requiring normalization. The team wants a serverless, scalable solution that can handle this ETL workload efficiently. Which approach should they use?