Question

    A company has a large dataset with a mix of numeric and categorical data. To ensure fair comparisons between variables, which data transformation technique should the analyst apply?

    A Imputation Correct Answer Incorrect Answer
    B Standardization Correct Answer Incorrect Answer
    C Normalization Correct Answer Incorrect Answer
    D Encoding Correct Answer Incorrect Answer
    E Aggregation Correct Answer Incorrect Answer

    Solution

    Normalization is a data transformation technique that rescales numeric values to a common scale, often between 0 and 1, while retaining relative differences between them. This method is crucial when dealing with mixed data types, as it allows fair comparisons between numerical variables, especially when they are on different scales. Normalization helps to mitigate the influence of large values dominating smaller ones in the analysis, particularly in machine learning models. When working with mixed data, normalization ensures that each variable contributes equally to the analysis without scale bias. The other options are incorrect because: • Option 1 (Imputation) deals with missing data, not rescaling variables. • Option 2 (Standardization) adjusts for mean and variance but does not rescale to a fixed range, which may not be suitable for all models. • Option 4 (Encoding) converts categorical data to numeric but doesn’t affect numeric variable scales. • Option 5 (Aggregation) combines data points but doesn’t standardize or normalize them.

    Practice Next