Question
Which data transformation technique would be best for
converting categorical variables, such as āGenderā (Male, Female), into a format usable in machine learning models?Solution
One-hot encoding is a technique used to convert categorical variables into a numerical format, where each category is represented by a binary variable. For instance, in the āGenderā variable, one-hot encoding would create two binary columns: āMaleā and āFemale.ā Each observation will have a value of 1 in one column and 0 in the other, making the data usable in machine learning algorithms that require numerical input. One-hot encoding prevents ordinal relationships from being falsely implied, ensuring accurate representation of non-numeric data in modeling. The other options are incorrect because: ⢠Option 1 (normalization) scales data but is ineffective for categorical conversion. ⢠Option 3 (logarithmic transformation) is used for continuous data to reduce skew, not categorical data. ⢠Option 4 (binning) groups continuous data into categories rather than encoding existing categories. ⢠Option 5 (polynomial transformation) applies to numerical features and is unrelated to categorical conversion.
The efficiency of a woman is 33.33% less than that of a man, whose efficiency is three times that of a child. If 2 children, 3 women, and 4 men together...
24.89² ÷ (34.33 ÷ 20.02) + 67.85 – 89.01 = ?
A sum of Rs.8200 is to be divided among 6 brothers, 5 sisters and 2 aunts. If each sister receives thrice as much as each aunt and each brother receives...
The average of first nine prime numbers isĀ
Select the option that is related to the third number in the same way as the second number is related to the ļ¬rst number.
12 : 60 :: 16 : ?
Which of the following is not a binary number?
If the sum of the digits of a three digit number is subtracted from that number, then it will always be divisible by
(0.04)5 × (0.2)4 ÷ (0.008)2 = (0.2)?
The following chart gives the expenditure incurred by a publisher to bring out a book
The mean deviation of the data 3, 10, 10, 4, 7, 10, 5 is: