Introducing Predictive Analytics with High Cardinality Features (Part 1 of 4)
This is the first in a 4-part series where Anders Larson and Shea Parkes discuss predictive analytics with high cardinality features. In this episode they focus on introducing what high cardinality features are, why you might want to work with them and some of the basic approaches to including them in predictive models.
Most real-world data available for use in predictive modeling is not purely numeric data. There are often columns/features of categorical data (e.g. product or customer identifiers, zip codes). Sometimes this categorical data has many unique values. When that happens, it is called a high cardinality feature. There can be a lot of strong signal in high cardinality features, but it can also be very tricky to work with them.
This is the first in a 4-part series where Anders Larson and Shea Parkes discuss predictive analytics with high cardinality features. In this episode they focus on introducing what high cardinality features are, why you might want to work with them and some of the basic approaches to including them in predictive models.