Factorization Machines for High Cardinality Features (Part 4 of 4)
Most real-world data available for use in predictive modeling is not purely numeric data. There are often columns/features of categorical data (e.g. product or customer identifiers, zip codes). Sometimes this categorical data has many unique values. When that happens, it is called a high cardinality feature. There can be a lot of strong signal in high cardinality features, but it can also be very tricky to work with them.
This is the fourth in a 4-part series where Anders Larson and Shea Parkes discuss predictive analytics with high cardinality features. In the prior episodes we focused on approaches to handling individual high cardinality features, but these methods did not explicitly address feature interactions. Factorization Machines can responsibly estimate all pairwise interactions, even when multiple high cardinality features are included. With a healthy selection of high cardinality features, a well tuned Factorization Machine can produce results that are more accurate than any other learning algorithm.