Robust Data Transformation and Real Time Feature Engineering
A feature store provides a means for creating a feature list in a logical group. This ensures they can be ready for training or inference. However, a feature store is much more than just a catalog. It is also a data transformation service for creating features.
Data scientists can easily create features using simple APIs. These APIs allow them to create complex functions, including aggregation, sliding window, joins custom functions, etc. By using these abstract APIs, data scientists can create complex features while reducing the dependencies on data engineers.
Advanced feature stores can handle features for batch processing as well as real-time feature engineering. As a result, the same abstract API can be used for calculating features in real-time based on event streaming (e.g. streaming coming from Kafka, Amazon Kinesis, etc.).
Write Features Once
Data scientists often create features while they prepare their models for training. However, once they need to take it to the operational production pipeline, they pass it over to the data engineers. Then, the data engineers need to write code in Spark or Java to make it ready for production.
This process becomes much easier with feature stores. Through the feature store APIs, features can be reused for both training and online inference, without the need to rewrite the code again for production.
Monitoring and Drift Detection
Alongside the features definition and the code for creating them, feature statistics are also captured. Advanced feature stores are even integrated to the production pipeline to capture the feature vector statistics as well.
By combining the statistics of the trained features and the "live" features that are sent to the model, the feature store can generate a drift report. This will help utilize data drift to identify potential model drift.
A feature store catalog enables engineers to share features, search for them and also collaborate. They can also evaluate features with detailed statistics and analysis, to see how features correlate to data sources and models.
To learn more about feature stores, check out our Feature Store page.