Case Study

Learn how Iguazio provided Quadient with an out-of-the-box data science platform and next-level MLOps automation capabilities, saving their data scientists and developers precious time and resources

What Are Feature Stores and Why Are They Critical for Scaling Data Science?

Adi Hirschtein | April 7, 2020

A feature store provides a single pane of glass for sharing all available features across the organization. When a data scientist starts a new project, he or she can go to this catalog and easily find the features they are looking for. But a feature store is not only a data layer; it is also a data transformation service enabling users to manipulate raw data and store it as features ready to be used by any machine learning model. These features can then accelerate machine learning use cases through the reduction of duplicate work.

Some of the largest tech companies that deal extensively with AI have built their own feature stores (Uber, Twitter, Google, Netflix, Facebook, Airbnb, etc.). This is a good indication to the rest of the industry of how important it is to use a feature store as a part of an efficient ML pipeline.

Calculating and cataloging features for a feature store 

Creating and then calculating offline features can take place over an extended period of time. The calculations of online features, however, are much more challenging, requiring fast computation as well as fast access to the data. The data can be stored in memory or in a very fast key value database. The process itself can be performed on various services in the cloud or on a platform such as the Iguazio Data Science Platform that has all of these components as a part of its core offering.

But first — let’s talk about access. Easy access.

Offline features are built mostly on frameworks such as Spark or SQL, and then stored in a database or as parquet files. Online features, on the other hand, may require data access using APIs for streaming engines such as Kafka, Kinesis, or in-memory key-value databases such as Redis or Cassandra.

Working with a feature store abstracts any complex data access layer, so when a data scientist is looking for a feature, instead of writing an engineering code he can use a simple API for retrieving the data that he needs.

The Benefits of Having a Feature Store:

Faster development

The feature store concept is built to abstract the data engineering layers that consume so much of data scientists’ time and provide easy access for reading and writing the best features for their models.

Smooth model deployment in production

The feature store enables a consistent feature set between the training and serving layer and enables a smoother deployment process, ensuring that the trained model indeed reflects the way things would work in production.

Increased model accuracy

The feature store catalogs additional metadata for each feature, which can help data scientists tremendously when selecting features for a new model, allowing them to focus on those that have achieved better impact on similar existing models. 

Better collaboration

Feature stores enable everyone in the company to share their work and avoid duplication. 

The ability to track lineage and address regulatory compliance

In a feature store, we can save the data lineage of a feature. This provides the necessary tracking information that captures how the feature was generated and provides the insight, as well as the reports needed for regulatory compliance.

Feature stores and MLOps

Feature stores enable data scientists to reuse features instead of rebuilding these features again and again for different models, saving them valuable time and effort. Feature stores automate this feature engineering process, which is an important part of the MLOps concept

Given the growing number of AI projects and the complexity associated with bringing ML projects to production, the industry needs a way to standardize and automate the core of feature engineering. To read the full article for more information about feature stores, follow this link to Towards Data Science.