Feature Store

Simplifying Machine Learning with the Qwak Feature Store

In the evolving landscape of machine learning, the management of features is crucial for developing scalable and reproducible models.

Hudson Buzby

Solutions Architect at JFrog ML

August 6, 2024

Contents

Simplifying Machine Learning with the Qwak Feature Store

Feature stores are essential tools for enhancing and streamlining the development of machine learning models. They act as centralized repositories where you can store, manage, and serve features for both training and inference purposes, improving the efficiency and performance of your machine learning workflows.

What is the Qwak Feature Store?

The Qwak Feature Store is designed to manage features across various machine learning models in a unified platform. It allows you to maintain a complete history of features used in training, and it's particularly useful for online serving use cases, where real-time inference is critical. Qwak’s feature store removes the training and serving skew by making sure that the data is stored both in an offline store, mainly for training purposes, and in a low-latency, scalable online store, for serving purposes. The online store keeps the latest state of a set of features, ensuring low-latency feature serving, which is essential for models where recency significantly impacts inference decisions.

Check out the full webinar on how to leverage feature stores in machine learning:

Configuring the Feature Ingestion Pipeline

We'll walk through configuring a batch feature ingestion pipeline on the Qwak platform. While we support a large number of batch and streaming databases and sources, in this specific example we’ll use Snowflake to ingest data into the Qwak Feature Store. Here’s a step-by-step overview:

Data Source Configuration:
- Data sources are essentially data connectors or APIs that link to your existing data repositories, like object storage (S3, Google Blob Storage), relational databases, cloud databases (BigQuery, Redshift, Snowflake), and even streaming sources like Kafka.
- For our example, we'll configure a Snowflake data source. This involves adding a name, description, and metadata, as well as specifying a timestamp column and credentials.
Feature Ingestion Pipeline:
- Next, we'll create a feature ingestion pipeline. Feature sets in Qwak are one-to-one with ingestion pipelines. We can configure these pipelines manually through the UI, or programmatically via Qwak’s SDK/APIs.
- We'll define metadata, a key column (a unique identifier like user ID), scheduling policies, and a backfill start date to manage historical data ingestion.
- After defining the pipeline, we select the columns and transformations needed for our features. Qwak’s UI supports simple aggregations and transformations, which can also be expressed in code for more complex scenarios.
Data Flow Management:
- Once configured, data flows into both the offline and online stores managed by Qwak. The platform ensures consistency and low-latency serving, essential for real-time inference.

Using Features for Training and Inference

With the data now in the Qwak Feature Store, we can utilize it to enhance our machine learning models:

Training with Offline Data:
- Using Qwak’s offline client, we fetch feature data for training. This involves specifying the feature set and columns, then issuing a query to retrieve the data frame for model training.
- Multiple feature sets can be joined as long as they share a logical key, allowing flexibility in combining data sources.
Real-time Inference with Online Data:
- Qwak provides two methods for online feature retrieval: native integration within Qwak’s inference deployment service and manual retrieval using the online client.
- For the native integration, Qwak automatically fetches the latest feature values at request time based on logical identifiers like user IDs.
- Manual retrieval involves querying the online feature store, similar to offline retrieval, but ensuring up-to-date feature values for real-time inference.

Advanced Features and Monitoring

Qwak offers several advanced features:

Distribution Views:
- See the general makeup of your features and get a quick understanding of what these features look like and what they’re composed of.
Feature Monitoring:
- Manage drift monitoring by defining calculation functions, baselines, thresholds, and alert mechanisms (e.g., Slack, PagerDuty).
Feature Lineage and Glossary:
- Track the lineage of features from data sources to models and explore an organizational-level glossary of available features.

Conclusion

The Qwak Feature Store is a powerful tool for managing features, ensuring efficient and accurate machine learning workflows. By centralizing feature management and providing robust tools for both offline and online use cases, Qwak helps streamline the development, deployment, and monitoring of machine learning models.

If you're interested in learning more about the Qwak Feature Store, check out our documentation or request a live demo to explore how it can fit your use cases.