Building a Full Blown ML App with Snowflake Data

In today's data-driven environment, the need for streamlined ML engineering solutions is paramount. The integration of platforms such as Qwak, specifically designed to reduce the complexities associated with ML model lifecycle management, and Snowflake, a leading cloud-based data warehouse platform, allows businesses to effectively leverage their data assets for machine learning endeavors.
Ran Romano
Ran Romano
Co-founder & CPO at Qwak
August 14, 2023
Contents
Building a Full Blown ML App with Snowflake Data

In today's data-driven environment, the need for streamlined ML engineering solutions is paramount. The integration of platforms such as Qwak, specifically designed to reduce the complexities associated with ML model lifecycle management, and the Snowflake Data Cloud, a global network where organizations mobilize their data and apps, put AI to work, and collaborate across teams, allows businesses to effectively leverage their data assets for machine learning endeavors.

This blog dives deep into the nuances of building an ML model managed by Qwak, utilizing data housed in a customer's Snowflake environment. Readers will gain insights into best practices, seamless integration techniques, and the benefits of combining these two powerhouse platforms.

In the first example, we will detail a full-fledged batch execution pipeline which utilizes a few powerful features Qwak offers, such as model versioning and tagging, conditional execution, and more.

‍

Qwak model build

Qwak is an ML engineering platform that simplifies the process of building, deploying, and monitoring machine learning models, bridging the gap between data scientists and engineers.

For the sake of this example we will use the example churn model, detailed in the this Qwak example.

just run the following example in the CLI to create a trained model instance (aka a Qwak build) of the churn model:

For an in-depth explanation of what is a Qwak model build and how it helps ML teams streamline their code to production - see our docs explaining all about the process or watch this video.

Batch based execution using Snowflake data

The example script is comprised of 3 steps:

  1. Fetching data for inference from a Snowflake table
  2. Programmatically querying Qwak builds for the latest and best model version to execute a batch prediction against.
  3. Executing a parallel batch execution job against the model instance (The build id) with the Snowflake data

First let’s see the full example of a working pipeline, and then we’ll break it down to steps with a detailed explanation on each section:

Breakdown

Querying Snowflake

The first phase is not Qwak specific at all. Here we connect and query a Snowflake for the inference data relevant to the current batch execution. Notice that in most cases at least the where clause will be parameterized, especially if the script is scheduled and managed by some orchestration tool.

Fetching the relevant build ID

Every Qwak build can log its own list of parameters and metrics. The interface is as easy as defining `qwak.log_metric("f1", 0.9)

And can be viewed in a list for comparison, or each build individually:

In our case, we fetch the build ID we wish to run a batch execution against programmatically using the QwakClient utility.

There are many useful patterns for this approach. For example, some of our customers do a “Model Per Dimension” type of execution using this mechanism to train a model per customer (same model, different datasets), tag it with the customer name - and then during batch inference programmatically fetches the build relevant for every customer and performs inference against it:

Batch execution

The parallelism, that is, how many tasks are running in parallel, is controlled by the executors parameter. For example, if the executors parameter is set to 5, then Qwak launches 10 tasks, with 5 tasks running in parallel.

The batch mechanism is a powerful mechanism, with many options and toggles to run clean and efficient inference pipelines. You can learn more about it here.

Summing up

Throughout the above, we showed how to create an ML application which runs a batch execution on top of your Snowflake data. Notice that the above example can be orchestrated by any orchestration tool - Airflow, Prefect, and many others. Check out the second blog post in this series to see a more advanced pattern of using Qwak’s feature store (connected to the same Snowflake tables) in order to fetch data for batch executions.

Chat with us to see the platform live and discover how we can help simplify your journey deploying AI in production.

say goodbe to complex mlops with Qwak