Building a Full Blown ML App with Snowflake Data
In today's data-driven environment, the need for streamlined ML engineering solutions is paramount. The integration of platforms such as Qwak, specifically designed to reduce the complexities associated with ML model lifecycle management, and the Snowflake Data Cloud, a global network where organizations mobilize their data and apps, put AI to work, and collaborate across teams, allows businesses to effectively leverage their data assets for machine learning endeavors.
This blog dives deep into the nuances of building an ML model managed by Qwak, utilizing data housed in a customer's Snowflake environment. Readers will gain insights into best practices, seamless integration techniques, and the benefits of combining these two powerhouse platforms.
In the first example, we will detail a full-fledged batch execution pipeline which utilizes a few powerful features Qwak offers, such as model versioning and tagging, conditional execution, and more.
‍
Qwak model build
Qwak is an ML engineering platform that simplifies the process of building, deploying, and monitoring machine learning models, bridging the gap between data scientists and engineers.
For the sake of this example we will use the example churn model, detailed in the this Qwak example.
just run the following example in the CLI to create a trained model instance (aka a Qwak build) of the churn model:
Batch based execution using Snowflake data
The example script is comprised of 3 steps:
- Fetching data for inference from a Snowflake table
- Programmatically querying Qwak builds for the latest and best model version to execute a batch prediction against.
- Executing a parallel batch execution job against the model instance (The build id) with the Snowflake data
First let’s see the full example of a working pipeline, and then we’ll break it down to steps with a detailed explanation on each section:
Breakdown
Querying Snowflake
The first phase is not Qwak specific at all. Here we connect and query a Snowflake for the inference data relevant to the current batch execution. Notice that in most cases at least the where clause will be parameterized, especially if the script is scheduled and managed by some orchestration tool.
Fetching the relevant build ID
Every Qwak build can log its own list of parameters and metrics. The interface is as easy as defining `qwak.log_metric("f1", 0.9)
And can be viewed in a list for comparison, or each build individually:
In our case, we fetch the build ID we wish to run a batch execution against programmatically using the QwakClient utility.
There are many useful patterns for this approach. For example, some of our customers do a “Model Per Dimension” type of execution using this mechanism to train a model per customer (same model, different datasets), tag it with the customer name - and then during batch inference programmatically fetches the build relevant for every customer and performs inference against it:
Batch execution
The parallelism, that is, how many tasks are running in parallel, is controlled by the executors parameter. For example, if the executors parameter is set to 5, then Qwak launches 10 tasks, with 5 tasks running in parallel.
The batch mechanism is a powerful mechanism, with many options and toggles to run clean and efficient inference pipelines. You can learn more about it here.
Summing up
Throughout the above, we showed how to create an ML application which runs a batch execution on top of your Snowflake data. Notice that the above example can be orchestrated by any orchestration tool - Airflow, Prefect, and many others. Check out the second blog post in this series to see a more advanced pattern of using Qwak’s feature store (connected to the same Snowflake tables) in order to fetch data for batch executions.