Supporting streaming deployments with Qwak

Qwak now supports deploying machine learning (ML) models with event-driven streaming architecture using Apache Kafka to support high-throughput predictions.
Pavel Klushin
Pavel Klushin
Head of Solution Architecture at Qwak
November 22, 2021
Contents
Supporting streaming deployments with Qwak

Qwak now supports deploying machine learning (ML) models with event-driven streaming architecture using Apache Kafka to support high-throughput predictions.

This new capability allows data scientists to deploy with a click ML models as an endpoint to receive data as stream and output predictions as streams.

Real-time ML inference at scale has become an essential part of modern applications. Although we started our deployment service with support of real-time predictions based on a web server, we do see among our customers a high demand for streaming-based ML predictions.
Streaming inference is useful in the following cases:

  • When the inference requests should triggered by an already existing stream of messages
  • When you would like to decouple the caller from the model
  • When you need to handle with prediction service failures because of high prediction traffic

How it works

Once a model is deployed to Qwak using the Streaming option, the deployed model will be triggered when the producer topic receives features/inference requests, and then it pushes the prediction to a consumer topic.



Once the model is deployed, you can track the service health metrics and be alerted if the metrics are above/below certain thresholds, such as error percentage, average throughput, consumed messages, consumer lag, processing lag, and errors over time.


Getting started with Qwak Streaming deployment

Using Qwak Management console


Choose the number of pods and CPU/memory size of the pod, and then add the address of the Bootstrap server and consumer/producer topic names.



Qwak CLI command


qwak models deploy stream \ --model-id "demo_stream" \ --build-id "my_build_id" \ --consumer-bootstrap-server "kafka-bootstrap-server.svc.cluster.local" \ --consumer-topic "in-topic" \ --consumer-group "consumer-group-example" \ --consumer-auto-offset-reset latest \ --consumer-timeout 60000 \ --producer-bootstrap-server "kafka-bootstrap-server.svc.cluster.local" \ --producer-topic "out-topic" \ --producer-compression-type gzip \ --workers 2

Qwak streaming deployment is perfect for event-based predictions that require high throughput, low latency, and fault tolerant environments. 

Get started for free today

Chat with us to see the platform live and discover how we can help simplify your journey deploying AI in production.

say goodbe to complex mlops with Qwak