Data streaming company Confluent just hosted the primary Kafka Summit in Asia in Bengaluru, India. The event saw strong participation from the Kafka community – over 30% of the worldwide community is from the region – and included several customer and partner sessions.
In the keynote, Jay Kreps, CEO and co-founder of the corporate, shared his vision to make use of Confluent to develop universal data products that support each the operational and analytical sides of information. To that end, he and his teammates introduced several innovations to the Confluent ecosystem, including a brand new feature that makes it easier to run AI workloads in real-time.
The offering, Kreps said, will save developers the complexity of coping with quite a lot of tools and languages as they struggle to coach and infer AI models with real-time data. In a conversation with VentureBeat, the corporate's CPO Shaun Clowes delved further into these offerings and the corporate's approach to the age of contemporary AI.
Confluent's Kafka story
Over a decade ago, firms relied heavily on batch data for analytical workloads. The approach worked, but it surely meant only understanding and adding value to information as much as a certain point – not probably the most current information.
To address this gap, a lot of open source technologies have been developed that enable the movement, management and processing of information in real time, including Apache Kafka.
Today, Apache Kafka is the popular alternative for streaming data feeds across 1000’s of organizations.
Confluent has built industrial services (each self-owned and fully managed) on top of it under the leadership of Kreps, certainly one of the unique creators of the open platform.
However, that is barely a part of the puzzle. Last yr, the information streaming player also acquired Immerok, a number one contributor to the Apache Flink project, to process (filter, merge and enrich) the information streams in transit for downstream applications.
Now at Kafka Summit, the corporate introduced AI model inference in its cloud-native offering for Apache Flink, simplifying one of the vital targeted applications with streaming data: real-time AI and machine learning.
“Kafka was designed to enable all of those different systems to work together in real time and enable really amazing experiences,” Clowes explained. “AI has just added fuel to this fireplace. For example, should you are using an LLM, it can catch up and respond when needed. So principally people just keep talking about it, whether it's true or not. At this point, you call the AI and the standard of its response almost all the time relies on the accuracy and freshness of the information. This has all the time been true for traditional machine learning and is very true for contemporary ML.”
To invoke AI with streaming data, teams using Flink previously needed to code and use multiple tools to perform customization across models and data processing pipelines. With AI model inference, Confluent makes this “very interchangeable and composable,” allowing them to make use of easy SQL statements inside the platform to make calls to AI engines, including those from OpenAI, AWS SageMaker, GCP Vertex, and Microsoft Azure.
“You could already use Flink to construct the RAG stack, but you would need to do it using code. You would have to jot down SQL statements, but you then would must use a custom function to call a model and get the embeddings or inference back. On the opposite hand, this makes it simply super pluggable. So without changing the code, you possibly can simply call any embeddings or generation models,” the CPO said.
Flexibility and strength
The company has chosen the plug-and-play approach because it wants to provide users the pliability to decide on the choice they need depending on their use case. Not to say, the performance of those models also continually evolves over time, with no model being a “winner or loser.” This implies that a user can initially start with Model A after which move to Model B as improvements occur without changing the underlying data pipeline.
“In this case, you essentially have two Flink jobs. A Flink job monitors data about customer data and this model generates an embedding from the document fragment and stores it in a vector database. Now you’ve gotten a vector database with the most recent contextual information. Then however you’ve gotten a request for conclusions, like when a customer asks a matter. So you are taking the query from the Flink job and fix it to the documents retrieved using the embeds. And that's it. They call the chosen LLM and transmit the information in response,” Clowes noted.
The company currently offers access to AI model inference to pick customers who construct real-time AI apps with Flink. It plans to expand access and introduce additional features in the approaching months to make running AI apps on streaming data easier, cheaper and faster. Clowes said a part of this effort would also include improvements to the cloud-native offering, which can feature a Gen AI assistant to help users with coding and other tasks of their respective workflows.
“With the AI assistant, you possibly can say, 'Tell me where this topic comes from, tell me where it's going, or tell me what the infrastructure looks like,' and it gives all of the answers and executes tasks. This will help our customers construct really good infrastructure,” he said.
A brand new option to lower your expenses
In addition to approaches to simplify AI efforts with real-time data, Confluent also discussed Freight Clusters, a brand new serverless cluster type for its customers.
Clowes explained that these auto-scaling Freight clusters reap the benefits of cheaper but slower replication across data centers. This introduces some latency but ends in a price reduction of as much as 90%. He said this approach works in lots of use cases, comparable to processing logging/telemetry data fed into indexing or batch aggregation engines.
“With the Kafka standard you possibly can go as much as electrons. Some customers expect extremely low latency of 10-20 milliseconds. However, once we discuss cargo clusters, we’re coping with one to 2 seconds of latency. It remains to be fairly fast and generally is a cost-effective option to collect data,” the CPO noted.
As a next step on this work, each Clowes and Kreps stated that Confluent desires to “make itself known” to expand its presence within the APAC region. In India alone, which already has the corporate's second largest workforce outside the US, there are plans to extend the variety of employees by 25%.
On the product page, Clowes emphasized that they’re researching and investing in ways to enhance data management, essentially shifting control to the left and cataloging data to drive data self-service. These elements are still very immature within the streaming world in comparison with the information lake world, he said.
“We hope that over time your complete ecosystem will even invest more in governance and data products within the streaming space. I’m very confident that this can occur. We as an industry have made greater progress on connectivity and streaming and even stream processing than on the governance side,” he said.