Over the last 10 years, the world of knowledge tools and infrastructure has exploded. As the founding father of a cloud data infrastructure company within the early days of cloud computing in 2009 and the founding father of a Meetup community for the emerging data engineering community in 2013, I discovered a spot at the middle of this community even before “Data Engineer” was one Job title. From this seat, I can reflect on the teachings we’ve learned from our recent data tool history and the way these should guide the event of a brand new era of AI.
In tech anthropology, 2013 was a time between the “big data” era and the “modern data stack” era. In the large data era, because the name suggests, more data was higher. It was claimed that data held the analytical secrets to unlock recent value in a company.
As a strategic consultant for a big Internet company, I used to be once tasked with developing a plan to chop through the information deluge of billions of DNS queries per day and discover a magical insight that might change into a brand new line of business for the corporate Worth 100 million dollars. Have we found this insight? Not within the relatively short time (months) we needed to spend on the project. It seems that storing large amounts of knowledge is comparatively easy, but generating large insights requires significant effort.
But not everyone recognized this. All you knew was that in case your data house wasn't so as, you couldn't play the insights game. As a result, corporations of all sizes and styles rushed to enhance their data stacks, leading to an explosion within the number of knowledge tools offered by vendors claiming that the answer was the missing piece of a very holistic data stack that might deliver magical insights to a company was on the lookout for.
Note that I don’t use the term “explosion” flippantly – these days MAD (Machine Learning, AI and Data) Landscape of 2024, creator Matt Turck notes that the variety of corporations selling data infrastructure tools and products in 2012 (the yr he began creating his market map) was a meager 139 corporations. In this yr’s edition there are 2,011 – a 14.5-fold increase!
A number of things happened that helped shape the present data landscape. Enterprises began moving more of their on-premises workloads to the cloud. Modern data stack providers offered managed services as composable cloud offerings that might provide customers with greater reliability, greater system flexibility, and the convenience of scaling on demand.
But as corporations went through the zero rate of interest policy (ZIRP) and expanded the number of knowledge tool providers, cracks started to appear within the MDS facade. Issues with system complexity (brought on by many alternative tools), integration challenges (quite a few different point solutions that need to speak with one another), and underutilized cloud services led some to query whether the promise of the MDS panacea could be fulfilled.
Many Fortune 500 corporations had invested heavily in data infrastructure with no clear strategy for extracting value from that data (remember, it's hard to get insights!), leading to inflated costs without proportional value. But the trend was to gather different tools – you’ll often hear reports of multiple overlapping tools utilized by different teams in the identical company. Many corporations have done this in the realm of business intelligence (BI), for instance Tableau, shower and even perhaps a 3rd tool was installed that essentially served the identical business purpose while racking up the invoices 3 times as quickly.
Of course, this sort of excess would ultimately end with the bursting of the ZIRP bubble. Nevertheless, the MAD landscape has not change into smaller, but continues to grow. Why?
What is the brand new “AI stack”?
Apparently, lots of the data tooling corporations were so well capitalized during ZIRP that they’ll proceed to operate despite tight corporate budgets and declining market demand for his or her services. One reason for that is that there continues to be not much churn in the brand count as a result of startup failure or consolidation.
But the fundamental reason is the rise of the subsequent wave of knowledge tools, driven by booming interest in AI. What's unique is that this recent wave of AI gained momentum before any real shakeout or consolidation from the last wave (MDS) was complete, spawning much more recent data tooling corporations.
However, when you imagine, as I do, that the “AI stack” is a fundamentally recent paradigm, then that is somewhat comprehensible. At a high level, AI relies on massive amounts of unstructured data (think Internet-sized mountains of text, images, and videos), while MDS is designed for smaller amounts of structured data (think tabular data in spreadsheets or databases).
Furthermore, the so-called non-deterministic or “generative” nature of AI models is totally different from the deterministic approach developed in additional traditional machine learning (ML) models. These older models were often designed to predict outcomes based on a limited set of coaching data. But the brand new generative AI models are designed to synthesize summaries or generate insights – meaning their output could also be different every time the model is run, even when the inputs haven’t modified. To prove this, note the difference with ChatGPT whenever you ask it the same query two or more times.
Since the architecture and output of AI models are fundamentally different, developers must adopt recent paradigms to check and evaluate such responses based on the unique intent of the user or application. Not to say ensuring the moral security, governance and monitoring of AI systems. Some of the extra areas across the recent AI stack that require further investigation include agent orchestration (AI models communicating with other models); Opportunities around smaller, purpose-built models for vertical use cases, disrupting traditional industries that were previously too expensive and sophisticated to automate; and workflow tools that enable the gathering and curation of fine-tuning datasets that allow corporations to “inject” their very own private data to construct custom models.
All of those and other possibilities will likely be taken under consideration as a part of the brand new AI stack as recent developer platforms emerge. Hundreds of startups are already working on these challenges by developing – you guessed it – a brand new batch of cutting-edge tools.
How can we construct higher and smarter this time?
As we enter this recent “AI era,” I feel it will be significant that we acknowledge where we come from – in spite of everything, data is the mother of AI and the countless data tools in recent history at the very least have a solid education to construct them of corporations on a firm path to treating their data like a first-class citizen. But I ponder: “
One suggestion is that corporations struggle to offer clarity around the particular value they expect from particular data or AI tools for his or her business. Over-investing in technology trends for the mistaken reasons isn’t business strategy, and while AI is currently sucking all of the air out of the space – and money out of corporations' IT and software budgets – it's vital to maneuver towards using tools concentrate on those who can prove clear value and actual ROI.
Another appeal could be for founders to stop constructing “me-too” data and AI tool options. If there are already several tools in the marketplace that you should get into, take the time to ask yourself: “Are we the very best founding team with unique and differentiated experience that may provide key insights into how we approach this problem?” ?” ?” If the reply isn't a powerful “yes,” you shouldn't proceed to develop this tool – regardless of how much money VCs are willing to spend on you.
Finally, investors are advised to think twice about where value is more likely to emerge at different levels of the information and AI tooling stack before investing in early-stage corporations. Too often I see VCs with a single box check: If the toolmaking founder has a certain pedigree or comes from a certain tech company, they immediately write them a check. This is lazy and likewise results in too many undifferentiated data tools flooding the market. No wonder we’d like a magnifying glass to read MAD 2024.
A speaker at a recent conference really helpful that corporations ask themselves, “How much will it cost your online business if a single row of your data is inaccurate?” That is, can you identify a transparent method for articulating a framework for the way Do you should quantify the worth of knowledge or a knowledge tool in your organization?
If we don't even get that far, no amount of budget or enterprise capital in data and AI tools can solve our confusion.