HomeEvents10 essential topics on the Data Engineering Summit 2024

10 essential topics on the Data Engineering Summit 2024

Conferences should not just speeches in front of podiums at venues; They represent trends, topics and problems relevant to on a regular basis life. At the Data Engineering Summit, happening April 23-24 along side ODSC East, we are going to explore several key topics that may help your data engineering team succeed. So listed here are ten key topics that shall be covered on the Data Engineering Summit in April.

Is Gen AI an information engineering or software engineering problem?

Generative AI is just not just an information engineering or software development problem, but fairly a collaborative effort that requires each. Data engineers prepare the training data while software engineers design and construct the models, making generative AI a two-pronged approach. Teams need to make a decision which aspect of the generative AI pipeline they need to tackle in order that it doesn't grow to be an issue for everybody!

Related session: Is Gen AI an information engineering problem or a software engineering problem?: Barr Moses, co-founder and CEO of Monte Carlo

Data infrastructure

Data engineering teams are challenged with transforming data from disparate sources right into a usable format, scaling systems to handle growing volumes of information, and ensuring data security and compliance. They also fight against technical debt as a consequence of previous shortcuts and ensure data quality is maintained to avoid unreliable results.

Related Session: Data Infrastructure for Scale, Performance, and Ease of Use: Ryan Boyd, Co-founder of MotherDuck

Foundation models

Foundation models are groundbreaking in AI. They are based on huge, diverse data and are like powerful, adaptable AI tools. Unlike single-use models, they could be fine-tuned for a lot of tasks, from speech tasks to image generation. Their power pushes the boundaries of what AI can do.

Related Session: From Research to Enterprise: Leveraging Base Models for Improved ETL, Analysis and Delivery: Ines Chami, Co-Founder and Chief Scientist at NUMBERS STATION AI

Data contracts

A knowledge contract is sort of a handshake for data exchange. It clarifies between provider and consumer: what the info looks like (format), what it means (definitions), how good it’s (quality) and the way it’s delivered (frequency, access). It ensures that everybody speaks the identical data language.

Related session: Building data contracts with open source tools: Jean-Georges Perrin, CIO at AbeaData

Semantic layers

A semantic layer simplifies data evaluation by translating complex data structures into business terms and presenting a unified view from various sources. This empowers users and promotes data-driven decisions.

Related Session: The Value of a Semantic Layer for GenAI: Jeff Curran, Senior Data Scientist at AtScale

Unstructured data

Unstructured data is information that doesn't fit neatly right into a predefined format like a spreadsheet. Think of it like a giant stack of documents, emails, videos, and social media posts. Although this data is worthwhile, it could actually be messy and difficult for computers to research directly.

Related Session: Unlocking the Unstructured with Generative AI: Trends, Models and Future Directions: Jay Mishra, Chief Operating Officer at Astera

Monolithic architecture

In software development, a monolithic architecture is a standard approach wherein the whole application is built as a single, self-contained unit. Imagine an enormous, monolithic rock – every part is intimately connected and inseparable. This includes the user interface (what you see and what you interact with), business logic (the core functions), and data storage (where information is stored).

Related Session: Data Pipeline Architecture – Stop Building Monoliths: Elliott Cordo, Founder, Architect and Builder at Datafutures

Experimental platforms

An experimentation platform is a tool for conducting A/B testing on web sites, apps, or marketing campaigns. You create variations of what you must test (e.g. latest layout, pricing), and the platform shows them to different users, analyzes the outcomes, and tells you which of them variation works best. It helps make data-driven decisions and improve product performance.

Related Session: Experimentation Platform at DoorDash: Yixin Tang, Engineer Manager at DoorDash

Open data lakes

An open data lake is an information lake that focuses on openness and adaptability. It stores data in vendor-neutral formats and uses open standards for easier access and collaboration, avoiding lock-in to specific vendors. Think of it as a public park on your dates fairly than a non-public walled garden.

Related Session: Dive into the Data: The Future of the Single Source of Truth is an Open Data Lake: Christina Taylor, Senior Staff Engineer at Catalyst Software

Data-centric AI

Data-centric AI disrupts the normal approach. Instead of prioritizing models, it focuses on high-quality data (labeling, cleansing, augmentation) to coach them. This iterative cycle continually improves the info to supply higher AI results. Imagine constructing a house: using one of the best tools with poor materials won't work. Data-centric AI ensures that strong data is the muse for reliable AI.

Related Session: How to Practice Data-Centric AI and Let AI Improve Your Own Dataset: Jonas Mueller, Chief Scientist and Co-Founder of Cleanlab

Sign me up!

As every data engineering skilled knows, one of the best technique to stay ahead of the curve is to stay awake to this point on every part related to data and data engineering. The best technique to do that is to attend the ODSC Data Engineering Summit and ODSC East.

At the Data Engineering Summit on April 24, co-located with ODSC East 2024, you'll be on the forefront of all the massive changes coming before they occur. Get your pass today and stay one step ahead.


Please enter your comment!
Please enter your name here

Must Read