HomeEventsAnnouncement of the primary speakers for the Data Engineering Summit 2024

Announcement of the primary speakers for the Data Engineering Summit 2024

We couldn't be more excited to announce the primary sessions for our second yearbook Data Engineering Summittogether with ODSC East this April. Join us for 2 days of lectures and panels from leading experts and data engineering pioneers. In the meantime, try our first session group.

How to practice data-centric AI and let the AI ​​improve its own data set

Jonas Müller | Chief Scientist and Co-Founder | Cleanlab

Data-centric AI is poised to play a critical role in machine learning projects. Manual work is not any longer the one approach to improve data. Instead, data-centric AI introduces systematic techniques to seek out and fix dataset problems using the bottom model, allowing you to enhance the performance of your model without changing the code.

In this session, you'll learn methods to implement fundamental data-centric AI ideas on a wide range of data sets. By exploring real data, this session provides you with the knowledge to immediately retrain higher models.

Tutorial: Introduction to Apache Arrow and Apache Parquet using Python and Pyrarrow

Andrew Lamb | Chairman of the Apache Arrow Program Management Committee | Employee software developer | Influx data

Dive deep into the fundamentals of Apache Arrow and Apache Parquet with Andrew Lamb. You will learn methods to do it Load data to/from Pyrarrow arrays, CSV and Parquet files and learn methods to quickly perform evaluation operations corresponding to filtering, aggregation, joining and sorting with Pyrarrow.

As you complete these tasks, you'll experience first-hand the advantages of the open Arrow ecosystem and the way Arrow enables fast and efficient interoperability with Pandas, pol.rs, DataFusion, DuckDB, and other technologies that support the Arrow storage format.

Data technology within the age of information regulations

Alex Gorelik | Distinguished Engineer | LinkedIn

As AI evolves, so do data regulations corresponding to GDPR, CCPA, DMA, and plenty of others. These regulations give users control over their data and limit how firms can use that data. In many cases, the flexibility to operate in a rustic depends upon compliance with these restrictions.

This talk will provide a real-world example of methods to translate these regulations into policy after which integrate policy enforcement into data engineering practices.

The 12-factor app for data

James Bowkett | Technical Delivery Director | Open Credo

To cope with an increasingly data-centric world, the 12-Factor App helps define methods to take into consideration and design cloud-native applications. This session will walk you thru the 12 principles of designing data-centric applications which have proven useful in 4 categories: Architecture and Design, Quality and Validation (Observability), Audit and Explainability, and Consumption.

Technical knowledge graph data for a semantic suggestion AI system

Ethan Hamilton | Data Engineer | Corporate knowledge

In this in-depth session, you'll learn methods to design a semantic suggestion system. These systems represent data as knowledge graphs and implement graph traversal algorithms to make it easier to seek out content in large datasets. Not only are these systems useful for a wide range of industries, also they are fun for data engineers to work on.

Data Pipeline Architecture – Stop constructing monoliths

Elliott Cordo | Founder, architect, constructing contractor | Data futures

Although data monoliths are widely used, they present some challenges, particularly for larger teams and organizations enabling federated data product development.

In this session, you’ll explore possible solutions from microservices and event-based architecture, with a give attention to multi-airflow infrastructure, micro-DAG packaging and deployment, DBT multi-project implementation, rational use of containers, and data sharing/publishing strategies.

Is Gen AI a knowledge engineering or software engineering problem?

Barr Moses | Co-Founder and CEO | Monte Carlo

At the start, Gen AI gave the impression to be a software engineering and API integration project. However, as production and talent develop into more accessible, the teams that get a head start to find ways to use genetic AI will come out on top. Join this session with Barr Moses to listen to his perspective on whether Gen AI is a knowledge engineering or software engineering problem.

Dive into the info: The way forward for the only source of truth is an open data lake

Christina Taylor | Senior Engineer | Catalyst software

Join this session to study constructing a centralized data repository that ingests data from a wide range of sources, including service databases, SAAS applications, unstructured files, and conversational data. Using real-world examples, you'll find out how you possibly can reduce costs and vendor lock-in by migrating from proprietary data warehouses to an open data lake.

With the insights gained on this session, you can be higher capable of select essentially the most appropriate technology for various analytics, machine learning, and product use cases.

The story of Apache Parquet reaching the head of friendship with data engineers

Gokul Prabagaren | Technical Manager | CapitalOne

Join this session to find out how a 100% cloud-based company does business Data processing pipeline and the way Apache Parquet plays a central role in every step of our processing. You'll explore a wide range of design patterns implemented with Parquet and Spark and find out how using Apache Parquet has increased business resiliency.


At the Data Engineering Summit on April 24, co-located with ODSC East 2024, you'll be on the forefront of all the large changes coming before they occur. Get your pass today and stay one step ahead.


Please enter your comment!
Please enter your name here

Must Read