HomeEventsMore speakers and sessions announced for the Data Engineering Summit 2024

More speakers and sessions announced for the Data Engineering Summit 2024

We couldn't be more excited to announce the schedule for Data Engineering Summittogether with ODSC East from April twenty third to twenty fourth is now live! We have a formidable roster of experts, thought leaders and practitioners. Below is a taste of what to anticipate.

Experimentation platform at DoorDash

Yixin Tang│Engineering Manager│DoorDash

DoorDash's experimentation platform is an integral part, leveraging big data tools to assist make 1000’s of choices on daily basis. Discover how DoorDash uses the platform to make decisions related to business strategies, machine learning models, optimization algorithms and infrastructure changes.

Data infrastructure when it comes to scale, performance and value

Elliott Cordo │Founder, Architect, Contractor │Datafutures

Despite their apparent benefits (time savings, increased productivity), monoliths present greater challenges, especially as complexity increases and teams turn out to be larger. This session will discuss strategies and technologies to avoid monoliths and their pitfalls.

From Research to Enterprise: Leveraging Base Models for Improved ETL, Analytics, and Delivery

Ines Chami │Co-founder and chief scientist │NUMBERS STATION AI

Join this session to explore current research on applying basis models to structured data and their applications in the trendy data stack from Stanford University and Numbers Station AI.

Creating data contracts with open source tools

Jean-Georges Perrin │AbeaData │CIO

In this session, you'll discuss data contracts, starting with an introduction that covers:

  • What is a knowledge contract?
  • What is its purpose?
  • Why does it make life easier for data engineers?

You'll then put it into practice, using open source tools to create a skeleton data contract that may provide help to learn more about its lifecycle.

Why the hype about dbt is justified

Dustin Dorsey │Sr. Cloud Data Architect │Onix

In just half-hour, you'll learn what dbt really is, what makes it unique, and why it's so way more than simply SQL. They discuss what makes it so popular (and unpopular) as a knowledge transformation tool and the driving aspects behind these opinions, clarifying some mistruths along the best way.

Clean on the go: Basic hygiene in the trendy data stack

Eric Callahan │Principal, Data Solutions │Pickaxe Foundry

Join this session to get an summary of the challenges that arise from the “I’ll clean it up later” mentality. In particular

  • Lots of little cleanup work for later
  • Confusion amongst colleagues attempting to use incomplete data sets
  • Missing activation metadata across the Modern Data Stack

And some solutions that may provide long-term advantages.

Unlocking the Unstructured with Generative AI: Trends, Models, and Future Directions

Jay Mishra │Chief Operating Officer │Astera

Join this session to explore the modern applications of generative AI in natural language processing and computer vision and highlight the technologies driving this development, including transformer architectures, attention mechanisms, and integrating OCR for processing scanned documents.

Decoding data architectures

James Serra │Data and AI Architect │Microsoft

Take a tour of Data Fabric, Data Lakehouse and Data Mesh and study the various benefits and drawbacks. You'll also examine common data architecture concepts, including data warehouses and data lakes, to find out the information architecture best suited to your needs.

Designing ETL Pipelines with Delta Lake and Structured Streaming – How to Plan Things Properly

Tathagata Das │Staff Software Engineer │Databricks

Structured streaming has proven to be the very best framework for constructing distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in features make it easy for developers to precise complex calculations. Delta Lake, then again, is the very best strategy to store structured data because it is an open source storage layer that brings ACID transactions to Apache Spark and large data workloads. Together, these could make constructing pipelines very easy in lots of common scenarios.

In a fancy ecosystem of storage systems and workloads, it will be important for a developer to know the issue being solved. By understanding the necessities of the issue, you possibly can design your pipeline to be as resource efficient as possible. Join this session to explore quite a lot of common streaming design patterns that will be used.

Data technology within the age of genetic AI

Ryan Boyd │Co-Founder │MotherDuck

This talk explores the changes in hardware and mindset which can be enabling a brand new type of software optimized for the 95% of us who don't have to process petabytes each day. Instead of consensus algorithms for large-scale distributed computing, can our engineers give attention to making data more accessible and usable and reducing the time between “problem” and “answer”?

The value of a semantic layer for GenAI

Jeff Curran│Senior Data Scientist │AtScale

Krishna Srihasam│Senior Data Scientist │AtScale

In this session you’ll learn methods to integrate business terminology and logic into the logic of an LLM, enabling queries to the database in natural language (as an alternative of SQL). In this session, you’ll explore the results of coupling this LLM with AtScale's query engine via an LLM and semantic layer powered chat bot.

Create security and savings: Master a secure, cost-effective cloud data lake

Ori Nakar│ Chief Engineer, Threat Research │Imperva

Johnathan Azaria │ Data Science TechLead │Imperva

Discover two novel data lake monitoring techniques that leverage each object storage protocols and query engine protocols. Dive deep into our aggregation strategies and discover how anomaly detection will be applied to this consolidated data. You'll see how improved access control mechanisms can increase the safety of your data lake and reduce the danger of information leaks and data corruption. We also highlight how these insights will be used to reduce the attack surface and discover and resolve cost anomalies and system disruptions.

Sign me up!

Get your passport Attend these and other sessions on the Data Engineering Summit in April. But you'd higher act quickly. Prices will rise soon.


Please enter your comment!
Please enter your name here

Must Read