HomeArtificial IntelligenceReassessing data management within the age of generative AI

Reassessing data management within the age of generative AI

Generative AI has transformed the technology industry by introducing recent data risks, equivalent to sensitive data leakage through large language models (LLMs), and resulting in increased demands from regulators and governments. To successfully navigate this environment, it is necessary for corporations to deal with core data management principles and ensure they use a sound approach to reinforce large language models with enterprise/non-public data.

A very good place to start out is by rethinking the best way organizations manage data, especially with regard to its use in generative AI solutions. For example:

  • Validating and creating privacy features: Data platforms should be prepared for higher levels of protection and oversight. This requires traditional features equivalent to encryption, anonymization and tokenization, but additionally the creation of capabilities to routinely classify data (confidentiality, taxonomy alignment) through machine learning. Data discovery and cataloging tools may also help, but must be prolonged to align classification with the organization's understanding of its own data. This will enable organizations to effectively apply recent policies and bridge the gap between the conceptual understanding of knowledge and the truth of implementing data solutions.
  • Improving controls, auditability and oversight: Data access, usage, and third-party interaction with enterprise data require recent designs with existing solutions. For example, capture a number of the requirements needed to make sure authorized use of the info. However, enterprises need complete audit trails and monitoring systems to trace how data is used, when data is modified, and whether data is shared through third-party interactions for each recent generation AI solutions and other varieties of AI solutions. It is not any longer enough to manage data by restricting access to it, and we must always also track the use cases for which data is accessed and applied in analytical and operational solutions. Automated alerts and reports on improper access and use (measured by query evaluation, data exfiltration, and network movement) must be developed by infrastructure and data management teams and often reviewed to proactively ensure compliance.
  • Preparing data for next-generation AI: There is a shift from traditional data management patterns and skills, requiring recent discipline to make sure the standard, accuracy and relevance of knowledge for training and augmenting language models for AI deployment. As vector databases develop into more common within the genomic AI space, data governance must be enhanced to accommodate non-traditional data management platforms to make sure the same management practices are applied to those recent architectural components. Data lineage becomes much more necessary as regulators mandate the necessity to supply “explainability” in models.

Enterprise data is commonly complex, diverse and scattered across different repositories, making it difficult to integrate with AI solutions. This complexity is further compounded by the necessity to comply with regulatory requirements, mitigate risks and shut skills gaps in data integration and optimization. generation enhanced by retrieval (RAG) pattern. In addition, data is commonly considered after the very fact when developing and deploying AI solutions, resulting in inefficiencies and inconsistencies.

Unlocking the complete potential of enterprise data for generative AI

At IBM, we’ve got developed an approach to solving these data challenges. The IBM Gen AI Data Ingestion Factory, a managed service designed to unravel the AI ​​“data problem” and unlock the complete potential of enterprise data for generation AI. Our predefined architecture and code blueprints, which will be deployed as a managed service, simplify and speed up the technique of integrating enterprise data into generation AI solutions. We approach this problem with an information management mindset, preparing data for governance, risk and compliance from the beginning.

Our core competencies include:

  • Scalable data ingestion: Reusable services to scale data ingestion and RAG across AI generation use cases and solutions, with optimized chunking and embedding patterns.
  • Regulatory and Compliance: Data is ready to be used with generative AI and meets current and future regulations, enabling corporations to satisfy compliance requirements with market regulations focused on generative AI.
  • Data protection management: Longer texts will be anonymized after they are discovered. This reduces the chance and ensures data protection.

The service is AI and data platform agnostic, meaning it may possibly be deployed anywhere and offers customization options for customer environments and use cases. By leveraging the IBM® Gen AI Data Ingestion Factory, organizations can achieve several key outcomes, including:

  • Reduce data integration time: A managed service that reduces the effort and time required to unravel the AI ​​“data problem.” For example, through the use of a repeatable process to “chunk” and “embed” data in order that development effort will not be required for each recent generation AI use case.
  • Compliant data usage: Helping you comply with data usage regulations for AI applications deployed by the enterprise. For example, ensuring that data from RAG patterns is approved for enterprise use in AI solutions.
  • Risk mitigation: Reducing the chance related to data utilized in AI solutions. For example, providing transparency about what data a model's results come from reduces model risk and reduces the time required to prove to regulators what data the knowledge comes from.
  • Consistent and reproducible results: Providing consistent and reproducible results from LLMs and Gen AI solutions. For example, capturing the provenance and comparing results (i.e. generated data) over time to report on consistency using standard metrics equivalent to ROUGE and BLEU.

Managing the complexity of knowledge risk requires cross-departmental expertise. Our team of former regulators, industry leaders and technology experts at IBM Consulting® are uniquely positioned to deliver consulting services and solutions to deal with these challenges.

Please learn more about our features below. If you’ve any further questions, please contact me at gsbaird@us.ibm.com.

Learn more about how AI governance may also help combat data risks

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read