Thursday 11:35
in Helium3
In many enterprises relying on SAP ERP systems, a wealth of valuable master data remains trapped within a closed ecosystem. This creates significant obstacles when striving for a comprehensive, 360° view, especially when integrating with modern, open data lakes built on platforms like Azure and designed around data mesh principles. This talk presents a practical PoC that tackles this challenge head-on, utilizing SAP Datasphere as the key integration point.
Outline:
- The challenge: navigating sap's data silos and the pursuit of a unified view
- The section outlines the enterprise data landscape of RATIONAL where valuable master data resides within SAP’s traditionally closed ecosystem, hindering data democratization and the creation of a comprehensive, 360° operational view. This scenario is quite common - at least for German manufacturing companies. This situation is frequently encountered, particularly among German manufacturers.
- The inherent conflict between the open, distributed nature of data lakes (especially those built on data mesh principles) and the centralized, closed nature of traditional SAP BI environments is discussed.
- Solution overview: leveraging sap datasphere as the integration layer
- An introduction to sap datasphere and its capabilities is provided, with a focus on its ability to connect with non-SAP systems.
- This part explains how datasphere was chosen as the central integration layer for the proof of concept and its role in enabling bi-directional data flow between SAP and the open data lake.
- Architecture of SAP Datasphere
- Introduction in architecture of SAP Datasphere and role of underlying SAP HANA database
- Explanation of openSQL schema as key integration option
- Security first: exploring network integration, authentication and authorization options
- This section details the evaluation of network connectivity options between the Azure services like Azure Databricks, PostgresQL, ADLS and SAP Datasphere
- The methods used to authenticate Python and Pypark to SAP datasphere are explained
- The implementation and evaluation of data authorization mechanisms within SAP Datasphere are described
- Python and PySpark integration
- Available interfaces for python integration (ODBC/JDBC, OData), their features and limitations
- Explanation of practical data integration patterns implemented within the poc for extracting data from sap and loading it into the data lake for full and delta load scenarios
- Reflecting PoC: summary and key learnings
- This section summarizes the core findings and lessons learned from the PoC, particularly regarding security and software quality best practices
- A hint for the SAP open data alliance launched in 2023
Main takeaways:
- An understanding of SAP Datasphere's architecture and its potential for integrating non-SAP, open-source technologies like Python and PySpark
- Knowledge of current features and limitations of SAP Datasphere in the area of data integration with the open source world
Rostislaw Krassow
Rostislaw, a data architect at RATIONAL AG, specializes in distributed databases, the Apache Hadoop ecosystem and Azure cloud. He leverages his expertise to oversee the company's Data & Analytics platform, where his daily work involves reconciling diverse stakeholder perspectives to deliver optimal solutions.