Problem statement
- Schreiber has plants across US, Europe, India & Mexico
- SFI uses legacy DWH technologies.
- The ask is to migrate to cloud based Data lake for easy scaling, uniform solution provision, self service by business users and adapting to big data world and AI for getting better insight using data.
Domains & Stakeholders
- Domains : Supply chain, Customer experience, Operations, Finance, Project Life cycle Management (PLM), Global IT
- Stake Holders : Product owner, scrum master, BRM, Data engineers, Power Bi developers, Business Analyst
Solution Provided
- Set up cloud-based Data Lakehouse with Medallion architecture to store data in better order
- Migrating the legacy Oracle based data warehouse to cloud
- Migrating the legacy reports to Power BI reports
Methodology
- Project delivery Methodology : Agile (Moved from Waterfall to Agile)
- Number of Squads : scaled up from 1 to 6 Squads (each Squad contains 5 to 6 people)
Current Status
- Diver : Out of 125 Diver models, 90 were converted to PBI semantic models and rest are decommissioned
- Tableau : out of 410 reports, 200 are converted to PBI semantic models/reports and rest are decommissioned
- Dimensions are built in Curated layer to use across project
- Data lake is built in enriched layer to utilize the massaged raw data for reporting, AI/ML use case
- Data sourcing done from Oracle, MS SQL, Web API, Manual excel files, SharePoint, parquet files, Json data, xml data etc.
- Reusable scripts are written to smooth movement of data between layers and to be plugged into project specific dataflows
- PoC done for streaming data to analyze power consumption in one of the plant
- AI/ML : GCP Vertex AI, Databricks Auto ML models built for Demand Forecast
- Gen AI : Integrating with Palantir to build Pilot Project in Supply chain domain
- Data Science : Building Data science Capability