Modern data platforms can be built and deployed using completely open source, Python packages. In this talk, I’ll cover what constitutes a modern data stack and what open source Python packages can be used to build a stack suitable for the needs of most developers and companies. Rather than a one size fits-all approach, I’ll initially demonstrate the rich ecosystem of technologies available and the pros and cons of the technology choices.
To be concrete, we will demo an instance of this type of self-contained, deployable platform that is composed of specific technology choices for the key components: data pipelines, transformation engine, data warehouse, presentation layer and orchestration. This implementation will use Docker, Python and yes, even some SQL.
Structure
Outcomes
The aim of this talk is to equip attendees with an understanding of the availalbe technology choices and the knowledge to build their own data platforms. This would specifically be useful for attendees who may be software or backend engineers who may also be called upon to own the data stack to support business and analyst use cases. It may also help engineers who may be looking to re-platform legacy, expensive data platforms to a more modern data stack. For research and personal projects, spinning up a modern platform could be useful for compute heavy analytics that have outgrown local development.