Machine Learning on Big Data Workshop

By Yury Zhuk on February 25, 2026 · 1 min read

Machine Learning on Big Data Workshop

Materials for the Machine Learning on Big Data workshop for Lumos Student DS Consulting

#AI #machine learning

A hands-on workshop covering machine learning fundamentals applied to large-scale datasets, highlighting practical strategies like reducing data size, chunking, lazy loading, and using efficient file formats such as Parquet.

I compare tools like Pandas, DuckDB, Polars, and Spark, and provide guidance on choosing between local machines, cloud VMs, or distributed systems depending on data size, frequency, and team needs.

Finally, I cover real-world pipeline considerations such as batch vs streaming workflows, orchestration tools, and managed platforms, showing how ML projects typically scale from local prototypes to production data pipelines. 

Workshop

Open in Google Colab or Download Notebook (.ipynb)
Download Slides PDF

https://www.linkedin.com/feed/update/urn:li:activity:7432726202823610369/ https://www.linkedin.com/feed/update/urn:li:activity:7432726202823610369/

Need support for your AI project?

Let's work together!

Related Posts

← Back to Blog