A step-by-step guide helps data scientists with no experience in shell commands to set up their python development environment to share reproducible notebooks.


Recently, I treated myself by purchasing a new Macbook Pro and started setting up my development environment for python. This step is pretty essential because if you did it the wrong way, plenty of…

A step-by-step guide to automating data extraction jobs from the EC2 in 5 minutes

Photo by Debby Hudson on Unsplash


According to Wikipedia, cron is a software utility and a time-based task scheduling found in Unix-like operating systems. Cron runs in the background, and executes scheduled jobs with cron known as “cron jobs”.

This guide provides a high-level overview of how to schedule jobs using cron’s proper syntax. It further…

How I built a great UI for a complicated sketch recognition model

Sketch prediction application — image by the author
the final deployed application — video by the author


Data scientists excel at creating models representing and predicting real-world data, but actually putting machine learning models in use is more of an art than science. Deployment requires skills more commonly found in software engineering and DevOps. Venturebeat states that 87%…

Lessons from Kaggle: Compare ensembles algorithms in terms of model accuracy, robustness, and generalization. Implementation included!

Ensembles methods — image by the author


We previously discussed some of the common ways to leverage the prediction power of Machine Learning (ML) models. These methods are mainly utilized to improve model generalizability by splitting the data into particular schemes.

How I built a great UI for my PDF-to-audiobook converter

PDF to Audiobook
Final interface hosted at Gradio hub!


One significant challenge for data scientists, data analysts, and machine learning engineers is to showcase and demo their models to non-technical personnel. That often demands additional skills, including frontend development, backend development, and sometimes even devops. Even if you are skilled in these areas…

Lessons I learned from Kaggle to enhance the final model performance. Each evaluation strategy’s pros and cons and when to use each one are included.

My performance compared to other kagglers 🤔 — image by the author


It is an excellent trait to learn from others and start from where they stopped. Also, I acknowledge that the solution to any problem is to understand it well and then judge different solutions until it gets a stable final one.

Since I started my account on Kaggle two years…

This article aims to automate the ML projects hosted on GitHub and reducing code-breaking via pull requests.



In my previous article, I walked you through a step by step tutorial to correctly setting up the python3 environment using pyenv on your local machine. …

This article will discuss the proper ways for serialization considering multiprocessing tasks. The more robust serialization package dill improves the situation over the pickle package.



As a data scientist, you may sometimes require to send complex object hierarchies over a network or save your objects’ internal state to a disk or database for later use. To achieve this, you can use a process called serialization, which is entirely supported by the standard library. …

A comparison between different categorical encoding strategies commonly used in the machine learning models preprocessing pipelines. This guide aims to assist you in selecting the right strategy based on your application.

Photo by Dan Gold on Unsplash


This post will discuss the different strategies to encode the categorical variables as a preprocessing step necessary to develop reliable machine learning models. Encode categorical variables is considered one of several actions that could enhance the model performance if applied appropriately. There are different encoding types; in this article, I…

Deploy the Machine Learning API to Heroku using the CircleCI pipeline with fewer command

Photo by timJ on Unsplash

In the previous post, I gave a high level overview of the required steps to develop a machine learning pipeline according to industry best practices. In this part, I will continue, and build accessible REST-API endpoints and deploy it to the Heroku platform using CI/CD pipeline.

Salma El Shahawy

Data Engineer at Fortune Magazine. https://www.linkedin.com/in/salma-elshahawy/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store