Update 12/12/2020 — Utilizing pyenv virtualenv to automate activate/deactivate the virtual environment. — Joe Klemmer Suggestion
Recently, I treated myself by purchasing a new Macbook Pro and started setting up my development environment for python. This step is pretty essential because if you did it the wrong way, plenty of issues would arise and distract you from focusing on the actual development work. So, I’ve decided to write a comprehensive guide listing the steps to help others setting their python development environment from scratch. Hence, you have a more comfortable life handling errors. Let’s get started!
According to Wikipedia, cron is a software utility and a time-based task scheduling found in Unix-like operating systems. Cron runs in the background, and executes scheduled jobs with cron known as “cron jobs”.
This guide provides a high-level overview of how to schedule jobs using cron’s proper syntax. It further goes over some ways to make job schedules more straightforward to write and understand.
To complete this guide, you’ll need
1. Access to an AWS EC2 instance, this tutorial based on Ubuntu 18.04 version
2. EC2 credentials, stored in a
3. Secured transfer applications such as WinSCP for…
Data scientists excel at creating models representing and predicting real-world data, but actually putting machine learning models in use is more of an art than science. Deployment requires skills more commonly found in software engineering and DevOps. Venturebeat states that 87% of data science projects never make it to production, while redapt indicates 90%. Both highlight that a significant factor that determines success and failure is collaboration and iteration as a team.
A machine learning model’s goal is to solve a problem, and it can only do that when consumed through…
We previously discussed some of the common ways to leverage the prediction power of Machine Learning (ML) models. These methods are mainly utilized to improve model generalizability by splitting the data into particular schemes.
One significant challenge for data scientists, data analysts, and machine learning engineers is to showcase and demo their models to non-technical personnel. That often demands additional skills, including frontend development, backend development, and sometimes even devops. Even if you are skilled in these areas, it takes a tremendous amount of time to get the job done. Some libraries can handle this job for you and let you focus more on model development tasks. Gradio is a great example; it’s a Python library that generates an easy-to-use UI for every machine learning model.
It is an excellent trait to learn from others and start from where they stopped. Also, I acknowledge that the solution to any problem is to understand it well and then judge different solutions until it gets a stable final one.
Since I started my account on Kaggle two years ago, my data science and machine learning skills have advanced dramatically. Such improvement came from either competing or reading other kaggler’s notebooks. However, whenever I submitted a submission, I never got up to the public board. I started to question myself. Why is that? …
In my previous article, I walked you through a step by step tutorial to correctly setting up the python3 environment using pyenv on your local machine. However, some issues may arise in some circumstances due to some conflicts resulting from pull requests that use different python versions or even missing modules.
This tutorial will demonstrate the steps to build a standard workflow for any project utilizing Github actions plugins. Configuring Github actions in the project CI folder would protect the codebase pull requests that don’t meet the code standards regarding code formatting, syntax error, and version conflict. …
As a data scientist, you may sometimes require to send complex object hierarchies over a network or save your objects’ internal state to a disk or database for later use. To achieve this, you can use a process called serialization, which is entirely supported by the standard library. This article will discuss the following:
This post will discuss the different strategies to encode the categorical variables as a preprocessing step necessary to develop reliable machine learning models. Encode categorical variables is considered one of several actions that could enhance the model performance if applied appropriately. There are different encoding types; in this article, I picked the widely used encoders to discuss their pros and cons, and when it is appropriate to use as possible as I can. So, let’s get started.
Disclaimer: I attached an executable notebook for each encoding strategy hosted on kaggle. …
In the previous post, I gave a high level overview of the required steps to develop a machine learning pipeline according to industry best practices. In this part, I will continue, and build accessible REST-API endpoints and deploy it to the Heroku platform using CI/CD pipeline.
Most of the code in this post is already explained in part#1.
The project repo at Github in case you want to follow along.
Let’s break it down one by one to understand the structure.