Update 12/12/2020 — Utilizing pyenv virtualenv to automate activate/deactivate the virtual environment. — Joe Klemmer Suggestion
Recently, I treated myself by purchasing a new Macbook Pro and started setting up my development environment for python. This step is pretty essential because if you did it the wrong way, plenty of issues would arise and distract you from focusing on the actual development work. So, I’ve decided to write a comprehensive guide listing the steps to help others setting their python development environment from scratch. Hence, you have a more comfortable life handling errors. Let’s get started!
We previously discussed some of the common ways to leverage the prediction power of Machine Learning (ML) models. These methods are mainly utilized to improve model generalizability by splitting the data into particular schemes.
One significant challenge for data scientists, data analysts, and machine learning engineers is to showcase and demo their models to non-technical personnel. That often demands additional skills, including frontend development, backend development, and sometimes even devops. Even if you are skilled in these areas, it takes a tremendous amount of time to get the job done. Some libraries can handle this job for you and let you focus more on model development tasks. Gradio is a great example; it’s a Python library that generates an easy-to-use UI for every machine learning model.
This tutorial…
It is an excellent trait to learn from others and start from where they stopped. Also, I acknowledge that the solution to any problem is to understand it well and then judge different solutions until it gets a stable final one.
Since I started my account on Kaggle two years ago, my data science and machine learning skills have advanced dramatically. Such improvement came from either competing or reading other kaggler’s notebooks. However, whenever I submitted a submission, I never got up to the public board. I started to question myself. Why is that? …
In my previous article, I walked you through a step by step tutorial to correctly setting up the python3 environment using pyenv on your local machine. However, some issues may arise in some circumstances due to some conflicts resulting from pull requests that use different python versions or even missing modules.
This tutorial will demonstrate the steps to build a standard workflow for any project utilizing Github actions plugins. Configuring Github actions in the project CI folder would protect the codebase pull requests that don’t meet the code standards regarding code formatting, syntax error, and version conflict. …
As a data scientist, you may sometimes require to send complex object hierarchies over a network or save your objects’ internal state to a disk or database for later use. To achieve this, you can use a process called serialization, which is entirely supported by the standard library. This article will discuss the following:
This post will discuss the different strategies to encode the categorical variables as a preprocessing step necessary to develop reliable machine learning models. Encode categorical variables is considered one of several actions that could enhance the model performance if applied appropriately. There are different encoding types; in this article, I picked the widely used encoders to discuss their pros and cons, and when it is appropriate to use as possible as I can. So, let’s get started.
Disclaimer: I attached an executable notebook for each encoding strategy hosted on kaggle. …
In the previous post, I gave a high level overview of the required steps to develop a machine learning pipeline according to industry best practices. In this part, I will continue, and build accessible REST-API endpoints and deploy it to the Heroku platform using CI/CD pipeline.
Most of the code in this post is already explained in part#1.
The project repo at Github in case you want to follow along.
Let’s break it down one by one to understand the structure.
REST…
Machine learning models could have tremendous value only when delivered to the end-users. The end-user could be recommender systems in the real-estate platform that suggests properties to renters or investors — Zillow, for instance.
However, machine learning projects can only be successful when a model is deployed, and its predictions are being served.
I was surprised that the machine learning deployment is unusually discussed online — this particular skill you need to learn in the practice workflow.
I tried to google this particular topic, but I found many blog posts about setting Flask APIs for machine learning models. …
Picking the right machine learning algorithm is decisive, where it decides the performance of the model. The most dominating factor in choosing a model is the performance, which employs the KFold-cross-validation technique to achieve independence.
The chosen model usually has a higher mean performance. Nevertheless, sometimes it originated through a statistical fluke. There are many statistical hypothesis-testing approaches to evaluate the mean performance difference resulting from the cross-validation to address this concern. If the difference is above the significance level `p-value` we can reject the null hypothesis that the two algorithms are the same, and the difference is not significant.
Data Scientist graduate student at CUNY with software engineering background. I build things that lives on the internet http://linkedin.com/in/salma-elshahawy