Machine Learning

Predict home price with XGBoost using real-time sales data

Photo by Tierra Mallorca on Unsplash

Check out the Web App and the code on my GitHub and feel free to let me know if you have any questions!

Project Background

Whether you’re buying or selling a house, change in home prices will affect your housing plans. Keeping an eye on home prices can give you an idea of what to expect if you plan to buy or sell a house any time soon.

Therefore, in this project, I would like to utilize my data science skills to create an interesting app that would keep track of home price, sales volume, and making price predictions using Machine Learning…


Natural Language Processing

Find your desired product based on review data

Photo by Sharon McCutcheon on Unsplash

Check out the Web App and the code on my GitHub and feel free to let me know if you have any questions!

Project Background

Do you have the same experience as I do when you are shopping online? Your search for a product name on an E-Commerce website. Hundreds of products might show up. You sort products by rating or price, go through these similar products one by one, scan other customers' reviews and try to find if other customers had given reviews on the product features you are interested in. This process is very time-consuming and could quickly get overwhelming…


Any statistical analysis and machine learning models can be as good as the quality of the data you feed into them

Photo by kazuend on Unsplash

This article is the 3rd part of a series of Pandas tutorials for beginners. Please Stay Tune for more future articles on this topic.

Introduction to Pandas — Part 1: Import and Create DataFrame

Introduction to Pandas — Part 2: Data Preview and Subsetting

Introduction to Pandas — Part 3: Data Wrangling

Introduction

In this article, I’m going to cover Pandas functions to clean and transform dataframe(s). I would say this is one of the most important steps in any data science project.


Useful Pandas functions and tricks to preview and subset a DataFrame

Photo by Jay Mantri on Unsplash

This article is the 2nd part of a series of Pandas tutorials for beginners. Please Stay Tune for more future articles on this topic.

Introduction to Pandas — Part 1: Import and Create DataFrame

Introduction to Pandas — Part 2: Data Preview and Subsetting

Introduction to Pandas — Part 3: Data Wrangling

Introduction

In the beginning of any data science project, we always want to get familiar with the data as quickly as possible. Reviewing the first n rows and computing basic information such as column names, data type, distribution, statistic summary would be helpful for us to understand the data…


Import various data files and Create a DataFrame from scratch using list and dictionary

Photo by Geran de Klerk on Unsplash

This article is the first part of a series of Pandas tutorials for beginners. Please Stay Tune for more future articles on this topic.

Introduction to Pandas — Part 1: Import and Create DataFrame

Introduction to Pandas — Part 2: Data Preview and Subsetting

Introduction to Pandas — Part 3: Data Wrangling

Introduction

Undoubtedly Pandas is one of the most popular Python library for data science. Its versatility and functionalities make it a powerful tool for data transformation and exploration. Getting familiar with Pandas has become an essential skill for data science professionals.

Prerequisite

# Install pandas library
!pip install pandas
#…


You want to make friends with tabula-py and Pandas

Background

Data science professionals are dealing with data in all shapes and forms. Data could be stored in popular SQL databases, such as PostgreSQL, MySQL, or am old-fashioned excel spreadsheet. Sometimes, data might also be saved in an unconventional format, such as PDF. In this article, I am going to talk about how to scrape data from PDF using Python libraries.

Required Libraries

  • tabula-py: to scrape text from PDF files
  • re: to extract data using regular expression
  • pandas — to construct and manipulate our panel data

Install Libraries

pip install tabula-py
pip install pandas

Import Libraries

import tabula as tb
import pandas as pd
import re

Scrape PDF Data in Structured Form

First…


SQL is more than just querying a database

Photo by Mike Benna on Unsplash

Background

SQL is an essential skill for data science professionals. Many companies store their data in a relational database system, such as MySQL, PostgreSQL, MS SQL Server, SQLite. SQL is the programming language we need to interact with them. We can write a SQL query to select, filter, transform, insert, update, and delete the underlying data in the database.

Not only SQL can be used to query a database, it can also be used for data analysis. Both Subquery and CTE (Common Table Expression) are useful tools we can use to write a complex SQL query to implement data analysis, in…


Important Skills for Data Engineer and Data Scientist

Photo by Jack Anstey on Unsplash

Background

When working on a data science project, data scientists are most excited about training a fancy machine learning model to solve a business problem. But before we dive into the model development, “What data are we using to train our ML models? Where is the data stored?” you might ask.

That’s where ETL process comes into the picture. ETL stands for Extract-Transform-Load, it includes a set of procedures that include collecting data from various sources, transforming the data, and then storing it into a new single data warehouse, which is accessible to data analysts and data scientists to perform data…


Which tool would you like to use in your next data science project?

Photo by Pascal Müller on Unsplash

Background

Data cleaning and manipulation are essential steps in any data science project. Both SQL and Pandas are popular tools used by Data Analysts and Data Scientists nowadays.

Which tool to used depends on where the data is stored, what kind of data format, and how we want to use it.

Things to consider:

  • If the data you are working with is not in panel format yet and you will need to piece together data from various sources, Pandas might work better. …

Aaron Zhu

Data Science | Machine Learning | Economics Consulting https://www.linkedin.com/in/aaron-zhu-53105765/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store