Use Linear Regression To Estimate Continuous Values with Python and Scikit-learn

Share this video with your friends

Send Tweet

Linear regression is a linear model that is used for regression problems, or problems where the goal is to predict a value on a continuous spectrum (as opposed to a discrete category).

We’ll use linear regression to estimate continuous values. In this case, we’ll predict house prices in Boston. We'll also look at how to visualize our results with matplotlib, and how to evaluate our models with different metrics for regression problems.

Andrew
Andrew
~ 7 years ago

Looks awesome. Im a newcomer to Python.
Is there a quick setup explanation for this tutorial?

Hannah Davis
Hannah Davis(instructor)
~ 7 years ago

Scikit-learn and matplotlib need to be installed. If you have pip, typing pip install scikit-learn and pip install matplotlib into terminal generally works (same with conda). Otherwise, you can install pip by typing sudo easy_install pip. For lesson 5, pandas_ml will also need to be installed if you want to visualize their confusion matrix. For lesson 6, graphviz needs to be installed if you want to visualize the decision tree.

Hannah Davis
Hannah Davis(instructor)
~ 7 years ago

I'm using Python 2.7 for this course. You can see your version number by typing python --version into terminal. Please let me know any obstacles you come across so I can make it clearer to get up and running! :)

Andrew
Andrew
~ 7 years ago

Thanks Hannah,

I downloaded 3.6 through Anaconda at one point.
is there a way to download multiple Py versions and switch like with Node and nvm use?

Andrew
Andrew
~ 7 years ago

Since pip and Conda seem to be somewhat interchangeable (like npm and yarn...?) I was able to do this:

conda install python=2.7

Hannah Davis
Hannah Davis(instructor)
~ 7 years ago

Pip and conda can both install python packages, yup! But you generally want to use one or the other (they can install packages in different places and it can get messy....)

The best thing to do is create a virtual environment. You can do that with conda by typing:

conda create -n yourenvname python=2.7

and then

source activate yourenvname

Then install your packages, and do the course from there!

Alan O'Donnell
Alan O'Donnell
~ 7 years ago

One small mistake: r squared scores don't necessarily need to be positive; they can be anything <= 1.

Elias Moreno
Elias Moreno
~ 7 years ago

I keep getting this hanging error: "UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment. 'Matplotlib is building the font cache using fc-list.' " I have looked in various places and I can't seem to find a fix. please help.

Hannah Davis
Hannah Davis(instructor)
~ 7 years ago

Elias - that sounds like a problem on your system. Have you followed the instructions here? https://stackoverflow.com/questions/34771191/matplotlib-taking-time-when-being-imported

Hannah Davis
Hannah Davis(instructor)
~ 7 years ago

Alan - thank you! I'll update that soon.

Elias Moreno
Elias Moreno
~ 7 years ago

I fixed it. Thank you!!

Jerry
Jerry
~ 7 years ago

I decided to use jetbrain's pycharm community edition as my development environment for these lessons and, I got problems starting right out of the box. You can tell me Python is easy and, maybe once, long ago it was but, no more. I'm a Python newbie but not new to programming so I can safely say Python is no cake walk.

I installed Python 3 using brew on my Mac. When I ran lesson 1 I kept getting this Python framework not installed error. This error occured when it tried to process the import statement:

import matplotlib.pyplot as plt

When I checked for the framework, it was there in Library/Frameworks folder. When I did a google search, it told me that my matplotlib "backend" actually, it should be called display driver, that which displays the plot, was not defined and, that I would have to create a .mapplotlibrc in my .matplotlib folder located in my root folder and, the problem would go away. Well it didn't because, pycharm defines its own environment (i.e venv). I then re-read the docs about using matplotlib and what it told me was you could define it several ways however, if you used it in your script that trumped all the rest. So I defined the backend in the script file itself as follows:

import matplotlib

matplotlib.use('TkAgg') #TkAgg is the back end, there are more backends

However, you need to do this before you execute the statement:

import matplotlib.pyplot as plt

otherwise, it will not work

Once I figured that out it got past the framework error and then it told me I was missing scipy. Did a pip install for that and, the plot rendered in a separate window.