From f5c480ad8eb043e644b6db1d775b968883db6ac8 Mon Sep 17 00:00:00 2001 From: Josh Mudge Date: Sat, 2 Mar 2019 17:47:52 -0700 Subject: [PATCH] Add linear regression info. --- Codecademy.md | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++ PHW.md | 3 ++ 2 files changed, 90 insertions(+) create mode 100644 PHW.md diff --git a/Codecademy.md b/Codecademy.md index d103034..3cfd4b5 100644 --- a/Codecademy.md +++ b/Codecademy.md @@ -18,6 +18,93 @@ https://discuss.codecademy.com/t/what-does-it-mean-that-python-is-an-object-orie SyntaxError example: `SyntaxError: EOL while scanning string literal` +# Linear Regression + +One of the projects in my Python 3 course on Codecademy was to calculate the linear regression of any given line in a set. This is my journey of doing just that, following their instructions. The goal, is to calculate the bounciness of different balls with the least error possible. + +Starting with `y = m*x + b` + +We can determine the y of a point pretty easily if we have the slope of the line (m) and the intercept (b). Thus, we can write a basic function to calculate the y: + +``` +def get_y(m, b, x): + y = m*x + b + return y + +print(get_y(1, 0, 7) == 7) +print(get_y(5, 10, 3) == 25) +``` + +We can then use that to calculate the linear regression of a line: + +``` +def calculate_error(m, b, point): + x_point = point[0] + y_point = point[1] + + y2 = get_y(m, b, x_point) + + y_diff = y_point - y2 + y_diff = abs(y_diff) + return y_diff + +``` + +To get a more accurate result, we need a function that will parse several points at a time: + +``` +def calculate_all_error(m, b, points): + totalerror = 0 + for point in points: + totalerror += calculate_error(m, b, point) + + return abs(totalerror) +``` + +We can test it using their examples: + +``` +#every point in this dataset lies upon y=x, so the total error should be zero: +datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)] +print(calculate_all_error(1, 0, datapoints)) + +#every point in this dataset is 1 unit away from y = x + 1, so the total error should be 4: +datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)] +print(calculate_all_error(1, 1, datapoints)) + +#every point in this dataset is 1 unit away from y = x - 1, so the total error should be 4: +datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)] +print(calculate_all_error(1, -1, datapoints)) + + +#the points in this dataset are 1, 5, 9, and 3 units away from y = -x + 1, respectively, so total error should be +# 1 + 5 + 9 + 3 = 18 +datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)] +print(calculate_all_error(-1, 1, datapoints)) +``` + +You can save all possible m and b values between -10 and 10 (m values), as well as -20 and 20 (b values) using: + +``` +possible_ms = [mv * 0.1 for mv in range(-100, 100)] #your list comprehension here +possible_bs = [bv * 0.1 for bv in range(-200, 200)] #your list comprehension here +``` + +We can find the combination that produces the least error, which is: + +``` + m = 0.3 + b = 1.7 + x = 6 +``` + +The goal was to calculate the bounciness of different balls with the least error possible. With this data, we can calculate how far a given ball would bounce. For example, a 6 cm ball would bounce 3.5 cm. We know this because we can plug in the numbers like this: + +``` +get_y(0.3, 1.7, 6) +``` + + # Math (ex6) ``` diff --git a/PHW.md b/PHW.md new file mode 100644 index 0000000..3ab16e6 --- /dev/null +++ b/PHW.md @@ -0,0 +1,3 @@ +# + +ex19.