Log-Log Regression¶

Linear Regression is quite capable of solving non-linear problems if you know how to properly pre-process your data. Let's look at a few types of datasets we can regress by using logarithmic transformations.

Regular Linear Regression¶

In [48]:
import numpy as np
import matplotlib.pyplot as plt

First let's analyze a typical linear dataset

In [64]:
x = np.arange(-10,10,0.1)
y = 3*x + 5 + np.random.normal(0,10,len(x))

m,b = np.polyfit(x,y,1)
y_fit = np.poly1d((m,b))(x)
plt.scatter(x,y, color="red", alpha=0.5, s=0.5)
plt.plot(x,y_fit); # note the semicolon here. what does it do?
plt.title("Linear y vs x");
No description has been provided for this image

And find $r$ and $m$

In [50]:
print(m)
np.corrcoef(x,y)
3.287642692247747
Out[50]:
array([[1.        , 0.88075115],
       [0.88075115, 1.        ]])

Exponential Regression¶

If we believe $y = Ca^x$ then by regressing $x$ against $\ln y$ we can determine $a$.

$$ \begin{array}{rl} y &=& Ca^x \\ \ln y &=& \ln C + x \ln a \\ \end{array} $$

This is a line with slope $\ln a$ and intercept $\ln C$

In [65]:
x = np.arange(3,5,0.01)
y = 0.25*3**x

# add noise, but keep y > 0
for i in range(len(y)):
    while True:
        noise  = random.gauss(0,y[i]/10)
        if (y[i]+noise > 0):
            break
    y[i] += noise

plt.scatter(x,y,s=0.5);
plt.title("Exponential correlation");
No description has been provided for this image
In [68]:
# transform y
y_t = np.log(y) ## this is ln

m,b = np.polyfit(x,y_t,1)
y_fit = np.poly1d((m,b))(x)
plt.scatter(x,y_t, color="red", alpha=0.5, s=0.5)
plt.plot(x,y_fit);
plt.title("ln(y) vs ");
No description has been provided for this image

And find $r$ and $a$ and $C$

In [53]:
print("base = " , np.exp(m))
print("C = ", np.exp(b))
print(f"r = {np.corrcoef(x,y)[1,0]}")
base =  2.9975765382313955
C =  0.24812105741218246
r = 0.95069612414204

Log-Log Regression¶

If we believe $y = Cx^k$ then by regressing $\ln x$ against $\ln y$ we can determine $k$.

$$ \begin{array}{rl} y &=& Cx^k \\ \ln y &=& \ln C + k \ln x \\ \end{array} $$

This is a line with slope $k$ and intercept $\ln C$

In [54]:
import random
In [69]:
x = np.arange(2,10,0.01)
y = 10*x**3.14
for i in range(len(y)):
    while True:
        noise  = random.gauss(0,y[i]/10)
        if (y[i]+noise > 0):
            break
    y[i] += noise
plt.scatter(x,y,s=0.5);
plt.title("Polynomial correlation");
No description has been provided for this image
In [70]:
# check y for 0
print(np.min(y))

# transform y and x
x_t = np.log(x) 
y_t = np.log(y) 

m,b = np.polyfit(x_t,y_t,1)
y_fit = np.poly1d((m,b))(x_t)
plt.scatter(x_t,y_t, color="red", alpha=0.5, s=0.5)
plt.plot(x_t,y_fit); 
plt.title("ln(y) vs ln(x)");
73.2177132494846
No description has been provided for this image

And find $r$ and $a$ and $C$

In [71]:
print("degree = " , m)
print("C = ", np.exp(b))
print(f"r = {np.corrcoef(x,y)[1,0]}")
degree =  3.133547086987462
C =  10.03662528625072
r = 0.9376281240390777