Practical Machine Learning
Saturday, 10 September 2016
Tuesday, 6 September 2016
Normal Equation for Linear Regression model
Normal Equation for Linear Regression:
Labels:
normal equation
Tuesday, 19 July 2016
Multivariate Linear Regression
Multivariate Linear Regression:-
When more than 1 features are given in linear regression then problem comes under multivariate linear regression.All the techniques are same as used in univariate linear regression except vectorized form is used for better optimization .
Dataset:
so to find the best fit equation of line for linearly correlated dataset .first we assume the hypothesis
$ H \left( x \right) = \theta^{T} \times x $
WHERE
$ x= \left[\begin{matrix} x_{0} \\ x_{1} \\ \vdots \\ x_{n+1} \end{matrix}\right] , \theta = \left[\begin{matrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n+1} \end{matrix}\right] $
And n is the number of features. and here $ x_{0}=1 $
Now we suppose the cost function for the multivariate linear regression which is same as to univariate linear regression.
$ J = \frac{1}{ 2n}\sum_{i=1}^{n} \left( \theta^T\times x-y^i \right)^2 $
For the optimal values of $ \theta $ found equation best fit the given dataset.
we can find optimal values of $ \theta $ by using gradient descent algorithm.
Repeat until convergence
{
$ \theta_{i}=\theta_{i}- \alpha \times \frac{\partial J }{\partial\theta_{i}} $
}
Dataset:
so to find the best fit equation of line for linearly correlated dataset .first we assume the hypothesis
$ H \left( x \right) = \theta^{T} \times x $
WHERE
$ x= \left[\begin{matrix} x_{0} \\ x_{1} \\ \vdots \\ x_{n+1} \end{matrix}\right] , \theta = \left[\begin{matrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n+1} \end{matrix}\right] $
And n is the number of features. and here $ x_{0}=1 $
Now we suppose the cost function for the multivariate linear regression which is same as to univariate linear regression.
$ J = \frac{1}{ 2n}\sum_{i=1}^{n} \left( \theta^T\times x-y^i \right)^2 $
For the optimal values of $ \theta $ found equation best fit the given dataset.
we can find optimal values of $ \theta $ by using gradient descent algorithm.
Repeat until convergence
{
$ \theta_{i}=\theta_{i}- \alpha \times \frac{\partial J }{\partial\theta_{i}} $
}
Matlab / Octave approach to Linear Regression :-
Dataset format is given as :
$ X= \left[\begin{matrix} 1 & a_{1,2} & \cdots & a_{1,n+1} \\ 1 & a_{2,2} & \cdots & a_{2,n+1} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & a_{m,2} & \cdots & a_{m,n+1} \end{matrix}\right] , \theta = \left[\begin{matrix} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n+1} \end{matrix}\right] , y= \left[\begin{matrix} y_{0} \\ y_{1} \\ \vdots \\ y_{n+1} \end{matrix}\right] $
Where a is the feature
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | % X and y and theta's format is given above the codes in mathematical form m = length(y); X=[ones(m,1),x]; theta = zeros(m,1); % plotting the value of x and y plot(x,y); %gradient descent function %computing gradient descent function theta=grad_descent(X, y, theta, alpha,iterations) m=length(y); for iter = 1:iterations S=zeros(1,length(y)); <% vectorized implementation S=sum(((theta'.*X')'.-y).*X); theta=theta - alpha/m*S; end theta = grad_descent(X,y,theta,0.04,2500); % plotting the hypothesis hold on; % keep previous plot visible plot(X(:,2), X*theta, '-'); hold off; % Now we can predict the value of y for any value of x by predict_y = theta'*x_val; |
Wednesday, 13 July 2016
practical linear regression
Intro to linear Regression : -
linear regression is used for where variable or features are linearly correlated in the dataset .we find the hypothesis [ best fit equation. ] [ equation of line for linear regression].
Example of LR dataset :-
eg.1 suppose we are given the everyday net data uses of a user from 1st day to Nth day. Suppose now if we want to calculate the data uses of (N+1)th day in advance ? of that user .
For that first we will preprocess the given dataset to make them linearly correlated by giving the Nth day data uses by = sum of data uses from 1st day to (N-1)th day. then will simply use linear regression technique if there is no discontinuity in data uses vs number of days.
so to find the best fit equation of line for linearly correlated dataset .first we assume the hypothesis
$$ H \left( x^{i} \right) = \theta _{0}+\theta _{1} \times x^{i} $$
this is the equation of line. in this equation $ \theta_{0} $ is intercept on y axis and $ \theta_{1} $ is the gradient of line .and here $ x^{i} $ means the $ i^{th} $ data example in the dataset.
this is the equation of line. in this equation $ \theta_{0} $ is intercept on y axis and $ \theta_{1} $ is the gradient of line .and here $ x^{i} $ means the $ i^{th} $ data example in the dataset.
Here $ \theta_{0} $ and $ \theta_{1} $ are the variables .
To find the value of $ \theta_{0} $ and $ \theta_{1} $ we can use the least square method.
First we suppose the Cost function J .
$$ J \left( \theta_{0} , \theta_{1} \right) = \frac{1}{2 n}\sum_{i=1}^{n} \left( H \left( x^i \right)-y^i \right)^2 $$
where n is the number of data examples in the dataset . $ y^i $ means the $ i^{th} $ data example's value of y.To find the value of $ \theta_{0} $ and $ \theta_{1} $ for best fit equation of line. we need to minimize the cost function.
To find the value of $ \theta_{0} $ and $ \theta_{1} $ we can use the least square method.
First we suppose the Cost function J .
$$ J \left( \theta_{0} , \theta_{1} \right) = \frac{1}{2 n}\sum_{i=1}^{n} \left( H \left( x^i \right)-y^i \right)^2 $$
where n is the number of data examples in the dataset . $ y^i $ means the $ i^{th} $ data example's value of y.To find the value of $ \theta_{0} $ and $ \theta_{1} $ for best fit equation of line. we need to minimize the cost function.
Objective
Minimize $ J \left( \theta_{0} , \theta_{1} \right) $ with respect to $ \theta_{0} $ and $ \theta_{1} $
To minimize the cost function we use gradient descent algorithm.
Gradient descent algorithm:
Repeat until Convergence
{
$$ \theta_{0}=\theta_{0}- \alpha \times \frac{\partial J\left( \theta_{0} , \theta_{1} \right) }{\partial\theta_{0}} $$
$$ \theta_{1}=\theta_{1}- \alpha \times \frac{\partial J\left( \theta_{0} , \theta_{1} \right) }{\partial\theta_{1}} $$
}
Where partial derivative of Cost function J is given as:
$$ \frac{\partial J\left( \theta_{0} , \theta_{1} \right) }{\partial\theta_{i}} = \frac{1}{ n}\sum_{i=1}^{n} \left( H \left( x^i \right)-y^i \right) \times x^i $$
now we put the value of $ \theta_{0} $ and $ \theta_{1} $ to find the best fit equation of line .
Matlab / Octave approach to Linear Regression :-
Dataset:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | % reading the csv dataset file data = csvread('ex1data1.txt'); % extracting the value of x and y x = data(:, 1); y = data(:, 2); m = length(y); X=[ones(m,1),x]; theta = zeros(2,1); % plotting the value of x and y plot(x,y); % cost function function J = cost_J(X,y ,theta) m=length(y); S=0; for i=1:m, S = S + (theta(1)+theta(2)*X(i, 2) - y(i))^2; end; J = 1/(m)*S; end %gradient descent function %computing gradient descent function theta=grad_descent(X, y, theta, alpha,iterations) m=length(y); for iter = 1:iterations S=[0; 0]; S1=0; S2=0; for i=1:m, S1=S1 + ((theta(1)+theta(2)*X(i, 2) - y(i))) * X(i,1); S2=S2 + ((theta(1)+theta(2)*X(i, 2) - y(i))) * X(i,2); end; S=[S1;S2]; theta=theta - alpha/m*S; end end theta = grad_descent(X,y,theta,0.04,2500); % plotting the hypothesis hold on; % keep previous plot visible plot(X(:,2), X*theta, '-'); hold off; % Now we can predict the value of y for any value of x by predict_y = theta(1)+theta(2)*x_val; |
Vectorized implementation of linear regression:-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | % reading the csv dataset file data = csvread('ex1data1.txt'); % extracting the value of x and y x = data(:, 1); y = data(:, 2); m = length(y); X=[ones(m,1),x]; theta = zeros(2,1); % plotting the value of x and y plot(x,y); % cost function function J = cost_J(X,y ,theta) m=length(y); S=0; for i=1:m, S = S + (theta(1)+theta(2)*X(i, 2) - y(i))^2; end; J = 1/(m)*S; end %gradient descent function %computing gradient descent function theta=grad_descent(X, y, theta, alpha,iterations) m=length(y); for iter = 1:iterations S=[0; 0]; <% vectorized implementation S=sum(((theta'.*X')'.-y).*X); theta=theta - alpha/m*S; end theta = grad_descent(X,y,theta,0.04,2500); % plotting the hypothesis hold on; % keep previous plot visible plot(X(:,2), X*theta, '-'); hold off; <% Now we can predict the value of y for any value of x by predict_y = theta(1)+theta(2)*x_val; |
Useful datasets links for implementing the linear regression models:-
For this purpose UCI repository is good i've found so far on internet.
Subscribe to:
Posts (Atom)