Scroll Top

Implementing Linear Regression Model Using Gradient Descent

Home Implementing Linear Regression Model Using Gradient Descent

TheDigitalBuild_PACKED POLICY FOR IDENTITY FEDERATION

By Kishore R. U Technical Blog September 18, 2023

This blog will explain the inner workings of gradient descent with an example.

Description

Objects that are thrown up, try to come down; water kept at a height, tries to reach the lowest inclination. These everyday phenomena are common ways through which objects try to obtain stability (lower the height, higher the stability). Built along these lines, we have a famous mathematical algorithm called Gradient Descent – that is used in all sorts of neural networks and machine learning problems.

This blog would help you understand the working of the algorithm and give you an intuition of the same by taking our own life as an example. Machine Learning: A Field of study that gives computers the ability to learn without being explicitly programmed – Arthur Samuel (1959). We can categorize a machine learning problem into 2 categories.

Types of machine learning problems:

Supervised Learning
Unsupervised learning

Supervised learning

It is a machine learning approach that’s defined by its use of labeled datasets. All the inputs in the dataset are mapped to their corresponding output. Intuitively, the dataset can be considered as your textbook with all the questions (inputs) with their appropriate (answers).

Example

1. Predict Gender based on Name, given a dataset that maps as follows, names-> gender.
2. Predict house rent based on historical dataset that has all house prices.

UnSupervised learning

It uses machine learning algorithms to analyze and cluster un -labeled data sets. These algorithms discover hidden patterns in data without the need for human intervention (hence, they are “unsupervised”). The clustering algorithm is one of the famous algorithms under this set.

For example, let’s analyse this problem statement, and in the process of solving this problem learn the underlying working principle of gradient descent. We are given a dataset that consists of prices of houses($), the size of the house (feet^2). By using this dataset, we are to build a basic machine-learning model that can predict the price of the house, given the size. There are many ways to solve this problem but let us start with the simplest – Linear regression approach.

As shown in the above image, we make use of the training set and create a hypothesis. This hypothesis is nothing but a function that gives out the predicted price, given the size of the house.
Input = size of house -> hypothesize -> output = estimated prize.
Now to describe this hypothesis,
y = mx + c;
Sounds familiar? This is the famous equation for a straight line, tweaking the values of m and c it creates all possible straight lines in a 2d plane. Now, we proceed to find the values for ‘m’ and ‘c’, such that our predictions are not far away from the actual value.

Cost Function

The cost function determines how well our hypothesis performs on predicting the price of houses. Higher the value returned by the cost function, indicates a model that is not performing well, and lower the value, indicates a good fitting model.

Our Motive

Choose values of ‘m’ and ‘c’ such that: h(x) is close to y for our training examples (x,y).

Our goal

Minimize ∑ ( h(x) – y) ^ 2 for all data points. This is our cost function, that finds the square of the difference between the actual value (y in dataset), and the value predicted by hypothesis (h(x)). Our goal is to minimize the cost function and that in turn would give us a hypothesis

cost function J(m, c) = 1/2n * ∑ ((m*x + c) – y ) ^2;

Where ’n’ denotes the number of items in the dataset.

This cost function is called the meansquared error function. Different values of ‘m’ and ‘c’ will generate different straight lines, and each of those straight lines would have different values for the cost function. The one that generates the minimum value for the cost function, we would use those values of ‘m’ and ‘c’ in our hypothesis.

Gradient Descent

It is an iterative first-order optimization algorithm used to find a local minimum/maximum of a given function. This method is commonly used in Machine Learning (ML) and Deep Learning (DL) to minimize a cost/loss function (e.g. in a linear regression).

A graph plotted for various values of θ_{0 &}θ_1:against Cost function J, would result in something like this.

Gradient

Before jumping into code, one more thing has to be explained — What is a gradient? Intuitively, it is a slope of a curve at a given point in a specified direction. Because we are interested only in a slope along one axis and ignore the others, these derivatives are called partial derivatives.

Gradient simply put, would tell us about the slope of the function at a given point. Positive value indicates a positive slope and a negative value, a negative slope.

The idea of gradient descent in linear regression is to initialise the value of θ_{0 &}θ₁ and at each iteration, compute the gradient (slope)and based on the slope, tweak the parameters, this will help us in reaching the local minima for our cost function – which is the best fit in our Linear Regression approach.

Lowering the value of J (cost function) means, better the modal. Caution even a slight change in the initialisation value might take us to a completely different local minima.

Pseudo code: (Iterating the dataset once is called an epoch)

for i in range(0, iterations):

for j in range(0,2):

θ_j= θ_j– ⍶ * slope ;

⍶ : denotes the learning rate.
𝛅/𝛅θ_j(J( θ_{0 ,}θ₁)): denotes the slope of the cost function with respect to a particular variable.(partial derivative).

Simultaneous update of both the parameters, result in quick convergence of the cost function.

Gradient Descent Intuition

Since our goal is to reduce the value of the cost function, gradient descent should indicate that from point ’t’ we should reduce the value of _,θ_1.As per the graph, we reduce the cost function.

θ₁= θ₁– ⍶(𝛅/𝛅θ_j(J( θ_{0 ,}θ₁)));

(𝛅/𝛅θ_j(J( θ_{0 ,}θ₁))) -> would return a positive value at ’t’ because the slope is positive and upwards.

& learning rate is always ⍶ is always positive number.

Therefore, θ₁= θ₁– ⍶ (+ve number);

Value of θ₁would obviously reduce, this in turn as per the graph, would reduce the cost function. Partial Gradient of the cost function can be easily found using basic derivatives knowledge.

This Gradient descent algorithm is so powerful that it powers most of the deep learning models for finding the correct hypothesis. With this basic information, you can build a simple linear regression model that predict prices given a dataset.

Now our life is like the gradient descent algorithm, all of us are initialised randomly, some are initialised with advantages( born with a golden spoon) others are not that lucky, but at every step in our life, we try to reach the local minima (better version of our self), and it is our actions – positive or negative, that would determine whether we reach a global optima (best version of ourself) or local optima (average).

Kishore R. U

+ posts

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use nonessential cookies that help us analyze and understand how you use this website and enhance your user experience. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
Zoominfo	session	Zoominfo uses technologies to collect and store information when you interact with services it offer to their partners, such as advertising services or analytics. All of those processes are meant to improve your user experience and the overall quality of our services.

Analytics

Analytics cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111355416_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	This cookie is used to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
_hjid	1 year	This is a Hotjar cookie that is set when the customer first lands on a page using the Hotjar script.
_hjIncludedInPageviewSample	2 minutes	This cookie is set to let Hotjar know whether the user is included in the data sampling defined by site's pageview limit.
_hjIncludedInSessionSample	2 minutes	This cookie is set to let Hotjar know whether the user is included in the data sampling defined by site's daily session limit.
_hjTLDTest	session	Hotjar test cookie to check the most generic cookie path it should use, instead of the page hostname. This is done so that cookies can be shared across subdomains (where applicable). To determine this, we store the _hjTLDTest cookie for different URL substring alternatives until it fails. After this check, the cookie is removed.
oktgid	1 year	This cookie is used for storing the visitor ID of the user who clicked on an okt.to link.
oktsid	session	This cookie is used for storing the session ID of the user who clicked on an okt.to link.

Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by YouTube and is used to track the views of embedded videos on YouTube pages.

Other

Other uncategorized cookies are those that are being analyzed and have not yet been classified into a category according to their type and purpose.

Cookie	Duration	Description
__gwtCookieCheck	session	This cookie is used to check if the visitors' browser supports cookies.
AnalyticsSyncHistory	1 month	These cookies are used to deliver advertisements more relevant to you and your interests. They are also used to limit the number of times you see an advertisement as well as help measure the effectiveness of the advertising campaign. They remember that you have visited a website and this information is shared with other organizations such as advertisers.
li_gc	2 years	These cookies are used to deliver advertisements more relevant to you and your interests. They are also used to limit the number of times you see an advertisement as well as help measure the effectiveness of the advertising campaign. They remember that you have visited a website and this information is shared with other organizations such as advertisers.
UserMatchHistory	1 month	LinkedIn - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

Implementing Linear Regression Model Using Gradient Descent

Description

Types of machine learning problems:

Supervised learning

UnSupervised learning

Cost Function

Our Motive

Our goal

Gradient Descent

Gradient

Pseudo code: (Iterating the dataset once is called an epoch)

Gradient Descent Intuition

Kishore R. U

Contact Us

Contact Us

Contact Us

Contact Us

Contact Us

Contact Us

Contact Us

Contact Us