Logistic Regression: Detailed Introduction

Introduction

Using prior observations from a data set, a statistical analysis technique called logistic regression predicts a binary outcome, such as yes or no.

A logistic regression model predicts a dependent data variable by looking at the association between one or more already existing independent variables. For example, a logistic regression could be used to predict whether a candidate for office would succeed or fail, or whether a high school student will be admitted to a specific college or not. These straightforward decisions between two possibilities provide binary results.

A logistic regression model can take into consideration several input criteria. In the event of college acceptance, the logistic function can take into account the student's grade point average, SAT score, and number of extracurricular activities. Using historical data on previous outcomes using the same input criteria, it then assigns new instances a rating depending on how likely it is that they will fall into one of two outcome groups.In the area of machine learning, logistic regression has become increasingly important. By using historical data, it enables machine learning algorithms to classify incoming input. When additional relevant data is provided, the algorithms get more adept at predicting classes within data sets.

Logistic Function

The technique's name, logistic regression, was inspired by the logistic function, which serves as the method's main building block.

The logistic function, also known as the sigmoid function, was developed by statisticians to describe the dynamics of population growth in ecology, which rise quickly and peak at the ecosystem's carrying capacity. Using this S-shaped curve, any real-valued number can be changed into a value between 0 and 1, but never precisely at those values.

1 / (1 + e^-value)

Where value is the actual numerical value you want to alter and e is the base of the natural logarithms (Euler's number or the EXP() function in your spreadsheet). The numbers between -5 and 5 have been turned into the range between 0 and 1, as shown in the plot below.

Like linear regression, logistic regression represents data using an equation.

To forecast an output value, input values (x) are mixed linearly with weights or coefficient values (referred to as the Greek capital letter Beta) (y). The output value being modelled is a binary value (0 or 1) rather than a numeric value, which is a significant distinction from linear regression.

Here is an illustration of a logistic regression formula:

y = e^(b0 + b1*x) / (1 + e^(b0 + b1*x))

When b0 is the bias or intercept term, b1 is the coefficient for the single input value, and y is the anticipated output (x). You must learn the associated b coefficients (constant real values) for each column in your input data from your training set.

Example

Python code for logistic regression

import numpy
from sklearn import linear_model

#Reshaped for Logistic function.
X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(X,y)

#predict if tumor is cancerous where the size is 9.46mm:
predicted = logr.predict(numpy.array([9.46]).reshape(-1,1))
print(predicted)

Output:

[1]

We have predicted that a tumor with a size of 9.46mm will be cancerous.

Application

  • Logistic regression can be used in the medical field to determine whether a tumour is likely to be benign or malignant.
  • Logistic regression can be used in the financial sector to determine if a transaction is fraudulent or not.
  • Logistic regression can be used in marketing to determine if a target audience will respond or not.

Summary

Logistic regression, in a nutshell, is applied to classification issues when the output or dependent variable is binary or categorical. When using logistic regressions, there are a few presumptions to be aware of, including the many forms of logistic regression, the various kinds of independent variables, and the available training data.