Curve Fitting: Linear, Cubic, Polynomial (1-5), Piecewise, Goodness of Fit and Regression Analysis In Python.

Curve Fitting

Curve fitting is a process of determining a possible curve for a given set of values. This is useful in order to estimate any value that is not in the given range. In other words, it can be used to interpolate or extrapolate data. A good curve fit is one which will be able to predict and explain the trend as precisely as possible. 

Curve fitting can be of various types. Depending upon the collected data, we can fit a linear, polynomial, exponential or any other known curve which is easy to analyse. As such, it is very important to analyse the data first, conceptualise the distribution and then try to fit a curve that would elegantly and precisely explain its behaviour in and out of the data range. 

In order to determine the goodness of a curve, we need techniques like regression analysis which tries to minimize the errors or deviations between the actual data and the estimated curve.

NOTE: In this project, I will start with the definitions of various functions and modules that we need in order to fit a curve to the data. The 'data' file has two columns which contain the temperatures and specific heat capacities in the first and the second column respectively. There are a total of 3200 rows. Throughout the project, the list of temperatures from the data file will be known as (and stored as a list in Python) T and the specific heat list as CP. The data file can be downloaded here: https://drive.google.com/file/d/1vWi9Vj3AdQf0gqUK1u9jwS8zg92BHGR0/view 

1) Definitions and Functionalities:

Modules used: matplot.lib, scipy, numpy, imageio.  

Functions:

1.1 curve_fit()

The curve_fit is a function in the scipy.optimize (Optimisation and Root finding) library of scipy module. It is essentially a non-linear least square fit tool. 

In the challenge, the curve_fit function takes the form:

pot,pcov = curve_fit(func,temperature,cp)

Where func is the generating function that we wish the data to fit to; temperature and cp are the first and the second values in each line of the 'data' file.

pcov and pcov: The curve_fit function returns two arrays. The first one, popt is a one-dimensional array that contains the coefficients of the determined function from curve_fit. The second array, pcov, is a 2-dimensional array, or a matrix, that contains the estimated covariance of the popt.

Covariance is the variance between the two variables. Thus, popt 2D array contains the covariances of each value (coefficient) in the popt list with the other values in the list that the curve_fit algorithm calculates. The scipy.optimize calculates these with the least squares. Now, the diagonal of this matrix gives the covariance of a coefficient with itself, which is simply the variance of the coefficient. And the (i,j) and (j,i) values are the same and give the covariance between the two (i,j)th or(j,i)th coefficients. 

1.2 np.array()

This is a method or function from the NumPy library which creates an array with the required arguments. In this assignment, we need to pass the temperature list as an array. 

In the above code, if 'func' is the function to which we want the curve to be fitted, then 'fit' is assigned to calculate the value of the function at a particular value of temp with the required coefficients (in this case p1). However, we want the values at all 'temp' values from the function. So, we cannot simply pass poly_function(temp,*p1). Thus, the np.array helps us to create an array of 'temp' and pass to the function. 

1.3 *popt

In python, the star operator is used to multiply a number, list or tuples. Also, if the operator is used singly before a list or tuple, it allows a variable number of arguments to pass from calling. 

For example, if we try to fit a 2-degree polynomial for the data, the popt will be a list containing three coefficients which are represented by num1, num2 and num3 coefficients in the below code.  

def func(t,a,b,c):
    return a*pow(t,2) + b*t +c 

popt,pcov = curve_fit(func,temp,cp)
fit = func(N.array(temp),*popt)

print(popt)

>>>(pseudo-output)
[num1,num2,num3]

Thus, the asterisk operator is used to pass a list of arguments to a function-calling.

1.4 polyfit()

The polyfit() function from the NumPy module is another curve fitting tool which is essentially a least squares polynomial fit. Unlike the curve_fit() function, the polyfit() function doesn't require the definition of the function of the curve we wish to fit, but, it simply asks for the degree of the polynomial to which we want to fit the data to. Thus, curve_fit() can be applied when we have a definition of the function at hand (which may be exponential, logarithmic, linear or even a polynomial) and polyfit() is exclusively used to fit some polynomial to the data.

1.5 poly1d()

The poly1d([list of coefficients]) function from NumPy is a function that takes a list containing real numbers and returns a polynomial which has the first number from the list as the coefficient of the highest power (n) term, the second as the coefficient of the (n-1)th term and so on till the last (the constant) term. For example:

import numpy as np

print(np.poly1d([1,-2,3]))
>>>    2
    1 x - 2 x + 3

That is, `1x^2 - 2x + 3`. Default symbol for the variable is 'x'. We can change it with the 'variable= str' argument:

An example of utilising polyfit() and poly1d() functions:

import numpy

#Assume T and CP are the temperature and cp lists respectively.

degree = 3 #desired polynomial degree
p = numpy.polyfit(T,CP,degree)

'''
p will be a list containing the 4 coefficients of a 3-degree polynomial that will fit the data.
'''

print(p)

>>>
[ 1.90608267e-08 -1.38124291e-04  3.87891016e-01  8.38896565e+02]


print(numpy.poly1d(p, variable='t') #the default variable is x

>>>        3             2
1.906e-08 t - 0.0001381 t + 0.3879 t + 838.9

1.6 linspace()

The linspace(start_number,end_number,number_of_elements) function from the numpy module creates a line space of equally spaced values.

It is obvious that we try to fit a curve to some data in order to approximate or interpolate some missing values from the data set as well as extrapolate the values based on the trend of the curve. While fitting a curve, we determine an equation from the data set and then plot the curve which is beyond the region of the given data set. The 'data' file contains 3200 entries, i.e., 3200 x-points and the corresponding 3200 y-points. However, when we plot a curve we fix the x-values as per our choice to get the desired y-values.

import numpy

example = numpy.linspace(1,10,5)
print(example)
#example becomes an array (a numpy.darray)
>>>
[ 1.  3.25  5.5  7.75  10. ]


Tany = numpy.linspace(200,3599,3400)
print('There are',len(Tany),'elements in T_any') #len() calculates the length
print(Tany)

>>>
3400
[ 200.  201.  202. ... 3597. 3598. 3599.]

 

2) Linear and Cubic polynomial Fitting to the 'data' file Using curve_fit(). 

Using the curve_fit() function, we can easily determine a linear and a cubic curve fit for the given data. 

2.1 Main Code:

 

#Linear and Polynomial Curve Fitting.

#1)Importing Libraries
import matplotlib.pyplot as plt					#for plotting. Aliasing matplotlib.pyplot as 'plt'.
import numpy as np       					    #for creating array. Aliasing numpy as 'np'.
from scipy.optimize import curve_fit as cf 		#for the curve fit function. Aliasing curve_fit as 'cf'.

# 2) Defining Fucntions:

#	2a) Defining Linear Polynomial:
def linear(t,a,b):  #t,a and b are the input arguments. a and b are the coefficients and t the variable.
	return a*t + b

#	2b) Defining Cubic Polynomial:
def cubic(t,a,b,c,d):
	return a*pow(t,3) + b*pow(t,2) +c*t + d 	#returns the cubic function in equation form.


#	2c) Extracting Temperature and cp From 'data' File:
def file_reader():

	temp=[] #defining the type of the temp variable, i.e., creating an empty list.
	cp=[]	#creating an empty list named cp.

	#Data Extracting Loop:
	for line in open('data','r'): 	  #Runs (open) and reads ('r') each line from 'data' file and stores info in the variable 'line'.
		temp_cp = line.split(',')     #splits each line by the comma (',') and stores in a 2 element list 'temp_cp'.
		temp.append(int(temp_cp[0]))  #select the first entry from each line and append to 'temp'. First entries are temperatures.
		cp.append(float(temp_cp[1]))  #select the second entry from each line and append to 'cp'. Second entries are cps.
	return temp,cp 			     	  #return list containing temperatures and cp respectively. 

# 3) Calling the file reader function:
T, CP = file_reader()    #the returned values from 'file_reader' are stored in 'T' and 'CP' respectively.

T_any = np.linspace(200,3599,3400) #adding 100 values from lower and upper side of T

# 4)Linear Curve Fitting:
#	4a) Getting Coefficients for the linear curve:

popt1, pcov = cf(linear,T,CP)


''' 
cf is the scipy's curve_fit fucntion which takes in the fucntion to which we want to fit the curve, x entries and y entries respectively.
cf returns the esitmated coefficients for the linear equation we provide and stores them in 'popt1'. The second variable, 'pcov' stores the
covariance matrix that the curve_fit function returns as the second return output.
'''
#	4b) Getting y values from the estimated function:
fit_linear = linear(np.array(T_any),*popt1)

'''
fit_linear stores a list of values returned by the 'linear' function which takes the temperature values with the help of 'np.array' function.
The rest input arguments to 'linear', i.e., the coefficients are input using the '*' operator. Here, popt1 has the two coefficients required 
for the linear equation and *popt1 passed these values. Thus, '*' operater is used to pass a list in this example.
'''

# 5)Cubic Curve Fitting:
	
popt2,pcov = cf(cubic, T, CP)
fit_cubic = cubic(np.array(T_any),*popt2)


'''We do not require the values in pcov in this program, so it better to overwrite the previous value from the Linear fit code and save some 
memory. However, we do require the coefficients for each linear and cubic polynomials later in the program to promp the equation of the 
curve, so, naming them differently is a must.'''

# 6) Creating Legends:

linear_legend = 'Linear Fit'+ str(np.poly1d(popt1,variable='t')) #creates a string for symbolic linear equation.  
cubic_legend = 'Cubic Fit'  + str(np.poly1d(popt2,variable='t')) #creates a string for symbolic cubic equation.

# 7) Plotting the Linear and Cubic Curve_fits Onto The Actual Curve:

plt.plot(T,CP,'r', linewidth=4)	
#plots the original curve with T on x-axes and CP on y-axes. The curve has red ('r') colour and a linewidth of 4 for better visuals.

plt.plot(T_any,fit_linear,'g') 	#plots T on x-axes and fit_linear on y-axes with a colour green.
plt.plot(T_any,fit_cubic,'b')	#plots the cubic curve with a blue colour.
plt.xlabel('Temperature')	#labels the x-axes.
plt.ylabel('Specific Heat Capacity (cp)')				#Labels the y-axes.
plt.title('Linear and Cubic Curve Fitting')				#Creates the title of the graph.
plt.legend(['Actual Curve',linear_legend,cubic_legend])	#adds legend, i.e., the names of the curves in the respective order of plotting.
plt.grid()
plt.show()

 

 

Output Figure:

Linear and Cubic Polynomial Curve Fits                                                     Fig. 2.1: Linear (green) and Cubic (blue) Polynomial Fit Against the Data Set (red). 

2.2 More on Importing, Aliasing and Methods:

Often we import modules or packages which have a long name (e.g., matplotlib.pyplot). Calling the module name every time we need to use, makes or code unnecessarily long and tedious. This is where the aliasing comes to rescue. While importing a library, package or module we can simply rename (alias) it with some shorter name. This can be used with the pythons 'as' function. In the above code, I have aliased the matplotlib.pyplot as 'plt'. One can even alias with a single character as long as it doesn't match with some existing function or method used in the script. 

Methods

In python, each module has its own set of functions called methods. The basic version of python has inbuilt methods like sum(), append() etc. To utilise these we declare them after a variable name using a dot. For example, in the above code, I have used the 'append' method to add entries in the temperature and the cp list. However, when have to use a method that we have imported from a library, we need to call on the library name (or the alias) before and then, with the help of the dot connector, use the method from the same library, e.g., 'np.array', 'plt.plot'. We cannot use 'plt.array' because matplotlib.pyplot has no attribute named as 'array'.

Further Shortening 

In the above code, you can see two methods of importing libraries or functions. When we only type e.g., "import matplotlib.pyplot as plt", we need to specify plt.plot(...) or plt.title(), i.e., while just using the import call only, we need to type the module name (or the aliased name) everytime we need to use it. However, if we only need a single or few attributes/methods from a package, we can use the 'from - import' method. For example, while utilising the curve_fit() function from the SciPy package, we just need to import the (particular) curve_fit() function by typing 'from scipy.optimize import curve_fit'. This has an advantage that whenever we require the curve_fit(), we do not require to write scipy.optimize.curve_fit(...), but, simply use the function direct way i.e., curve_fit(...). However, the disadvantage of importing a single (or few functions separated by commas) decreases the utility of the package, and only the called functions will be available to work with. 

 

3) Goodness of Fit

In order to find the best fit for the curve, it is obvious that higher the order of the polynomial, better will be the fit. However, theoretically, if there are n+2 data points, the best fit would be one that passes through all the points. However, this can lead to the generation of a polynomial of the nth degree. This is indeed not a proper way of fitting a curve. Only increasing the degree of the polynomial isn't very practical.

Using statistical techniques like regression analysis, we can easily determine whether a curve fitted is good enough or bad. A curve returned by the 'curve_fit' function is determined by non-linear least squares method. Least squares is a statistical method used to determine a line of best fit by minimizing the sum of squares created by a mathematical function. However, there are better tools that can determine explicitly whether a curve fit should be acceptable or not. 

3.1 RMS ERROR

This is perhaps one of the simplest ways of determining the goodness of a fit. In this method, the square root of the mean of the sum of the differences between the estimated y values and the actual y values is calculated. Mathematically,

`E_(RMS) =  sqrt((sum(Y-y)^2)/n)`

RMS ERROR CODE:
#RMS ERROR

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit as cf


def linear(t,a,b):
	return a*t + b


def cubic(t,a,b,c,d):
	return a*pow(t,3) + b*pow(t,2) +c*t + d


def file_reader():

	temp=[]
	cp=[]
	for line in open('data','r'):
		temp_cp = line.split(',')
		temp.append(int(temp_cp[0]))
		cp.append(float(temp_cp[1]))
	return temp,cp


T, CP = file_reader()

T_any = np.linspace(300,3499,3200) 

'''During regression, we can compare only when len(T) = len(T_any). This is because
that we need to compare the existing/known y-values from the predicted y-values. Even though it seems that T_any is equal to T at each point, it behaves differently if we just plug in T in place of T_any. IF YOU KNOW THE ANSWER OF THE ABOVE PROBLEM, KINDLY POST IT IN THE COMMENTS'''


popt, pcov = cf(linear,T,CP)
fit_linear = linear(np.array(T_any),*popt)
	
popt,pcov = cf(cubic, T, CP)
fit_cubic = cubic(np.array(T_any),*popt)
'''Note this time, we do not require the popt values so, we can overwrite them after getting the fit_ arrays'''

# 8) Root Mean Square Error:
n=len(T)	   #length of the T array
rmsL = np.sqrt(np.sum((pow((np.array(fit_linear)-CP),2)))/n) #_L for linear curve
rmsC = np.sqrt(np.sum((pow((np.array(fit_cubic)-CP),2)))/n)  #_C for cubic curve
RMS = [rmsL,rmsC] #makes a list of two errors

x=[1,3] #x contains the polynomial degrees. Here, 1 and 3.
plt.plot(x,RMS,'--or') #plot dashes, points as 'o' and red colour
plt.title("RMS ERROR")
plt.xlabel('Polynomial Degree')
plt.ylabel('RMS Error')
plt.grid()		#makes grid lines visible.
plt.show()

print('RMS Error for linear fit = %3.5f and for cubic fit = %3.5f'%(tuple(RMS)))


>>> 
RMS Error for linear fit = 25.99909 and for cubic fit = 29.42028
 
Figure Output: 
RMS Error for linear and cubic curves                                                                Fig. 3.1: RMS Error For The Linear And Cubic Fit

From the above plot, it is clear that RMS value increases as the polynomial degree increases. However, it doesn't give us the ability to decide whether the linear curve is acceptable or not. 

 

3.2 CHI Squared (R-Squared)

In the Chi-Square method, we have to determine the following things from the data and the curve fit.

1) Mean `Y_mean`; where Y represents the dependent variable values in the estimated curve.

2) TSS. Total Sum of Squares. Gives the squared difference between the actual and the mean values of the actual dependent variable points. Mathematically, 

`TSS = sum(y - Y_(mean))^2` ; where `y` represents the values of the dependent variable in actual data.

3) RSS, Residual Sum of Squares. Gives the difference between the actual and the estimated values of the dependent variable points. The aim is to minimize the residual sum. This is a property which the model (curve fit) cannot explain. Mathematically,

`RSS = sum(y - Y)^2`

4) ESS, Explained Sum of Squares. Gives the difference between the mean and the estimated values of the dependent variable points. Mathematically,

`ESS = sum(Y_(mean) - Y)^2`

Also, `ESS + RSS = TSS`. 

5) `R^2`, R-Squared. R Squared value gives the approx. value of chi, for the Chi goodness test. R^2 is expressed in terms of percentage. Higher the value of the R^2, better is the fit. An 'A' percentage of R^2 determines that the predicted curve can explain (or predict) 'A'% of the behaviour of the estimated curve or model. Generally, the R^2 percentage value should not be less than 95% for the curve to be accepted. Mathematically,

`R^2 = 1 - (RSS)/(TSS)`

In other words, the R-squared value gives us the degree to which we can explain our model.

Code: After we estimate our linear_fit and cubic_fit function by the main code, the code to determine `R^2` values is:

#...continued from Main Code
T, CP = file_reader()
T_any = np.linspace(300,3499,3200)

popt, pcov = cf(linear,T,CP)
fit_linear = linear(np.array(T_any),*popt)
	
popt,pcov = cf(cubic, T, CP)
fit_cubic = cubic(np.array(T_any),*popt)

# 9) R-Squared Test
n=len(T)	   #length of the T array


tss = np.sum(pow((CP - np.mean(CP)),2)) 
'''total squared sums. Unique for the data. This doesn't depend on the cureve fit'''

rss_l = np.sum(pow((CP - np.array(fit_linear)),2)) #residual squared sums for linear fit
r2_l = (1-(rss_l/tss)) #r-Squared value for linear fit

rss_c = np.sum(pow((CP - np.array(fit_cubic)),2)) #c for cubic
r2_c = 	(1-(rss_c/tss))


R2 = [100*r2_l,100*r2_c] #making a list of the two r-squared values as percentages
 
x=[1,3] #x contains the polynomial degrees. Here, 1 and 3.
plt.plot(x,R2,'--om') #dashed magenta lines with circles at data points
plt.title("R^2 ~ Chi^2 Value")
plt.xlabel('Polynomial Degree (Percentage)')
plt.ylabel('R^2 ~ Chi^2')
plt.grid()
plt.show()

print('The R^2 value for linear fit = %2.2f%%  and for cubic curve = %2.2f%%' %tuple(R2)))

'''To supress general operators, we usually require a backslash before the operator.
However, to supress the % symbol, i.e., escape it, we need to use the % symbol twice.'''


>>>
The R^2 value for linear fit = 92.49%  and for cubic curve = 90.38%
 
Figure Output:
Rsquarede or CHi-squared values                                                                Fig. 3.2: R-Squared Values for Linear and Cubic Fits.

From the above results, it is clear that the linear curve fit has the `R^2` value less than 95% while the cubic curve fit has even lesser `R^2` value. Clearly, the linear-fit is a better fit with respect to cubic fit returned by the curve_fit() function of scipy module. Thus, comparatively, the linear fit fairs far better than the cubic fit. However, neither curves pass the Chi-Squared test (should be > 95%).

 

4) Curve Fitting Using polyfit() Function 

Just like the curve_fit() function from the SciPy module, there is a function called polyfit() from the NumPy module. The primary difference between the two functions is that the curve_fit() function needs the definition of a mathematical function to which we want the data to fit. The function can be polynomial, exponential logarithmic or any other suitable equation. However, the polyfit() function is only used to fit a polynomial to some data. The other difference is that the method used by the curve_fit() function is non-linear least squares, while polyfit() is a least-square polynomial fit. Thus, polyfit() should be used wherever there is an explicit indication that the curve must be a polynomial. Syntax example is given in section 1.4 and 1.5.

For Polyfit() curve fitting, I have selected polynomials of degrees 1-5, created a gif to animate all 5 plots with the help of Imageio library, and finally utilised the regression analysis to plot  RMS error, Standard deviation and the R-Squared Test.

Imageio: Imageio is a Python library that provides an easy interface to read and write a wide range of image data, including animated images, volumetric data, and scientific formats. [http://imageio.github.io]

 

Program Code:

#Animating Curve Fitting Using Polyfit() and Goodness of fit

# 1) Importing Libraries/Modules

import matplotlib.pyplot as M 		#aliased as M
import numpy as N 					#aliased as N
import imageio as img 				#the animation module 

# 2) Extracting Temperature and CP. (file parsing)

def file_reader():
	
	temp=[]
	cp=[]
	for line in open('data','r'):
		temp_cp = line.split(',')
		temp.append(int(temp_cp[0]))
		cp.append(float(temp_cp[1]))
	return temp,cp

T, CP = file_reader()
n=len(T)
Tany = N.linspace(300,3499,3200)  #line space for X-axis


# 3) Plotting Curve Fits

#	3.1 Initialisation/Preallocation

ERROR=[] 	#error list
SD=[]		#standard deviation list
R2=[]		#R-squared list
images=[]	#curve-plot list



for i in range(1,6):	
	
    # 3.2) Calling polyfit and making equation

	p1 = N.polyfit(T,CP,i)  #calling the function to obtain coefficients
	fit_eq= N.poly1d(p1)	#making an equation from p1 with x as variable
	
	'''
	example: for the first iteration the: 
	print(fit) 
	would result in:
	>>> 0.09875 x + 995.2
	'''
	# 3.3) Equating values for each Tany in equation stored in 'fit_eq'
	Fit=[]
	for j in range(n):
		value=fit_eq(Tany[j])
		Fit.append(value)
	
	# 3.3) Calculating RMS, R^2 and SD

	err = N.sqrt(N.sum((pow((N.array(Fit)-CP),2)))/n)
	ERROR.append(err)

	tss = N.sum(pow((CP - N.mean(CP)),2))
	rss = N.sum(pow((CP - N.array(Fit)),2))
	
	r2 = 100*(1-(rss/tss))
	R2.append(r2)

	sd = pow((rss/(n-2)),0.5)
	SD.append(sd)
	
	# 3.4) Plotting Actual Curve and Curve Fits

	M.plot(T,CP,Tany,Fit,linewidth=3)
	M.xlabel('Temperature')
	M.ylabel('Specific Heat Capacity (cp)')
	M.title('Polyfit Curvefitting at degree = %s' %i)
	M.legend(['Actual Curve', 'Polyfit Curve'],loc=2)
	M.axis([0,4000,900,1400]) #fixes the axes for better visualisation
	M.grid()
	

	filename ='C:/Users/Adnan/Desktop/Learn/Lync/Python/Programs/Polyfit/' +str(i) + '.png' #creates file names
	M.savefig(filename) 	#saves the figure
	images.append(img.imread(filename)) #appends the curve plots in list
	M.show()
	

# 3.5) Greating the Gif animation

duration = {'duration':2} #setting the duration
img.mimsave('C:/Users/Adnan/Desktop/Learn/Lync/Python/Programs/Polyfit/polyfit.gif',images,**duration) #saves the gif

# 4) Plotting RMS, R^2 and SD

#	4.1) RMS
x=N.linspace(1,i,i)
M.plot(x,ERROR,'-om')
M.title('RMS Error Vs Polynomial Degree')
M.xlabel('Polynomial Degree')
M.ylabel('RMS Error')
M.grid()
M.show()

#	4.2) SD
M.plot(x,SD,'-ob')
M.title('Standard Deviation Vs Polynomial Degree')
M.xlabel('Polynomial Degree')
M.ylabel('Standard Deviation')
M.grid()
M.show()

#  4.3) R2 
M.plot(x,R2,'-ok')
M.title('R2 Value Vs Polynomial Degree')
M.xlabel('Polynomial Degree')
M.ylabel('R2')
M.grid()
M.show()

 

Result Outputs:

Curve Fits:

Gif Polyfit degree 1-5                                                                  Fig. 4.1: Gif Showing Polynomial Fits At Degree 1-5 

RMS Error:

RMS Error for Polyfit Curve Fitting                                                                     Fig. 4.2: RMS Error Vs Polynomial Degree Plot

Standard Deviation:

R_Sqared Test for Polyfit Curve Fitting                                                                    Fig. 4.3: Standard Deviation Vs Polynomial Degree Plot

R-Squared Test: 

Standard Deviation for Polyfit Curve Fitting                                                                    Fig. 4.4: R-Squared Values Vs Polynomial Degree Plot

 

From the above plots, it is clear that the polynomial of degree 2 returned by the polyfit() function has an `R^2` value (98.12) which is more than the `R^2` value of degree 1 polynomial (92.49). As per the R-Squared Test, the polynomial of degree 2 is acceptable.

Equation:

The equation of the 2nd-degree polynomial can be obtained by the poly1d() function as:

'''
Followed from the program in section 4
'''

p = N.polyfit(T,CP,degree)
print((N.poly1d(p, variable='t'))) #setting the variable (temperature) to 't'


>>>
            2
-2.951e-05 t + 0.2108 t + 913.9

 

NOTE: For the above polynomial, the Standard Deviation is about 13. This means that the predicted values may have a `+-` 13 error. It is up to the person to decide whether this value is acceptable or not. If this amount of deviation is acceptable, then the curve is definitely the best fit with the least polynomial degree. 

5) Piecewise Curve Fitting

 Many times, it is not possible to fit a single curve to the data. It may be the case that the curve behaves like a polynomial in a certain range, while it might show some exponential behaviour in other range.  In that case, it is better to fit different curves for different regions of the data. 

The primary analysis requires identifying the ranges over which the data needs to be divided into. Then fitting the requisite curve to each part. For example, for the data given in the 'data' file, I found that breaking the temperature range into four parts and then devising a polynomial of degree 2 for each part resulted in a very close fit. The actual curve is shown in Fig. 5.1.

Actual Curve                                                                    Fig. 5.1: Actual Data Curve. Cp vs Temperature

 

Code:

#Piece-Wise Polynomial Curve Fitting Degree (1-5)

# 1) Importing Libraries/Modules
import matplotlib.pyplot as plt  					#for plotting
import numpy as np      					        #for creating array
from scipy.optimize import curve_fit as cf 			#for the curve fit function


# 2) Extracting Data

def file_reader():

	temp=[]
	cp=[]
	for line in open('data','r'):
		temp_cp = line.split(',')
		temp.append(int(temp_cp[0]))
		cp.append(float(temp_cp[1]))
	return temp,cp

T,CP = file_reader()

# 3) Creating Fucntions for the fit 

def poly1(t,A,B):
	return A*t + B

def poly2(t,A,B,C):
	return A*pow(t,2) + B*t + C

''' 
If there exists a part in the curve where it behaves exponentially, then the defition could be like:
def expo(t,A,B,C):
	return A*np.exp(-B*t) + C
Similarly any equation that seems to fit can be passed to the curve_fit function according to the need.
'''


# 4) Splitting the data into ranges 


# 	4.1) Part1 T = 300-600

function1=poly2 	 #calls on the desired function
T300_600=T[:300]	 #uses only the first 300 values
CP300_600=CP[:300]

T1 = np.linspace(300,600,300) 	           #creates a line space equal to the length of T300_600
p1, p2 = cf(function1,T300_600,CP300_600)  #p1= popt and p2 = pcov
fit300_600 = function1(T1,*p1)			   #gives the estimated equation for the set values


# 	4.2) Part2 T = 600-1000
function2 = poly2
T600_1000=T[300:700]
CP600_1000=CP[300:700]

T2 = np.linspace(600,1000,400)
p1, p2 = cf(function2,T600_1000,CP600_1000)
fit600_1000 = function2(T2,*p1)


# 	4.3) Part3 T = 1000-1500
function3 = poly2
T1000_1500=T[700:1200]
CP1000_1500=CP[700:1200]

T3 = np.linspace(1000,1500,500)
p1, p2 = cf(function2,T1000_1500,CP1000_1500)
fit1000_1500 = function3(T3,*p1)


# 	4.4) Part4 T= 1500-3499
function4 = poly2
T1500_=T[1200:]
CP1500_=CP[1200:]

T4 = np.linspace(1500,3500,2000)
p1, p2 = cf(function2,T1500_,CP1500_)
fit1500_ = function4(T4,*p1)


# 5) Plotting the Actual Curve and the 'Piecefits'

plt.plot(T,CP,'k',linewidth=10)           #Linewidth exaggerated for better visualisation
plt.plot(T1,fit300_600,'g', linewidth=3)
plt.plot(T2,fit600_1000,'m', linewidth=3)
plt.plot(T3,fit1000_1500,'y', linewidth=3)
plt.plot(T4,fit1500_,color='r', linewidth=3)
plt.legend(['Actual Curve', 'Part1','Part2','Part3','Part4'])
plt.xlabel('Temperature')
plt.ylabel('Cp')
plt.title('Piecewise Polynomoial Curve Fitting')
plt.grid()
plt.show()

 

Output:

Piecewise Curve Fitting                                                                           Fig: 5.2: Piecewise Curve Fitting.

 

Of course, the piecewise curve fitting technique is not suitable in the above case as the ranges selected can be arbitrary and imprecise. However, there are certain occasions where the erratic data leads one to use this technique. 

 

*** END***


Projects by Adnan Zaib Bhat

1) File Parsing Definition: Parse essentially means to ''resolve (a sentence) into its component parts and describe their syntactic roles''. In computing, parsing is 'an act of parsing a string or a text'. [Google Dictionary]File parsing in computer language means to Read more

1) Integration/Area Under Curve 1.1 PV Diagram  In thermodynamics, a PV diagram is a plot which shows the relationship between the pressure and volume for a particular process.  We know that ` dw = p.dv` is the small work done by the process at a particular Read more

Problem: Minimize: `5-(x-2)^2 -2(y-1)^2`; subject to the following constraint: `x + 4y = 3` 1) Lagrange Multipliers Lagrange multipliers technique is a fundamental technique to solve problems involving constrained problems. This method is utilised to find the local Read more

1) Air Cushion Vehicle  [source: https://bit.ly/2z5LgIl] A hovercraft, also known as an air-cushion vehicle or ACV, is an amphibious craft capable of travelling over land, water, mud, ice, and other surfaces. Hovercraft use blowers t Read more

Otto Cycle An Otto cycle is an idealized thermodynamic cycle that describes the functioning of a typical petrol or spark-ignition piston engine (Otto engine). The Otto cycle is a description of what happens to a mass of gas as it is subjected t Read more

2D Animation In Python: A basic robotic arm has two essential parts: the main arm and the manipulator. The main arm is the backbone or the support and can rotate with the base and lean in or out based on the requirements. Thus, the arm defines the reach of the robot. T Read more

This program code in python helps to visualise the effect of the drag forces on a moving body at different speeds and at different drag coefficients.   Drag Force The drag force is a type of fluid friction or resistance which is experienced by a body moving Read more

**This project parses the NASA Polynomials file in an interactive programme where the user can enter the name of a species (which exists in the data file), and the code creates and stores the plots of specific heats, entropies, and enthalpies of the species in part Read more

1. Genetic Algorithm A Genetic Algorithm is an optimisation technique which is used to minimise a certain function. When used in optimisation, generally its purpose is to minimise the error of a function. It is often used in curve fitting. This algorithm is n Read more


Loading...

The End