Curve fitting for specific heat data of Oxygen

Problem Statement:

  • Write a code to fit a linear and cubic equation to specific heat data for oxygen
  • Estimate goodness of fit for each degree of polynomial
  • improve the fit for cubic polynomial

 

Theory:

Curve fitting - 

Curve fitting is a method that uses numerical techniques to create a polynomial for the given set of observed data.  The objective of curve fitting is to find the parameters of a mathematical model that describes a set of (usually noisy) data in a way that minimizes the difference between the model and the data. The most common approach is the "linear least squares" method, also called "polynomial least squares", a well-known mathematical procedure for finding the coefficients of polynomial equations that are a "best fit" to a set of X,Y data. A polynomial equation expresses the dependent variable Y as a polynomial in the independent variable X, for example as a straight line (Y = a + bX, where a is the intercept and b is the slope), or a quadratic (Y = a + bX + cX2), or a cubic (Y = a + bX + cX2 + dX3), or higher-order polynomial. Those coefficients (a, b, c, etc) can be used to predict values of Y for each X. In all these cases, Y is a linear function of the parameters a,b,c, and/or d. This is why we call it a "linear" least-squares fit.

Goodness of fit-

Data from the created polynomial can be compared to the experimental data by numerical and graphical methods. Plotting residuals and prediction bounds are graphical methods that aid visual interpretation, while computing goodness-of-fit statistics and coefficient confidence bounds yield numerical measures that aid statistical reasoning. The numerical methods to compute goodness of fit are as follows - 

 

  • The sum of squares due to error (SSE)

  • R-square

  • Adjusted R-square

  • Root mean squared error (RMSE)

 

Sum of squares due to error

This statistic measures the total deviation of the response values from the fit to the response values. It is also called the summed square of residuals and is usually labeled as SSE.

A value closer to 0 indicates that the model has a smaller random error component, and that the fit will be more useful for prediction.

R-square

This statistic measures how successful the fit is in explaining the variation of the data. Put another way, R-square is the square of the correlation between the response values and the predicted response values. It is also called the square of the multiple correlation coefficient and the coefficient of multiple determination.

R-square is defined as the ratio of the sum of squares of the regression (SSR) and the total sum of squares (SST). SSR is defined as

   

SST is also called the sum of squares about the mean, and is defined as

where SST = SSR + SSE. Given these definitions, R-square is expressed as

   

R-square can take on any value between 0 and 1, with a value closer to 1 indicating that a greater proportion of variance is accounted for by the model. For example, an R-square value of 0.8234 means that the fit explains 82.34% of the total variation in the data about the average.

Adjusted R-square

This statistic uses the R-square statistic defined above, and adjusts it based on the residual degrees of freedom. The residual degrees of freedom is defined as the number of response values n minus the number of fitted coefficients m estimated from the response values.

       v = n – m

v indicates the number of independent pieces of information involving the n data points that are required to calculate the sum of squares. Note that if parameters are bounded and one or more of the estimates are at their bounds, then those estimates are regarded as fixed. The degrees of freedom is increased by the number of such parameters.

The adjusted R-square statistic is generally the best indicator of the fit quality when you compare two models that are nested — that is, a series of models each of which adds additional coefficients to the previous model.

 

The adjusted R-square statistic can take on any value less than or equal to 1, with a value closer to 1 indicating a better fit. Negative values can occur when the model contains terms that do not help to predict the response.

Root Mean Square Error

This statistic is also known as the fit standard error and the standard error of the regression. It is an estimate of the standard deviation of the random component in the data, and is defined as

        

where MSE is the mean square error or the residual mean square

 

Just as with SSE, an MSE value closer to 0 indicates a fit that is more useful for prediction.

 

Best fit” simply means that the differences between the actual measured Y values and the Y values predicted by the model equation are minimized. It does not mean a "perfect fit". For a perfect fit the dataset from the created polynomial would perfectly fit the experimental data and statistically this could correspond to a r-squared value of 1. Various methods such as 'Split-wise' method can be used to achieve a perfect fit.

 

About the program:

->Curve fitting a linear polynomial to data without splitwise

%curve fitting a linear polynomial for cp data 

clear all;
close all;
clc;

%Preparing the data
cp_data = load('data');
temp = cp_data(:,1);
cp = cp_data(:,2);

%curve fit
coeffs = polyfit(temp,cp,1);
predicted_cp = polyval(coeffs,temp);

%plotting
plot(temp,cp,'lineWidth',3,'color','b');
hold on
plot(temp,predicted_cp,'lineWidth',3,'color','r');
hold off
xlabel('temp[K]');
ylabel('Specific_heat[KJ/Kmol_K]');
legend('original data','curvefit data');

%measuring the goodness of fit (R-square method)
mean_cp_data = mean(cp);

%sum of squares of the regression
error_data_1 = predicted_cp - mean_cp_data;
error_data_1_squared = error_data_1.^2;
SSR = sum(error_data_1_squared);

%sum of square of errors
error_data_2 = cp - predicted_cp;
error_data_2_squared = error_data_2.^2;
SSE = sum(error_data_2_squared);

SST = SSR + SSE;

R_square = SSR/SST;
  • Data is loaded from a file in the same directory using the 'load' command and the matrix is split into two column vectors - temperature and specific heat.
  • 'polyfit' function is used to get output coefficients using input as (X-temp, Y-Specific_heat, N-degree of polynomial). The degree of polynomial is set to 1 to get a linear polynomial.
  • Using the 'polyval' function to which the inputs are output array of coefficients and the temperature values, we receive an output of new specific heat values obtained by solving the linear equation.
  • These new values and the actual specific heat values are plotted against the temperature values.
  • To measure the goodness of fit, the R-square statistical method is employed.
  • The obtained R-square value is 0.9249 showing that its a good fit but not a perfect fit.

Output-

Workspace(Showing the value of R-square)

-----------------------------------------------------------------------------------------------------------------

->Curve fitting a cubic polynomial to data without splitwise (warning issue)

%curve fitting a cubic polynomial for cp data 

clear all;
close all;
clc;

%Preparing the data
cp_data = load('data');
temp = cp_data(:,1);
cp = cp_data(:,2);

%curve fit
[coeffs] = polyfit(temp,cp,3);
predicted_cp = polyval(coeffs,temp);

%plotting
plot(temp,cp,'lineWidth',3,'color','b');
hold on
plot(temp,predicted_cp,'lineWidth',3,'color','r');
hold off
xlabel('temp[K]');
ylabel('Specific heat[KJ/Kmol_K]');
legend('original data','curvefit data');

%measuring the goodness of fit (R-square method)
mean_cp_data = mean(cp);

%sum of squares of the regression
error_data_1 = predicted_cp - mean_cp_data;
error_data_1_squared = error_data_1.^2;
SSR = sum(error_data_1_squared);

%sum of square of errors
error_data_2 = cp - predicted_cp;
error_data_2_squared = error_data_2.^2;
SSE = sum(error_data_2_squared);

SST = SSR + SSE;

R_square = SSR/SST;
  • Data is loaded from a file in the same directory using the 'load' command and the matrix is split into two column vectors - temperature and specific heat.
  • 'polyfit' function is used to get output coefficients using input as (X-temp, Y-Specific_heat, N-degree of polynomial). The degree of polynomial is set to 3 to get a cubic polynomial.
  • Using the 'polyval' function to which the inputs are output array of coefficients and the temperature values, we receive an output of new specific heat values obtained by solving the linear equation.
  • These new values and the actual specific heat values are plotted against the temperature values.
  • To measure the goodness of fit, the R-square statistical method is employed.
  • Once run the program shows a warning - 'Polynomial is badly conditioned'. "Badly conditioned" means that the solution of the system of linear equations critically depends on rounding errors due to the limited precision. 
  • Scaling and centering can be used to improve the numerical properties of the function as well as the fitting algorithm. 
  • Also, the obtained R-square value is 0.9967 showing that its a better fit than the linear polynomial but yet not a perfect fit.

Output-

Workspace(showing the R-square value)-

Command Window(Showing the warning)-

-------------------------------------------------------------------------------------------------------------

->Curve fitting a linear polynomial to data with splitwise technique

%Preparing the data
cp_data = load('data');
temp = cp_data(:,1);
cp = cp_data(:,2);

%obtaining input from user for number of splits
x = input('Enter the value of n:');
for i = 1:x
    if i==1
        n = 1;
        m = 3200/x;
        m = round(m);
        cp_split = cp(n:m);
        temp_split = temp(n:m);
    else
        n = (3200/x)*(i-1) + 1;
        n=round(n);
        m = n + (3200/x) - 1;
        m=round(m);
        cp_split = cp(n:m);
        temp_split = temp(n:m);
        
    end
  
  coeffs = polyfit(temp_split,cp_split,1);
  predicted_cp(n:m) = polyval(coeffs,temp_split);
end
predicted_cp = predicted_cp' ;

%to ensure no void values enter the obtained data array
b = find(~predicted_cp);
for i = 1:length(b)
    predicted_cp(b) = predicted_cp(b-1);
end 

%plotting temp V/s actual Cp and predicted/obtained Cp
plot(temp,cp,'lineWidth',3,'color','b');
hold on
plot(temp,predicted_cp,'lineWidth',3,'color','r');
hold off
xlabel('temp[K]');
ylabel('Specific_heat[KJ/Kmol_K]');
legend('original data','curvefit data');

%measuring the goodness of fit (R-square method)
mean_cp_data = mean(cp);

%sum of squares of the regression
error_data_1 = predicted_cp - mean_cp_data;
error_data_1_squared = error_data_1.^2;
SSR = sum(error_data_1_squared);

%sum of square of errors
error_data_2 = cp - predicted_cp;
error_data_2_squared = error_data_2.^2;
SSE = sum(error_data_2_squared);

%sum of square about the mean
SST = SSR + SSE;
%numerical statistic to assess goodness of fit 
R_square = SSR/SST;
  • Data is loaded from a file in the same directory using the 'load' command and the matrix is split into two column vectors - temperature and specific heat.
  • These sets of data have column a vector of size 3200 elements. To obtain a better fit the 'splitwise' technique needs to be employed this time.
  • The user is asked for the number of splits, the value of which is utilised by the for loop to divide the data set.
  • For each split of data, polyfit creates a new set of coefficients which are used by polyval to output new values of specific heat and store it in the predicted_cp array
  • These new values and the actual specific heat values are plotted against the temperature values.
  • To measure the goodness of fit, the R-square statistical method is employed.
  • Taking the no. of splits as 15, we achieved a perfect fit i.e., R-squared = 1.

Output-

Command Window(Asking for input for number of splits from user)-

Workspace(showing the R-square value)-

-----------------------------------------------------------------------------------------------------------------

->Curve fitting a cubic polynomial to data with splitwise technique (warning issue solved)

clear all;
close all;
clc;

%Preparing the data
cp_data = load('data');
temp = cp_data(:,1);
cp = cp_data(:,2);

%obtaining input from user for number of splits
x = input('Enter the value of n:');
for i = 1:x
    if i==1
        n = 1;
        m = 3200/x;
        m = round(m);
        cp_split = cp(n:m);
        temp_split = temp(n:m);
    else
        n = (3200/x)*(i-1) + 1;
        n=round(n);
        m = n + (3200/x) - 1;
        m=round(m);
        cp_split = cp(n:m);
        temp_split = temp(n:m);
    end
  [coeffs,~,mu] = polyfit(temp_split,cp_split,3);
  predicted_cp(n:m) = polyval(coeffs,temp_split,[],mu);
end
predicted_cp = predicted_cp' ;

%to ensure no void values enter the obtained data array
b = find(~predicted_cp);
for i = 1:length(b)
    predicted_cp(b) = predicted_cp(b-1);
end 

%plotting temp V/s actual Cp and predicted/obtained Cp
plot(temp,cp,'lineWidth',3,'color','b');
hold on
plot(temp,predicted_cp,'lineWidth',3,'color','r');
hold off
xlabel('temp[K]');
ylabel('Specific_heat[KJ/Kmol_K]');
legend('original data','curvefit data');

%measuring the goodness of fit (R-square method)
mean_cp_data = mean(cp);

%sum of squares of the regression
error_data_1 = predicted_cp - mean_cp_data;
error_data_1_squared = error_data_1.^2;
SSR = sum(error_data_1_squared);

%sum of square of errors
error_data_2 = cp - predicted_cp;
error_data_2_squared = error_data_2.^2;
SSE = sum(error_data_2_squared);

%sum of square about the mean
SST = SSR + SSE;

%numerical statistic to assess goodness of fit 
R_square = SSR/SST;
  • Data is loaded from a file in the same directory using the 'load' command and the matrix is split into two column vectors - temperature and specific heat.
  • These sets of data have a column vector of size 3200 elements. To obtain a better fit the 'splitwise' technique needs to be employed this time.
  • The user is asked for the number of splits, the value of which is utilised by the for loop to divide the data set.
  • For each split of data, polyfit creates a new set of coefficients which are used by polyval to output new values of specific heat and store it in the predicted_cp array
  • These new values and the actual specific heat values are plotted against the temperature values.
  • To measure the goodness of fit, the R-square statistical method is employed.
  • Taking the no. of splits as only 5 as compared to the 15 for linear, we achieved a perfect fit i.e., R-squared = 1.
  • Also, it can be observed that the warning has disappeared. This has been done by obtaining an additional output from the polyfit function called 'mu'. 'mu' is a two-element vector with centering and scaling values. mu(1) is mean(x), and mu(2) is std(x). Using thesevalues, polyfit centers x at zero and scales it to have unit standard deviation. This is used by the polyval funtion as an input to improve the numerical properties of the function as well as fitting algorithm. 

Output-

Command Window(Asking for input for number of splits from user & shows no warning signs)-

Workspace(showing the R-square value)-

-----------------------------------------------------------------------------------------------------------------

Conclusion:

  • Curve fitting can be used to analyse 'noisy data' that has no given physical relation.
  • Methods such as the R-square method can employed to understand the closeness to the actual data.
  • increasing the degree of output polynomial will increase the goodness of fit but your if model does not reflect the physics of the data, the resulting coefficients are useless. In this case, understanding what your data represents and how it was measured is just as important as evaluating the goodness of fit.

 

References:

https://in.mathworks.com/help/curvefit/evaluating-goodness-of-fit.html

https://terpconnect.umd.edu/~toh/models/CurveFitting.html

https://projects.skill-lync.com/projects/Curve-Fitting-and-criterion-to-choose-best-fit-41860


Projects by JAY DAMANI

Problem Statement: Write a code in MATLAB to optimise the stalagmite function and find the global maxima of the function using the built-in genetic algorithm function. Run variations by adding bounds and population size and run statistical studies on the a Read more

Problem Statement: Write a program in MATLAB to solve a second order 'Ordinary differential equation' that represents the equation of motion of a simple pendulum with damping and create an animation of this motion. Theory : The figure shows an idealized pendul Read more

Problem Statement: Write a program in MATLAB to plot the PV diagram and find the thermal efficiency of an engine running on the otto/air standard cycle. Theory: Otto Cycle -  In 1876, a German engineer, Nikolaus August Otto advanced the study of heat Read more

Flow over a bicycle
JAY DAMANI · 2020-02-11 21:27:34

Problem Statement: Write a program in MATLAB to obtain plots showing the relation of Drag force experienced by the bicylcle while passing through air and the relative velocity of the bicycle. Write a program in MATLAB to obtain plots showing the relation of Drag force Read more

Problem Statement: Write a program to create an animation in MATLAB showing the forward kinematics of a 2R robotic arm for prediction of manipulator/end effector's position and robot's reach and workspace. Theory: A robot is a programmable machine capable of carrying Read more


Loading...

The End