Linear Regression Based Lead Seven Day Maximum and Minimum Air Temperature Prediction in Chennai, India

The surface temperature is the key determinant for vegetation, animals and human livelihood in a particular location of earth. Timely prediction of minimum and maximum temperature will help in planning and governing very hot and very cold climate. In this study numerical weather parameters based lead seven day minimum and maximum temperature prediction models using multiple linear regression is developed at the location Chennai, India. The result of the analysis states that regression based minimum temperature prediction models provide better accuracy than maximum temperature forecast models with the highest R 2 and lowest MAE, RMSE in independent test dataset. The analysis also emphasizes that the prediction performance is good at smaller lead days and it decreases gradually to higher lead days for both minimum and maximum temperature.


INTRODUCTION
Weather and climate influence the lives of human, animals and vegetation.Particularly temperature has major impact in our lives.Heat exhaustion or heat stroke takes thousands of lives during summer every year all over the world.Over 500 thousand chickens perished in Georgia alone during a two-day period at the peak of the summer heat (Donald, 2011).
Accurate and timely prediction of temperature will help to take precautionary measures.Accurate calculation of what the atmosphere will do in the future is challenging because of the dynamic nature of the atmospheric environment influencing the temperature observed in the earth surface.The objective of this study is to develop single point minimum and maximum temperature prediction models using Multiple Linear Regression (MLR).Scientific Community has recommended many linear and nonlinear temperature prediction techniques, but still MLR is selected since linear models often produce better forecasts than nonlinear models even when the data are nonlinear (Chatfield, 2009) and also statistical schemes require little computation time to make a forecast.
Statistical forecasting techniques, MOS is a powerful tool which generates models based on linear regression which has given significant results in forecasting maximum and minimum temperature (Taylor and Leslie, 2005).Stepwise linear regression approach is used for estimating daily maximum and  (Shengpan et al., 2012).Shengpan et al. (2012) in the study correlates surface temperature with air temperature and tries to predict the minimum and maximum air temperature.In short-term (6 to 24 h) single-station forecasts using a multiple regression model for predicting temperature anomalies, the RMSE of the temperature forecasts is of 1.78°C for the 6-h forecast and 2.28°C for the 12-h forecast (Christoph et al., 1999).In this study, regression models are created using the predictors listed in Table 1 to forecast the maximum and minimum temperature.

Data:
The atmospheric parameters ( where, y C = The predicted variable The data set comprises of nine atmospheric parameters, one dependent and eight independent parameters as shown in Table 1.Fourteen regression models are formulated seven each for T min temperature and T max temperature prediction models are devised.Formulated models fitness is measured by coefficient of determination (R 2 ), it shows how well the independent variables explain variation in the dependent variable (Wilks, 2006).
The prediction model performance is validated by deploying the models with independent verification dataset and calculating MAE, RMSE and observed vs. predicted correlation coefficient.The MAE is the arithmetic average of the absolute values of the differences between the members of each pair and RMSE is the square root of average squared difference between the forecast and observation pairs.The forecast is perfect if MAE and RMSE are equal to zero.Correlation coefficient between observed and predicted value is another accuracy measure for validating the models.The absolute error or the residual e C is obtained by: e where, f C = The observed value y C = The predicted value The Mean Absolute Error (MAE) is used to measure how close forecasts are to the eventual outcomes.The MAE is given by: Root Mean Square Error (RMSE) measure of the differences between predicted value and the values actually observed.It is given by: Daily minimum temperature is used as eighth predictor of minimum temperature prediction models and daily maximum temperature is used as eighth predictor of maximum temperature prediction models.

RESULTS AND DISCUSSION
The performance of the devised forecast models accuracy is assessed by deploying the models with one year data (1995).

Maximum temperature forecast models assessment:
The performance of the maximum temperature forecast models for lead day one and lead day seven for maximum temperature is summarized in Table 2 and Fig. 1.The forecast given by lead day one forecast model is 90% correlated with the observed.The lead days two and three predictions are 85% correlated with the observed.Figure 4 shows that the predictions results are much correlated with the station observations.Figure 1b compares the MAE and RMSE of the models.The analysis on MAE demonstrates that the performance degrades as the lead rises.The RMSE calculated for these models also justify the above point.
The coefficient of determination also highlight that the MLR models has better fit for lead day one are 81 and 59% best-fit on seventh day (Fig. 5).

CONCLUSION
The main intension of the study is to propose short term minimum and maximum temperature estimation models in densely populated urban area using multiple linear regression.Foretelling of temperature will help public as well governing authorities to take necessary precaution to handle sour heat and cold weather.Among the models formulated, the models for lesser lead day's produces better accuracy with least MAE, RMSE and higher correlation between observed and predicted.The coefficient of determinant of the models also states the same.The study also suggests that the forecast accuracy is higher for minimum temperature when compared with maximum temperature prediction, the correlation between the temperature and the atmospheric parameters selected for analysis decreases as the lead increases.Although the methodology employed in this study has given significant performance for least lead days, the correlation of the accuracy for greater lead days should be further refined.
β " = The intercept β # = Measures the change in y C with respect to x C# β = Measures the change in y C with respect to x C x C# … … .x C = Predictor variable and ε C the error Models: Minimum and maximum forecast models for predicting the minimum temperature (T min ) and maximum temperature (T max ) that may be felt in next 7 days are devised using the dataset from 1996 to 2003.

Fig. 5 :
Fig. 5: Association between observed and predicted maximum temperature (°C) (T max ), (a-c) lead day one through day three, (df) lead day five to seven

Table 1 :
Predictors used to formulate prediction models obtained from National Data Centre of National Centre for Environmental Prediction (NCEP), USA.(http://www.ncdc.noaa.gov/oa/ncdc.html).For the present study, the overall period used for analysis covers for a duration of nine years(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003).The data from 1996 through 2003 are used for the training purpose and dataset of one year (1995) are used to validate the performance of the derived models.
Table 1) recorded daily in Chennai, India.(Latitude: 13°4'7.3"N, Longitude: 80° 14'48.33"E) are used as predictors to forecast next seven days minimum temperature and maximum temperature.The observed predictor dataset for analysis is

Table 2 :
Performance summary of all regression models formulated