Statistical Analysis of Temperature Distributions in Vietnam

Summary

Research Overview

This presentation introduces an approach for analyzing temperature distribution data in Vietnam using Compositional Linear Regression (CLR) transformation and Distributional ANOVA (FANOVA). The research focuses on 30 years of daily maximum temperature data across different regions of Vietnam to study global warming patterns, treating the data as distributions rather than time series.

Methodology

Uses CLR transformation to convert compositional data to functional data in L2 space
Applies B-spline basis functions to transform discrete data points into functional data
Implements Distributional ANOVA to test differences between mean densities across groups
Employs symbolic linear regression to analyze temporal evolution of temperature distributions
Utilizes five test statistics (L2P, FD, CF, and others) for hypothesis testing

Data Analysis

Dataset contains 1,890 observations of daily temperatures over 30 years
Temperature ranges analyzed were between 12-40°C
Data was transformed from raw measurements to density functions
Rather than analyzing time series, the approach treats data as distributions of temperature values
The research examines six different regions in Vietnam

Key Findings

Temperature distributions have changed significantly over the 30-year period in most regions
Statistical tests confirm significant differences in temperature patterns between regions
The Mekong Delta shows notably different temperature patterns compared to other regions
Only the Central Highlands region didn't show significant temperature evolution
Results indicate a decrease in frequency of lower temperatures and increase in higher temperatures

Research Context

This study is part of a larger research project resulting in three papers:
- First paper: Analysis of temperature impact on rice production in Vietnam
- Second paper: Application of ICI to detect operators in terms of density
- Third paper: The current presentation on regional temperature differences
The research team acknowledges support from VIASM (Vietnam Institute for Advanced Study in Mathematics)

Notes

Transcript

And then we get the norm. So from all kind of define, we have a hint-based structure. And what Vandermeer has called a base way. So, how to work with the enemy in baseball? Actually, the idea is quite simple. We could not work nicely in baseball. I would like to take advantage of the functional data of the L-choice way. We all have introduced what we call the CLR transformation. Through the formula 6, we can transform one decimal pi into one decimal pi.

It shows the space is L2 0. So L2 0 is a subspace of L2 and with one more constraint. So it means that the dimension is reduced by one. The CLR is quite interesting because it is isometric, which means that you can go from B2 to the L2 space and back and forward.

So from here, we have one density by applying the CLR, we obtain all of the good properties from the functional attributes.

One more step comes from the data. So what we have data and we say that we have the functional data or the legislative data. Actually, what we have in fact... In the beginning, it is not a function. It is a data point. Big risk in terms of data point or in terms of the histogram. So we need one more step. It is called preprocessing step to transform from data into the basic data.

So that is quite normal and you have a different kind of basis to approximate the data to obtain the function. So in our approach, we start from the B-spline, with the order and also the note. And based on the B-spline, people tend to the work of the Machiavelli. They take the derivative and then they often call the Z-B-spline. So Z-B-spline is the basis of L2-0.

With the Z-base line, by taking the inverse of COR, we can obtain the basis of the base space. It means that we have our own structure for the true space. Now we can express one of our functions, either in the base space or in our true z-code, with two different kinds of bases.

It is an expression that tends to the linear combination. So that is the basic way on how to add to your choosable. Let's start with the Distributional ANOVA. Sometimes in our team, we call it ANOVA because it is a short name for the Distributional ANOVA.

So we start with the sample in baseplate, and we can rewrite it in terms of the equation 9, where the pi j is the mean density in group j, and u is the stochastic error project. And here, when we define the symbol here, it is called the expectation and the covariance in base space can be rewrite in terms of the silver inverse. And to take, to have the.

So, when we have this kind of shambles, we can read it, the more we can read it, the The problem of FANOVA in context of the distribution of the density. So first we need to define what the overload shambles mean. And similarly, we get the expression of between and between group, between and between mean square.

Yeah, so now, when we have the sample with group, we will go through three kinds of hypothesis. So first problem is that we consider we have only one sample. This means that Z is equal to 1. And you would like to test whether the mean density... Equation 10 can be rewritten in equation 11, so it means that we obtain something true in the goal.

So for this kind of test, we then can apply the test that has been built for the functional, for the FANOVA. It's called the norm approach or the BC approach. So two of them can be derived in terms of the distance and based on the agent value. In the paper of Kukaska, they say that the BC approach will be good in several cases, while in the other, you should use the other approach.

The second problem is that we have several groups. Later on, it means that we have 10 leaders in different regions in Vietnam. Now we would like to test whether the mean density of all groups are equal. Our alternative is to ask if the two are different. With the same strategy, once we have the hypothesis in base space, we can transform in the L2-V goal, and then we apply the funnel-guide context.

So with the help from Backcache, we can use the Backcache called SD-ANOVA. And there's one function called FANOVA-TEST. And here, the author has built... We are going to recognize the software to automate for us. In our paper, we focus on five tests called the L2P, FD, CF.

So, the tests I did for five tests can be solved with Z-table. And three of them can be written in terms of norm in the basic way. While the other we need to write in the detailed formula. And people usually use a bookshelf to calculate.

So for Zip, we only apply from the software. We do not do any new from Zip application.

And then we apply the software. And the first rule of testing is that when we reject the SD-Go, it means that there are at least two different groups that have a mean density difference. So now we go to the bandwidth test. We would like to find a quick couple of answers. And here, they will apply the ANTOVA test.

I have worked with the temperature and we have data for the minimum, daily minimum and daily maximum given one year and group visits in Vietnam over 30 years. So it means that the data point from our data, I mean for one observation, we have 365 or 366 data point value. And so total barriers, total observation is 1,890 observations. And in this paper,

We have six users in Vietnam. However, in this paper, we only focus on the maximum temperature because we would like to focus on the global warming. And in order to apply for the minimum temperature, it is quite straightforward because we just apply our strategy.

That one slide is about how to work with the temperature data. So when I receive the raw data, the temperature, I'm playing with it. It was edited from minus 5 to 35 during 30 years in Vietnam, but however due to the extreme cold, very cold and very hot in Quang Ngai, and then we exert the interval until only from 12 to 14.

So you may ask, from high to low is quite fast.

People say that in Vietnam, I mean in the North, the trough temperature will never happen in Saigon.

When the temperature is down to 12°C, only children at kindergarten will stay at home. And when the temperature is up to 10°C, all the private schools will close. So it means that there are even 10 or 12 children. In Vietnam, it is extremely cold due to the humidity. I need to wear a jacket for 40 or 45 minutes. It is already hot.

Yes, so that is the way we deal with the data. That is the one observation. We have the data for one year, in one groovy, and then we obtain the density, the histogram, and then we transform by the CLR. And we do it for all the observations in our data.

So, thanks for the work. So, Batalovaya, they have put some effort how to control the density here to avoid over-heating or under-heating. And for that part, it's not the focus of the paper, and then we use wood in the kitchen to get heat.

One point about the temperature data is that usually when we go with the temperature, the later uploads, people will talk about the time series of data. The time series when we talk about the data. The temperature data. So if we consider that it is a time series of temperature, this means that we focus on this kind of horizontal line.

And then we focus on the true line here. But now we ignore the date. We put all the data together. And we focus on the data, the value. It means that we focus on the temperature here, the distribution here. Instead of the function here, we focus on the density.

And too much density, because we have a lot of density of even one year of working, we can have what is called, you can view our data like the time series of density. So that is another way to think about the temperature. It's not a time series, it's a time series of density.

I would like to emphasize on that, because It's so difficult for us to publish the first paper. Because when we submit that paper to the journal, I mean it's not the Mathematical Journal, that is an application, because I am mainly a publication. And all of the authors and all of the researchers,

Why did you not change the date? But the view is quite different. For us, consider the density here. It's very nice. Actually, when we have the huge data, I mean for 30 years and 63 visits in Vietnam, So that is our visual lab. The next question is how we can take into account the time.

So, by doing that, we propose to do a symbol-linear model, basically, like in Formula 14. So here, we can see that the form of the formula is a symbol-linear regression, while the alpha is the line in the shape, and the beta here is the slope, but it's not one value, it's slope density.

And we can come and we fix it like a regression. In order to come and to fix it, we can follow the work of Pascal. And actually the idea is quite simple, since the model in D2 will be transformed in the Yang-Zhu, they go home and fit the model in the Yang-Zhu and then go back to the baseline.

Yeah, so we fit this kind of model for one reason, I mean for each group, each group here. And once we ignore the group, we can ignore the z-index, and then we fit a model for the group data. Once we obtain the beta, the slope density, if the slope density follows the uniform distribution, this means that there is no change of evolution in the temperature. So the different density is uniform.

So that is the way of the data we obtain here. We have the map of Vietnam. The color corresponds to exclusion. And then we obtain the density of the temperature. And we estimate the linear model. And then we obtain the slope and B density in the figure here.

And after that, when you apply for...

For the disheveled armoire, we will take in the sloppy nesting. We will not put it here.

We work on the slow velocity because that's the way we take into account the time already. The fourth question is asked, is there any change in the temporal evolution of temperature in Vietnam? The answer is yes, because after 30 years, The temperature has changed and we see that from the finger here, if we take the dotted line, the uniform distribution, the columbine chains are quite different from the...

It is also supported by two tests for punctuation. We repeat this kind of test here for...

And we also obtained that in all regions, there is some kind of evolution in temperatures. Except for the Central Highlands. This result is not quite satisfying because I read a lot of papers about the annual in Central Highlands about the year 2016. But our data, I mean our tests and data do not support that. It may come from the government.

For example, we only work until, I mean the data is only until 2016. And that's the time, yeah, maybe it happened after or around 2016 and we do not have CKF data.

Let's come to the global test. We would like to see whether the temperature in Vietnam is different. It is supported by all of the fans with very high demand.

After we have this, we can now go to the pairwise test to see which couple are different. With this kind of table, we cannot see anything. So we come up with this kind of diagram. Here there are two kinds of lines. When there is a solid line, it means that no test results have been tested. And a dotted line means that there are some test results and there is no test result.

Where the action of the mice meets their own tests, results in a new hypothesis.

And here, that is a citizen in Vietnam and we see that the true reason why this thing is an RRD, the reason for the capital Hanoi. And NBR, that may be the place where you enjoy for the teaching. One thing is only collect by one dotted line and the other you collect with three dotted lines.

Yes, so we see that the temperature in Mekong Delta level is quite different from the other region, while the red line is the same. I think that idea came to me three months ago when my co-author was quizzing someone

She has a conference and she heard something about the work of Mayo and the idea that we can apply the off-ratio in terms of functional. We can have a result like this, for example, we can explain that in Vietnam, there is a To the failure of the study, the frequency of temperature within the interval, I mean the low temperature, has decreased and it compensates to the higher temperature. And in order to get this kind of interval, we have run several kinds of optimal ones.

We also have a sub-formula for one reason here, but I will not forget here. So that is one summary from the beginning. I said to you that I am cooking in my second PSP in Mathematica. And actually it comes up with three papers, like the normal thesis.

We start with the first paper, where we apply still the context, the density, the base space, like I explained to you, and I explained the impact of... temperature on the rice in Vietnam. And the second level is that through that kind of basic thing, we apply the ICI to detect the operator in terms of density.

This is the third study. I hope all three papers here will find a home and I will finish the second PSP. In conclusion, for this kind of work, I have applied the... I mean, we have adapted the text for the entity. From the functional data, we adapt to density and we apply for Vietnam. We detect the true value. The true result is that there is some gap.

The temperature in Vietnam is quite different. For this paper, you can find the quotes written and everything from the hip-hop company.

We have gathered last year in Viasm to start and to work on this paper. Our team really appreciates the support from Viasm. Thank you very much.

Summary

Research Approach

The research focuses on analyzing the entire temperature distribution in Vietnam over a one-year period
Current methodology follows Martin Aubin's work, requiring a pre-setting step to get density from data
The team aims to eliminate this intermediary step but hasn't yet found a solution
The analysis examines data in five-year intervals to consider climate trends
The approach uses scalar-on-density regression or density-on-density regression methods

Climate Change Concerns

Vietnam is considered the fourth most impacted country by climate change globally
The Mekong Delta faces significant challenges from upstream hydroelectric power construction
Issues include water scarcity, soil erosion, seawater pollution, and limited fishing capabilities
BCF (Biochar-Compost Fertilizer) was mentioned as important for reducing carbon emissions in rice production

Methodology Discussion

Questions raised about choosing density as the metric for analysis:
The presenter clarified this is an ANOVA distribution test, not regression analysis
A suggestion was made to consider hierarchical modeling for grouped data analysis

Action Items

[ ] Continue work on eliminating the pre-setting step in the density calculation process
[ ] Peter to explore hierarchical modeling as suggested for the grouped data approach
[ ] Consider taking detailed discussions about regression vs. ANOVA approaches offline

Notes

Transcript