The purpose of this article is to
prove the accuracy of the simulation model made by the combination of Pascal’s
Triangular method and Monte Carlo Simulation model included in the article of
“Application of Pascal’s Triangular plus Monte Carlo Analysis to Appraise the
Wisdom of Crowds” and posted on link: http://emfps.blogspot.com/2012/05/application-of-pascals-triangular-plus_08.html
in which I am willing to prove this simulation model as well as works. The most
important findings are to find the least squares fitting and linear function
for a limited area (XY) and so to obtain the point which has the deviation
approximately equal to zero where we have the infinity points on the plane of
XY. I think that we can also generate the best numerical solution for 3D
Laplace Equation and so we will be able to find
the best fitting plane given a set of N points in a 3D space by using of this
simulation model.
Nowadays, all fields of engineering
and science such as Social science, medicine, finance, strategic management and
so on are using the method of the Last Squares Fitting. One of the most
important applications of two variables function’s minimizing is to utilize the
least squares method for fitting of direct line (y = mx + b).
Assume you have infinity points (x1, y1), (x2, y2),……(xn, yn) that you have found them by your observation or survey
questionnaire in which all points are limited into the area of x = a and y = b
and “n” reach to infinity. There are two amounts of “y” for each observed
amount of “x” (xobs) which are the observed amount of “y” (yobs) and the amount
of “y” that it is predicted by linear function of y = mxobs + b where we
have:
Predicted amount – observed amount =
(mxobs + b) - yobs
We call this amount a deviation (d)
in which there are the infinity deviations of “d1, d2,…. dn” as follows:
Set (A) = [(d1= (mx1 + b) – y1), (d2= (mx2 + b) – y2)……… (dn= (mxn + b) – yn)] and “n” reach to infinity.
SUM (d^2) = d1^2 + d2^2+…..+ dn^2 (1)
Or
SUM (d^2) = [(mx1 + b) – y1]^2 + [(mx2 + b) – y2]^2+…+ (mxn + b) – yn]^2 (2)
Methodology
The methodology is the same with my
previous article.
Note: “Since I want to prove my simulation model in this article,
I do not explain more about it.
Therefore, please read carefully my article of “Application of Pascal’s
Triangular plus Monte Carlo Analysis to Appraise the Wisdom of Crowds” and posted
on link: http://emfps.blogspot.com/2012/05/application-of-pascals-triangular-plus_08.html
before the continuation of this article.”
Finding
and discussion
Let me start my discussion by using
an example as follows:
Assume we want to get the best
fitting line among infinity points into a rectangular area where:
x = 0 to x = 4 and y = 1 to y = 5
x = 0 to x = 4 and y = 1 to y = 5
I apply all steps mentioned in my
previous article in which I want to compare the final result between 200 trials
(n = 200) and 1000 trials (n =1000) as follows:
1) For n = 200, we will have below
algorithm to find all outcomes:
n
|
200
|
Number of outcomes of "x"
|
201
|
Range of "X"
|
201
|
Amount of "X"(min)
|
0
|
Amount of "X" (max)
|
4
|
max - min
|
4
|
delta
|
0.02
|
n
|
200
|
Number of outcomes of "y"
|
201
|
Range of "Y"
|
201
|
Amount of "Y"(min)
|
1
|
Amount of "Y" (max)
|
5
|
max - min
|
4
|
delta
|
0.02
|
Therefore, the outcomes for “x” will
be started by 0 then 0.02 and will be finished by 4 and so the outcomes for “y”
will be started by 1 then 1.02 and will be finished by 5.
After all steps, we will have below
Cut – offs (accumulative probability) for “x” and “y” as follows:
Cut Offs
|
x
|
0
|
0.02
|
7.72E-45
|
1.68
|
0.01
|
1.82
|
0.1
|
1.96
|
0.4
|
2.18
|
0.9
|
3.08
|
1
|
4
|
Cut Offs
|
y
|
0
|
1.02
|
7.72E-45
|
2.68
|
0.01
|
2.82
|
0.1
|
2.96
|
0.4
|
3.18
|
0.9
|
4.08
|
1
|
5
|
For Least Squares Fitting, I used
RAND formula accompanied by VLOOKUP for “x” and “y” simultaneously where we
will have 201 cells on row filled by them.
Then, I applied below algorithm to
calculate “SUM (d^2):
(xi)^2
|
SUM (xi)
|
SUM(xi)^2
|
SUM (yi)
|
xi * yi
|
SUM(xi*yi)
|
m
|
b
|
d0
|
d0^2
|
SUM(d^2)
|
Where I calculated “m” and “b” by
using of below formulas:
[(SUM(xi)^2) *m ] + [(SUM (xi)) *b =
SUM(xi*yi)
[(SUM (xi)) * m] + (n*b) = SUM (yi)
To calculate “d0”, I utilized from below formula:
d0 = ((m*x0) + b) – y0
The final results of two – ways
sensitivity analysis were as follows:
b
|
|
-
|
3.167558
|
Ave
|
3.166511
|
STDEV
|
0.159445
|
AVE
|
0.158023
|
CV
|
0.049904
|
m
|
|
-
|
0.002501
|
AVE
|
0.000201
|
STDEV
|
0.073112
|
AVE
|
0.07178
|
SUM
(d^2)
|
||
-
|
22.8858892
|
|
AVE
|
22.88723508
|
|
STDEV
|
1.721733542
|
|
AVE
|
1.665178887
|
|
CV
|
0.072755791
|
|
Finally, I obtained below point
which generated the least SUM (d^2) among all points:
“x” for min[SUM (d^2)] = 1.96
“y” for min[SUM (d^2)] = 3.18
2) For n = 1000, we will have below
algorithm to find all outcomes:
n
|
1000
|
Number of outcomes of "X"
|
1001
|
Range of "X"
|
1001
|
Amount of "X"(min)
|
0
|
Amount of "X" (max)
|
4
|
max - min
|
4
|
delta
|
0.004
|
n
|
1000
|
Number of outcomes of "Y"
|
1001
|
Range of "Y"
|
1001
|
Amount of "Y"(min)
|
1
|
Amount of "Y" (max)
|
5
|
max - min
|
4
|
delta
|
0.004
|
Therefore, the outcomes for “x” will
be started by 0 then 0.004 and will be finished by 4 and so the outcomes for
“y” will be started by 1 then 1.004 and will be finished by 5.
After all steps, we will have below
Cut – offs (accumulative probability) for “x” and “y” as follows:
Cut Offs
|
x
|
0
|
0.004
|
6.6E-219
|
1.852
|
0.01
|
1.92
|
0.1
|
1.984
|
0.4
|
2.08
|
0.9
|
2.496
|
1
|
1
|
Cut Offs
|
y
|
0
|
1.004
|
6.6E-219
|
2.852
|
0.01
|
2.92
|
0.1
|
2.984
|
0.4
|
3.08
|
0.9
|
3.496
|
1
|
5
|
For Least Squares Fitting, I used
RAND formula accompanied by VLOOKUP for “x” and “y” simultaneously where we
will have 1001 cells on row filled by them.
Then, I applied the same algorithm
above mentioned to calculate “SUM (d^2) where the final results were as
follows:
b
|
|
-
|
3.074636
|
AVE
|
3.076022
|
STDEV
|
0.06411
|
AVE
|
0.066586
|
m
|
|
-
|
0.002325
|
AVE
|
0.000143
|
STDEV
|
0.031485
|
AVE
|
0.03185
|
SUM(d^2)
|
|
ave
|
21.87308
|
AVE
|
21.74568
|
STDEV
|
3.527124
|
AVE
|
3.45824
|
CV
|
0.159031
|
Finally, I obtained below point
which generated the least SUM (d^2) among all points:
“x” for min[SUM (d^2)] = 2.08
“y” for min[SUM (d^2)] = 3.08
How can we
prove above results?
There are two ways to prove above
results and finally to be proved this simulation model as follows:
A) First way: Referring to formula (2), we have:
SUM (d^2) = [(mx1 + b) – y1]^2 + [(mx2 + b) – y2]^2+……+ (mxn + b) – yn]^2
Let me consider below function:
f (x, y) = [(mx + b) – y]^2
Total sum of this curve on below area
will be equal double integral f (x,y) as follows:
x = 0 to x = 4 and y = 1 to y = 5
x = 0 to x = 4 and y = 1 to y = 5
I made an algorithm on my excel
spreadsheet to solve this double integral as follows:
1) For n = 200, we will have:
p1
|
1
|
p2
|
5
|
q1
|
0
|
q2
|
4
|
m
|
0.002501
|
b
|
3.167558
|
Delta (p)
|
4
|
Delta (q)
|
4
|
(p1)^3
|
1
|
(p2)^3
|
125
|
(q1)^3
|
0
|
(q2)^3
|
64
|
(p1)^2
|
1
|
(p2)^2
|
25
|
(q1)^2
|
0
|
(q2)^2
|
16
|
m^2
|
6.25E-06
|
b^2
|
10.03342
|
f1
|
0.000534
|
f2
|
160.5347
|
f3
|
165.3333
|
f4
|
0.507006
|
f5
|
-0.48019
|
f6
|
-304.086
|
SUM(d^2)
|
21.8099
|
2) For n = 1000, we will have:
p1
|
1
|
p2
|
5
|
q1
|
0
|
q2
|
4
|
m
|
0.000143
|
b
|
3.076022
|
Delta (p)
|
4
|
Delta (q)
|
4
|
(p1)^3
|
1
|
(p2)^3
|
125
|
(q1)^3
|
0
|
(q2)^3
|
64
|
(p1)^2
|
1
|
(p2)^2
|
25
|
(q1)^2
|
0
|
(q2)^2
|
16
|
m^2
|
2.04E-08
|
b^2
|
9.461912
|
f1
|
1.74E-06
|
f2
|
151.3906
|
f3
|
165.3333
|
f4
|
0.028133
|
f5
|
-0.02744
|
f6
|
-295.298
|
SUM(d^2)
|
21.4265
|
By using of try and error around m =
0.000143 and b = 3.076022, we can find the least squares fitting by calculating double
integral where we will have:
m = 0
b = 3
p1
|
1
|
p2
|
5
|
q1
|
0
|
q2
|
4
|
m
|
0
|
b
|
3
|
Delta (p)
|
4
|
Delta (q)
|
4
|
(p1)^3
|
1
|
(p2)^3
|
125
|
(q1)^3
|
0
|
(q2)^3
|
64
|
(p1)^2
|
1
|
(p2)^2
|
25
|
(q1)^2
|
0
|
(q2)^2
|
16
|
m^2
|
0
|
b^2
|
9
|
f1
|
0
|
f2
|
144
|
f3
|
165.3333
|
f4
|
0
|
f5
|
0
|
f6
|
-288
|
SUM(d^2)
|
21.33333
|
You can see that
1000 trials (n = 1000) is closer to the result of double integral. (Please
check red colour)
B) Second way: Referring to formula (2), we have:
SUM (d^2) = [(mx1 + b) – y1]^2 + [(mx2 + b) – y2]^2+……+ (mxn + b) – yn]^2
Let me consider below function for
deviation square of each random point:
f (x, y) = [(mx + b) – y]^2
If we want to have the least amount of f (x, y), we should simultaneously solve below equations:
I replaced “m”, “b”, and “x” for
min[SUM (d^2)] and “y” for min[SUM
(d^2)] in each trial into above formulas
and I calculated the final result by using of excel spreadsheet as follows:
v n = 200 trials:
n
|
200
|
|
x
|
1.96
|
1.96
|
y
|
3.18
|
3.18
|
d1
|
1.59E-07
|
6.36
|
d2
|
0.001275
|
-0.00079
|
d3
|
-0.00128
|
-6.33302
|
SUM
|
-5.3E-06
|
0.026189
|
v n = 1000 trials
n
|
1000
|
|
x
|
2.08
|
2.08
|
y
|
3.08
|
3.08
|
d1
|
8.5E-08
|
6.16
|
d2
|
0.000879
|
-0.00059
|
d3
|
-0.00088
|
-6.15204
|
SUM
|
-1.1E-06
|
0.007361
|
You
can see that the final results for n = 1000 trials are very closer to zero.
Note:
“All spreadsheets and calculation notes are available. The people, who are
interested in having my spreadsheets of this method as a template for further
practice, do not hesitate to ask me by sending an email to: soleimani_gh@hotmail.com or call
me on my cellphone: +989109250225. Please be informed these spreadsheets are
not free of charge.”