mmcdara

6149 Reputation

17 Badges

9 years, 62 days

MaplePrimes Activity


These are Posts that have been published by mmcdara

Recently I came back on the general problem of drawing the syntactic graph of a mathematical expression.
Probably some of you have already done this as students for it is a classic when you learn recursive procedures, chained lists or graphs.

I wasn't interested in doing this with Maple, because Maple had already done  a part of the job thanks to the procedure ToInert.
More of this, the package GraphTheory seemed to possess all the required features to obtain quickly this syntactic graph.
Nevertheless it took me a lot of time to fix (almost all) the problems.
The issues are mainly of two orders:

  1. ToInert is very verbose: a necessary feature when you want to have a non ambiguous syntax of an expression, but partly useless for simple visualization.
    Here is an example
    ToInert(f(x))
    _Inert_FUNCTION(_Inert_NAME("f"), _Inert_EXPSEQ(_Inert_NAME("x")))

     

  2. GraphTheory 
    Once the inert form of the expression is known, it is necessary to put it in a form that can be manipulated by the procedures of the GraphTheory package.
    More precisely one needs to transform this inert form into a set of lists [a, b], where a and b are two neighboring vertices of the syntactic graph and [a, b] the directed arc from a to b.
    As the syntactic graph is a tree, this implies using edges {a, b} instead of arcs [a, b].
    The problem is that some operators are commutative while others are not: for the latter this means that the edges and vertices on the syntactic graph must appear in an order that respects the non-commutativity.
    Here his a toy example where I manually buid the syntactic graphs of a/b and b/a: the two graphs are identical and this comes from the fact that edges in Graph( edges )  must be a set, thus an ordered structure whose order doesn't care about non-commutativity.

    restart:
    with(GraphTheory):
    # The first is aimed to represent the expression a/b
    # while the second is aimed to represent the expression b/a
    Gdiv := Graph({{"/", "a"}, {"/", "b"}}):
    g1 := DrawGraph(Gdiv, style=tree, root="/", title=a/b):
    
    Gdiv := Graph({{"/", "b"}, {"/", "a"}}):
    g2 :=DrawGraph(Gdiv, style=tree, root="/", title=b/a):
    
    plots:-display(<g1 | g2>)
    

    Download ab_ba.mw

     

After several attempts, I decided to discard the GraphTheory package, that is to deprive myself of all the interesting features one needs to manipulate a graph.

The result is given on the attached file (... and the content of the worksheet can't be loaded as usual).

Download Syntactic_Graph.mw

Here is an example


Twelve test cases are given, all the corresponding syntactic graphs are correct, but one of them (test case iexpr=1) seems incorrect because the right child of a parent P is located to the right of the left child of a parent P', even though P' is to the right of P.
This could be corrected by modifying the way the posiitons are computed in procedure Place.


PS : It doesn't seem that Maple has a built-in procedure to construct the syntactic graph of a mathematical expression.
But maybe I'm wrong?


 

 

Hi,
This post is inspired by a recent question maple least square fit error... where the OP was simulating what appeared to be a stochastic process known as the Drunkard's walk (see for instance The_Drunkard's_Walk).


In the case of the PO, the drunk took a step forward or a step backward (say along a narrow, long corridor) with equal probabilities. In addition one assumes that the step the drunkhard takes is independent of all the steps he did before.
His move is what is called a (1D) Random_walk

This little application based on MAPLETS (ok, I know that some people see them as old-fashioned technology).

It draws a sequence of several drunkard's walks, all of identical number of steps, and interactively plots the current histogram of the arrival point (the point where he is at the end of his walk -which should be the door if the same pub he started from if he is an inveterate drunkhard or if he knows a little about statistics- ).

The code contains 2 procedures :

  • f_step_by_step (n, Discrete=false/true) 
    n : number of steps
    Discrete = false (default value) plot the histogram of the arrivals point as if these points were realizarions of a continuous random variable
    In this simple model these arrivals can take only integer values between -n and +n included; the Discrete=true option is recommended but it takes more walks for it to converge to the asymptotic distribution (see below).

    Once launched, f_step_by_step opens a maplet containing a Plotter and 2 buttons. A first walk is displayed, clic the "Plot" button to draw another and repeat the operation as many times as you want.
     
  • f_automatic (n, m, Discrete=false/true) 
    d and Discrete both have the same meaning than for f_step_by_step.
    m is the number of random walk you want to draw
    The code is set to draw 1000 walks of 1000 steps ; this correspong roughly to 250 Mb of memory used.

    f_automatic contains a call to Threads:-Sleep to delay the display, the argument of Sleep is set to 0.25 second and must be modified within the procedure (its value could be passed to f_automatic as an argument).

    In my opinion it is the more interesting of the two procedures.
     

The values of the current mean and standard deviation are displayed as title.

The purpose is before all educative and can be seen as an illustration of the (one of) Central Limit Theorem(s) (CLT)

A little bit of theory:
Let X[n] the position of the drunkhard after n steps; his position X[n+1] is either X[n]-1 or X[n]+1 with equal probabilities.
The displacement X[n+1]-X[n] is a discrete random variable S with outcomes -1 and +1 and it's easy to find its variance is equal to 1.
The position of the trunkard after n steps is just a realization of the n independant and identically distributed, random variables S1, ...Sn whose distribibution is equal to the one of S.
Thus :

  • Expectation (S1 + ... +Sn)  = 0 
  • Variance (S1 + ... +Sn)  = n 
    For n=1000 steps, the standard deviation of the arrivals is about 31.6)

CLT says that the distribution of S1 + ... +Sn  tends to a Gaussian distribution as n tends to infinity.

What is the exact distribution of the arrivals?
Another way to represent S is to write S = 2*B-1 where B is a Bernoulli random variable with parameter 1/2. The random variable "Arrival" is  twice the sum of N indpendent rabdom variables such like S 
and thus its distribution is 2*Binomial(N, 1/2)-N.

What is the rate of convergence of the histogram to the true probability function?
For a sample of size N drawn from a continuous random variables, its histogram has:

  • a bias error of order 1/K (K being the number of bins)
  • a Linfinity error of order  K*sqrt( log(K) / M )  (M number of drunkhard's walks)
    see for instance Lec2_density.pdf

Using the option Discrete=true corresponds to the choice K=2*N+1, in this case the choice with the highest Linfinity error (the larger K the smaller the bias but the larger the  Linfinity error).
This is the reason I introduced the possibility to graw histograms and bar (column) graphs: for the same value of M the Linfinity error of the histogram (for instance with the default number of bins Maple uses) is nuch slower than the one of the comumn graph.



A few "internal" parameters.
I already spoke about the delay to display txo succesive walks.
Other parameters could be:

  • The value of minbins in the case discrete=false (default)
    This value is fixed to 2*sqrt(M).
     
  • The width in the "view" option of the plot: it's left part displays the drunkhard's walk and it's right one the histogram of the arrivals (after a rotation of -Pi/2). 
    This value si fixed to 5/4*M.
    Note that the histogram is dynamically rescaled in order it's height is always 1/4*number_of_steps.
     
  • The height of the view option is set to -Q..Q where Q is equal to 4 standard deviations of the theoritical distribution of the arrivals.
    The continuous envelope of this distribution is red plotted in red (its height is normalized to M/4, see above). 
    One can show this standard deviation iverifies sqrt(M).
    In my opinion using a full vertical scale (-M..M) doesn't give pretty drunkward's walkes because they seem to more concentrated around the value 0.
     



WATCHOUT
Starting from 0 any walk with an odd number of steps will give an odd arrival and any walk with an even number of steps will give an even arrival. Thus the exact number of outcomes for the arrivals are:

  • 2*n if n is odd
  • 2*n-1 if n is even
     


Other application
This maplet can be used as illustration of the Galton Board (also known as the Bean_machine)


Why using maplets?
Another solution could have been to use animate. But to draw the M drunkard's walks, you would have had to use M frames. An excessive task that I'm not even sure Maple would have been able to handle.
I guess that imbeded components could do the job too, but I'm not as comfortable with them as I am with maplets.

To illustrate what the code does an image of the final result is given below.

Drunkard_walk.mw

A fascinating race is presently running (even if the latest results seem  to have put an end to it).
I'm talking of course about the US presidential elections.

My purpose is not to do politics but to discuss of a point of detail that really left me puzzled: the possibility of an electoral college tie.
I guess that this possibility seems as an aberration for a lot of people living in democratic countries. Just because almost everywhere at World electoral colleges contain an odd number of members to avoid such a situation!

So strange a situation that I did a few things to pass the time (of course with the earphones on the head so I don't miss a thing).
This is done with Maple 2015 and I believe that the amazing Iterator package (that I can't use thanks to the teleworking :-( ) could be used to do much more interesting things.

 

restart:

with(Statistics):

ElectoralCollege := Matrix(51, 2, [

Alabama,        9,        Kentucky,        8,        North_Dakota,        3,

Alaska,        3,        Louisiana,        8,        Ohio,        18,

Arizona,        11,        Maine,        4,        Oklahoma,        7,

Arkansas,        6,        Maryland,        10,        Oregon,        7,

California,        55,        Massachusetts,        11,        Pennsylvania,        20,

Colorado,        9,        Michigan,        16,        Rhode_Island,        4,

Connecticut,        7,        Minnesota,        10,        South_Carolina,        9,

Delaware,        3,        Mississippi,        6,        South_Dakota,        3,

District_of_Columbia,        3,        Missouri,        10,        Tennessee,        11,

Florida,        29,        Montana,        3,        Texas,        38,

Georgia,        16,        Nebraska,        5,        Utah,        6,

Hawaii,        4,        Nevada,        6,        Vermont,        3,

Idaho,        4,        New_Hampshire,        4,        Virginia,        13,

Illinois,        20,        New_Jersey,        14,        Washington,        12,

Indiana,        11,        New_Mexico,        5,        West_Virginia,        5,

Iowa,        6,        New_York,        29,        Wisconsin,        10,

Kansas,        6,        North_Carolina,        15,        Wyoming,        3
]):
 

ElectoralCollege := Vector(4, {(1) = ` 51 x 2 `*Matrix, (2) = `Data Type: `*anything, (3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})

(1)

add(ElectoralCollege[..,2]):
tie := %/2;

269

(2)

ec := convert(ElectoralCollege, listlist):

# Sets of states that form an electoral college tie

R      := 10^5:
nbties := 0:
states := NULL:
for r from 1 to R do
  poll  := combinat:-randperm(ec):
  cpoll := CumulativeSum(op~(2, poll)):
  if tie in cpoll then
    nbties := nbties+1;
    place  := ListTools:-Search(tie, cpoll);
    states := states, op~(1, poll)[1..place]:   # see below
  end if:
end do:

# electoral college tie is not so rare an event
# (prob of occurrence about 9.4 %).
#
# Why the hell the US constitution did not decide to have an odd
# number or electors to avoid ths kind of situation instead of
# introducing a complex mechanism when tie appears????

nbties;
evalf(nbties/R);

states := [states]:

9397

 

0.9397000000e-1

(3)

# What states participate to the tie?

names := sort(ElectoralCollege[..,1]):

all_states_in_ties := [op(op~(states))]:

howoften := Vector(
                    51,
                    i -> ListTools:-Occurrences(names[i], all_states_in_ties)
            ):

ScatterPlot(Vector(51, i->i), howoften);

 

# All the states seem to appear equally likely in an electoral college tie.
# Why? Does someone have a guess?
#
# The reason is obvious, as each state must appear in the basket of a candidate,
# then in case of a tie each state is either in op~(1, poll)[1..place] (candidate 1)
# or either in op~(1, poll)[place+1..51] (candidate 2);
# So, as we obtained 9397 ties, each states appears exactly 9397 times (with
# different occurences in the baskets of candidate 1 and 2).

 

# Lengths of the configurations that lead to a tie.
#
# Pleas refer to the answer above to understand why Histogram(lengths) should be
# symmetric.
lengths := map(i -> numelems(states[i]), [$1..nbties]):
sort(Tally(lengths))

[14 = 1, 15 = 2, 16 = 7, 17 = 36, 18 = 78, 19 = 179, 20 = 341, 21 = 507, 22 = 652, 23 = 849, 24 = 1015, 25 = 1041, 26 = 1056, 27 = 997, 28 = 862, 29 = 657, 30 = 515, 31 = 300, 32 = 158, 33 = 95, 34 = 41, 35 = 6, 36 = 2]

(4)

Histogram(lengths, range=min(lengths)..max(lengths), discrete=true)

 

ShortestConfigurations := map(i -> if lengths[i]=min(lengths) then states[i] end if, [$1..nbties]):
print~(ShortestConfigurations):

[New_York, Wisconsin, Illinois, Kentucky, Florida, New_Jersey, Mississippi, Indiana, Virginia, Maryland, California, Massachusetts, North_Carolina, Texas]

(5)

LargestConfigurations := map(i -> if lengths[i]=max(lengths) then states[i] end if, [$1..nbties]):
print~(LargestConfigurations):

[Alaska, Tennessee, North_Carolina, South_Carolina, District_of_Columbia, Colorado, Minnesota, Georgia, South_Dakota, New_Hampshire, Wyoming, Ohio, Rhode_Island, Arizona, Delaware, Montana, West_Virginia, Vermont, Michigan, Kentucky, Louisiana, Arkansas, Maine, Missouri, New_Mexico, Virginia, Maryland, Oregon, Wisconsin, Iowa, Kansas, Connecticut, North_Dakota, Nevada, Hawaii, Oklahoma]

 

[West_Virginia, Maryland, Massachusetts, Colorado, South_Dakota, Kentucky, Kansas, Wyoming, North_Dakota, Indiana, Michigan, Utah, Louisiana, Ohio, Alabama, Nebraska, Connecticut, Illinois, Oklahoma, Alaska, New_Jersey, District_of_Columbia, Oregon, Nevada, Missouri, Delaware, Washington, New_Hampshire, Arizona, Maine, South_Carolina, Hawaii, Vermont, Montana, Rhode_Island, Idaho]

(6)

# What could be the largest composition of a basket in case of a tie?
# (shortest composition is the complementary of the largest one)

ecs   := sort(ec, key=(x-> x[2]));
csecs := CumulativeSum(op~(2, ecs)):

# Where would the break locate?

tieloc := ListTools:-BinaryPlace(csecs, tie);

csecs[tieloc..tieloc+1]

[[North_Dakota, 3], [Alaska, 3], [Delaware, 3], [South_Dakota, 3], [District_of_Columbia, 3], [Montana, 3], [Vermont, 3], [Wyoming, 3], [Maine, 4], [Rhode_Island, 4], [Hawaii, 4], [Idaho, 4], [New_Hampshire, 4], [Nebraska, 5], [New_Mexico, 5], [West_Virginia, 5], [Arkansas, 6], [Mississippi, 6], [Utah, 6], [Nevada, 6], [Iowa, 6], [Kansas, 6], [Oklahoma, 7], [Oregon, 7], [Connecticut, 7], [Kentucky, 8], [Louisiana, 8], [Alabama, 9], [Colorado, 9], [South_Carolina, 9], [Maryland, 10], [Minnesota, 10], [Missouri, 10], [Wisconsin, 10], [Arizona, 11], [Massachusetts, 11], [Tennessee, 11], [Indiana, 11], [Washington, 12], [Virginia, 13], [New_Jersey, 14], [North_Carolina, 15], [Michigan, 16], [Georgia, 16], [Ohio, 18], [Pennsylvania, 20], [Illinois, 20], [Florida, 29], [New_York, 29], [Texas, 38], [California, 55]]

 

40

 

Array(%id = 18446744078888202358)

(7)

# This 40  states coniguration is not a tie.
#
# But list all the states in basket of candidate 1 and look to the 41th state (which is
# in the basket of candidate 2)

ecs[1..tieloc];
print():
ecs[tieloc+1]

[[North_Dakota, 3], [Alaska, 3], [Delaware, 3], [South_Dakota, 3], [District_of_Columbia, 3], [Montana, 3], [Vermont, 3], [Wyoming, 3], [Maine, 4], [Rhode_Island, 4], [Hawaii, 4], [Idaho, 4], [New_Hampshire, 4], [Nebraska, 5], [New_Mexico, 5], [West_Virginia, 5], [Arkansas, 6], [Mississippi, 6], [Utah, 6], [Nevada, 6], [Iowa, 6], [Kansas, 6], [Oklahoma, 7], [Oregon, 7], [Connecticut, 7], [Kentucky, 8], [Louisiana, 8], [Alabama, 9], [Colorado, 9], [South_Carolina, 9], [Maryland, 10], [Minnesota, 10], [Missouri, 10], [Wisconsin, 10], [Arizona, 11], [Massachusetts, 11], [Tennessee, 11], [Indiana, 11], [Washington, 12], [Virginia, 13]]

 

 

[New_Jersey, 14]

(8)

# It appears that exchanging Virginia and New_Jersey increases by 1 unit the college of candidate 1
# and produces a tie.

LargestBasketEver := [ ecs[1..tieloc-1][], ecs[tieloc+1] ];

add(op~(2, LargestBasketEver))

[[North_Dakota, 3], [Alaska, 3], [Delaware, 3], [South_Dakota, 3], [District_of_Columbia, 3], [Montana, 3], [Vermont, 3], [Wyoming, 3], [Maine, 4], [Rhode_Island, 4], [Hawaii, 4], [Idaho, 4], [New_Hampshire, 4], [Nebraska, 5], [New_Mexico, 5], [West_Virginia, 5], [Arkansas, 6], [Mississippi, 6], [Utah, 6], [Nevada, 6], [Iowa, 6], [Kansas, 6], [Oklahoma, 7], [Oregon, 7], [Connecticut, 7], [Kentucky, 8], [Louisiana, 8], [Alabama, 9], [Colorado, 9], [South_Carolina, 9], [Maryland, 10], [Minnesota, 10], [Missouri, 10], [Wisconsin, 10], [Arizona, 11], [Massachusetts, 11], [Tennessee, 11], [Indiana, 11], [Washington, 12], [New_Jersey, 14]]

 

269

(9)

# The largest electoral college tie contains 40 states (the shortest 11)


 

Download ElectoralCollegeTie.mw

HI,
 

This post concerns the simulation of a physical system whose behavior is governed by ODEs.
It is likely that some people will think that all which follows is nothing but embellishments  or a waste of time.
And in some sense they will be right.
I was thinking the same until I received some sharp remarks at the occasion of a few presentations of my works. 
So experience has proven me that doing a presentation in front of project managers with only 2D curves often leads to smiles, not to speak about those who raise their eyes to heaven in front of the poverty of the slides. 
Tired of this attitude, I decided to replace these 2D curves with a short film, which of course does not reveal more than what these 2D curves were already revealing, but which is pretty enough for the financing keeps going on.

For those of you who might regret this situation, just consider this work as a demonstration of the capabilities of Maple in 3D rendering.


PS: all the display outputs have been removed to avoid loading an unnecessary huge file.
      The two last commands must be uncommented to play the animation.

 

Download ODE_Movie.mw

 

I found in the Application Center a quite old work (2010) titled Generation of correlated random numbers  (see here view.aspx).
This work contains a few errors that I thought it was worth correcting. 

Basically the works I refer to concern the sampling of linearly correlated random variables (or correlation in the Pearson sense). Classical textbooks about the subject generally discuss this topic by considering only gaussian random variables and present two methods to generate linearlycorrelated samples: one base on the Cholesky decomposition of the correlation matrix, the other based on its SVD decomposition.

Now the question is: can we apply any of these two procedures to generate linearly correlated samples of arbitrary random variables?
The answer is NO and the reason why it is so is strongly related to a fundamental property of gaussian random variables (GRV) that is that any linear combination of GRVs is still a GRV.
But things are not that simple because even the multi gaussian case handmed with Cholesky's decomposition or SVD can lead to undesired results if no precautions are taken.

The aim of this post is to show what are those wrong results we obtain by thoughtlessly applying these decompositiond and, of course, to show how we must proceed to avoid them.

Let's start by a very simple point of natural good sense: suppose U1 and U2 are two independant identically distributed (iid) random variables and that we have some "function" F which, when applied to the couple (U1, U2) generate a couple (A1, A2) of linearly correlated random variables. Thus F(U1, U2) = (A1, A2).
Let's suppose this same relation holds if we replace U1 and U2 by "a sample of U1" and "a sample of U2", and thus (A1, A2) by "a sample of the bivariate (A1, A2) whose components are linearly correlated". Let's call S the cloud one could obtain by using for instance the ScatterPlot(A1, A2) procedure of Maple.

Let's suppose now that instead of computing F(U1, U2) I decide to compute (U2, U1). Let's call (A1' , A2') the corresponding joint sample and write S' := ScatterPlot(A2', A1').
It seems natural (and it is!) to think that S and S' will be the same up to sampling artifacts. 

Any correct method to generate samples from (linearly or not) correlated random variables must verify this similarity of patterns between S and S' S and S'. But this is not the case in this work view.aspx.

The safer way to correlate, even in the Pearson's sense, random variables is to use the concept of COPULAS (there is a work on copulas in the Application Center, but for a quick overview see here Copula_(probability_theory)).
For this special case of linear correlation on can use copulas without knowing it, and this is very simple: as soon as our  procedure F introduced above gives correct results if U1 and U2 are standard GRVs,

  • take any couple (R1, R2) of arbitrary random variables,
  • build a map M(R1, R2) --> (U1, U2),
  • generate (A1, A2) = F(U1, U2),
  • compute M^(-1)(A1, A2)


What is the point of correcting a work that is 10 years old?
A very simple answer is that the Cholesky's decomposition (or SVD) is still the emblematic method to use for linearly correlating random variable. This is the only one presented in scholar textbooks, the only one a lot of students have been taught about (unless they have  they have had an extensive background in probability or statistics), and thus a systematic source of wrong results users are not even aware of.


Next point: it's well known that the Pearson's correlation cannot be lower than -1 or higher than +1, but this is common mistake to think any value between -1 and +1 can be reached.
This is guaranteed for GRVs, but  not for some other random variables.
For a classical counter-example see  04_correlation_2016_cost_symposium_fkuo_tagged.pdf 

The notation used in the attached file are mainly those used in the initial work  view.aspx.

restart:


This work is aimed to correct the procedure used in  https://fr.maplesoft.com/applications/view.aspx?SID=99806
to correlate arbitrary random variables in the (common) Pearson's sense.

with(LinearAlgebra):
with(plots):
with(Statistics):

 

GAUSSIAN RANDOM VARIABLES

 

# First example: both A1 and A2 are centered gaussian random variables
#                The order we use (A1 next A2 or A2 next A1) to define Ma doesn't matter

Y   := RandomVariable(Normal(0, .25)):
rho := .9:
Q   := 10^4:
A1  := Sample(Y, Q):
A2  := Sample(Y, Q):
Ma  := `<,>`(`<,>`(A1), `<,>`(A2)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:


opts := titlefont = [TIMES, BOLD, 12], symbol=point, transparency=0.5:
A1A2 := ScatterPlot(Column(Rs2, 1), Column(Rs2, 2), title = "Correlated Normal RV", opts, color=blue):



Ma  := `<,>`(`<,>`(A2), `<,>`(A1)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:

A2A1 := ScatterPlot(Column(Rs2, 2), Column(Rs2, 1), title = "Correlated Normal RV", opts, color=red):

display(A1A2, A2A1);

 

# Second example : both A1 and A2 are non-centered gaussian random variables with equal standard deviations.
#                  The order we use to define Ma does matter

Y   := RandomVariable(Normal(1, .25)):
rho := .9:
Q   := 10^4:
A1  := Sample(Y, Q):
A2  := Sample(Y, Q):
Ma  := `<,>`(`<,>`(A1), `<,>`(A2)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:


opts := titlefont = [TIMES, BOLD, 12], symbol=point, transparency=0.5:


A1A2 := ScatterPlot(Column(Rs2, 1), Column(Rs2, 2), title = "Correlated Normal RV", opts, color=blue):



Ma  := `<,>`(`<,>`(A2), `<,>`(A1)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:

A2A1 := ScatterPlot(Column(Rs2, 2), Column(Rs2, 1), title = "Correlated Normal RV", opts, color=red):

display(A1A2, A2A1);

 

 

# Second example corrected: to avoid order's dependency proceed this way
#    1/ center A1 and A2
#    2/ correlate the now centered rvs
#    3/ uncenter the couple of correlated rvs


C1  := convert(Scale(A1, scale=Mean), Vector[row]):
C2  := convert(Scale(A2, scale=Mean), Vector[row]):

Ma  := `<,>`(`<,>`(C1), `<,>`(C2)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:


opts := titlefont = [TIMES, BOLD, 12], symbol=point, transparency=0.5:


A1A2 := ScatterPlot(Column(Rs2, 1)+~Mean(A1), Column(Rs2, 2)+~Mean(A2), title = "Correlated Normal RV", opts, color=blue):



Ma  := `<,>`(`<,>`(C2), `<,>`(C1)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:

A2A1 := ScatterPlot(Column(Rs2, 2)+~Mean(A1), Column(Rs2, 1)+~Mean(A2), title = "Correlated Normal RV", opts, color=red):

display(A1A2, A2A1);

 

# Third example : both A1 and A2 are centered gaussian random variables with unequal standard deviations.
#                 The order we use to define Ma does matter

rho := .9:
Q   := 10^4:
A1  := Sample(Normal(0, 1), Q):
A2  := Sample(Normal(0, 2), Q):
Ma  := `<,>`(`<,>`(A1), `<,>`(A2)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:


opts := titlefont = [TIMES, BOLD, 12], symbol=point, transparency=0.5:


A1A2 := ScatterPlot(Column(Rs2, 1), Column(Rs2, 2), title = "Correlated Normal RV", opts, color=blue):



Ma  := `<,>`(`<,>`(A2), `<,>`(A1)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:

A2A1 := ScatterPlot(Column(Rs2, 2), Column(Rs2, 1), title = "Correlated Normal RV", opts, color=red):

display(A1A2, A2A1);

 

# Third example corrected: to avoid order's dependency proceed this way
#    1/ scale A1 and A2
#    2/ correlate the now scaled rvs
#    3/ unscale the couple of correlated rvs


C1  := A1 /~ StandardDeviation(A1):
C2  := A2 /~ StandardDeviation(A2):

Ma  := `<,>`(`<,>`(C1), `<,>`(C2)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:


opts := titlefont = [TIMES, BOLD, 12], symbol=point, transparency=0.5:


A1A2 := ScatterPlot(Column(Rs2, 1)*~StandardDeviation(A1), Column(Rs2, 2)*~StandardDeviation(A2), title = "Correlated Normal RV", opts, color=blue):



Ma  := `<,>`(`<,>`(C2), `<,>`(C1)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:

A2A1 := ScatterPlot(Column(Rs2, 2)*~StandardDeviation(A1), Column(Rs2, 1)*~StandardDeviation(A2), title = "Correlated Normal RV", opts, color=red):

display(A1A2, A2A1);

 

# More generally: to avoid order's dependency proceed this way
#    1/ transform A1 and A2 into standard gaussian random variables (mean and standard deviation scalings)
#    2/ correlate the now scaled rvs
#    3/ unscale the couple of correlated rvs

 

A MORE COMPLEX EXAMPLE:

NON GAUSSIAN RANDOM VARIABLES
(here two LogNormal rvs)

 

 

# Preliminary
#   the expectation (mean) of a LogNormal rv cannot be 0;
#   as a consequence it is expected that the order used to buid Ma will matter
#
# Proceed as Igor Hlivka did

 

Y   := RandomVariable(LogNormal(.5, .25)):
rho := .9:
Q   := 1000:
A1  := Sample(Y, Q):
A2  := Sample(Y, Q):
Ma  := `<,>`(`<,>`(A1), `<,>`(A2)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:

ScatterPlot(A1, A2, color = red, title = ["Raw LogNormal RV", font = [TIMES, BOLD, 12]]):
A1A2 := ScatterPlot(Column(Rs2, 1), Column(Rs2, 2), title = "Correlated LogNormal RV", opts, color=blue):

# And now change, as usual, the order in Ma

Ma  := `<,>`(`<,>`(A2), `<,>`(A1)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:

A2A1 := ScatterPlot(Column(Rs2, 2), Column(Rs2, 1), title = "Correlated LogNormal RV", opts, color=red):

display(A1A2, A2A1);

 

# How can we avoid that the order used to assemble Ma do matter?
#
# A close examination of what was done with gaussiann rvs show that in all the cases we
# went back to standard gaussian rvs before correlating them.
# So let's just do the same thing here.
#
# Of course it's not as immediate as previously...
# (please do not focus on the slowness of the code, it is written to clearly explain 
# what is done, not to be fast)



#-------------------------------------- from Y to standard gaussian
G  := RandomVariable(Normal(0, 1)):
G1 := Vector[row](Q, q -> Quantile(G, Probability(Y > A1[q], numeric), numeric)):
G2 := Vector[row](Q, q -> Quantile(G, Probability(Y > A2[q], numeric), numeric)):
# could be replaced by this faster code
#   cdf_Y := unapply(CDF(Y, z), z) assuming z > 0;
#   cdf_G := unapply(CDF(G, z), z);
#   S1    := sort(A1):
#   ini   := -10:
#   V     := Vector[row](Q):
#   for q from 1 to Q do
#     V[q] := fsolve(cdf_G(z)=cdf_Y(S1[q]), z=ini);
#     ini  := V[q]:
#   end do:
#------------------------------------------------------------------

Ma  := `<,>`(`<,>`(G1), `<,>`(G2)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:


opts := titlefont = [TIMES, BOLD, 12], symbol=point, transparency=0.5:


#-------------------------------------- from standard gaussian to Y
C1 := Vector[row](Q, q -> Quantile(Y, Probability(G > Rs2[q, 1], numeric), numeric)):
C2 := Vector[row](Q, q -> Quantile(Y, Probability(G > Rs2[q, 2], numeric), numeric)):
#------------------------------------------------------------------
A1A2 := ScatterPlot(C1, C2, title = "Correlated Normal RV", opts, color=blue):



Ma  := `<,>`(`<,>`(G2), `<,>`(G1)):
MA  := Transpose(Ma):
Cor := Matrix([[1, rho], [rho, 1]]):
Cd2 := LUDecomposition(Cor, method = 'Cholesky', output = ['U']):
Rs2 := MA . Cd2:

#-------------------------------------- from standard gaussian to Y
C1 := Vector[row](Q, q -> Quantile(Y, Probability(G > Rs2[q, 1], numeric), numeric)):
C2 := Vector[row](Q, q -> Quantile(Y, Probability(G > Rs2[q, 2], numeric), numeric)):
#------------------------------------------------------------------

A2A1 := ScatterPlot(C2, C1, title = "Correlated Normal RV", opts, color=red):

display(A1A2, A2A1);

 

 

CONCLUSION: Be extremely careful when correlating non standard gaussian random variables,
                             and more generally non gaussian random variables.


Correlating rvs the way Igor Hlivka did can be replaced in the more general framework of COPULA THEORY.

Mathematically a bidimensional copula C is a function from [0, 1] x [0, 1] to [0, 1] if C is joint CDF of a bivariate random variable
both with uniform marginals on [0, 1].
See for instanc here  https://en.wikipedia.org/wiki/Copula_(probability_theory)

What I did here to "correlate" A1 and A2 was nothing but to apply in a step-by-step way a GAUSSIAN COPULA to the bivariate
(A1, A2) random variable.
In  Quantile(G, Probability(Y > A1[q], numeric), numeric) the blue expression maps A1 onto [0, 1] (as it is needed
in the definition of a copula), while the brown sequence is the copula itself (when the same operation on A2 has been done).

 

 


 

Download LInear_Correlated_Random_Variables.mw

1 2 3 4 5 6 Page 3 of 6