Question: A remark about Statistics:-Sampling(..., method=[envelope...])

Hi, 

Recently a few questions concerning the sampling of the Cauchy distribution and the sampling of a truncated Normal distribution have been posted (mainly by  @jalale).
This post is concerned by the sampling of a truncated (standard) Cauchy distribution.

In a first part the efficiency fo two methods is adressed in the case of a non-truncated Cauchy distribution:

  • The "standard" Maple's command Statistics:-Sample(Cauchy(0, 1), N)
  • And a general method a priori very efficient if one knows the ICDF (Inverse Cumulative Function Distribution). It happens that this ICDF is just cot(U*Pi)  where U is a Uniform RV over [0, 1].
     

The second part adresses the sampling of a truncatedCauchy distribution with two methods:

  • The "standard" Maple's command Statistics:-Sample(Cauchy(0, 1), N, method=[envelope, range=...])
  • The method based on the use of the ICDF

 

Results:

Test1 (non-truncated Cauchy distribution) 

  • "Standard" Maples sampling outperforms the ICDF based method in terms of :
    • memory occupation: ICDF is twice more demanding
    • cpu time: ICDF is ten times slower
       

Test2 (truncated Cauchy distribution) 

  • ICDF based method i outperforms "Standard" Maples sampling oin terms of :
    • memory occupation: Maples "envelope sampling" method is twice more demanding
    • cpu time: Maples "envelope sampling" method is two times slower


But, beyond these simple observations, a disturbing problem is: the "envelope sampling" method seems to not return the correct distribution (at least when used this waymethod=[envelope, range=a..b]  with a < b)
This is confirmed by the two last plot where histogram and PDF are uperimposed.

Do you think this problem can be avoided by another parameterization of the "envelope sampling" method or that it reveals some underlying problem with it?

PS: I did not investigate further for other distributions .

 


 

 

Sampling the Cauchy distribution

Maple's default sampling method outperformes the adhoc method

restart

with(Statistics):

C := RandomVariable(Cauchy(0, 1))

_R

(1)

f := unapply(CDF(C, t), t);

proc (t) options operator, arrow; 1/2+arctan(t)/Pi end proc

(2)

finv := unapply(-solve(f(t)=u, t), u)

proc (u) options operator, arrow; cot(u*Pi) end proc

(3)

# "natural" way to proceed

N  := 10^6:

S1 := CodeTools:-Usage(Sample(C, N)):

memory used=7.71MiB, alloc change=39.63MiB, cpu time=69.00ms, real time=69.00ms, gc time=8.72ms

 

# Let's try the sampling strategy based on the inverse of the CDF
# Usually it starts from sampling a Uniform RV on [0, 1] and
# next applies finv to the result.
#
# Smart but inefficient

U  := RandomVariable(Uniform(0., 1)):
S2 := CodeTools:-Usage(finv~(Sample(U, N))):

memory used=145.02MiB, alloc change=7.63MiB, cpu time=4.94s, real time=3.04s, gc time=2.61s

 

# Much more efficient
#
# Given the special form of finv it's simpler to sample a Unirorm RV
# on [0, Pi] and apply "cot" to the result

pi := evalf(Pi):
U  := RandomVariable(Uniform(0., pi)):
S2 := CodeTools:-Usage(cot~(Sample(U, N))):

memory used=15.28MiB, alloc change=0 bytes, cpu time=652.00ms, real time=237.00ms, gc time=577.78ms

 

Sampling a truncated Cauchy distribution

Example 1:
with(Statistics) + method=[envelope, range=-10..10]

The adhoc method outperforms Maple's default sampling method

S1 := CodeTools:-Usage(Sample(C, N, method=[envelope, range=-10..10])):

Histogram(S1);

memory used=8.88MiB, alloc change=-7.63MiB, cpu time=322.00ms, real time=260.00ms, gc time=93.77ms

 

 

p  := Probability(C < -10, numeric);
q  := 1-Probability(C > +10, numeric);
U  := RandomVariable(Uniform(p*pi, q*pi)):
S2 := CodeTools:-Usage(cot~(Sample(U, N))):

Histogram(S2);

HFloat(0.03172551743055352)

 

HFloat(0.9682744825694465)

 

memory used=15.28MiB, alloc change=7.63MiB, cpu time=170.00ms, real time=113.00ms, gc time=92.11ms

 

 

Sampling a truncated Cauchy distribution

Example 2:
with(Statistics) + method=[envelope, range=-1..1]

The adhoc method outperformes Maple's default sampling method

S1 := CodeTools:-Usage(Sample(C, N, method=[envelope, range=-1..1])):


scaling := Probability(C < +1, numeric) - Probability(C < -1, numeric);
plots:-display( Histogram(S1), plot(PDF(C, t)/scaling, t=-1..1, thickness=3, color=red) );

memory used=8.08MiB, alloc change=0 bytes, cpu time=246.00ms, real time=215.00ms, gc time=46.62ms

 

HFloat(0.5)

 

 

p  := Probability(C < -1, numeric);
q  := 1-Probability(C > +1, numeric);
U  := RandomVariable(Uniform(p*pi, q*pi)):
S2 := CodeTools:-Usage(cot~(Sample(U, N))):

plots:-display( Histogram(S2), plot(PDF(C, t)/scaling, t=-1..1, thickness=3, color=red) );

HFloat(0.25)

 

HFloat(0.75)

 

memory used=15.28MiB, alloc change=7.63MiB, cpu time=163.00ms, real time=106.00ms, gc time=85.93ms

 

 

 


 

Download CAUCHY_adhoc-sampling.mw

Please Wait...