roman_pearce

Mr. Roman Pearce

1678 Reputation

19 Badges

20 years, 215 days
CECM/SFU
Research Associate
Abbotsford, British Columbia, Canada

I am a research associate at Simon Fraser University and a member of the Computer Algebra Group at the CECM.

MaplePrimes Activity


These are replies submitted by roman_pearce

I've got a Windows 7 machine running right now. I recommend you use the 32-bit version of Maple unless you need to do computations in more than 4 GB of RAM. 64-bit Maple doesn't have very good performance on Windows, mostly because of limitations and incompatibilities in the GMP library. The 32-bit version of Maple 11 should work fine.
Another vote for higher performance.
Lists aren't limited to 100 elements, but if you're modifying the elements of a list you will get an error at 100 elements. The reason for this is that you shouldn't modify elements of a list. It has to make a new list each time, and copy all the data. The 100 element limit is in there to stop people from coding very inefficient algorithms without realizing it. Use an Array or a hash table instead, or modify all the elements of the list at once using the map command, for example.
This could only cannibalize sales of the current version of Maple. They would do better to focus available resources into major improvements to the core product, and as they gain marketshare start lowering the price. You need "killer features" though. These are features that are not easily matched by the competition, so that in the time it takes them to catch up (years, ideally) you can cement your lead and hopefully bring even more killer features out of the pipeline. There are always opportunities to do this, but you need to go way beyond what people think is "reasonable". You can't get there by refining existing stuff. You have to start fresh with extreme goals, and then claw your way up to them. For example, say I want to visualize large data sets. If I bought the fastest specialized software today, how large could I get up to? Ok, I want something 10 times better than that. Then you go and make it. It's fairly easy to go bust with such an approach, so don't bet the farm or overpromise up front, but ideally 10-30% of your bets should be long. Massive improvements often pay incremental benefits as well. Since learning is involved you will never get everything on the first try, but the intermediate versions are still likely to be big improvements that customers will like.
Old versions of Maple are perfectly capable, and of course if you're mostly writing your own programs you can get by for a long time before upgrading. However, I am constantly surprised by how the improvements pile up over time. You go back five years and all kinds of stuff is missing or much less powerful. I think it's easy to dismiss the newest features because users don't see their full benefit right away.
I want the interface to be about 100 times faster. See http://www.mapleprimes.com/forum/cpuusage I did a test on a MacBook 2.0 GHz Core 2. Bashing keys in worksheet mode, I used 100% of one core for 10 seconds entering 374 characters. This suggests that to enter one single character and display it on the screen, Maple uses something like 53 million machine instructions. It's got to be mostly redrawing the screen. Please cut this number down, it's outrageous. I can type faster than Maple can display sometimes. I did a few more tests on the interface. Here are two polynomials with 10,000 terms. It's 380kb. Copy and paste the input into a Maple worksheet. It takes about a second. Now hit enter. It takes 8 seconds to parse and display the output. It's all way too slow, although it has improved from earlier versions of the standard interface. Here's a problem. Polynomial with 874,000 terms. It's 17MB. Copy and paste into a worksheet and Maple chokes and crashes. It just completely blows up. Pasting it into a text editor takes 0.1 seconds, about the speed of strcpy. Maple 14 computes this thing in about 0.14 seconds. It's the product of two polynomials with 1000 terms. I'm not sure how you're going to fix this. If you design for efficiency as you go then you don't get these kinds of problems. Now it looks intractable and you'll probably have to rip the guts out of something to fix it.
I want a factor of 10 speedup in polynomial algorithms over Maple 14, with parallelization for large problems that achieves linear speedup on multicore cpus.
When I am typing my code into Maple, what difference does is make if the CPU is at 30 % or at 10%? It matters because it should be 0%, so it suggests an inefficiency in the underlying design. How large can a worksheet get before the CPU is pegged at 100% for one second for each character you type? I can't think of a plausible reason why typing anything would consume 30% of a GHz cpu, that's at least a thousand times too slow. I like my software to be responsive.
Thanks for this useful reference. I find myself wishing this could all be simplified somehow :)

Thanks for the great article.  A fairer solution would be to say "if you're all too picky then nobody gets any gifts".  Alternatively, all gifts could be given to the person who determined that there was no perfect matching, thereby saving everyone time.

When polynomials are first entered, Maple will create the old dag structure and simplify it to new one. But after that, Maple will try to keep things in the array format. For example, let f(x,y,z) and g(x,y,z) be two polynomials stored as packed arrays. If you create f*g it will be a Maple prod with f^1*g^1. Nothing new, the packed array replaces the Maple's sum. Calling expand(f*g) will multiply in the packed array format. Other routines like coeff(f,x,i) will similarly preserve the structure. There are quite a lot of gory details to handle polynomials in different variables and things like subs(x=y+1, f) and stuff, which is why this is not shipping soon. Those details have been worked out with an emphasis on speed. More generality can be added later once we get the thing working.
It will be limited to names initially. It might be nice to support functions, but that will greatly complicate the coding of algorithms. For example, expand(f) where one of the variables is sin(2*x). I don't think it makes sense to support more general Maple structures like sums. One other restriction that I forgot to mention is that the coefficients must be integers. Rational arithmetic is hopelessly slow so there's no reason to support it. Programmers will be encouraged to clear denominators, while ordinary Maple users who don't know or care will be well served by routines which clear denominators, compute, and then restore them. Double precision coefficients may be worth supporting if there are industrial applications.
Thanks for the interesting post. I have a few random thoughts: On lack of caches, this is basically a GPU. Makes a lot of sense for intensive branch free computations, e.g. linear algebra, simulations, plotting, signal processing. Basically dense computations requiring high throughput. It uses die area for arithmetic units instead of cache. For general purpose computing however, caches are never going away. Branches = caches. What you can expect to see is more of the following: shared resources across multiple execution cores. We're going to get *a lot* of cores. Thousands. But the caches will be shared, so high performance design will become even more important. The decode and L1 instruction caches on Bulldozer are shared. That means you want to run the same program on each core if at all possible. As for smart compilers, I think this is basically wrong. Smart compilers have never really worked. They were supposed to make RISC more efficient. They were also supposed to make functional languages faster than C. I think people underestimate compilers' ability to choose good instructions, and massively overestimate their ability to make a good program. You can not automate design. I think the future does not belong to any one solution. I think you'll see old low level techniques dusted off and used for fine grained parallelization of operations. Programmer tools, like the task model, will be used to parallelize non-uniform algorithms. And high level languages like Maple will wield this power automatically, driven by everything underneath. Nobody wants to write parallel software. The future is automatic.
The multiplications are broken up recursively until it gets down to fairly small polynomials. You can watch what it does with this bit of code:
_bigprod := eval(`expand/bigprod`):
`expand/bigprod` := proc(a,b) printf("%d x %d terms\n", nops(a), nops(b)); _bigprod(a,b): end proc:
f,g := seq(randpoly([x,y,z,t],degree=10,dense),i=1..2):
p := expand(f*g):
Increase the degree or the number of variables to get larger problems. It's actually inefficient to add lots of polynomials together using Maple's current algorithms, so using four products is probably better than nine.
Maple has to multiply two polynomials f and g, each having n terms. The algorithm shown above uses n^2 memory, which is wasteful because the product probably won't have n^2 terms. So Maple 13 splits the polynomials in half, i.e. let f = f1 + f2 and g = g1 + g2, where each fi and gj has n/2 terms, and compute (f1*g1) + (f1*g2) + (f2*g1) + (f2*g2). Note that there is no savings of operations. Maple has Karatsuba's algorithm, but this is for sparse multivariate polynomials. It saves a lot of memory though, which helps Maple to run faster.
First 12 13 14 15 16 17 18 Last Page 14 of 39