20190221, 00:01  #1 
∂^{2}ω=0
Sep 2002
República de California
2^{2}·5·11·53 Posts 
Mlucas v18 available
Mlucas v18 has gone live. Use this thread to report bugs, build issues, and for any other related discussion.
Last fiddled with by ewmayer on 20190306 at 21:55 
20190221, 08:09  #2 
Jul 2009
Germany
2×313 Posts 
I always wanted to try it out, but unfortunately I can not compile multithreaded, because I still use windows 7 professional. Would be great if someone would upload an exe file for the AMD K 10 architecture.

20190221, 08:14  #3 
"Composite as Heck"
Oct 2017
863_{10} Posts 
Must be my birthday :)

20190221, 12:03  #4 
"Composite as Heck"
Oct 2017
1101011111_{2} Posts 
It compiles and doesn't seg fault on the Samsung S7, well done on fixing the Arm issues this is great.

20190221, 17:45  #5 
Jul 2009
Germany
2·313 Posts 

20190221, 19:29  #6 
∂^{2}ω=0
Sep 2002
República de California
11660_{10} Posts 

20190221, 20:24  #7  
"Composite as Heck"
Oct 2017
863_{10} Posts 
Quote:
Sounds like your phone has a Snapdragon 415 which is a 28nm 4xA53 4xA53. It should work but unfortunately doesn't come close in efficiency to an S7's 14nm 4xM1 4xA53. It should handily beat a raspberry pi 3's 40nm in efficiency and throughput and slot somewhere behind the 20nm 10 core Helio X25 ( https://www.mersenneforum.org/showpo...8&postcount=83 ). Attached is the v18 ARM asimd binary from the S7 on the offchance you find it useful, AFAIK you need a rooted phone to run it and if you have a rooted phone you could easily build mlucas from source yourself but there it is. Quote:
I'll try and create an APK tomorrow, there's a chance it works where the v17.1 failed as there were clobberrelated error messages like this: Code:
/home/u18/AndroidStudioProjects/MlucasAPK/app/src/main/cpp/mi64.c:813:19: error: unknown register name 'rax' in asm : "cc","memory","rax","rbx","rcx","rsi","r10","r11" /* Clobbered registers */\ 

20190221, 22:00  #8  
∂^{2}ω=0
Sep 2002
República de California
26614_{8} Posts 
Quote:
Quote:


20190222, 07:16  #9 
Jul 2009
Germany
2·313 Posts 

20190222, 12:03  #10  
Einyen
Dec 2003
Denmark
2×7×227 Posts 
Compiled it on the usual c5d.9xlarge with 18 cores and 36 threads:
gcc c O3 march=skylakeavx512 DUSE_AVX512 DUSE_THREADS ../src/*.c >& build.log grep i error build.log [Assuming above grep comes up empty] gcc o Mlucas *.o lm lpthread lrt DCARRY_16_WAY is not needed in v18 right? This time all 18 cores was fastest for some reason. Code:
18.0 ./Mlucas fftlen 4608 iters 10000 nthread 36 4608 msec/iter = 3.24 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 nthread 34 4608 msec/iter = 3.18 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 nthread 32 4608 msec/iter = 3.15 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 nthread 30 4608 msec/iter = 3.07 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 nthread 28 4608 msec/iter = 3.03 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 nthread 26 4608 msec/iter = 3.08 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 cpu 0:17 4608 msec/iter = 2.96 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 cpu 0:16 4608 msec/iter = 3.12 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 cpu 0:15 4608 msec/iter = 3.09 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 cpu 0:14 4608 msec/iter = 4.05 ROE[avg,max] = [0.246727988, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 cpu 0:13 4608 msec/iter = 4.18 ROE[avg,max] = [0.246727988, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 cpu 18:35 4608 msec/iter = 3.00 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 ./Mlucas fftlen 4608 iters 10000 cpu 0:34:2 4608 msec/iter = 4.27 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000iteration Res mod 2^64, 2^351, 2^361 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107 From the README.html should this be cpu 0:n1 ? Quote:
Last fiddled with by ATH on 20190222 at 12:07 

20190222, 23:44  #11  
∂^{2}ω=0
Sep 2002
República de California
11660_{10} Posts 
Correct  if you open platform.h and search for CARRY_16_WAY you'll see it's now on by default for avx512 builds.
Quote:
Quote:
From a jobmanagement perspective it's of course easier to just run 1 job using all the physical cores, and as long as n <= 4 one won't sacrifice much total throughput by doing so. So on both my nonHT Intel quad Haswell and my quadARM64core Odroid C2 I use cpu 0:3, as I do on my HTenabled dualcore Intel Broadwell NUC because there I want to use 2threadsperphysicalcore and a single 4thread job gives me nearly the same throughput as separate jobs using cpu 0,2 and cpu 1,3. I need to carefully reread the README.html page to try to catch remaining such ,versus: mixups, because they are easy to overlook. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Mlucas version 17.1  ewmayer  Mlucas  96  20191016 12:55 
Mlucas on ubuntu  Damian  Mlucas  17  20171113 18:12 
Mlucas version 17  ewmayer  Mlucas  3  20170617 11:18 
MLucas on IBM Mainframe  Lorenzo  Mlucas  52  20160313 08:45 
mlucas on sun  delta_t  Mlucas  14  20071004 05:45 