IRC logs for #baserock for Friday, 2015-07-03

*** franred has joined #baserock06:51
*** bjdooks_ has joined #baserock07:10
*** ratmice___ has quit IRC07:11
*** ratmice___ has joined #baserock07:11
*** bjdooks has quit IRC07:11
*** paulw has quit IRC07:11
*** Kinnison has quit IRC07:11
*** Kinnison has joined #baserock07:11
*** Kinnison has joined #baserock07:11
*** paulw has joined #baserock07:11
*** zoli__ has quit IRC07:29
*** zoli__ has joined #baserock07:29
*** zoli__ has quit IRC07:31
*** zoli__ has joined #baserock07:31
*** zoli__ has quit IRC07:33
*** tiagogomes_ has joined #baserock07:47
*** ratmice___ has quit IRC07:49
*** ratmice___ has joined #baserock07:49
*** mariaderidder has joined #baserock07:50
*** zoli__ has joined #baserock07:53
*** bashrc_ has joined #baserock08:02
*** franred has quit IRC08:28
*** gary_perkins has joined #baserock08:30
*** zoli__ has quit IRC08:32
*** zoli__ has joined #baserock08:38
paulsherwoodhi folks... i'm wondering about atomicity in python....08:38
*** jonathanmaw has joined #baserock08:39
paulsherwoodthis being to avoid races when multiple things offer something into a cache08:39
KinnisonYour cache protocol should offer some way to ensure that08:39
paulsherwoodiirc Kinnison suggested have each cache entry be a directory08:39
paulsherwoodthis is pre-protocol, i'm afraid08:40
KinnisonOr if you're talking about *being* the cache then you need atomicity to be guaranteed by your storage08:40
KinnisonA good atomic operation on filesystem is mkdir()08:40
KinnisonIt'll succeed or fail08:40
Kinnisonand on NFS it's sync()d with the server AFAICT08:40
paulsherwoodah, ok. i was being confused by os.rename, which happily re-writes over stuff08:41
Kinnisonrename is an atomic replacement operation (either you read the old or the new) but it doesn't tell you if someone else renamed first08:41
KinnisonYou can't rename a dir on top of another dir though08:41
paulsherwoodyup... so mkdir should be better08:41
Kinnisonso renaming a dir into place ought to be atomic08:42
* Kinnison would need to check wrt. NFS but I imagine it's atomic there too08:42
paulsherwoodi think you can, Kinnison? from my experiments os.rename can rename one onto another, losing info08:42
*** ssam2 has joined #baserock08:42
*** ChanServ sets mode: +v ssam208:42
rjekStore cache items in a Maildir? :)08:42
paulsherwoodrjek: that's not a terrible idea. i was really hoping to find something that *does* this already08:43
paulsherwoodbut so far my googling has failed08:43
rjekUse IMAP as your cache protocol, and then use Dovecot! >:D08:43
paulsherwoodheh. at this point i need something simple, in python08:44
Kinnisonpaulsherwood: You can't os.rename() one dir on top of another if both are non-empty08:44
Kinnisonpaulsherwood: so rename() of a dir should be fine as your atomic "put the artifact in place"08:44
rjekmkdir("cachekey"), open("cachekey/data")08:44
paulsherwoodKinnison: ah, ok... i did the wrong experiment, then :) thanks!08:45
Kinnisonwrite-to tmp/blahblah08:45
Kinnisonrename("tmp", "cachekey")08:45
Kinnisonif someone else got to the rename() first, yours will fail with EEXIST08:45
rjekAh, yes, also that protects you against somebody /requesting/ the object before it has been fully written, as the rename is atomic08:45
Kinnisonalso, your 'tmp' name should be created atomically08:45
Kinnisonwhich most temfile implementations will do08:45
*** franred has joined #baserock08:48
*** mdunford has joined #baserock08:52
*** lachlanmackenzie has joined #baserock09:03
*** zoli__ has quit IRC09:14
*** mdunford has quit IRC09:46
*** zoli__ has joined #baserock09:47
*** mdunford has joined #baserock10:02
*** mdunford has quit IRC10:07
* paulsherwood chuckles maniacally as 5 ybd instances in a sack fight to be the first to build gcc10:11
*** rdale has joined #baserock10:12
*** richard_maw has left #baserock10:12
*** mdunford has joined #baserock10:18
*** zoli__ has quit IRC10:19
*** zoli__ has joined #baserock10:22
*** franred has quit IRC10:29
*** mdunford has quit IRC10:31
*** zoli__ has quit IRC10:40
*** zoli__ has joined #baserock10:45
*** franred has joined #baserock10:46
*** mdunford has joined #baserock10:47
*** zoli__ has quit IRC10:52
*** zoli__ has joined #baserock11:15
*** zoli__ has quit IRC11:29
*** zoli__ has joined #baserock11:29
*** zoli__ has joined #baserock11:30
*** zoli__ has quit IRC11:33
*** franred has quit IRC12:21
*** franred has joined #baserock12:22
*** zoli__ has joined #baserock12:47
*** zoli__ has quit IRC13:01
*** zoli__ has joined #baserock13:30
*** zoli__ has quit IRC13:59
*** zoli__ has joined #baserock14:03
*** zoli__ has quit IRC14:08
*** mariaderidder has quit IRC14:09
*** franred has quit IRC14:10
*** zoli__ has joined #baserock14:14
*** zoli__ has quit IRC14:18
*** mariaderidder has joined #baserock14:22
*** zoli__ has joined #baserock14:46
*** zoli__ has quit IRC14:50
*** zoli___ has joined #baserock14:50
*** zoli___ has quit IRC14:57
paulsherwoodso, the ybd instances in the sack interfered with each other.15:07
SotKwhat did they do?15:07
paulsherwoodany ideas why parallel runs of building stage1-gcc would trip each other up?15:07
paulsherwoodone succeeded, others failed15:08
SotKcan we see the logs?15:08
ssam2there's no reason they should interfere15:08
ssam2sounds like something is wrong in ybd15:08
paulsherwoodssam2: you're almost certainly correct :)15:09
paulsherwoodssam2: do you know if anyone has tried running multiple morphs simultaneously on one machine?15:09
* paulsherwood needs to re-run, to get tidier logs15:09
SotKpaulsherwood: multiple morphs on one machine doesn't reliably work15:10
paulsherwoodSotK: oh? are there known problems? maybe that's relevant15:11
ssam2paulsherwood: lots of times, using scripts/run-distbuild15:11
ssam2I forget why they break :)15:11
paulsherwoodbut they do break?15:11
ssam2something to do with tmpdirs, I think15:11
SotKpaulsherwood: they tend to get tangled by renaming temporary directories that others are using IIRC15:11
paulsherwoodaha. i  think ybd is past that issue, but i may be mistaken15:12
* SotK has seen them work sometimes, if the race conditions pan out nicely15:12
ssam2heh, it may even be the issue we were discussing the other day of constructing a commandline for linux-user-chroot that is wrong by the time it's actually processed15:12
paulsherwoodssam2: i've settled for chroot in my testing at the moment15:14
ssam2ok, i'm not sure what it could be then15:15
paulsherwoodthat issue does rule out l-u-c for this use-case presently, i'm afraid15:15
paulsherwoodi'll post logs when i have them15:15
*** mariaderidder has quit IRC15:31
*** CTtpollard has quit IRC15:57
paulsherwoodis that enough, or do i need to paste the whole log from gcc?16:14
paulsherwoodtoo big for paste.baserock.org16:19
ssam2the 'Killed' error from make looks suspicious16:20
ssam2that suggests something actually terminated the process from outside, although I could be wrong16:21
ssam2maybe it ran out of memory ?16:21
paulsherwoodah, that's possible i suppose16:21
persiaRunning out of memory seems the most likely.  gcc wants a fair bit, and running 5 gccs can be confusing, since the amount of memory requested is a function of HW discovery.16:22
ssam2looking at the source to 'make', I don't think it would have said 'Killed' unless it received a SIGKILL16:22
*** jonathanmaw has quit IRC16:24
*** mdunford has quit IRC16:38
paulsherwoodi think you folks are right...16:39
paulsherwoodi'm considering running a whole herd of ybd on thunderx16:39
Kinnisonjust because it's called 'thunderx' doesn't mean it's a good idea to thunder a herd on it16:40
paulsherwoodany suggestions for instances, and max-jobs per instance?16:40
bashrc_48 core!16:40
paulsherwoodbashrc_: yup. exactly. so how many should i run, and how many cores should each instance get?16:41
KinnisonTypical build systems in the past which we've used have been around 4 cores and 4G of RAM16:43
Kinnisonhow much RAM does this system have?16:43
Kinnisonif > 48 then numcores is the limiter16:43
KinnisonAlso, what's the IO like on the unit?16:44
persiaTo test gcc, I'd start with 8 jobs, to leave headroom.  If that fails, we know it is something else.16:47
persiaIf 8 works, then it becomes interesting to make it larger.16:47
bashrc_it would be interesting to see how fast a linux system could be compiled on 48 cores, although maybe compute power isn't the major limitation16:47
persiamax-jobs per instance is probably sensibly numcores/numjobs16:47
paulsherwoodhmmm... i if i fork a process, will its subprocess get the same output from random as the parent?16:49
Kinnisondepends on what the child does wrt. seeding16:51
* paulsherwood has done no seeding16:52
persiaFor most framework RNG implementations, the child will get different results16:58
persiaIf you have your own implementation, you'll have imeplementation-dependent results.16:58
rjekThe C library's random(), children will get the same sequence16:58
paulsherwoodafter addinn random.seed(instance) i get
paulsherwoodwhich is more like the randomness i was hoping for :)17:00
*** gary_perkins has quit IRC17:05
*** ssam2 has quit IRC17:07
paulsherwoodam i missing something even more fundamental here? on a linux system with many cores (eg thunderx), if i run a python program, and it forks 10 times, will each subprocess get a different core?17:15
flatmushprobably, but not guaranteed17:18
paulsherwoodok... but it will broadly do the sensible thing?17:18
paulsherwoodok... not sure that's what i'm seeing, though.17:19
persiaBe aware that you need to reserve some cores for the OS, and if your python program calls out to other things, you will have many more jobs than forks of your code, which are likely to be scheduled on even more cores.17:19
flatmushI'm not really sure of the quirks of python, so I'm assuming its fork call is the same as C, which it most likely is.17:19
KinnisonAlso, obviously, if the python isn't doing anything then it won't consume any CPU17:19
Kinnisonand unless you explicitly bind it, it'll go wherever when it wants CPU time17:20
persiaMost environments default to trying to use the same core used before for the same job (in case something was cached), but this is not to be relied upon.17:20
persiaI *think* that recent linux schedules things in a hierarchical cache-optimised way by default, but I don't know if someone found better results not doing that and so it changed since I last looked.17:22
paulsherwoodsorry it's such a mess, but somewhere in minion #3 has failed to build stage1-gcc for some reason. i wonder if 10 instances with max-jobs 10 is too much?17:24
persiaI would schedule fewer for 48 cores.17:24
persiaThe usual rule of thumb is to run a number of jobs equivalent to one's simulthreading count plus one.17:25
persiaBut that's for simulthreading ranges in the 2-20 range.  It might not scale to large numbers.17:25
persiaSo, if you have 32 cores, and each handles 16 threads, you want to schedule 513 jobs by this metric.17:25
paulsherwoodyup. for some reason i thought this box had 96. anyway i'm happy to let the code and the kit suffer for a while17:25
persiaIf you've 48 cores and each handles one thread, you want to schedule 49 jobs.17:26
persiaThat should include all calls to gcc, shell, make, etc.17:26
persiaSince you're managing this with Python, I'd count max-jobs + 2 as the number of jobs per child.17:26
persiaAnd then you probably have a few overhead jobs for your system.17:27
flatmushit's black magic picking the core count, it really depends how often you'll have threads stuck waiting on I/O17:27
persiaSo maybe 5 children with max-jobs=8, or 8 children with max-jobs=5, to be sure of some headroom.17:27
flatmush5/4 is the ratio I often use for cores17:27
persiaflatmush: For cores, or for threads?17:27
flatmushthreads, but without SMT that's the same thing17:28
flatmushwith SMT it's even more black magic17:28
paulsherwoodi could vary instances and max-jobs, re-run the whole thing lots of times and then plot elapsed time17:28
paulsherwoodand the winner is.... minion #9 9:15-07-03 18:28:37 [2/232/232] [stage1-gcc] Now cached as stage1-gcc.da0a2535b0f62a86abdd1412c3110d21d0cf0760a3b695fe5e53079a7667d8bd17:29
paulsherwood5:15-07-03 18:28:55 [2/232/232] [stage1-gcc] Bah! I could have cached stage1-gcc.da0a2535b0f62a86abdd1412c3110d21d0cf0760a3b695fe5e53079a7667d8bd17:29
paulsherwood9:15-07-03 18:28:57 [2/232/232] [stage1-gcc] Bah! I raced and rebuilt stage1-gcc.da0a2535b0f62a86abdd1412c3110d21d0cf0760a3b695fe5e53079a7667d8bd17:29
paulsherwoodsomething's broken there... two number 9s :)17:29
persiaWhat you'll probably see is that increasing things decreases time (modulo all the parallelisation issues), until you hit some limit, at which point your scheduler will serialise your parallelism, and that will stay until some other point, where time will increase again as the rate of failure for lack of resources increases.17:30
paulsherwoodyup... which should give some broad heuristic for best overall throughput i hope17:31
persiaExcept that SMP topographies, SMT implementation details, IO parallelism (including to main memory), NUMA, and the vagaries of the scheduler will result in limited confidence in results excepting for specific workloads against specific platforms.17:32
*** petefoth has joined #baserock17:32
persia(so it's only worth having a very rough heuristic vs. deep research.  5/4 sounds about right)17:33
* Kinnison tends to use CPUcount*1.517:34
Kinnisonto account for IO losses17:34
paulsherwoodfor max-jobs?17:34
Kinnisonso if yo uhave say 48 cores17:34
paulsherwoodok, but in this situation?17:34
Kinnisonthen I'd do 10 cattle of 6 jobs17:35
Kinnisonbut if your IO is not great then maybe fewer cattle of more jobs17:35
Kinnisonsince each cow in your herd will disproportionately consume IO17:35
paulsherwoodok, so 10 instances, max-jobs 6  for each17:35
paulsherwoodi'll try it if/when the current herd crosses  the prairie :)17:36
KinnisonDid you ever answer my question about what kind of IO your system had?17:36
KinnisonIUf you did it was lost in the burble above17:36
paulsherwoodi don;t think so. not sure i know how to answer? it's a thunderx is pretty much all i know17:37
Kinnisonhow many spindles, are they rust or solidstate, what is the storage layout ( e.g. raid? )17:37
paulsherwoodare there some commands i could run to spit that out?17:38
Kinnisonwell ls /dev/sd?17:38
paulsherwoodit's in a cloud somewhere17:38
Kinnisonif it's a cloud based system then feck knows what your IO will be like17:38
Kinnisonrun bonnie++ ?17:38
persiaFor that matter, if you're running in a VM, your core/memory arrangements may be much smaller, and you might have SMT enabled in the virtual machine, even though the hardware doesn't use that by default.17:40
paulsherwoodif i run bonnie, will she interfere with my cattle?17:40
Kinnisonyour cattle will interfere with a useful bonnie output17:41
paulsherwoodok, i've killed the herd17:42
paulsherwoodhow long does bonnie take?17:42
Kinnisonno idea17:42
*** lachlanmackenzie has quit IRC18:22
*** tiagogomes_ has quit IRC18:45
*** dabukalam has quit IRC21:26
*** dabukalam has joined #baserock21:26
*** dabukalam has quit IRC21:29
*** dabukalam has joined #baserock21:30

Generated by 2.14.0 by Marius Gedminas - find it at!