IRC logs for #baserock for Friday, 2015-07-03

*** franred has joined #baserock		06:51
*** bjdooks_ has joined #baserock		07:10
*** ratmice___ has quit IRC		07:11
*** ratmice___ has joined #baserock		07:11
*** bjdooks has quit IRC		07:11
*** paulw has quit IRC		07:11
*** Kinnison has quit IRC		07:11
*** Kinnison has joined #baserock		07:11
*** Kinnison has joined #baserock		07:11
*** paulw has joined #baserock		07:11
*** zoli__ has quit IRC		07:29
*** zoli__ has joined #baserock		07:29
*** zoli__ has quit IRC		07:31
*** zoli__ has joined #baserock		07:31
*** zoli__ has quit IRC		07:33
*** tiagogomes_ has joined #baserock		07:47
*** ratmice___ has quit IRC		07:49
*** ratmice___ has joined #baserock		07:49
*** mariaderidder has joined #baserock		07:50
*** zoli__ has joined #baserock		07:53
*** bashrc_ has joined #baserock		08:02
*** franred has quit IRC		08:28
*** gary_perkins has joined #baserock		08:30
*** zoli__ has quit IRC		08:32
*** zoli__ has joined #baserock		08:38
paulsherwood	hi folks... i'm wondering about atomicity in python....	08:38
*** jonathanmaw has joined #baserock		08:39
paulsherwood	this being to avoid races when multiple things offer something into a cache	08:39
Kinnison	Your cache protocol should offer some way to ensure that	08:39
paulsherwood	iirc Kinnison suggested have each cache entry be a directory	08:39
paulsherwood	this is pre-protocol, i'm afraid	08:40
Kinnison	Or if you're talking about being the cache then you need atomicity to be guaranteed by your storage	08:40
Kinnison	A good atomic operation on filesystem is mkdir()	08:40
Kinnison	It'll succeed or fail	08:40
Kinnison	and on NFS it's sync()d with the server AFAICT	08:40
paulsherwood	ah, ok. i was being confused by os.rename, which happily re-writes over stuff	08:41
Kinnison	rename is an atomic replacement operation (either you read the old or the new) but it doesn't tell you if someone else renamed first	08:41
Kinnison	You can't rename a dir on top of another dir though	08:41
paulsherwood	yup... so mkdir should be better	08:41
Kinnison	so renaming a dir into place ought to be atomic	08:42
* Kinnison would need to check wrt. NFS but I imagine it's atomic there too		08:42
paulsherwood	i think you can, Kinnison? from my experiments os.rename can rename one onto another, losing info	08:42
*** ssam2 has joined #baserock		08:42
*** ChanServ sets mode: +v ssam2		08:42
rjek	Store cache items in a Maildir? :)	08:42
paulsherwood	rjek: that's not a terrible idea. i was really hoping to find something that does this already	08:43
paulsherwood	but so far my googling has failed	08:43
rjek	Use IMAP as your cache protocol, and then use Dovecot! >:D	08:43
paulsherwood	heh. at this point i need something simple, in python	08:44
Kinnison	paulsherwood: You can't os.rename() one dir on top of another if both are non-empty	08:44
Kinnison	paulsherwood: so rename() of a dir should be fine as your atomic "put the artifact in place"	08:44
rjek	mkdir("cachekey"), open("cachekey/data")	08:44
Kinnison	naah	08:45
Kinnison	mkdir("tmp")	08:45
paulsherwood	Kinnison: ah, ok... i did the wrong experiment, then :) thanks!	08:45
Kinnison	write-to tmp/blahblah	08:45
Kinnison	rename("tmp", "cachekey")	08:45
rjek	hmm	08:45
Kinnison	if someone else got to the rename() first, yours will fail with EEXIST	08:45
rjek	Ah, yes, also that protects you against somebody /requesting/ the object before it has been fully written, as the rename is atomic	08:45
Kinnison	also, your 'tmp' name should be created atomically	08:45
Kinnison	which most temfile implementations will do	08:45
*** franred has joined #baserock		08:48
*** mdunford has joined #baserock		08:52
*** lachlanmackenzie has joined #baserock		09:03
*** zoli__ has quit IRC		09:14
*** mdunford has quit IRC		09:46
*** zoli__ has joined #baserock		09:47
*** mdunford has joined #baserock		10:02
*** mdunford has quit IRC		10:07
* paulsherwood chuckles maniacally as 5 ybd instances in a sack fight to be the first to build gcc		10:11
*** rdale has joined #baserock		10:12
*** richard_maw has left #baserock		10:12
*** mdunford has joined #baserock		10:18
*** zoli__ has quit IRC		10:19
*** zoli__ has joined #baserock		10:22
*** franred has quit IRC		10:29
*** mdunford has quit IRC		10:31
*** zoli__ has quit IRC		10:40
*** zoli__ has joined #baserock		10:45
*** franred has joined #baserock		10:46
*** mdunford has joined #baserock		10:47
*** zoli__ has quit IRC		10:52
*** zoli__ has joined #baserock		11:15
*** zoli__ has quit IRC		11:29
*** zoli__ has joined #baserock		11:29
*** zoli__ has joined #baserock		11:30
*** zoli__ has quit IRC		11:33
*** franred has quit IRC		12:21
*** franred has joined #baserock		12:22
*** zoli__ has joined #baserock		12:47
*** zoli__ has quit IRC		13:01
*** zoli__ has joined #baserock		13:30
*** zoli__ has quit IRC		13:59
*** zoli__ has joined #baserock		14:03
*** zoli__ has quit IRC		14:08
*** mariaderidder has quit IRC		14:09
*** franred has quit IRC		14:10
*** zoli__ has joined #baserock		14:14
*** zoli__ has quit IRC		14:18
*** mariaderidder has joined #baserock		14:22
*** zoli__ has joined #baserock		14:46
*** zoli__ has quit IRC		14:50
*** zoli___ has joined #baserock		14:50
*** zoli___ has quit IRC		14:57
paulsherwood	so, the ybd instances in the sack interfered with each other.	15:07
pedroalvarez	:)	15:07
SotK	what did they do?	15:07
paulsherwood	any ideas why parallel runs of building stage1-gcc would trip each other up?	15:07
paulsherwood	one succeeded, others failed	15:08
SotK	can we see the logs?	15:08
ssam2	there's no reason they should interfere	15:08
ssam2	sounds like something is wrong in ybd	15:08
paulsherwood	ssam2: you're almost certainly correct :)	15:09
paulsherwood	ssam2: do you know if anyone has tried running multiple morphs simultaneously on one machine?	15:09
* paulsherwood needs to re-run, to get tidier logs		15:09
SotK	paulsherwood: multiple morphs on one machine doesn't reliably work	15:10
paulsherwood	SotK: oh? are there known problems? maybe that's relevant	15:11
ssam2	paulsherwood: lots of times, using scripts/run-distbuild	15:11
ssam2	I forget why they break :)	15:11
paulsherwood	but they do break?	15:11
ssam2	something to do with tmpdirs, I think	15:11
SotK	paulsherwood: they tend to get tangled by renaming temporary directories that others are using IIRC	15:11
paulsherwood	aha. i think ybd is past that issue, but i may be mistaken	15:12
* SotK has seen them work sometimes, if the race conditions pan out nicely		15:12
ssam2	heh, it may even be the issue we were discussing the other day of constructing a commandline for linux-user-chroot that is wrong by the time it's actually processed	15:12
paulsherwood	ssam2: i've settled for chroot in my testing at the moment	15:14
ssam2	ok, i'm not sure what it could be then	15:15
paulsherwood	that issue does rule out l-u-c for this use-case presently, i'm afraid	15:15
paulsherwood	i'll post logs when i have them	15:15
*** mariaderidder has quit IRC		15:31
*** CTtpollard has quit IRC		15:57
paulsherwood	http://paste.baserock.org/yeriporiba	16:13
paulsherwood	is that enough, or do i need to paste the whole log from gcc?	16:14
paulsherwood	too big for paste.baserock.org	16:19
ssam2	the 'Killed' error from make looks suspicious	16:20
ssam2	that suggests something actually terminated the process from outside, although I could be wrong	16:21
ssam2	maybe it ran out of memory ?	16:21
paulsherwood	ah, that's possible i suppose	16:21
persia	Running out of memory seems the most likely. gcc wants a fair bit, and running 5 gccs can be confusing, since the amount of memory requested is a function of HW discovery.	16:22
ssam2	looking at the source to 'make', I don't think it would have said 'Killed' unless it received a SIGKILL	16:22
*** jonathanmaw has quit IRC		16:24
*** mdunford has quit IRC		16:38
paulsherwood	i think you folks are right...	16:39
paulsherwood	i'm considering running a whole herd of ybd on thunderx	16:39
Kinnison	just because it's called 'thunderx' doesn't mean it's a good idea to thunder a herd on it	16:40
paulsherwood	any suggestions for instances, and max-jobs per instance?	16:40
bashrc_	48 core!	16:40
paulsherwood	bashrc_: yup. exactly. so how many should i run, and how many cores should each instance get?	16:41
Kinnison	Typical build systems in the past which we've used have been around 4 cores and 4G of RAM	16:43
Kinnison	how much RAM does this system have?	16:43
Kinnison	if > 48 then numcores is the limiter	16:43
Kinnison	Also, what's the IO like on the unit?	16:44
persia	To test gcc, I'd start with 8 jobs, to leave headroom. If that fails, we know it is something else.	16:47
persia	If 8 works, then it becomes interesting to make it larger.	16:47
bashrc_	it would be interesting to see how fast a linux system could be compiled on 48 cores, although maybe compute power isn't the major limitation	16:47
persia	max-jobs per instance is probably sensibly numcores/numjobs	16:47
paulsherwood	hmmm... i if i fork a process, will its subprocess get the same output from random as the parent?	16:49
Kinnison	depends on what the child does wrt. seeding	16:51
paulsherwood	http://paste.baserock.org/foguvaqoza	16:51
* paulsherwood has done no seeding		16:52
persia	For most framework RNG implementations, the child will get different results	16:58
persia	If you have your own implementation, you'll have imeplementation-dependent results.	16:58
rjek	The C library's random(), children will get the same sequence	16:58
paulsherwood	after addinn random.seed(instance) i get http://paste.baserock.org/pazujeyiha	17:00
paulsherwood	which is more like the randomness i was hoping for :)	17:00
*** gary_perkins has quit IRC		17:05
*** ssam2 has quit IRC		17:07
paulsherwood	am i missing something even more fundamental here? on a linux system with many cores (eg thunderx), if i run a python program, and it forks 10 times, will each subprocess get a different core?	17:15
flatmush	probably, but not guaranteed	17:18
paulsherwood	ok... but it will broadly do the sensible thing?	17:18
flatmush	yup	17:19
paulsherwood	ok... not sure that's what i'm seeing, though.	17:19
persia	Be aware that you need to reserve some cores for the OS, and if your python program calls out to other things, you will have many more jobs than forks of your code, which are likely to be scheduled on even more cores.	17:19
flatmush	I'm not really sure of the quirks of python, so I'm assuming its fork call is the same as C, which it most likely is.	17:19
Kinnison	Also, obviously, if the python isn't doing anything then it won't consume any CPU	17:19
Kinnison	and unless you explicitly bind it, it'll go wherever when it wants CPU time	17:20
persia	Most environments default to trying to use the same core used before for the same job (in case something was cached), but this is not to be relied upon.	17:20
persia	I think that recent linux schedules things in a hierarchical cache-optimised way by default, but I don't know if someone found better results not doing that and so it changed since I last looked.	17:22
paulsherwood	sorry it's such a mess, but somewhere in http://paste.baserock.org/akowaloduc minion #3 has failed to build stage1-gcc for some reason. i wonder if 10 instances with max-jobs 10 is too much?	17:24
persia	I would schedule fewer for 48 cores.	17:24
persia	The usual rule of thumb is to run a number of jobs equivalent to one's simulthreading count plus one.	17:25
persia	But that's for simulthreading ranges in the 2-20 range. It might not scale to large numbers.	17:25
persia	So, if you have 32 cores, and each handles 16 threads, you want to schedule 513 jobs by this metric.	17:25
paulsherwood	yup. for some reason i thought this box had 96. anyway i'm happy to let the code and the kit suffer for a while	17:25
persia	If you've 48 cores and each handles one thread, you want to schedule 49 jobs.	17:26
persia	That should include all calls to gcc, shell, make, etc.	17:26
paulsherwood	tvm	17:26
persia	Since you're managing this with Python, I'd count max-jobs + 2 as the number of jobs per child.	17:26
persia	And then you probably have a few overhead jobs for your system.	17:27
paulsherwood	ok	17:27
flatmush	it's black magic picking the core count, it really depends how often you'll have threads stuck waiting on I/O	17:27
persia	So maybe 5 children with max-jobs=8, or 8 children with max-jobs=5, to be sure of some headroom.	17:27
flatmush	5/4 is the ratio I often use for cores	17:27
persia	flatmush: For cores, or for threads?	17:27
flatmush	threads, but without SMT that's the same thing	17:28
flatmush	with SMT it's even more black magic	17:28
paulsherwood	i could vary instances and max-jobs, re-run the whole thing lots of times and then plot elapsed time	17:28
paulsherwood	and the winner is.... minion #9 9:15-07-03 18:28:37 [2/232/232] [stage1-gcc] Now cached as stage1-gcc.da0a2535b0f62a86abdd1412c3110d21d0cf0760a3b695fe5e53079a7667d8bd	17:29
paulsherwood	5:15-07-03 18:28:55 [2/232/232] [stage1-gcc] Bah! I could have cached stage1-gcc.da0a2535b0f62a86abdd1412c3110d21d0cf0760a3b695fe5e53079a7667d8bd	17:29
paulsherwood	9:15-07-03 18:28:57 [2/232/232] [stage1-gcc] Bah! I raced and rebuilt stage1-gcc.da0a2535b0f62a86abdd1412c3110d21d0cf0760a3b695fe5e53079a7667d8bd	17:29
paulsherwood	something's broken there... two number 9s :)	17:29
persia	What you'll probably see is that increasing things decreases time (modulo all the parallelisation issues), until you hit some limit, at which point your scheduler will serialise your parallelism, and that will stay until some other point, where time will increase again as the rate of failure for lack of resources increases.	17:30
paulsherwood	yup... which should give some broad heuristic for best overall throughput i hope	17:31
persia	Except that SMP topographies, SMT implementation details, IO parallelism (including to main memory), NUMA, and the vagaries of the scheduler will result in limited confidence in results excepting for specific workloads against specific platforms.	17:32
*** petefoth has joined #baserock		17:32
persia	(so it's only worth having a very rough heuristic vs. deep research. 5/4 sounds about right)	17:33
paulsherwood	5/4?	17:34
* Kinnison tends to use CPUcount*1.5		17:34
Kinnison	to account for IO losses	17:34
paulsherwood	for max-jobs?	17:34
Kinnison	yes	17:34
Kinnison	so if yo uhave say 48 cores	17:34
paulsherwood	ok, but in this situation?	17:34
Kinnison	then I'd do 10 cattle of 6 jobs	17:35
Kinnison	but if your IO is not great then maybe fewer cattle of more jobs	17:35
Kinnison	since each cow in your herd will disproportionately consume IO	17:35
paulsherwood	ok, so 10 instances, max-jobs 6 for each	17:35
paulsherwood	i'll try it if/when the current herd crosses the prairie :)	17:36
Kinnison	Did you ever answer my question about what kind of IO your system had?	17:36
Kinnison	IUf you did it was lost in the burble above	17:36
paulsherwood	i don;t think so. not sure i know how to answer? it's a thunderx is pretty much all i know	17:37
Kinnison	how many spindles, are they rust or solidstate, what is the storage layout ( e.g. raid? )	17:37
paulsherwood	are there some commands i could run to spit that out?	17:38
Kinnison	well ls /dev/sd?	17:38
paulsherwood	it's in a cloud somewhere	17:38
Kinnison	Oh	17:38
Kinnison	if it's a cloud based system then feck knows what your IO will be like	17:38
Kinnison	run bonnie++ ?	17:38
persia	For that matter, if you're running in a VM, your core/memory arrangements may be much smaller, and you might have SMT enabled in the virtual machine, even though the hardware doesn't use that by default.	17:40
paulsherwood	if i run bonnie, will she interfere with my cattle?	17:40
Kinnison	your cattle will interfere with a useful bonnie output	17:41
paulsherwood	ok, i've killed the herd	17:42
paulsherwood	how long does bonnie take?	17:42
Kinnison	no idea	17:42
*** lachlanmackenzie has quit IRC		18:22
*** tiagogomes_ has quit IRC		18:45
*** dabukalam has quit IRC		21:26
*** dabukalam has joined #baserock		21:26
*** dabukalam has quit IRC		21:29
*** dabukalam has joined #baserock		21:30

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!