IRC logs for #baserock for Friday, 2014-10-10

*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Client Quit]		00:00
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock		00:06
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Client Quit]		00:10
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock		00:33
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Quit: pwerner9]		00:49
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock		01:36
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Client Quit]		01:39
*** mSher [~mike@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock		06:09
*** franred [~franred@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock		07:27
*** dutch [~william@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock		08:07
rjek	paulsherwood: It's nice dipped in hummous.	08:18
*** tiagogomes [~tiagogome@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock		08:19
Kinnison	I like celery and marmite	08:25
Kinnison	and I hate marmite	08:25
*** jonathanmaw [~jonathanm@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock		08:29
pedroalvarez	log of a build taking 10 minutes to resolve artifacts: http://fpaste.org/140722/41287973/	08:44
pedroalvarez	was sent yesterday by Paul	08:44
pedroalvarez	also, I don't know where, but it was raised the point that we could move morph out of tools	08:46
pedroalvarez	makes sense to me	08:47
Kinnison	We'd have to move the stuff morph depends on too	08:48
Kinnison	e.g. linux-user-chroot	08:48
Kinnison	Lets assume he's building a devel system, that's roughly 200 items, if we assume he only has a trivial number of those cached locally then resolution will consist of between 2 and 4 round-trips per item, let's average that at 3 and say 600 RTTs needed. Assuming each query consumes negligible time at each end and we're at 1 second per query.	08:50
Kinnison	An idealised HTTP session consists of SYN, SYN+ACK, ACK+PSH, ACK+PSH+FIN, ACK+FIN, ACK	08:50
Kinnison	which is 6 packets	08:50
Kinnison	If Paul is 150ms away from his trove, that's a believable time period for resolving things	08:51
Kinnison	Welcome to internets	08:51
Kinnison	Also, in practice, we won't have idealised packet streams which means there's probably between 4 and 6 more packets in that sequence, which would also push out the time	08:54
Kinnison	So, to say something more constructive -- we need to find a way to reduce the number of queries sent to the trove, either by batching things, or by having sufficient cached data locally to not need to	08:56
*** ssam2 [~ssam2@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock		09:03
ratmice_______	makes me think of something like cap'n'proto promise pipelining	09:05
Kinnison	Yeah, I'm not sure our central resolution visitor would support that short-term.	09:05
Kinnison	Makes me wish for promises / lazy evaluation in python	09:05
Kinnison	I wonder if we could use python futures for it?	09:06
straycat	I'm not convinced that's the problem.	09:07
Kinnison	I'm all for it being something easier to fix -- what is your theory?	09:08
straycat	We're not querying trove when we resolve artifacts, I think we do that when we construct the source pool.	09:08
Kinnison	Oh	09:08
Kinnison	Code has changed or is not how I understood it then	09:11
straycat	I also might have misunderstood, but I don't see any querying in the artifact resolver, as far as I was aware it takes a source pool and constructs a build graph from it.	09:12
Kinnison	If that's true then it should take milliseconds, not minutes	09:12
straycat	We update caches and resolve refs when we construct the source pool.	09:13
* Kinnison wonders what could cause that graph build to take forever then		09:13
straycat	Kinnison, Unless some recent changes to definitions have exposed some bug in the artifact construction code, there's also the "WARNING No _validate_cross_refs_for_chunk" warnings in the log.	09:13
Kinnison	I think that's harmless	09:14
Kinnison	Looking at the core resolution code, I think we keep resolving things over and over	09:16
Kinnison	We seem to make no effort to memoise the result of resolving something	09:17
richard_maw	paulsherwood: context managers are what you use with the with statement; pedroalvarez: we have already split out a morph stratum, but it looks like I forgot to move linux-user-chroot out	09:18
pedroalvarez	richard_maw: great, I thought I had a dream about this. Good to know that it was real	09:19
ssam2	Kinnison: if you're talking about resolving refs to SHA1s and retrieving morphologes, the results are indeed saved in dicts	09:19
ssam2	http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/morph.git/tree/morphlib/app.py#n355	09:19
Kinnison	ssam2: I'm not	09:20
richard_maw	ssam2: that bit happens before the big slow-down	09:20
Kinnison	If someone adds a line to the top of _resolve_system_dependencies and _resolve_stratum_dependencies logging the names of the sources being resolved, I bet you will see sources being resolved over and over	09:21
Kinnison	(artifactresolver.py)	09:21
ssam2	oh, interesting	09:21
ssam2	who's been profiling? well done whoever it was!	09:22
Kinnison	Paul noticed the huge delay	09:22
Kinnison	I foolishly assumed it was the network stage and explained how it could fit the 10min he was seeing	09:23
Kinnison	and then straycat pointed out I was bonkers :-)	09:23
ssam2	right. I've always assumed it was the network stage too	09:23
ssam2	is someone looking at improving this ?	09:23
* Kinnison thinks it'd be a better message if it was "Analysing build graph" rather than "Resolving artifacts"		09:23
ssam2	yes, we overload the term 'resolving' a bit :)	09:23
richard_maw	Kinnison: I think the problem may be that _create_initial_queue starts the queue with every non-chunk, then every system and stratum adds their dependencies to the queue again	09:26
Kinnison	Like I said, I think we're repeatedly scannning things	09:27
Kinnison	richard_maw: even if the initial queue contained only the system sources, I bet we 'resolve' build-essential a gajillion times by the end	09:27
* richard_maw is going to try just putting every source in the iterable to start with, and not queue everything later		09:27
straycat	What was paulsherwood trying to build?	09:32
straycat	Oh he sent an email about this didn't he.	09:33
straycat	Ahh that bug may have been a different issue and doesn't include the system being built in any case.	09:35
straycat	Even though we're not memoising stuff, it's hard to see that it would take minutes.	09:37
Kinnison	straycat: You'd think that, but then cache key computation should have been fast, but was taking minutes until I added memoisation	09:37
Kinnison	Python can be distressingly dynamic sometimes which makes things which look efficient take a long time	09:38
* richard_maw wishes for python3 and its memoisation decorators		09:38
Kinnison	:-)	09:38
* straycat nods		09:38
franred	pedroalvarez, http://fpaste.org/140876/12934090/ <-- is this enough for a +1 to move to python 2.7.8?	09:41
pedroalvarez	franred: yeah, I'll communicate my decision on gerrit	09:42
franred	thanks :)	09:42
Kinnison	Hrf, gerrit signed me out	09:44
straycat	Ahh okay, when I tried to build the genivi devel system, resolving artifacts took around 30 seconds, not 10 minutes, but longer than it should I guess.	09:45
*** Krin [~mikesmith@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock		09:57
* ssam2 discovers the keyboard shortcuts in Gerrit, begins to feel positive about it		10:04
Kinnison	straycat: it's calculating a few thousand interlinks between artifacts, if it takes more than a second or so on a mediocre system it's taking too long	10:05
Kinnison	straycat: I think it depends what order things turn up in the queue as to how long it takes	10:05
Kinnison	which is also annoying	10:05
Kinnison	Yay for non-deterministic stuff	10:06
franred	ssam2, you may need to vote once again in my patch, looks like that making modifications(rebase and try to remove the gerrit ID in the commit message) to it revokes your +1	10:06
franred	pedroalvarez, ^^	10:06
Kinnison	franred: You can't remove that ID	10:06
Kinnison	franred: It's how gerrit tracks the patch	10:06
franred	Kinnison, I know, but I was trying to remove to push to our master...	10:06
franred	my fault...	10:07
Kinnison	I don't know that you shoudl bother -- once we hav a bot handling things, it won't try and remove the ID	10:07
Kinnison	Also, please teach me how to type	10:07
pedroalvarez	franred: I still see the +1	10:07
franred	pedroalvarez, yes, but it should have a +2	10:07
pedroalvarez	then is not because the +1 has been revoked	10:08
franred	Kinnison, I acn't...	10:08
Kinnison	:-S	10:08
pedroalvarez	the proof: "This has been merge! "	10:08
ssam2	so we shall have 'Change-Id' in every single commit message in definitions from now on?	10:11
Kinnison	To a greater or lesser extent, yes	10:12
Kinnison	Unless we write a rebase rule to remove it during zuul's merge action	10:12
Kinnison	Consider https://git.openstack.org/cgit/openstack/horizon/commit/?id=c98a2eb2813331025764f52986a404053cdf6dd7	10:13
jjardon	Hi, what was the command to allow me to compile offline?	10:15
Kinnison	offline?	10:15
jjardon	ni internet connection	10:15
Kinnison	If you have all the relevant git repos locally, try --no-git-update	10:15
jjardon	s/ni/no	10:15
pedroalvarez	I'd say --no-git-update	10:15
jjardon	Thanks!	10:15
Kinnison	But that won't work if you're missing anything	10:15
Kinnison	anything	10:15
Kinnison	Do we still have a way to cache all the gits for a system?	10:16
Kinnison	I guess you could issue a build, overriding the cache server urls so they fail	10:17
Kinnison	that'd cause it to cache the gits no?	10:17
ssam2	yes, but also to build everything	10:17
Kinnison	Not if his local artifact cache already has the chunks etc	10:18
ssam2	in that case, yeah	10:18
Kinnison	It'll cache the gits in order to build the graph, and then discover it already has the binary artifacts	10:18
ssam2	jjardon: I could show you how to do that if you're in Manchester	10:19
pedroalvarez	meh, mosh doesn't support ssh agent forwarding yet...	10:32
pedroalvarez	after 2 years in their roadmap	10:32
Kinnison	It's hard to do	10:33
Kinnison	pedroalvarez: I'm sure they'd welcome patches	10:33
* Kinnison wants them to support multipath with ipv4 and ipv6 in it		10:33
Kinnison	so I can roam between networks with different connectivity types	10:33
pedroalvarez	Kinnison: I think that also is on his Ideas list	10:35
Kinnison	Aye	10:36
pedroalvarez	oh, mosh has been written by the massachusetts institute of technoloy?	10:36
straycat	massachoosits	10:47
*** Krin [~mikesmith@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Remote host closed the connection]		11:52
*** franred [~franred@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving]		11:56
paulsherwood	can anyone tell me what's the state/plan on gerrit?	12:10
pedroalvarez	state: is just a gerrit instance which is (kind of) mirroring the g.b.o sources of the baserock projects, so we can send patch reviews using it	12:11
pedroalvarez	I've been trying to add zuul and a gearman worker son gerrit can do some tests, but I need more time to get that working	12:12
pedroalvarez	is not running in baserock	12:12
pedroalvarez	and my objective was: get infrastucture working so we can start using gerrit, and check the workflow with it	12:13
pedroalvarez	the plan for the future: gerrit integrated in trove	12:13
* Kinnison wonders if a clearer plan for the future might be Trove as a swarm		12:26
Kinnison	one Gerrit system (with the caches and resolvers and perhaps lorry controller), N lorry worker systems, N workersystems for things like zuul	12:27
Kinnison	And then Mason as a swarm of systems mirrors and enhances the shape	12:27
* paulsherwood would like to see less complexity :)		12:30
Kinnison	I guess it depends on what you consider complexity	12:31
Kinnison	we have a lot of services	12:31
Kinnison	we can either have several simple but cooperating systems	12:31
Kinnison	or one ubersystem	12:31
Kinnison	One ubersystem is probably easier for a single human to grok	12:31
Kinnison	a swarm is more horizontally scalable	12:31
straycat	paulsherwood, What system were you building when you had to wait for 10 minutes for artifacts to resolve?	12:36
richard_maw	on a related note: removing the queueing from ArtifactResolver didn't break tests	12:38
richard_maw	http://pastebin.com/0uaa04fA	12:38
pedroalvarez	but does that break morph?	12:38
richard_maw	I ran a build and it got as far as noticing that I'd already built that	12:39
richard_maw	so resolving still produced the same result	12:39
pedroalvarez	richard_maw: then +1	12:40
Kinnison	SotK: I think you posted to baserock-dev with a badly set up client	12:40
Kinnison	SotK: care to fix your client and resend?	12:40
richard_maw	pedroalvarez: any chance I can persuade you to adopt the patch? I'm neck-deep in nfs and vlans at the moment.	12:40
paulsherwood	straycat: something like this i think... http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/definitions.git/tree/systems/genivi-plusplus-system-x86.morph?h=baserock/ps/jt-gdp&id=a18d755ef9eb0b321ce904657c8269fff3d16cf6	12:41
pedroalvarez	richard_maw: patch adopted	12:42
SotK	Kinnison: done, sorry about that	13:18
Kinnison	:-)	13:20
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock		13:25
* radiofree wonders how easy it would be to install baserock on http://us.acer.com/ac/en/US/content/model/NX.MPRAA.005		13:27
Kinnison	Depends on the bootloader I guess	13:31
jjardon	straycat: i had to wait 10 min as well, check the jjardon/gnome branch if you are curious about the system (around 340 chunks)	13:33
straycat	paulsherwood, that took about 6 minutes to resolve on my machine	13:39
*** thecorconian [~thecorcon@eccvpn1.ford.com] has joined #baserock		13:40
*** thecorconian [~thecorcon@eccvpn1.ford.com] has quit []		13:40
pedroalvarez	straycat: can you check how long takes with richard_maw's patch?	13:41
radiofree	6 minutes still isn't very good....	13:41
pedroalvarez	morph.git: baserock/pedroalvarez/remove-queueing	13:42
pedroalvarez	(I have no idea how to describe this patch, hence the commit message)	13:42
straycat	radiofree, nod it shouldn't take any noticeable amount of time to resolve artifacts.	13:44
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Quit: pwerner9]		13:52
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock		13:53
*** genii [~quassel@ubuntu/member/genii] has joined #baserock		13:53
straycat	pedroalvarez, yeah sure	13:56
pedroalvarez	straycat: thanks! :)	13:57
straycat	Okay well richard_maw's patch seems to fix that issue, though seems a bit strange that we just had some redundant queueing in there, I hope we're not missing some edge case by changing this.	14:38
straycat	I am having some trouble running check, it seems to fill up /tmp does anyone else have this problem?	14:38
richard_maw	yes	14:38
richard_maw	TMPDIR=/src/tmp ./check --full	14:39
straycat	Why do I have to do that?	14:39
richard_maw	because yarns don't clean up state until all the yarns have finished	14:39
straycat	I really meant why do I have to do that now, I don't usually need to?	14:40
richard_maw	we were borderline, I occasionally had to clear stuff out of /tmp before I could run ./check	14:40
straycat	Okay	14:41
paulsherwood	yarns should cleanup, either before or after.	14:43
richard_maw	the problem is that it doesn't do cleanup per-scenario, it cleans up its tempdirs after all the scenarios have been run	14:45
richard_maw	if it cleaned up after each scenario, we'd probably fit	14:45
* richard_maw is going to be poking at yarn this weekend		14:45
jonathanmaw	richard_maw: taking up knitting? :P	14:50
straycat	I'm a bit confused by this code now, wasn't the queue there because you're supposed to be resolving recursively?	14:56
richard_maw	we already have all the sources and artifacts created by that point	15:00
richard_maw	resolving is about connecting up the dependencies	15:01
straycat	Right, so why is that recursive?	15:02
richard_maw	I'm not sure why it was, I know it isn't now.	15:02
* straycat nods		15:03
straycat	If we're sure about that we should rename the function, because having _resolve_artifacts_recursively for a function that's not recursive is...	15:04
* ssam2 tries to remember what triggers `morph build` to fail with 'ValueError: need more than 0 values to unpack'		15:48
richard_maw	ssam2: repositories in your workspace that don't have morph.repository in their config?	15:49
ssam2	oh, of course	15:49
*** jonathanmaw [~jonathanm@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving]		16:03
*** dutch [~william@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Quit]		16:17
*** mSher [~mike@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving]		16:38
pedroalvarez	a green in http://85.199.252.101/ is expected in around 15 minutes	16:43
pedroalvarez	we removed all the artifacts by mistake, and this mason instance is building all the x86_64 systems on the release.morph cluster	16:43
pedroalvarez	that's why is taking too long	16:44
*** ssam2 [~ssam2@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Ping timeout: 244 seconds]		16:44
*** CTtpollard [~tom@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Ex-Chat]		16:50
*** tiagogomes [~tiagogome@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving]		17:16
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Quit: pwerner9]		17:18
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock		17:19
paulsherwood	can we simply make the current line say 'running' when it is, rather than leaving the red for ages?	17:23
richard_maw	paulsherwood: it's doable, but I'm not sure it's worthwhile if we're dropping it in favour of OpenStack's preferred tools.	17:25
paulsherwood	richard_maw: good point	17:26
straycat	Hrm, I think some of these comments make a little less sense now that artifacts don't have dependencies: we don't add another source's artifacts as dependencies for every artifact in the current source, we just add another source's dependencies to the current source's dependencies.	17:28
richard_maw	I'm afraid I can't suggest anything right now, my brain is about to switch off.	17:28
richard_maw	See you later.	17:28
straycat	o/	17:28
* straycat would follow but had the misfortune of accidentally brewing earl grey for 8 minutes		17:29
straycat	Oh no ignore me the comments are fine >.>	17:36
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit []		17:41
paulsherwood	is there a way to do graceful shutdown of a distbuild network?	18:59
paulsherwood	(for example to update morph on all nodes)	18:59
straycat	you could systemctl stop the relevant services I guess	19:00
paulsherwood	would poweroff do that?	19:04
straycat	yes	19:05
straycat	when I update morph on all nodes I typically update it, then reboot all the nodes.	19:06
paulsherwood	ok, great. if some workers were in mid-build, will they recover?	19:07
straycat	it's not really graceful, any running jobs will just get lost, but presumably users are warned before an update	19:07
paulsherwood	that's ok	19:07
*** jamiehowarth [~jamiehowa@2607:fb90:50a:1f58:419f:4dfa:e61d:4fa3] has joined #baserock		19:56
*** genii [~quassel@ubuntu/member/genii] has quit [Read error: Connection reset by peer]		22:25
*** genii [~quassel@ubuntu/member/genii] has joined #baserock		22:41
*** genii [~quassel@ubuntu/member/genii] has quit [Remote host closed the connection]		22:55
*** jamiehowarth [~jamiehowa@2607:fb90:50a:1f58:419f:4dfa:e61d:4fa3] has quit [Ping timeout: 272 seconds]		23:21

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!