IRC logs for #baserock for Friday, 2014-10-10

*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Client Quit]00:00
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock00:06
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Client Quit]00:10
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock00:33
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Quit: pwerner9]00:49
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock01:36
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Client Quit]01:39
*** mSher [~mike@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock06:09
*** franred [~franred@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock07:27
*** dutch [~william@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock08:07
rjekpaulsherwood: It's nice dipped in hummous.08:18
*** tiagogomes [~tiagogome@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock08:19
KinnisonI like celery and marmite08:25
Kinnisonand I hate marmite08:25
*** jonathanmaw [~jonathanm@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock08:29
pedroalvarezlog of a build taking 10 minutes to resolve artifacts: http://fpaste.org/140722/41287973/08:44
pedroalvarezwas sent yesterday by Paul08:44
pedroalvarezalso, I don't know where, but it was raised the point that we could move morph out of tools08:46
pedroalvarezmakes sense to me08:47
KinnisonWe'd have to move the stuff morph depends on too08:48
Kinnisone.g. linux-user-chroot08:48
KinnisonLets assume he's building a devel system, that's roughly 200 items, if we assume he only has a trivial number of those cached locally then resolution will consist of between 2 and 4 round-trips per item, let's average that at 3 and say 600 RTTs needed.  Assuming each query consumes negligible time at each end and we're at 1 second per query.08:50
KinnisonAn idealised HTTP session consists of SYN, SYN+ACK, ACK+PSH, ACK+PSH+FIN, ACK+FIN, ACK08:50
Kinnisonwhich is 6 packets08:50
KinnisonIf Paul is 150ms away from his trove, that's a believable time period for resolving things08:51
KinnisonWelcome to internets08:51
KinnisonAlso, in practice, we won't have idealised packet streams which means there's probably between 4 and 6 more packets in that sequence, which would also push out the time08:54
KinnisonSo, to say something more *constructive* -- we need to find a way to reduce the number of queries sent to the trove, either by batching things, or by having sufficient cached data locally to not need to08:56
*** ssam2 [~ssam2@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock09:03
ratmice_______makes me think of something like cap'n'proto promise pipelining09:05
KinnisonYeah, I'm not sure our central resolution visitor would support that short-term.09:05
KinnisonMakes me wish for promises / lazy evaluation in python09:05
KinnisonI wonder if we could use python futures for it?09:06
straycatI'm not convinced that's the problem.09:07
KinnisonI'm all for it being something easier to fix -- what is your theory?09:08
straycatWe're not querying trove when we resolve artifacts, I think we do that when we construct the source pool. 09:08
KinnisonOh09:08
KinnisonCode has changed or is not how I understood it then09:11
straycatI also might have misunderstood, but I don't see any querying in the artifact resolver, as far as I was aware it takes a source pool and constructs a build graph from it.09:12
KinnisonIf that's true then it should take milliseconds, not minutes09:12
straycatWe update caches and resolve refs when we construct the source pool.09:13
* Kinnison wonders what could cause that graph build to take forever then09:13
straycatKinnison, Unless some recent changes to definitions have exposed some bug in the artifact construction code, there's also the "WARNING No _validate_cross_refs_for_chunk" warnings in the log.09:13
KinnisonI think that's harmless09:14
KinnisonLooking at the core resolution code, I think we *keep* resolving things over and over09:16
KinnisonWe seem to make no effort to memoise the result of resolving something09:17
richard_mawpaulsherwood: context managers are what you use with the with statement; pedroalvarez: we have already split out a morph stratum, but it looks like I forgot to move linux-user-chroot out09:18
pedroalvarezrichard_maw: great, I thought I had a dream about this. Good to know that it was real09:19
ssam2Kinnison: if you're talking about resolving refs to SHA1s and retrieving morphologes, the results are indeed saved in dicts09:19
ssam2http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/morph.git/tree/morphlib/app.py#n35509:19
Kinnisonssam2: I'm not09:20
richard_mawssam2: that bit happens before the big slow-down09:20
KinnisonIf someone adds a line to the top of _resolve_system_dependencies and _resolve_stratum_dependencies logging the names of the sources being resolved, I bet you will see sources being resolved over and over09:21
Kinnison(artifactresolver.py)09:21
ssam2oh, interesting09:21
ssam2who's been profiling? well done whoever it was!09:22
KinnisonPaul noticed the huge delay09:22
KinnisonI foolishly assumed it was the network stage and explained how it could fit the 10min he was seeing09:23
Kinnisonand then straycat pointed out I was bonkers :-)09:23
ssam2right. I've always assumed it was the network stage too09:23
ssam2is someone looking at improving this ?09:23
* Kinnison thinks it'd be a better message if it was "Analysing build graph" rather than "Resolving artifacts"09:23
ssam2yes, we overload the term 'resolving' a bit :)09:23
richard_mawKinnison: I think the problem may be that _create_initial_queue starts the queue with every non-chunk, then every system and stratum adds their dependencies to the queue again09:26
KinnisonLike I said, I think we're repeatedly scannning things09:27
Kinnisonrichard_maw: even if the initial queue contained only the system sources, I bet we 'resolve' build-essential a gajillion times by the end09:27
* richard_maw is going to try just putting every source in the iterable to start with, and not queue everything later09:27
straycatWhat was paulsherwood trying to build?09:32
straycatOh he sent an email about this didn't he.09:33
straycatAhh that bug may have been a different issue and doesn't include the system being built in any case.09:35
straycatEven though we're not memoising stuff, it's hard to see that it would take minutes.09:37
Kinnisonstraycat: You'd think that, but then cache key computation should have been fast, but was taking minutes until I added memoisation09:37
KinnisonPython can be distressingly dynamic sometimes which makes things which look efficient take a long time09:38
* richard_maw wishes for python3 and its memoisation decorators09:38
Kinnison:-)09:38
* straycat nods09:38
franredpedroalvarez, http://fpaste.org/140876/12934090/ <-- is this enough for a +1 to move to python 2.7.8?09:41
pedroalvarezfranred: yeah, I'll communicate my decision on gerrit09:42
franredthanks :)09:42
KinnisonHrf, gerrit signed me out09:44
straycatAhh okay, when I tried to build the genivi devel system, resolving artifacts took around 30 seconds, not 10 minutes, but longer than it should I guess.09:45
*** Krin [~mikesmith@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock09:57
* ssam2 discovers the keyboard shortcuts in Gerrit, begins to feel positive about it10:04
Kinnisonstraycat: it's calculating a few thousand interlinks between artifacts, if it takes more than a second or so on a mediocre system it's taking too long10:05
Kinnisonstraycat: I think it depends what order things turn up in the queue as to how long it takes10:05
Kinnisonwhich is also annoying10:05
KinnisonYay for non-deterministic stuff10:06
franredssam2, you may need to vote once again in my patch, looks like that making modifications(rebase and try to remove the gerrit ID in the commit message) to it revokes your +110:06
franredpedroalvarez, ^^10:06
Kinnisonfranred: You can't remove that ID10:06
Kinnisonfranred: It's how gerrit tracks the patch10:06
franredKinnison, I know, but I was trying to remove to push to our master...10:06
franredmy fault...10:07
KinnisonI don't know that you shoudl bother -- once we hav a bot handling things, it won't try and remove the ID10:07
KinnisonAlso, please teach me how to type10:07
pedroalvarezfranred: I still see the +110:07
franredpedroalvarez, yes, but it should have a +210:07
pedroalvarezthen is not because the +1 has been revoked10:08
franredKinnison, I acn't...10:08
Kinnison:-S10:08
pedroalvarezthe proof: "This has been merge! "10:08
ssam2so we shall have 'Change-Id' in every single commit message in definitions from now on?10:11
KinnisonTo a greater or lesser extent, yes10:12
KinnisonUnless we write a rebase rule to remove it during zuul's merge action10:12
KinnisonConsider https://git.openstack.org/cgit/openstack/horizon/commit/?id=c98a2eb2813331025764f52986a404053cdf6dd710:13
jjardonHi, what was the command to allow me to compile offline?10:15
Kinnisonoffline?10:15
jjardonni internet connection10:15
KinnisonIf you have all the relevant git repos locally, try --no-git-update10:15
jjardons/ni/no10:15
pedroalvarezI'd say --no-git-update10:15
jjardonThanks!10:15
KinnisonBut that won't work if you're missing anything10:15
Kinnison*anything*10:15
KinnisonDo we still have a way to cache all the gits for a system?10:16
KinnisonI guess you could issue a build, overriding the cache server urls so they fail10:17
Kinnisonthat'd cause it to cache the gits no?10:17
ssam2yes, but also to build everything10:17
KinnisonNot if his local artifact cache already has the chunks etc10:18
ssam2in that case, yeah10:18
KinnisonIt'll cache the gits in order to build the graph, and then discover it already has the binary artifacts10:18
ssam2jjardon: I could show you how to do that if you're in Manchester10:19
pedroalvarezmeh, mosh doesn't support ssh agent forwarding yet...10:32
pedroalvarezafter 2 years in their roadmap10:32
KinnisonIt's hard to do10:33
Kinnisonpedroalvarez: I'm sure they'd welcome patches10:33
* Kinnison wants them to support multipath with ipv4 *and* ipv6 in it10:33
Kinnisonso I can roam between networks with different connectivity types10:33
pedroalvarezKinnison: I think that also is on his Ideas list10:35
KinnisonAye10:36
pedroalvarezoh, mosh has been written by the massachusetts institute of technoloy?10:36
straycatmassachoosits10:47
*** Krin [~mikesmith@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Remote host closed the connection]11:52
*** franred [~franred@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving]11:56
paulsherwoodcan anyone tell me what's the state/plan on gerrit?12:10
pedroalvarezstate: is just a gerrit instance which is (kind of) mirroring the g.b.o sources of the baserock projects, so we can send patch reviews using it12:11
pedroalvarezI've been trying to add zuul and a gearman worker son gerrit can do some tests, but I need more time to get that working12:12
pedroalvarezis not running in baserock12:12
pedroalvarezand my objective was: get infrastucture working so we can start using gerrit, and check the workflow with it12:13
pedroalvarezthe plan for the future: gerrit integrated in trove12:13
* Kinnison wonders if a clearer plan for the future might be Trove as a swarm12:26
Kinnisonone Gerrit system (with the caches and resolvers and perhaps lorry controller), N lorry worker systems, N workersystems for things like zuul12:27
KinnisonAnd then Mason as a swarm of systems mirrors and enhances the shape12:27
* paulsherwood would like to see *less* complexity :)12:30
KinnisonI guess it depends on what you consider complexity12:31
Kinnisonwe have a lot of services12:31
Kinnisonwe can either have several simple but cooperating systems12:31
Kinnisonor one ubersystem12:31
KinnisonOne ubersystem is probably easier for a single human to grok12:31
Kinnisona swarm is more horizontally scalable12:31
straycatpaulsherwood, What system were you building when you had to wait for 10 minutes for artifacts to resolve?12:36
richard_mawon a related note: removing the queueing from ArtifactResolver didn't break tests12:38
richard_mawhttp://pastebin.com/0uaa04fA12:38
pedroalvarezbut does that break morph?12:38
richard_mawI ran a build and it got as far as noticing that I'd already built that12:39
richard_mawso resolving still produced the same result12:39
pedroalvarezrichard_maw: then +112:40
KinnisonSotK: I think you posted to baserock-dev with a badly set up client12:40
KinnisonSotK: care to fix your client and resend?12:40
richard_mawpedroalvarez: any chance I can persuade you to adopt the patch? I'm neck-deep in nfs and vlans at the moment.12:40
paulsherwoodstraycat: something like this i think... http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/definitions.git/tree/systems/genivi-plusplus-system-x86.morph?h=baserock/ps/jt-gdp&id=a18d755ef9eb0b321ce904657c8269fff3d16cf612:41
pedroalvarezrichard_maw: patch adopted12:42
SotKKinnison: done, sorry about that13:18
Kinnison:-)13:20
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock13:25
* radiofree wonders how easy it would be to install baserock on http://us.acer.com/ac/en/US/content/model/NX.MPRAA.00513:27
KinnisonDepends on the bootloader I guess13:31
jjardonstraycat: i had to wait 10 min as well, check the jjardon/gnome branch if you are curious about the system (around 340 chunks)13:33
straycatpaulsherwood, that took about 6 minutes to resolve on my machine13:39
*** thecorconian [~thecorcon@eccvpn1.ford.com] has joined #baserock13:40
*** thecorconian [~thecorcon@eccvpn1.ford.com] has quit []13:40
pedroalvarezstraycat: can  you check how long takes with richard_maw's patch?13:41
radiofree6 minutes still isn't very good....13:41
pedroalvarezmorph.git: baserock/pedroalvarez/remove-queueing13:42
pedroalvarez(I have no idea how to describe this patch, hence the commit message)13:42
straycatradiofree, *nod* it shouldn't take any noticeable amount of time to resolve artifacts.13:44
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Quit: pwerner9]13:52
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock13:53
*** genii [~quassel@ubuntu/member/genii] has joined #baserock13:53
straycatpedroalvarez, yeah sure13:56
pedroalvarezstraycat: thanks! :) 13:57
straycatOkay well richard_maw's patch seems to fix that issue, though seems a bit strange that we just had some redundant queueing in there, I hope we're not missing some edge case by changing this.14:38
straycatI am having some trouble running check, it seems to fill up /tmp does anyone else have this problem?14:38
richard_mawyes14:38
richard_mawTMPDIR=/src/tmp ./check --full14:39
straycatWhy do I have to do that?14:39
richard_mawbecause yarns don't clean up state until all the yarns have finished14:39
straycatI really meant why do I have to do that now, I don't usually need to?14:40
richard_mawwe were borderline, I occasionally had to clear stuff out of /tmp before I could run ./check14:40
straycatOkay14:41
paulsherwoodyarns should cleanup, either before or after. 14:43
richard_mawthe problem is that it doesn't do cleanup per-scenario, it cleans up its tempdirs after all the scenarios have been run14:45
richard_mawif it cleaned up after each scenario, we'd probably fit14:45
* richard_maw is going to be poking at yarn this weekend14:45
jonathanmawrichard_maw: taking up knitting? :P14:50
straycatI'm a bit confused by this code now, wasn't the queue there because you're supposed to be resolving recursively?14:56
richard_mawwe already have all the sources and artifacts created by that point15:00
richard_mawresolving is about connecting up the dependencies15:01
straycatRight, so why is that recursive?15:02
richard_mawI'm not sure why it was, I know it isn't now.15:02
* straycat nods15:03
straycatIf we're sure about that we should rename the function, because having _resolve_artifacts_recursively for a function that's not recursive is...15:04
* ssam2 tries to remember what triggers `morph build` to fail with 'ValueError: need more than 0 values to unpack'15:48
richard_mawssam2: repositories in your workspace that don't have morph.repository in their config?15:49
ssam2oh, of course15:49
*** jonathanmaw [~jonathanm@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving]16:03
*** dutch [~william@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Quit]16:17
*** mSher [~mike@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving]16:38
pedroalvareza green in http://85.199.252.101/ is expected in around 15 minutes16:43
pedroalvarezwe removed all the artifacts by mistake, and this mason instance is building all the x86_64 systems on the release.morph cluster16:43
pedroalvarezthat's why is taking too long16:44
*** ssam2 [~ssam2@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Ping timeout: 244 seconds]16:44
*** CTtpollard [~tom@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Ex-Chat]16:50
*** tiagogomes [~tiagogome@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving]17:16
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Quit: pwerner9]17:18
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock17:19
paulsherwoodcan we simply make the current line say 'running' when it is, rather than leaving the red for ages?17:23
richard_mawpaulsherwood: it's doable, but I'm not sure it's worthwhile if we're dropping it in favour of OpenStack's preferred tools.17:25
paulsherwoodrichard_maw: good point17:26
straycatHrm, I think some of these comments make a little less sense now that artifacts don't have dependencies: we don't add another source's artifacts as dependencies for every artifact in the current source, we just add another source's dependencies to the current source's dependencies.17:28
richard_mawI'm afraid I can't suggest anything right now, my brain is about to switch off.17:28
richard_mawSee you later.17:28
straycato/17:28
* straycat would follow but had the misfortune of accidentally brewing earl grey for 8 minutes17:29
straycatOh no ignore me the comments are fine >.>17:36
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit []17:41
paulsherwoodis there a way to do graceful shutdown of a distbuild network?18:59
paulsherwood(for example to update morph on all nodes)18:59
straycatyou could systemctl stop the relevant services I guess19:00
paulsherwoodwould poweroff do that?19:04
straycatyes19:05
straycatwhen I update morph on all nodes I typically update it, then reboot all the nodes.19:06
paulsherwoodok, great. if some workers were in mid-build, will they recover?19:07
straycatit's not really graceful, any running jobs will just get lost, but presumably users are warned before an update19:07
paulsherwoodthat's ok19:07
*** jamiehowarth [~jamiehowa@2607:fb90:50a:1f58:419f:4dfa:e61d:4fa3] has joined #baserock19:56
*** genii [~quassel@ubuntu/member/genii] has quit [Read error: Connection reset by peer]22:25
*** genii [~quassel@ubuntu/member/genii] has joined #baserock22:41
*** genii [~quassel@ubuntu/member/genii] has quit [Remote host closed the connection]22:55
*** jamiehowarth [~jamiehowa@2607:fb90:50a:1f58:419f:4dfa:e61d:4fa3] has quit [Ping timeout: 272 seconds]23:21

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!