*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Client Quit] | 00:00 | |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock | 00:06 | |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Client Quit] | 00:10 | |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock | 00:33 | |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Quit: pwerner9] | 00:49 | |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock | 01:36 | |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Client Quit] | 01:39 | |
*** mSher [~mike@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock | 06:09 | |
*** franred [~franred@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock | 07:27 | |
*** dutch [~william@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock | 08:07 | |
rjek | paulsherwood: It's nice dipped in hummous. | 08:18 |
---|---|---|
*** tiagogomes [~tiagogome@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock | 08:19 | |
Kinnison | I like celery and marmite | 08:25 |
Kinnison | and I hate marmite | 08:25 |
*** jonathanmaw [~jonathanm@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock | 08:29 | |
pedroalvarez | log of a build taking 10 minutes to resolve artifacts: http://fpaste.org/140722/41287973/ | 08:44 |
pedroalvarez | was sent yesterday by Paul | 08:44 |
pedroalvarez | also, I don't know where, but it was raised the point that we could move morph out of tools | 08:46 |
pedroalvarez | makes sense to me | 08:47 |
Kinnison | We'd have to move the stuff morph depends on too | 08:48 |
Kinnison | e.g. linux-user-chroot | 08:48 |
Kinnison | Lets assume he's building a devel system, that's roughly 200 items, if we assume he only has a trivial number of those cached locally then resolution will consist of between 2 and 4 round-trips per item, let's average that at 3 and say 600 RTTs needed. Assuming each query consumes negligible time at each end and we're at 1 second per query. | 08:50 |
Kinnison | An idealised HTTP session consists of SYN, SYN+ACK, ACK+PSH, ACK+PSH+FIN, ACK+FIN, ACK | 08:50 |
Kinnison | which is 6 packets | 08:50 |
Kinnison | If Paul is 150ms away from his trove, that's a believable time period for resolving things | 08:51 |
Kinnison | Welcome to internets | 08:51 |
Kinnison | Also, in practice, we won't have idealised packet streams which means there's probably between 4 and 6 more packets in that sequence, which would also push out the time | 08:54 |
Kinnison | So, to say something more *constructive* -- we need to find a way to reduce the number of queries sent to the trove, either by batching things, or by having sufficient cached data locally to not need to | 08:56 |
*** ssam2 [~ssam2@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock | 09:03 | |
ratmice_______ | makes me think of something like cap'n'proto promise pipelining | 09:05 |
Kinnison | Yeah, I'm not sure our central resolution visitor would support that short-term. | 09:05 |
Kinnison | Makes me wish for promises / lazy evaluation in python | 09:05 |
Kinnison | I wonder if we could use python futures for it? | 09:06 |
straycat | I'm not convinced that's the problem. | 09:07 |
Kinnison | I'm all for it being something easier to fix -- what is your theory? | 09:08 |
straycat | We're not querying trove when we resolve artifacts, I think we do that when we construct the source pool. | 09:08 |
Kinnison | Oh | 09:08 |
Kinnison | Code has changed or is not how I understood it then | 09:11 |
straycat | I also might have misunderstood, but I don't see any querying in the artifact resolver, as far as I was aware it takes a source pool and constructs a build graph from it. | 09:12 |
Kinnison | If that's true then it should take milliseconds, not minutes | 09:12 |
straycat | We update caches and resolve refs when we construct the source pool. | 09:13 |
* Kinnison wonders what could cause that graph build to take forever then | 09:13 | |
straycat | Kinnison, Unless some recent changes to definitions have exposed some bug in the artifact construction code, there's also the "WARNING No _validate_cross_refs_for_chunk" warnings in the log. | 09:13 |
Kinnison | I think that's harmless | 09:14 |
Kinnison | Looking at the core resolution code, I think we *keep* resolving things over and over | 09:16 |
Kinnison | We seem to make no effort to memoise the result of resolving something | 09:17 |
richard_maw | paulsherwood: context managers are what you use with the with statement; pedroalvarez: we have already split out a morph stratum, but it looks like I forgot to move linux-user-chroot out | 09:18 |
pedroalvarez | richard_maw: great, I thought I had a dream about this. Good to know that it was real | 09:19 |
ssam2 | Kinnison: if you're talking about resolving refs to SHA1s and retrieving morphologes, the results are indeed saved in dicts | 09:19 |
ssam2 | http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/morph.git/tree/morphlib/app.py#n355 | 09:19 |
Kinnison | ssam2: I'm not | 09:20 |
richard_maw | ssam2: that bit happens before the big slow-down | 09:20 |
Kinnison | If someone adds a line to the top of _resolve_system_dependencies and _resolve_stratum_dependencies logging the names of the sources being resolved, I bet you will see sources being resolved over and over | 09:21 |
Kinnison | (artifactresolver.py) | 09:21 |
ssam2 | oh, interesting | 09:21 |
ssam2 | who's been profiling? well done whoever it was! | 09:22 |
Kinnison | Paul noticed the huge delay | 09:22 |
Kinnison | I foolishly assumed it was the network stage and explained how it could fit the 10min he was seeing | 09:23 |
Kinnison | and then straycat pointed out I was bonkers :-) | 09:23 |
ssam2 | right. I've always assumed it was the network stage too | 09:23 |
ssam2 | is someone looking at improving this ? | 09:23 |
* Kinnison thinks it'd be a better message if it was "Analysing build graph" rather than "Resolving artifacts" | 09:23 | |
ssam2 | yes, we overload the term 'resolving' a bit :) | 09:23 |
richard_maw | Kinnison: I think the problem may be that _create_initial_queue starts the queue with every non-chunk, then every system and stratum adds their dependencies to the queue again | 09:26 |
Kinnison | Like I said, I think we're repeatedly scannning things | 09:27 |
Kinnison | richard_maw: even if the initial queue contained only the system sources, I bet we 'resolve' build-essential a gajillion times by the end | 09:27 |
* richard_maw is going to try just putting every source in the iterable to start with, and not queue everything later | 09:27 | |
straycat | What was paulsherwood trying to build? | 09:32 |
straycat | Oh he sent an email about this didn't he. | 09:33 |
straycat | Ahh that bug may have been a different issue and doesn't include the system being built in any case. | 09:35 |
straycat | Even though we're not memoising stuff, it's hard to see that it would take minutes. | 09:37 |
Kinnison | straycat: You'd think that, but then cache key computation should have been fast, but was taking minutes until I added memoisation | 09:37 |
Kinnison | Python can be distressingly dynamic sometimes which makes things which look efficient take a long time | 09:38 |
* richard_maw wishes for python3 and its memoisation decorators | 09:38 | |
Kinnison | :-) | 09:38 |
* straycat nods | 09:38 | |
franred | pedroalvarez, http://fpaste.org/140876/12934090/ <-- is this enough for a +1 to move to python 2.7.8? | 09:41 |
pedroalvarez | franred: yeah, I'll communicate my decision on gerrit | 09:42 |
franred | thanks :) | 09:42 |
Kinnison | Hrf, gerrit signed me out | 09:44 |
straycat | Ahh okay, when I tried to build the genivi devel system, resolving artifacts took around 30 seconds, not 10 minutes, but longer than it should I guess. | 09:45 |
*** Krin [~mikesmith@82-70-136-246.dsl.in-addr.zen.co.uk] has joined #baserock | 09:57 | |
* ssam2 discovers the keyboard shortcuts in Gerrit, begins to feel positive about it | 10:04 | |
Kinnison | straycat: it's calculating a few thousand interlinks between artifacts, if it takes more than a second or so on a mediocre system it's taking too long | 10:05 |
Kinnison | straycat: I think it depends what order things turn up in the queue as to how long it takes | 10:05 |
Kinnison | which is also annoying | 10:05 |
Kinnison | Yay for non-deterministic stuff | 10:06 |
franred | ssam2, you may need to vote once again in my patch, looks like that making modifications(rebase and try to remove the gerrit ID in the commit message) to it revokes your +1 | 10:06 |
franred | pedroalvarez, ^^ | 10:06 |
Kinnison | franred: You can't remove that ID | 10:06 |
Kinnison | franred: It's how gerrit tracks the patch | 10:06 |
franred | Kinnison, I know, but I was trying to remove to push to our master... | 10:06 |
franred | my fault... | 10:07 |
Kinnison | I don't know that you shoudl bother -- once we hav a bot handling things, it won't try and remove the ID | 10:07 |
Kinnison | Also, please teach me how to type | 10:07 |
pedroalvarez | franred: I still see the +1 | 10:07 |
franred | pedroalvarez, yes, but it should have a +2 | 10:07 |
pedroalvarez | then is not because the +1 has been revoked | 10:08 |
franred | Kinnison, I acn't... | 10:08 |
Kinnison | :-S | 10:08 |
pedroalvarez | the proof: "This has been merge! " | 10:08 |
ssam2 | so we shall have 'Change-Id' in every single commit message in definitions from now on? | 10:11 |
Kinnison | To a greater or lesser extent, yes | 10:12 |
Kinnison | Unless we write a rebase rule to remove it during zuul's merge action | 10:12 |
Kinnison | Consider https://git.openstack.org/cgit/openstack/horizon/commit/?id=c98a2eb2813331025764f52986a404053cdf6dd7 | 10:13 |
jjardon | Hi, what was the command to allow me to compile offline? | 10:15 |
Kinnison | offline? | 10:15 |
jjardon | ni internet connection | 10:15 |
Kinnison | If you have all the relevant git repos locally, try --no-git-update | 10:15 |
jjardon | s/ni/no | 10:15 |
pedroalvarez | I'd say --no-git-update | 10:15 |
jjardon | Thanks! | 10:15 |
Kinnison | But that won't work if you're missing anything | 10:15 |
Kinnison | *anything* | 10:15 |
Kinnison | Do we still have a way to cache all the gits for a system? | 10:16 |
Kinnison | I guess you could issue a build, overriding the cache server urls so they fail | 10:17 |
Kinnison | that'd cause it to cache the gits no? | 10:17 |
ssam2 | yes, but also to build everything | 10:17 |
Kinnison | Not if his local artifact cache already has the chunks etc | 10:18 |
ssam2 | in that case, yeah | 10:18 |
Kinnison | It'll cache the gits in order to build the graph, and then discover it already has the binary artifacts | 10:18 |
ssam2 | jjardon: I could show you how to do that if you're in Manchester | 10:19 |
pedroalvarez | meh, mosh doesn't support ssh agent forwarding yet... | 10:32 |
pedroalvarez | after 2 years in their roadmap | 10:32 |
Kinnison | It's hard to do | 10:33 |
Kinnison | pedroalvarez: I'm sure they'd welcome patches | 10:33 |
* Kinnison wants them to support multipath with ipv4 *and* ipv6 in it | 10:33 | |
Kinnison | so I can roam between networks with different connectivity types | 10:33 |
pedroalvarez | Kinnison: I think that also is on his Ideas list | 10:35 |
Kinnison | Aye | 10:36 |
pedroalvarez | oh, mosh has been written by the massachusetts institute of technoloy? | 10:36 |
straycat | massachoosits | 10:47 |
*** Krin [~mikesmith@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Remote host closed the connection] | 11:52 | |
*** franred [~franred@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving] | 11:56 | |
paulsherwood | can anyone tell me what's the state/plan on gerrit? | 12:10 |
pedroalvarez | state: is just a gerrit instance which is (kind of) mirroring the g.b.o sources of the baserock projects, so we can send patch reviews using it | 12:11 |
pedroalvarez | I've been trying to add zuul and a gearman worker son gerrit can do some tests, but I need more time to get that working | 12:12 |
pedroalvarez | is not running in baserock | 12:12 |
pedroalvarez | and my objective was: get infrastucture working so we can start using gerrit, and check the workflow with it | 12:13 |
pedroalvarez | the plan for the future: gerrit integrated in trove | 12:13 |
* Kinnison wonders if a clearer plan for the future might be Trove as a swarm | 12:26 | |
Kinnison | one Gerrit system (with the caches and resolvers and perhaps lorry controller), N lorry worker systems, N workersystems for things like zuul | 12:27 |
Kinnison | And then Mason as a swarm of systems mirrors and enhances the shape | 12:27 |
* paulsherwood would like to see *less* complexity :) | 12:30 | |
Kinnison | I guess it depends on what you consider complexity | 12:31 |
Kinnison | we have a lot of services | 12:31 |
Kinnison | we can either have several simple but cooperating systems | 12:31 |
Kinnison | or one ubersystem | 12:31 |
Kinnison | One ubersystem is probably easier for a single human to grok | 12:31 |
Kinnison | a swarm is more horizontally scalable | 12:31 |
straycat | paulsherwood, What system were you building when you had to wait for 10 minutes for artifacts to resolve? | 12:36 |
richard_maw | on a related note: removing the queueing from ArtifactResolver didn't break tests | 12:38 |
richard_maw | http://pastebin.com/0uaa04fA | 12:38 |
pedroalvarez | but does that break morph? | 12:38 |
richard_maw | I ran a build and it got as far as noticing that I'd already built that | 12:39 |
richard_maw | so resolving still produced the same result | 12:39 |
pedroalvarez | richard_maw: then +1 | 12:40 |
Kinnison | SotK: I think you posted to baserock-dev with a badly set up client | 12:40 |
Kinnison | SotK: care to fix your client and resend? | 12:40 |
richard_maw | pedroalvarez: any chance I can persuade you to adopt the patch? I'm neck-deep in nfs and vlans at the moment. | 12:40 |
paulsherwood | straycat: something like this i think... http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/definitions.git/tree/systems/genivi-plusplus-system-x86.morph?h=baserock/ps/jt-gdp&id=a18d755ef9eb0b321ce904657c8269fff3d16cf6 | 12:41 |
pedroalvarez | richard_maw: patch adopted | 12:42 |
SotK | Kinnison: done, sorry about that | 13:18 |
Kinnison | :-) | 13:20 |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock | 13:25 | |
* radiofree wonders how easy it would be to install baserock on http://us.acer.com/ac/en/US/content/model/NX.MPRAA.005 | 13:27 | |
Kinnison | Depends on the bootloader I guess | 13:31 |
jjardon | straycat: i had to wait 10 min as well, check the jjardon/gnome branch if you are curious about the system (around 340 chunks) | 13:33 |
straycat | paulsherwood, that took about 6 minutes to resolve on my machine | 13:39 |
*** thecorconian [~thecorcon@eccvpn1.ford.com] has joined #baserock | 13:40 | |
*** thecorconian [~thecorcon@eccvpn1.ford.com] has quit [] | 13:40 | |
pedroalvarez | straycat: can you check how long takes with richard_maw's patch? | 13:41 |
radiofree | 6 minutes still isn't very good.... | 13:41 |
pedroalvarez | morph.git: baserock/pedroalvarez/remove-queueing | 13:42 |
pedroalvarez | (I have no idea how to describe this patch, hence the commit message) | 13:42 |
straycat | radiofree, *nod* it shouldn't take any noticeable amount of time to resolve artifacts. | 13:44 |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Quit: pwerner9] | 13:52 | |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock | 13:53 | |
*** genii [~quassel@ubuntu/member/genii] has joined #baserock | 13:53 | |
straycat | pedroalvarez, yeah sure | 13:56 |
pedroalvarez | straycat: thanks! :) | 13:57 |
straycat | Okay well richard_maw's patch seems to fix that issue, though seems a bit strange that we just had some redundant queueing in there, I hope we're not missing some edge case by changing this. | 14:38 |
straycat | I am having some trouble running check, it seems to fill up /tmp does anyone else have this problem? | 14:38 |
richard_maw | yes | 14:38 |
richard_maw | TMPDIR=/src/tmp ./check --full | 14:39 |
straycat | Why do I have to do that? | 14:39 |
richard_maw | because yarns don't clean up state until all the yarns have finished | 14:39 |
straycat | I really meant why do I have to do that now, I don't usually need to? | 14:40 |
richard_maw | we were borderline, I occasionally had to clear stuff out of /tmp before I could run ./check | 14:40 |
straycat | Okay | 14:41 |
paulsherwood | yarns should cleanup, either before or after. | 14:43 |
richard_maw | the problem is that it doesn't do cleanup per-scenario, it cleans up its tempdirs after all the scenarios have been run | 14:45 |
richard_maw | if it cleaned up after each scenario, we'd probably fit | 14:45 |
* richard_maw is going to be poking at yarn this weekend | 14:45 | |
jonathanmaw | richard_maw: taking up knitting? :P | 14:50 |
straycat | I'm a bit confused by this code now, wasn't the queue there because you're supposed to be resolving recursively? | 14:56 |
richard_maw | we already have all the sources and artifacts created by that point | 15:00 |
richard_maw | resolving is about connecting up the dependencies | 15:01 |
straycat | Right, so why is that recursive? | 15:02 |
richard_maw | I'm not sure why it was, I know it isn't now. | 15:02 |
* straycat nods | 15:03 | |
straycat | If we're sure about that we should rename the function, because having _resolve_artifacts_recursively for a function that's not recursive is... | 15:04 |
* ssam2 tries to remember what triggers `morph build` to fail with 'ValueError: need more than 0 values to unpack' | 15:48 | |
richard_maw | ssam2: repositories in your workspace that don't have morph.repository in their config? | 15:49 |
ssam2 | oh, of course | 15:49 |
*** jonathanmaw [~jonathanm@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving] | 16:03 | |
*** dutch [~william@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Quit] | 16:17 | |
*** mSher [~mike@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving] | 16:38 | |
pedroalvarez | a green in http://85.199.252.101/ is expected in around 15 minutes | 16:43 |
pedroalvarez | we removed all the artifacts by mistake, and this mason instance is building all the x86_64 systems on the release.morph cluster | 16:43 |
pedroalvarez | that's why is taking too long | 16:44 |
*** ssam2 [~ssam2@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Ping timeout: 244 seconds] | 16:44 | |
*** CTtpollard [~tom@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Ex-Chat] | 16:50 | |
*** tiagogomes [~tiagogome@82-70-136-246.dsl.in-addr.zen.co.uk] has quit [Quit: Leaving] | 17:16 | |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [Quit: pwerner9] | 17:18 | |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has joined #baserock | 17:19 | |
paulsherwood | can we simply make the current line say 'running' when it is, rather than leaving the red for ages? | 17:23 |
richard_maw | paulsherwood: it's doable, but I'm not sure it's worthwhile if we're dropping it in favour of OpenStack's preferred tools. | 17:25 |
paulsherwood | richard_maw: good point | 17:26 |
straycat | Hrm, I think some of these comments make a little less sense now that artifacts don't have dependencies: we don't add another source's artifacts as dependencies for every artifact in the current source, we just add another source's dependencies to the current source's dependencies. | 17:28 |
richard_maw | I'm afraid I can't suggest anything right now, my brain is about to switch off. | 17:28 |
richard_maw | See you later. | 17:28 |
straycat | o/ | 17:28 |
* straycat would follow but had the misfortune of accidentally brewing earl grey for 8 minutes | 17:29 | |
straycat | Oh no ignore me the comments are fine >.> | 17:36 |
*** pwerner9 [~pwerner9@d14-69-32-220.try.wideopenwest.com] has quit [] | 17:41 | |
paulsherwood | is there a way to do graceful shutdown of a distbuild network? | 18:59 |
paulsherwood | (for example to update morph on all nodes) | 18:59 |
straycat | you could systemctl stop the relevant services I guess | 19:00 |
paulsherwood | would poweroff do that? | 19:04 |
straycat | yes | 19:05 |
straycat | when I update morph on all nodes I typically update it, then reboot all the nodes. | 19:06 |
paulsherwood | ok, great. if some workers were in mid-build, will they recover? | 19:07 |
straycat | it's not really graceful, any running jobs will just get lost, but presumably users are warned before an update | 19:07 |
paulsherwood | that's ok | 19:07 |
*** jamiehowarth [~jamiehowa@2607:fb90:50a:1f58:419f:4dfa:e61d:4fa3] has joined #baserock | 19:56 | |
*** genii [~quassel@ubuntu/member/genii] has quit [Read error: Connection reset by peer] | 22:25 | |
*** genii [~quassel@ubuntu/member/genii] has joined #baserock | 22:41 | |
*** genii [~quassel@ubuntu/member/genii] has quit [Remote host closed the connection] | 22:55 | |
*** jamiehowarth [~jamiehowa@2607:fb90:50a:1f58:419f:4dfa:e61d:4fa3] has quit [Ping timeout: 272 seconds] | 23:21 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!