*** astrophys has quit IRC | 00:14 | |
*** gtristan has quit IRC | 01:51 | |
*** gtristan has joined #baserock | 02:19 | |
*** bashrc has joined #baserock | 08:01 | |
*** edcragg has joined #baserock | 08:04 | |
*** rdale has joined #baserock | 08:09 | |
*** bruce_ has joined #baserock | 08:18 | |
*** anahuelamo has joined #baserock | 08:19 | |
*** jonathanmaw has joined #baserock | 08:27 | |
*** edcragg has quit IRC | 08:30 | |
*** tiagogomes has joined #baserock | 08:38 | |
*** bashrc has left #baserock | 08:39 | |
*** edcragg has joined #baserock | 08:58 | |
*** ssam2 has joined #baserock | 08:58 | |
*** ChanServ sets mode: +v ssam2 | 08:58 | |
*** locallycompact has joined #baserock | 09:39 | |
*** pedroalvarez has left #baserock | 09:56 | |
*** pedroalvarez has joined #baserock | 09:56 | |
*** ChanServ sets mode: +v pedroalvarez | 09:56 | |
paulsherwood | overlaps... | 10:01 |
---|---|---|
paulsherwood | /tools/lib/gcc/x86_64-bootstrap-linux-gnu/4.9.2/specs | 10:01 |
pedroalvarez | yup | 10:01 |
pedroalvarez | i remember that one | 10:01 |
paulsherwood | which one should be chosen? | 10:02 |
paulsherwood | linux-api-headers? or what was there before? | 10:03 |
rjek | Which chunks are providing the overlapping files? | 10:03 |
richard_maw | that overlap is required in the bootstrap | 10:03 |
richard_maw | otherwise it can't find the tools | 10:03 |
paulsherwood | ack | 10:04 |
paulsherwood | so rdale's proposal may not work for this one | 10:04 |
pedroalvarez | we create an specs file in stage2-gcc, and then in stage2-reset-specs we overwrite it IIRC | 10:05 |
ssam2 | the bootstrap is basically going to be a hack for as long as build-depends are transitive | 10:05 |
rdale | maybe that is a result of transitive build dependencies | 10:05 |
ssam2 | the stage2 tools need the stage1 tools, but the final tools don't need the stage2 tools | 10:05 |
paulsherwood | ack | 10:05 |
ssam2 | the stage2 tools need the stage1 tools, but the final tools don't need the stage1 tools, even | 10:05 |
jjardon | nice, GCC6 has "native" support for musl: https://gcc.gnu.org/gcc-6/changes.html | 10:06 |
paulsherwood | i always thought we should have three strata, not 1, for this | 10:07 |
pedroalvarez | patches welcome | 10:07 |
paulsherwood | heh :) | 10:07 |
paulsherwood | i'd patch, if i understood this well enough | 10:07 |
paulsherwood | i'll keep thinking about it, though | 10:08 |
pedroalvarez | it takes a while to get how all of it works, but it's possible to dig into it | 10:08 |
paulsherwood | maybe gtristan can solve it | 10:08 |
paulsherwood | :) | 10:09 |
pedroalvarez | hehe | 10:10 |
pedroalvarez | not sure, but I think he nuked build-essentials for the aboriginal work | 10:10 |
paulsherwood | sounds good to me :-) | 10:11 |
ssam2 | if only aboriginal were less complex ... | 10:18 |
paulsherwood | yup... but maybe we can shield most folks from the complexity | 10:24 |
rjek | Does anybody know why a trove's lorry controller might be sat consuming 100% of a core? | 10:28 |
rjek | strace says it's doing nothing but select(), accept()=5, fcntl(5,...), futex(), futex(), repeat | 10:29 |
rjek | Interestingly it never seems to close fd 5 | 10:29 |
rjek | (Might be in another thread?) | 10:29 |
richard_maw | rjek: I vaguely recall that happening when the database was full of historical jobs, but I added code to lorry-controller to fix that, don't know if we're running that version of course | 10:30 |
rjek | Anything I can look for that might be diagnostically useful, richard_maw? | 10:30 |
richard_maw | check the lorry controller config, if it's still got the expiry period set to a year, it'll be that problem | 10:31 |
pedroalvarez | maybe easier to check the size of the database first | 10:33 |
pedroalvarez | /home/lorry/webapp.db i think | 10:33 |
rjek | Where does the lorry controller config live? | 10:34 |
richard_maw | /etc/lorry-controller* IIRC | 10:35 |
benbrown_ | richard_maw, rjek: can't see that set anywhere, and the default is set to 3 days in lorry-controller-remove-old-jobs | 11:04 |
rjek | Nothing in /etc/lorry-controller/ contains any expiry info | 11:31 |
rjek | (sorry, got dragged into a meeting) | 11:31 |
richard_maw | rjek: it'll be set to the default then, which assuming we're running a version with the default fixed to be more reasonable, will be 3 days, and there's something else wrong instead | 11:43 |
rjek | Nod | 11:49 |
rjek | Is there any logging? | 11:49 |
rjek | Oh yes. | 11:50 |
rjek | And there's lots of it | 11:50 |
rjek | THe log is full of minions asking for work, once a second or so | 11:56 |
ssam2 | how big is the ~lorry/webapp.db file ? | 12:03 |
ssam2 | I think it's normal for minions to ask for work once a second or so | 12:03 |
rjek | ~160MB | 12:03 |
ssam2 | hmm, that should be ok | 12:04 |
ssam2 | when I've seen super high CPU usage, it's normally been because that file was > 1GB | 12:04 |
*** CTtpollard has quit IRC | 12:16 | |
*** gtristan has quit IRC | 12:17 | |
*** CTtpollard has joined #baserock | 12:19 | |
rjek | This is the loop it's sat in: http://www.rjek.com/p/34677567.txt | 12:24 |
jjardon | paulsherwood: do you know if https://github.com/devcurmudgeon/ybd/issues/205 is easily reproducible? I have ybd and kbas running and Im not getting any error so far | 12:30 |
rjek | lorry 353 86.6 0.2 503036 41944 ? Sl 11:55 35:33 /usr/bin/python /usr/bin/lorry-controller-webapp --config=/etc/lorry-controller/webapp.conf | 12:37 |
rjek | Hmm, so it's the webapp, not the controller itself | 12:37 |
richard_maw | hmm, don't recall anything the webapp would be doing that could peg the CPU like that | 12:38 |
rjek | -sh: tcpdump: not found | 12:38 |
rjek | boo | 12:38 |
rjek | bottle is threaded | 12:39 |
* rjek straces with all the threads | 12:39 | |
ssam2 | if it's using 100% CPU, probably whatever the problem is will be inside Python itself | 12:40 |
ssam2 | and may not be calling any syscalls | 12:40 |
ssam2 | the only way I know of to live-debug a Python process is to attach gdb, which is not exactly nice | 12:41 |
ssam2 | although there may be some helper scripts i don't know about | 12:41 |
*** anahuelamo has quit IRC | 12:41 | |
rjek | http://www.rjek.com/p/79731e3e.txt | 12:41 |
rjek | 661's FD 5 is constantly changing | 12:42 |
rjek | 6 is the db | 12:42 |
rjek | so it's thrashing the DB for some reason | 12:42 |
* rjek wonders if it's burning a hole in his shiny expensive SSDs :( | 12:43 | |
rjek | Oh, it's reading. :) | 12:43 |
richard_maw | so, just the atime metadata updates to worry about then? | 12:45 |
ssam2 | could be an sqlite bug even with the small db. you could move the webapp.db file out of the way, and see if the problem goes away | 12:45 |
rjek | Looks like something is making a great deal of requests for data from it | 12:45 |
rjek | (ie, every time I ls /proc/X/fds/, the FD the thread is handling is different | 12:45 |
rjek | ssam2: What are the risks in moving the db out of the way? | 12:46 |
richard_maw | breaking it completely if you haven't stopped it all first | 12:46 |
rjek | ie, will everything explode into a pile, or will it recreate the DB using information stored elsewhere and continue? | 12:46 |
richard_maw | everything will explode in a pile since it will fail to initialise the DB correctly if it goes away at runtime rather than startup | 12:47 |
pedroalvarez | stop webapp service, remove webapp.db, restart services (but there are a few, maybe better reboot) | 12:48 |
rjek | -sh: service: not found | 12:48 |
rjek | Bah | 12:48 |
pedroalvarez | systemctl!!! | 12:49 |
rjek | What ludicrous system-specific syntax do I need for that? | 12:49 |
pedroalvarez | systemctl | grep webapp (to find out the name) | 12:49 |
pedroalvarez | systemctl stop lorry-controller-webapp.service (IIRC) | 12:49 |
ssam2 | if it breaks completely you can just restart the webapp process | 12:50 |
richard_maw | not entirely | 12:51 |
richard_maw | if it breaks the wrong way it will have an uninitialised database and won't notice it needs to initialise it | 12:51 |
rjek | Right, stopping webapp, moving db out of the way, rebooting has resulted in a somewhat quieter system | 12:52 |
rjek | Let's wait and see if it is a /working/ system | 12:52 |
rjek | thanks all | 12:53 |
paulsherwood | locallycompact: do you have any more info about https://github.com/devcurmudgeon/ybd/issues/205 ? | 12:54 |
paulsherwood | i've not seen it anywhere else | 12:54 |
pedroalvarez | rjek: I suggest to have a look later today | 12:57 |
rjek | pedroalvarez: I've kept the db files anyway | 12:58 |
*** anahuelamo has joined #baserock | 13:06 | |
*** CTtpollard has quit IRC | 14:02 | |
*** gtristan has joined #baserock | 14:03 | |
*** CTtpollard has joined #baserock | 14:16 | |
*** astrophys has joined #baserock | 14:49 | |
*** ssam2 has quit IRC | 14:53 | |
*** ssam2 has joined #baserock | 15:07 | |
*** ChanServ sets mode: +v ssam2 | 15:07 | |
jjardon | paulsherwood: Hi, what the 3 numbers mean in "[0/20/291] [xorg-lib-libfontenc]" I guess the last one is the total of chunks to buid, but what about the other 2? | 15:09 |
richard_maw | I think it's meant to be "Current job/number of jobs to complete/total number of components in system" | 15:28 |
richard_maw | So the middle number doesn't include jobs that are already done | 15:28 |
*** bruce_ has quit IRC | 15:51 | |
*** anahuelamo has quit IRC | 15:51 | |
*** anahuelamo has joined #baserock | 15:52 | |
*** bwh_ has joined #baserock | 15:54 | |
*** fay_ has quit IRC | 16:17 | |
*** jonathanmaw has quit IRC | 16:33 | |
paulsherwood | yup :) | 16:51 |
*** ssam2 has quit IRC | 16:52 | |
*** tiagogomes has quit IRC | 17:04 | |
*** franred has quit IRC | 17:13 | |
*** edcragg has quit IRC | 17:15 | |
*** anahuelamo has quit IRC | 17:27 | |
*** gtristan has quit IRC | 17:30 | |
*** locallycompact has quit IRC | 17:44 | |
*** cosm has quit IRC | 18:43 | |
*** edcragg has joined #baserock | 20:19 | |
*** gtristan has joined #baserock | 20:31 | |
*** edcragg has quit IRC | 20:51 | |
*** edcragg has joined #baserock | 21:45 | |
*** edcragg has quit IRC | 23:11 | |
*** gtristan has quit IRC | 23:53 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!