*** zoli__ has joined #baserock | 01:33 | |
*** zoli__ has quit IRC | 03:03 | |
*** zoli__ has joined #baserock | 03:04 | |
*** zoli__ has quit IRC | 04:37 | |
*** zoli__ has joined #baserock | 04:38 | |
*** zoli__ has quit IRC | 04:52 | |
*** zoli__ has joined #baserock | 04:52 | |
*** zoli__ has quit IRC | 05:33 | |
*** paulw has joined #baserock | 06:16 | |
*** Walkerdine has quit IRC | 06:51 | |
*** zoli__ has joined #baserock | 07:22 | |
*** zoli__ has quit IRC | 07:36 | |
*** zoli__ has joined #baserock | 07:36 | |
*** toscalix has joined #baserock | 07:55 | |
*** bashrc has joined #baserock | 08:00 | |
*** zoli__ has quit IRC | 08:11 | |
*** rdale has joined #baserock | 08:21 | |
*** mariaderidder has joined #baserock | 08:24 | |
*** radiofree has quit IRC | 08:27 | |
*** paulw has quit IRC | 08:29 | |
*** radiofree has joined #baserock | 08:29 | |
*** paulw has joined #baserock | 08:30 | |
*** toscalix has quit IRC | 08:40 | |
*** toscalix__ has joined #baserock | 08:52 | |
*** radiofree has quit IRC | 08:54 | |
*** radiofree has joined #baserock | 08:56 | |
*** zoli__ has joined #baserock | 08:56 | |
*** zoli__ has quit IRC | 09:01 | |
*** lmackenzie75 has joined #baserock | 09:16 | |
*** cyndis has quit IRC | 10:30 | |
*** paulw has quit IRC | 11:03 | |
*** zoli__ has joined #baserock | 11:03 | |
*** zoli__ has quit IRC | 11:08 | |
*** zoli__ has joined #baserock | 11:22 | |
*** zoli___ has joined #baserock | 11:28 | |
*** zoli__ has quit IRC | 11:31 | |
*** zoli___ has quit IRC | 11:38 | |
*** zoli__ has joined #baserock | 11:47 | |
paulsherwood | yay! | 11:49 |
---|---|---|
* rjek wonders what paulsherwood is celebrating. | 11:50 | |
pedroalvarez | <Walkerdine> Now I'm finally building an image for real this time | 11:50 |
pedroalvarez | but he left | 11:50 |
rjek | aha | 11:50 |
*** paulw has joined #baserock | 12:35 | |
*** toscalix__ is now known as toscalix | 12:36 | |
*** cyndis has joined #baserock | 12:37 | |
*** gary_perkins has quit IRC | 12:43 | |
*** gary_perkins_ has joined #baserock | 12:43 | |
*** gary_perkins_ has quit IRC | 12:55 | |
*** gary_perkins has joined #baserock | 12:56 | |
*** gary_perkins has quit IRC | 12:59 | |
*** cyndis has quit IRC | 13:00 | |
*** tiagogomes_ has quit IRC | 13:01 | |
*** gary_perkins has joined #baserock | 13:02 | |
*** zoli__ has quit IRC | 13:03 | |
*** gary_perkins has quit IRC | 13:06 | |
*** gary_perkins has joined #baserock | 13:08 | |
*** gary_perkins has quit IRC | 13:11 | |
*** gary_perkins has joined #baserock | 13:14 | |
*** tiagogomes_ has joined #baserock | 13:15 | |
*** gary_perkins has quit IRC | 13:17 | |
*** paulw has quit IRC | 13:30 | |
*** paulw has joined #baserock | 13:31 | |
*** cyndis has joined #baserock | 13:32 | |
*** toscalix has quit IRC | 13:35 | |
*** toscalix__ has joined #baserock | 13:35 | |
*** Walkerdine has joined #baserock | 13:41 | |
*** paulw has quit IRC | 13:47 | |
*** paulw has joined #baserock | 13:48 | |
*** Walkerdine__ has joined #baserock | 13:49 | |
*** Walkerdine has quit IRC | 13:52 | |
*** zoli__ has joined #baserock | 14:27 | |
*** zoli___ has joined #baserock | 14:33 | |
*** zoli__ has quit IRC | 14:33 | |
*** Walkerdine__ has quit IRC | 14:52 | |
*** Walkerdine__ has joined #baserock | 14:57 | |
*** zoli__ has joined #baserock | 14:59 | |
*** zoli___ has quit IRC | 14:59 | |
*** tiagogomes_ has quit IRC | 15:05 | |
*** fay_ has quit IRC | 15:05 | |
*** fay_ has joined #baserock | 15:06 | |
*** gary_perkins has joined #baserock | 15:10 | |
*** Walkerdine has joined #baserock | 15:10 | |
Walkerdine | Yeah finally. I decided to just use a usb hub to attach the ssd | 15:11 |
radiofree | that's not exactly going to be very performant | 15:12 |
Walkerdine | Why not? | 15:15 |
rjek | USB's slow compared to SATA | 15:16 |
rjek | 480Mbps, polled IO verses 3Gbps interrupt-driven | 15:16 |
Walkerdine | Does it matter if I'm using usb 3.0 or no | 15:16 |
*** tiagogomes_ has joined #baserock | 15:17 | |
radiofree | i don't think usb 3 works with that kernel (on a jetson) | 15:18 |
Walkerdine | Oh well I could never get it working by just plugging it into the sata port | 15:20 |
rjek | Well, slow is better than not working | 15:21 |
*** tiagogomes_ has quit IRC | 15:21 | |
*** zoli__ has quit IRC | 15:22 | |
*** zoli___ has joined #baserock | 15:22 | |
radiofree | Walkerdine: did you power the drive? | 15:24 |
radiofree | like so https://richardstechnotes.files.wordpress.com/2014/07/tk1_disk.jpg | 15:25 |
Walkerdine | Yeah I powered the drive. I spent over a week trying to get that drive to work on the sata port | 15:26 |
Walkerdine | I've tried with 2 different SSDs 2 different Jetsons a couple different versions of baserock | 15:26 |
Walkerdine | I reinstalled the original software on the jetson and the drives worked fine | 15:27 |
rjek | When I googled for this, there appears to be some manufacture variance related to the SATA port on those boards | 15:31 |
Walkerdine | It connected once but I got errors when I tried to mount it. I rebooted and I never got it working again | 15:31 |
wdutch | paulsherwood: any idea what's going wrong here? http://52.19.1.31:8010/builders/trigger_builders/builds/20/steps/shell_1/logs/stdio (you have to scroll to the bottom for the ybd output) | 16:04 |
paulsherwood | wdutch: is its trove working properly? | 16:08 |
paulsherwood | Cloning into bare repository '/src/gits/git___cu010_trove_codethink_com_delta_binutils_redhat.tmp'... | 16:08 |
paulsherwood | connected. | 16:08 |
paulsherwood | HTTP request sent, awaiting response... 404 Not Found | 16:08 |
paulsherwood | 2015-09-10 15:58:10 ERROR 404: Not Found. | 16:08 |
paulsherwood | tar: git___cu010_trove_codethink_com_delta_binutils_redhat.tar: Cannot open: No such file or directory | 16:08 |
paulsherwood | tar: Error is not recoverable: exiting now | 16:08 |
paulsherwood | (for example) | 16:08 |
wdutch | it's a bit slow | 16:09 |
Kinnison | that's just a name mismatch | 16:09 |
Kinnison | paulsherwood: purely a name mismatch, just slows down fetches | 16:10 |
paulsherwood | i think i've not tested the parallelisation with no gits | 16:10 |
* Kinnison could correct that with a bit of poking but it only slows initial clone | 16:10 | |
paulsherwood | for your use-case are you expecting to download all gits every time or could they be pre-populated? | 16:11 |
Kinnison | We're expecting the git cache to be populated in the general case | 16:11 |
Kinnison | It'd hurt immensely to download every time | 16:11 |
Kinnison | IMO | 16:11 |
paulsherwood | so in this case the fastest hack would be to remove any .tmp directories in base/gits, and set git-server: to something that doesn't work | 16:15 |
paulsherwood | that will force all repos to be cloned at the start, rather than during the builds | 16:15 |
paulsherwood | and thereafter things should work tidily | 16:15 |
Kinnison | paulsherwood: that sounds icky | 16:16 |
paulsherwood | i can improve it, certainly. and will do so | 16:16 |
Kinnison | so you think that not having all the gits up-front will cause issues? | 16:16 |
richard_maw | I'm stumped as to why the build is failing, AFAICT the last thing attempted was a link command, which should not have anything to do with its build environment, and we've not made any changes beneath it from firehose | 16:16 |
Kinnison | /src/tmp/tmpv_m1YT/stage1-gcc.build/missing: line 81: makeinfo: command not found | 16:17 |
paulsherwood | Kinnison: i'm unsure. as i said, i've not tested the parallel case without gits | 16:17 |
Kinnison | WARNING: 'makeinfo' is missing on your system. | 16:17 |
Kinnison | You should only need it if you modified a '.texi' file, or | 16:17 |
Kinnison | any other file indirectly affecting the aspect of the manual. | 16:17 |
Kinnison | You might want to install the Texinfo package: | 16:17 |
Kinnison | <http://www.gnu.org/software/texinfo/> | 16:17 |
Kinnison | The spurious makeinfo call might also be the consequence of | 16:17 |
Kinnison | using a buggy 'make' (AIX, DU, IRIX), in which case you might | 16:17 |
Kinnison | want to install GNU make: | 16:17 |
Kinnison | <http://www.gnu.org/software/make/> | 16:17 |
Kinnison | make[4]: *** [../../../gmp/doc/gmp.info] Error 127 | 16:17 |
Kinnison | I think that's more likely the issue | 16:17 |
Kinnison | which's odd | 16:17 |
richard_maw | it's a warning an the associated message says "you'll be fine as long as you don't modify .texi files" | 16:18 |
Kinnison | and then proceeds to error 127 | 16:18 |
richard_maw | hmm | 16:18 |
Kinnison | I think tis a case of parallel make confusing the error message | 16:18 |
* Kinnison installs texinfo onto the AWS box | 16:19 | |
Kinnison | richard_maw: can you try causing builders to trigger again? | 16:19 |
wdutch | I can | 16:19 |
Kinnison | wdutch: ta | 16:20 |
wdutch | off it goes | 16:20 |
* Kinnison wonders if we can persuade it to run as though under a pty | 16:21 | |
Kinnison | so we get line-buffered logging rather than 4k or whatever it is | 16:21 |
*** paulw has quit IRC | 16:22 | |
paulsherwood | is it working, wdutch ? | 16:26 |
wdutch | it died :( | 16:26 |
wdutch | http://52.19.1.31:8010/builders/trigger_builders/builds/23/steps/shell_1/logs/stdio | 16:26 |
Kinnison | ENOSPC | 16:27 |
Kinnison | I wonder what's using up all the room on that 100G partition I set aside for it | 16:27 |
* Kinnison decides to blow away all the old artifacts etc | 16:27 | |
paulsherwood | Kinnison: blow away /src/tmp/* first | 16:27 |
Kinnison | paulsherwood: I'm clearing it down in general | 16:27 |
paulsherwood | ok | 16:28 |
Kinnison | paulsherwood: since we *know* it's going to build from scratch (since it's currently dying on stage1-gcc) | 16:28 |
* Kinnison would rather clear it down | 16:28 | |
* Kinnison is wondering what the right approach to clearing out this stuff will be longer-term | 16:28 | |
paulsherwood | it leaves tmp in case folks need to explore the debris. | 16:28 |
* paulsherwood is wondering too | 16:29 | |
Kinnison | richard_maw: If you have access to the relevant scripts, could you have it clear down /src/tmp before running builds? | 16:29 |
richard_maw | erm, maybe | 16:29 |
* Kinnison is currently clearing out /src/tmp | 16:29 | |
paulsherwood | it would be easy enough to blat all tmp on start, assuming no other ybds are active | 16:29 |
Kinnison | We're limiting ourselves to one top-level builder per slave | 16:30 |
*** mariaderidder has quit IRC | 16:30 | |
paulsherwood | ack | 16:30 |
paulsherwood | so that would be the best bet then, have it blat tmp first of all | 16:30 |
* Kinnison nods | 16:30 | |
paulsherwood | i can add that with a config option | 16:30 |
* Kinnison also clears out all the cached gits from the wrong trove :-) | 16:30 | |
* Kinnison removes an old backup of /src too | 16:31 | |
Kinnison | such cruft accumulated in one week | 16:31 |
paulsherwood | what's the best way to check if a program foo is running, in a python program? | 16:31 |
Kinnison | there, 91G spare | 16:31 |
Kinnison | richard_maw: If you can cause a builder run again, we can see if it gets further | 16:31 |
Kinnison | paulsherwood: Hmm | 16:32 |
Kinnison | paulsherwood: well you could use pgrep | 16:32 |
richard_maw | paulsherwood: that's a deceptively difficult question to solve correctly | 16:32 |
jmacs | paulsherwood: There are libraries, but nothing built in that I know of. Either iterating over /proc with os.listdir or calling pgrep using subprocess | 16:32 |
* paulsherwood finds psutil | 16:33 | |
richard_maw | that will tell you if a process with the name you think your program should have is running | 16:33 |
paulsherwood | ack | 16:33 |
Kinnison | richard_maw: now the ybd stuff seems to be expecting .co.uk | 16:33 |
paulsherwood | which is not necessarily safe :) | 16:33 |
Kinnison | richard_maw: buh | 16:33 |
Kinnison | richard_maw: where is the ybd config for the builder, do you know? | 16:34 |
richard_maw | Kinnison: in pdar's homedir I think | 16:34 |
paulsherwood | Kinnison: in order of priority . ybd/ ybd/config | 16:34 |
Kinnison | pdar: what is the trove configured to? | 16:35 |
paulsherwood | (and only ybd/config actually contains a ybd.conf file by default) | 16:35 |
Kinnison | pdar: it should be cu010-trove.codethink.com | 16:35 |
*** zoli___ has quit IRC | 16:35 | |
*** zoli__ has joined #baserock | 16:35 | |
pdar | oh, i changed it to .co.uk at about 17:15 | 16:36 |
pdar | I think i misunderstood what was going on | 16:36 |
Kinnison | pdar: doh | 16:36 |
Kinnison | pdar: No, the *trove* was misconfigured, not ybd :-) | 16:36 |
pdar | ahh I'll change it back then | 16:37 |
* Kinnison was perhaps not clear enough | 16:37 | |
*** Walkerdine has quit IRC | 16:37 | |
paulsherwood | note it's probably more predictable to create a new ybd/ybd.conf and leave ybd/config/ybd.conf as default (so git pulls won't get worried) | 16:37 |
richard_maw | paulsherwood: pretty much the only reliable options on linux are to have been the one to spawn it, at which point you can look up its status PID safely since it can't re-use it until you reap it, or for you to stuff the PID into a cgroup that nothing else is allowed to change, or to defer this logic to something else. This is one of the reasons why systemd exists. | 16:38 |
paulsherwood | richard_maw: ack. so i'd better not go down this rabbithole | 16:38 |
richard_maw | I'm lacking context, what is the problem you want to solve? | 16:38 |
paulsherwood | working out whether it's safe to blat tmp dir on starting ybd | 16:39 |
Kinnison | paulsherwood: that's what lockfiles are for | 16:39 |
Kinnison | paulsherwood: and there's lots and lots of good research and implementations for lockfiles | 16:39 |
richard_maw | or take a lock on the directory itself | 16:39 |
* paulsherwood will have to google, then | 16:40 | |
paulsherwood | all of my experience of lock files in former lives was cleaning up messes when locks got left around by accident | 16:41 |
richard_maw | paulsherwood: that's why you use posix lock file descriptors, rather than lock files | 16:41 |
richard_maw | paulsherwood: the lifetime of the lock is maximally bound by the process, so when your process dies, so does the lock | 16:42 |
richard_maw | paulsherwood: and if you crash *hard* you don't leave locks around | 16:42 |
richard_maw | I think Tiago did locking for tempdirs in morph | 16:42 |
* richard_maw saw patches | 16:42 | |
Kinnison | well stage1-gcc built that time | 16:43 |
*** Walkerdine__ has quit IRC | 16:43 | |
Kinnison | so I think we stand a chance \o/ | 16:43 |
paulsherwood | richard_maw: was this recent? | 16:45 |
richard_maw | fairly | 16:45 |
richard_maw | paulsherwood: http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/morph.git/tree/morphlib/stagingarea.py#n63 for making sure the staging area can't be removed by gc while in use, and http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/morph.git/tree/morphlib/plugins/gc_plugin.py#n94 for removing it | 16:45 |
paulsherwood | richard_maw: tvm! | 16:45 |
richard_maw | paulsherwood: you don't have to go to staging area granularity, just open /src/tmp instead | 16:45 |
paulsherwood | agreed | 16:46 |
* Kinnison thinks we care enough about the result that he'll leave this builder on overnight | 16:48 | |
richard_maw | paulsherwood: you need to know when you want a shared lock and when you want an exclusive lock, and when you want to do a blocking lock, or a non-blocking lock | 16:48 |
richard_maw | when using it you'll want a shared lock, so multiple YBD instances can use it | 16:49 |
richard_maw | and when removing it you want to take an exclusive lock | 16:49 |
paulsherwood | ack | 16:49 |
richard_maw | so only the thing removing it can use it | 16:49 |
* richard_maw departs | 16:50 | |
* paulsherwood too | 16:50 | |
richard_maw | though locks are interesting so I may be around to talk about them later | 16:50 |
* paulsherwood too | 16:50 | |
*** zoli__ has quit IRC | 16:55 | |
*** bashrc has quit IRC | 17:02 | |
richard_maw | Ok, locking! It's better to be explicit and have processes take a lock, than leave it implied that "ybd.py" processes lock removal of the tempdir. | 17:39 |
*** Walkerdine has joined #baserock | 17:40 | |
richard_maw | You could make a lock file, but that causes interesting problems if you might have multiple processes needing the directory | 17:40 |
richard_maw | and problems of processes needing to clean up the lock file correctly after they're done | 17:41 |
richard_maw | and with abnormal termination you can't get rid of them | 17:41 |
richard_maw | so you can use flock() (fcntl.flock() in python) on an open file descriptor to take an advisory lock on a file | 17:42 |
richard_maw | this is advisory locking, since nothing prevents you going behind the program's back and removing the directory anyway | 17:42 |
richard_maw | you need programs that may decide to clean up the directory to ask if they are allowed to | 17:43 |
richard_maw | (there is mandatory locking in Linux, but it's a bad idea) | 17:43 |
richard_maw | (one reason why Windows updates are such a pain is that mandatory locking is the norm, so to apply updates you have to shut everything down) | 17:44 |
richard_maw | by making locks file descriptor based, you can guarantee that it is eventually released | 17:44 |
richard_maw | as if your process dies, you release the lock | 17:44 |
richard_maw | in normal cases you would take an exclusive lock when writing and a shared lock when reading (hence why some literature calls them read locks and write locks) | 17:45 |
richard_maw | morph makes the staging areas gc safe by taking an exclusive lock when creating the staging area, leaving the file descriptor open while it is in use, then removing it and closing the file descriptor when it is no longer needed | 17:47 |
richard_maw | this relies on the linux behaviour of letting you remove a file when you have it open | 17:47 |
richard_maw | when morph wants to gc staging areas it can know that one is in use by attempting to take an exclusive, non-blocking lock | 17:48 |
richard_maw | at which point it can move on to the next staging area to remove | 17:48 |
richard_maw | if the lock fails, or it can proceed to clean up the directory | 17:49 |
richard_maw | if you ran another gc during this time they could not confuse each other by both trying to clean up the staging area | 17:49 |
richard_maw | since you don't care about this fine grained locking, you would, on start, attempt to open the directory for reading, and if this fails, skip trying to remove it | 17:51 |
richard_maw | then you'd take an exclusive lock to clean up its contents | 17:51 |
richard_maw | do this non-blocking if you want to make ybd report an error, or carry on without trying to remove it | 17:52 |
richard_maw | if taking the exclusive lock fails you should take a read lock, so nothing else tries to remove it while you are using it | 17:52 |
richard_maw | if you took the exclusive lock, then you can remove all its contents, and convert your exclusive lock to a shared lock | 17:53 |
richard_maw | you should also take a read lock when you first create the directory | 17:56 |
richard_maw | the point of all this locking is that when you have a process cleaning it up, you have an exclusive lock, at which point you can take a blocking read lock, and wait for it to be tidied up | 17:57 |
*** lmackenzie75 has quit IRC | 17:57 | |
richard_maw | or it can be in use, at which point you can decide whether you want to wait for everything to finish so you can clean it up (blocking exclusive lock), proceed without cleaning it up, or terminate (non-blocking exclusive lock) | 17:58 |
richard_maw | I'm willing to do code review for your locking code, since I was the one to suggest it to tiago and I reviewed his code for morph | 18:00 |
paulsherwood | ok thanks - i'll get to it later today | 18:17 |
*** zoli__ has joined #baserock | 18:45 | |
*** zoli___ has joined #baserock | 19:00 | |
*** zoli__ has quit IRC | 19:00 | |
*** zoli___ has quit IRC | 20:07 | |
*** zoli__ has joined #baserock | 20:07 | |
*** Walkerdine has quit IRC | 20:09 | |
*** toscalix__ has quit IRC | 20:41 | |
*** toscalix_ has joined #baserock | 20:41 | |
*** Walkerdine has joined #baserock | 21:09 | |
*** zoli__ has quit IRC | 21:23 | |
*** Walkerdine has quit IRC | 21:35 | |
*** Walkerdine has joined #baserock | 21:41 | |
*** zoli__ has joined #baserock | 22:04 | |
*** Walkerdine has quit IRC | 22:45 | |
paulsherwood | richard_maw: https://github.com/devcurmudgeon/ybd/compare/clean-tmp-with-locks seems to work | 22:51 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!