IRC logs for #buildstream for Monday, 2020-06-29

*** tristan has quit IRC07:18
*** benschubert has joined #buildstream07:54
*** phildawson has joined #buildstream08:21
*** santi has joined #buildstream08:41
*** tomaz has joined #buildstream09:20
*** tristan has joined #buildstream09:33
*** ChanServ sets mode: +o tristan09:33
*** tristan has quit IRC09:41
*** tristan has joined #buildstream10:49
*** ChanServ sets mode: +o tristan10:49
douglaswinshipis there a good way to find out how large a file is, which you're in a "bst shell" shell? I tried running du, and it said everything was 0 Bytes.10:55
douglaswinshipI'm guessing that's got something to do with the "virtual" part of the virtual directory system.10:56
juergbidouglaswinship: have you already tried 'ls -l' or 'du --apparent-size'?10:57
juergbiI assume this is on master with buildbox-fuse10:58
juergbiwe may be able to more closely emulate a block device to fix `du`10:58
douglaswinshipjuergbi: aha, --apparent-size works. Thanks!10:59
tristanbenschubert, any further thoughts on optimizing _variables.pyx after my latest comment ? (A) do you think the added memory is a concern ? (B) do you think it worthwhile to reuse cached `Variables` collections on elements where they would be resolved identically ?11:12
tristanI also thought of hashing Values based on their dependency ValueClasses and reusing there, but that would certainly be more costly than just resolving them11:13
tristanThere may also be something to do with provenance identity comparisons11:13
tristanif two Elements have a collection of loaded Values which are essentially equal, and every named Variable has the same provenances, we can just reuse a fully resolved Variables collection11:14
* tristan has tried using python deque() instead of cython list, but that adds overhead in valentind's test sample data11:15
benschuberttristan:11:17
benschubertA) It is, but I think we can live with it if the code is faster and easier to maintain11:17
benschubert2) I don't think it would be easy to do that nicely11:17
benschubertWhat I'm wondering is, can we do it in two passes? One as efficient as possible, which detects an error state but can't give a nice error, and then another one we fallback on on error?11:17
benschubertI might be able to look at it more in details this afternoon if needed11:17
tristanbenschubert, For the memory overhead, what we really gain is provenance of all values in a form that is easy to reach, so actually it will be very easy to do the whole cyclic variable report as is11:19
benschubertSo it's useless apart for the report?11:20
tristanThere may be a straight forward way to make that go away11:20
benschubertCould we easily do two passes then?11:20
tristanI.e. have it around in temporary objects only there for the duration of the resolution of variables11:20
tristanit's not a whole provenance object, just an address to it11:20
tristanI'm trying to think of how we can get similar performance without complicating the code with a second pass :-S11:21
tristanI mean, we can of course double the code11:21
valentindIs not deque using linked lists? Is not it only better than vectors when doing multi threaded operations?11:22
tristanI was under the impression that deque had efficient pop() due to saving a pointer to the last element11:22
valentindThe O(1) insertion does not really apply anymore on modern hardware.11:22
tristanwhereas I think of a list having similar properties to a single-linked list11:22
valentindI thought python "list" where vectors.11:23
tristanBut I don't know the underlying properties of list11:23
tristanArrays ? I don't think so11:23
tristanOh hi valentind11:23
tristanI see that you are also not benschubert :)11:23
* tristan continues a conversation and notices that the nick of who he is talking to has changed ;-)11:23
valentindThere are been papers showing vectors were much better than linked list.11:24
benschubertlists in python have a hard time changing size (at 4,7, 13,....) as they include a copy of existing data11:24
valentindI do not think linked lists are used much anymore11:24
benschubertso if you are not needing indexing + appending/poping a lot, a deque is what you want11:24
tristanYeah11:24
tristanBut using python deque in cython is a big overhead11:24
benschubertbecause cython doesn't know how to access the C interface directly, and thus calls to python11:25
tristanAnd using deque `from libcpp.deque cimport deque` is undocumented and impossible to achieve :)11:25
benschuberthttps://wiki.python.org/moin/TimeComplexity is an amazing page when needing stuff like that11:25
tristanit complains that it wants the types of the queue elements to be strongly typed11:25
benschuberttristan: are you sure about deque: https://cython.readthedocs.io/en/latest/src/userguide/wrapping_CPlusPlus.html#standard-library ?11:26
tristanso it should be something like `cdef deque[Tuple(str, ProvenanceInformation, set)]`11:26
valentindhttps://www.youtube.com/watch?v=TJHgp1ugKGM&feature=youtu.be&t=294811:26
benschubertvalentind: that's C++11:27
valentindThat is the same CPU.11:27
tristanbenschubert, I found something like that (probably the same thing rendered in pdf in ugly googledocs), but not seeing how to use deque() from there really11:27
tristanbenschubert, but looking at the vector for instance, notice `cdef vector[int] vect`11:28
tristanSo if you want a collection of `int`, which can be defined as `cdef deque[int] variable_name`, that works11:28
benschuberttristan: you could deque[Tuple]11:28
benschubertand then when you pop you cast the tuple11:29
tristanHmmm, lets try that, where do I import `Tuple` from ?11:29
benschubertbecause tuple is actually a PyOBject pointer to somewhere11:29
tristanI believe I tried deque[tuple]11:29
tristanAnd the only `Tuple` I've seen, is from the mypy type hinting stuff11:30
tristanSo I felt reluctant to expect that it would be anything meaningful other than useful for doing static type checking11:30
benschubertSeems like we cna either do:11:31
benschubert- a struct11:31
benschubert- a std::pair11:31
benschubertI'd be reluctant on the `std::pair`, as we've managed to avoid C++ until now11:31
tristanbenschubert, `from typing import Tuple` <-- Think this is the right `Tuple` ?11:31
benschubertthat's typing info, cython can't do anything with it11:32
tristanYeah11:32
tristanAny idea what `deque[Tuple]` is ? which Tuple from where ?11:33
benschubertwhere do you see that?11:34
* tristan would be happy to just understand how to use `(void *)`11:34
tristan<benschubert> tristan: you could deque[Tuple]11:34
tristanbenschubert, I see that there ^^^^11:34
tristanAnd wonder... what is Tuple ?11:34
benschubertah deque[tuple] sorry11:34
tristanI don't think that works, I'll try again11:34
tristanI think deque[Value] also didnt work (for a cython defined Value object in the same file)11:35
benschubertbut would `deque` alone not work?11:35
benschubertactually I think you want `deque` only11:35
benschubertand then cast your return11:35
benschubertvalentind: interesting talk, would you have such benchmarks with python?11:36
tristanbenschubert, forgive my unknowing of cython, what might that look like ?11:39
tristanIf I do `cdef deque pending = deque()`, then the first `deque` says: 'deque' is not a type identifier11:39
tristanmaybe it's just a function ?11:39
benschubertoh wait, the page I linked in Cython is about C++11:40
tristanyeah, I did `from libcpp.deque import deque` since that's all I could find about it, ever11:41
tristansorry, s/import/cimport again11:41
benschubertoh well11:41
benschubertand a simple deque without typing was way slower right?11:42
tristanwait a sec, I might have something11:42
tristanI actually again did import instead of cimport, and for some reason the parser didnt explode at that line11:42
benschubertbecause it's valid :)11:42
tristanOddly...it's now telling me: src/buildstream/_variables.pyx:239:66: Object of type 'deque[T]' has no attribute 'pop'11:43
valentindbenschubert, I do not have. But that would be interesting to measure it.11:44
benschuberttristan: can you push your code somewhere I'll look into it after lunch11:44
tristanSo `cdef deque pending = deque()` works, and pending.append((a, b, c)) works11:45
tristanbut pop is not valid11:45
tristanif only I had like a strategy for investigating how this is supposed to work, like knowing what C header files I should be looking at, hmmm11:46
tristanBut seems quite involving for something which only *might* shave off some cycles11:46
*** mohan43u has quit IRC12:07
*** mohan43u has joined #buildstream12:07
benschuberttristan: back, did you find something/ give up?12:23
benschubertI agree that it might not be the most worth optimization12:23
benschubertwe probably can do much better12:24
benschubertactually the split between resolving and getting the error was made especially because of the slowness12:24
benschubertso I think it would be hard to do without the split12:24
tristanDoes that split exist ?12:24
tristanI don't think so, looking at https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/_variables.pyx#L17212:25
tristan_check_for_missing and _check_cycles12:26
benschubertoh, they used to exist12:26
tristanThose are called unconditionally anyway12:26
tristanIn the current setup, every variable should be resolved only once12:27
tristanAnd the added explicit check() (only happens on non-junctions) ensures that every variable is resolved12:27
tristanNote: https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables/src/buildstream/_variables.pyx#L26612:28
tristanThis queues dependencies of a value for resolution12:28
tristanand https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables/src/buildstream/_variables.pyx#L39612:28
tristanThat decides that a value has no dependencies if it has already been resolved12:28
tristanMaybe we could have a quicker exit12:29
tristanAnd avoid queuing the dependency itself if it's already resolved12:29
tristanon this line: https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables/src/buildstream/_variables.pyx#L26312:30
benschuberthttps://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables/src/buildstream/_variables.pyx#L171 also seems pretty inefficient :)12:30
benschubertI can probably have a look today/tomorrow if I can understand the hotspot here if you want12:30
benschubertbut with the current state, I don't think we should merge it like that12:31
tristanbenschubert, I have a local change which removes that and buys some cycles indeed12:31
benschubertlet me know if you want me to run the big benchmarks, they take roughly 30-45 minutes to run12:32
tristanbenschubert, My local branch changes that for `for varname in self._values.keys(): ... self._expand(varname, None)`12:32
tristanAnd also removes the unconditional ProvenanceInformation() creation at `_expand()` entry12:32
tristanBut I think the difference is tiny :-S12:33
tristanbenschubert, and you don't like provenance identity based variable pool reuse ?12:34
tristanI think there are probably a lot of elements in a given project which do not override/define variables12:34
benschubertwe can try, I'm not sure I get 100% of that what would entail12:34
benschubertand if we gain a lot by doing that I'm definitely +112:34
tristanFor every Element which essentially has their variables defined exactly the same way, we would reuse the same Variables instance entirely12:35
benschubertthen that seems worth trying12:35
tristanAnd we could identify that sameness by comparing provenances of values (with ProvenanceInformation identity comparisons)12:36
tristanSo we're given a MappingNode at startup, the MappingNode will be a different instance for every element (probably ?) but the provenances of each value should be the same for each variable, if the YAML composition was equal12:37
benschubertthat's true12:37
tristanbenschubert, I think it would be a big win, but it would falsely make the big benchmark you are running very fast12:38
benschubertand we could have a dict inside Variables that's global to the class and contains a mapping provenance -> Variables class or something?12:38
tristansince it will be the same Variables instance for every Element12:38
benschuberttristan: I'm fine with that, most of the examples I care about are like that12:38
benschubertwe should definitely validate for fsdk or something else that is more "organic" that the solution is also good12:39
tristanThe mapping is probably a bit tricky to construct efficiently :-S12:39
benschuberttrue12:39
tristanIt might entail sorting the variable keys12:39
benschubertand might actually be worse if more packages define few variables than none...12:39
tristanI think that often enough we have a lot of reuse of identical variables tables12:40
tristanbut if they are mostly different, then it's more expensive definitely12:40
tristanThen again, if MappingNode is deterministic in some way, we don't need to sort12:41
tristanWe only need a O(n) accumulation of provenance identities to compose a hash key12:41
benschubertMappingNode uses a `dict`, so it would be sorted12:41
tristanHmmm :-/12:41
tristanOnly happens to be under the current version of python12:42
benschubertI think we would need to try, more than anything else12:42
benschubertit's in the language definition now12:42
benschubert(py37+)12:42
tristanDont we still support 3.6 ?12:42
tristanWe dropped 3.5 I think recently12:42
benschubertyes, and cpython 3.6 has sorted but it's not in the language def12:42
benschubertso another interpreter targeting python3.6 might not implement that hte same way12:43
benschubertbut we don't support anything else than CPython anyways12:43
tristanbut we anyway require cpython12:43
benschubertthen we are fine12:43
* tristan reluctantly relinquishes a particle of trust to python12:43
* tristan looks back over his shoulders and hopes it wont come back to haunt him12:44
benschubert(and to be fair, I'm not sure we'll still need python3.6 support before 2.0 :D)12:44
tristanI'm not sure that we will notice when python 3.12 comes out and yet-again redefines dictionary behavior12:44
benschubertah no shoot, python3.6 is until 2012-1212:45
benschuberttristan: before it was undefined12:45
benschubertnow it is defined since 3.712:45
benschubertso before it was implementation dependent12:45
benschubertnow it's part of the language12:45
benschubertand I think the python community is not ready for a breaking change like python2 -> 3 so for breaking "defined" things we should be fine12:45
benschubertasync had some breakages but the APIs where provisional for example12:46
tristanI'll give it a shot, I wonder what is a good way to construct such a dictionary key12:46
benschubertfor MappingNode?12:47
tristanmaybe first just try to use a dictionary of variable name / provenances as a dictionary12:47
tristanNo, a dictionary key for a global dictionary of `Variables` instances12:47
benschubertyeah something like that might work12:47
tristanso just use a compare-by-value, maybe I need to use... id(provenance)12:48
tristan(is that a thing ? ... `id()` ? sounds familiar)12:48
tristanAre the MappingNode keys interned too perhaps ? would make sense for them to be12:49
tristanin that case we could use the identities of strings and provenances12:49
benschuberttristan: they are not12:49
benschubertwe _can_ try such a change, but will require benchmarking again12:50
benschubertI'm happy to check that if you want12:50
benschubertOr maybe intern all strings in Nodes?12:50
benschubertthat seems aweful though12:50
tristanI don't know about values, but certainly MappingNode keys is a good win12:56
benschubertah?12:56
tristanValues maybe also would save memory12:56
benschubertdid you just try?12:56
tristanBut MappingNode keys being interned means we speed up every MappingNode.get_*() call if the provided key is interned12:56
tristanNope, gonna try it now12:57
tristanWhere is the code which constructs Node() from yaml ?13:01
tristanthe yaml representer thing13:02
tristanhmmm13:02
* tristan is trying to find what passes the `dict` to the MappingNode constructor13:03
benschubertone sec13:03
benschuberthttps://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/_yaml.pyx#L18713:04
benschubertbut that's no guarantee13:05
benschubertso we might need a wrapper around instead13:05
tristanAha13:05
tristan _handle_doc_MappingStartEvent13:05
tristanbenschubert, As far as I can see, the constructor might as well not take any dict13:06
tristanOnly an empty dict is ever given, and things are only assigned via __setitem__ ?13:06
benschuberttristan: either this or my previous link13:06
benschubertso we could get rid of my previous link by having an optimized fonction to set things correctly13:07
tristanOh wait13:07
benschubertit's kind of a bigger change, and will not be as easy13:07
tristanCould we turn that into: (<MappingNode> self.output[-2])[key] = self.output[-1] ?13:07
tristanAnd always go through __setitem__ ?13:07
benschubertit might be slower13:08
tristanOr that is a cython/python overhead ?13:08
benschubertI am not 100% sure13:08
tristanSo I can just call sys.intern() in that place13:08
benschubertbut we can either do this or have a special c-only funciton that does the sys.intern + set13:08
benschubertand have that used in __setitem__ too13:08
tristanOr, just: https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/_yaml.pyx#L17513:09
tristanbenschubert, That is keys and scalars and all strings ?13:09
benschubertI'm not 100% sure13:09
benschubertthis whole ruamel stuff is hacky13:09
tristanyeah13:10
benschubertbut I'd rather have something done in MappingNode directly13:11
tristanev.value are the strings13:11
benschubertsince we have other places in the code that add keys there13:11
benschubertand we probably want to have everything treated the same there13:11
tristanbenschubert, We can't really do that with a public MappingNode.value13:11
tristanbecause _yaml.py goes and touches it13:11
benschuberttristan: it's "public" as in c-public13:12
benschubertso we can have a wrapper in MappingNode, like MappingNode._set(key, value)13:12
benschubertand we change all our code calling MappingNode.value to use this instead13:12
tristanOk13:12
benschubertif you dont' want to get sidetracked on that and instead work on other optimizations I can do that tonight probably13:13
benschubert(+benchmarks)13:13
tristanbenschubert, I think it's together, and I think I have a good patch for it already13:18
tristansec, lemme push a branch13:18
benschubertawesome :)13:18
tristangrrr AttributeError: 'buildstream.node.MappingNode' object has no attribute '_set'13:26
tristanthis _yaml is done differently than _variables.pyx, and has a .pdx which needs to be understood too13:26
benschubertah you need to update _node.pxd too13:28
benschubertpxd is for exposing a Cython interface13:28
benschubertotherwise cython doesn't expose anything13:28
tristanI did, with the identical signature, I added it right beside _get()13:28
benschubertdid you pip install again?13:28
tristanOh that worked13:29
benschubert:D13:29
tristanbenschubert, I've been testing with `tox -e venv /bin/bash`13:29
tristanBut this one required a `git clean -xdff` to get rebuilt properly13:29
benschubertuh weird13:29
tristaninterned MappingNode strings seems to not effect performance much (in this specific case of the variables test)13:30
tristanPerhaps a small speed gain13:30
tristanI'll push the patch now so you can see it anyway13:30
benschubert{VW}13:31
benschubertwoops13:31
benschubertseems good13:31
benschubertdo you want me to see the difference on the debian mock project?13:31
tristanbenschubert, https://gitlab.com/BuildStream/buildstream/-/merge_requests/1973/diffs?commit_id=b281d937f8cb4f99547e62cf46efe88458f8114413:34
tristanbenschubert, Sure :)13:34
tristanbenschubert, However, I was really doing it mostly so that I could rely on interned keys, and from there construct a speedy key for a Variables lookup table for the other, later optimization13:35
tristanBut benchmarking them separately could certainly be helpful13:35
benschubertdo you want a benchmark of only this commit? Or the branch?13:35
tristanI have a feeling that change might on it's own speed things up a bit, if MappingNode lookups are usually done with strings obtained from MappingNodes in the first place13:36
benschubertOk I'll try this on its own13:36
tristanIt wont speed up Plugin.configure() routines, because those use hard coded strings13:36
benschubertI think we can do even slightly better however, so will modify it a bit :)13:36
tristanSure :)13:36
tristanbenschubert, I don't know what the effect might be of interning the lookup string in _get(), but I suspect it won't improve things (possibly worsen performance)13:37
tristanyou really need to be *using* the interned lookup string from calling code in order to get a speedup (so within Variables code, it should definitely help)13:38
benschubertyeah I'd rather not13:38
benschubertwe could also have every hard-coded string interned when used in nodes13:38
benschubertwould be a manual process but only interned at import time might give some speedup13:38
tristanIf an element interns a string to be used later on in Plugin.configure() for a lookup, it will help... *only* if the lookup string is done classwide13:39
tristanThen you intern the string once and you get speedy lookups for every instance of that plugin13:40
tristanbenschubert, that said, it might accidentally "just work", e.g.: if cython implementors thought "It's a literal string, as such it should be interned automatically by the language"13:42
tristanThe literal string is anyway going to cost memory, so it would make sense to deduplicate them13:43
benschubertyeah let's already see if the initial change helps :)13:43
tristanAsking in #python, the answer is yes: Literal strings are automatically interned13:44
benschubertah awesome13:46
tristanif the benchmark is not intensive in lookups, might not make much difference (although we have a good chance of memory savings)13:48
*** tristan has quit IRC13:53
douglaswinshipjk13:58
douglaswinship(sorry)13:58
benschuberttristan: added https://gitlab.com/BuildStream/buildstream/-/commit/cb34c95009f8a77ea28b7390a5abbbcc428aa171 on top of your commit, am benchmarking just this change.14:02
*** tristan has joined #buildstream14:03
*** ChanServ sets mode: +o tristan14:03
benschuberttristan: there's also other places in the code where we'd need ot change14:19
benschubertif that alone brings some improvements, do you want me to take over and cleanup everywhere for the interning, so you can focus on the rest?14:19
tristanbenschubert, Sure, but I might end up overlapping, at least that part I need14:29
benschuberttristan: I don't expect rebasing to be too hard? I will keep the exact interface from the MappingNode you have set14:30
tristanbenschubert, I have a `frozenset({id(key): id(value.get_provenance()) for key, value in mapping_node.items())` which successfully reuses Variables instances :)14:30
tristanBut it causes some tests to fail so I have to look a bit deeper14:30
benschubertinteresting!14:30
benschubertnot sure what's the frozenset meant to do here?14:31
tristandict on it's own is not hashable14:31
tristanso cannot use a dict as a dict key14:31
tristanneither are lists apparently14:31
tristanwhen running `bst show target.bst` in tests/format/junctions/nested, I get a lot of matches as I would have expected14:33
tristanthat so far is at least promising :)14:33
* tristan goes to obtain some food and may tinker a bit later in the wee hours of the morning before sleeping :)14:34
benschubertah right, I'd actually be surprised if you get some speedup with having to hash all of this :) let's hope14:36
benschubertwouldn't the provenance of 'MappingNode' itself be enough?14:36
tomazhey, I see that we have a bit of snap code within buildstream, is this still open or someone forgot to close? https://gitlab.com/BuildStream/buildstream/-/issues/65014:39
coldtomi'd argue that it's still an issue, as the snap_image plugin lives in freedesktop-sdk only, valentind has more context than me though14:43
benschuberttristan: it's slightly slower with the change on the dummy debian project14:54
tristanbenschubert, the provenance of the MappingNode itself doesnt guarantee me that none of the child values have been composited15:05
tristanwhich I guess explains the failing tests15:06
benschubertah good point15:06
tristanBecause the same is true for nested dicts15:06
tristanI could have the same provenance identity for a child key which is a dict, but that child's child being composited invalidates the mapping node15:06
tristanSurely a recursive collection of provenance identities is still way cheaper than a resolution of Variables, though15:07
tristanbenschubert, how does memory consumption compare btw ?15:14
benschubertno change15:14
tristanHmmm15:14
tristanMakes me wonder if the strings are already interned by ruamel internally15:14
tristanand we're just adding overhead of re-looking them up with sys.intern()15:14
benschubertpossible, could check the ruamel codebase15:14
tristanthen again, the initially loaded yaml may not be all that much in this project, and just reused/recomposited many times15:15
tristanmaybe the keys are not as redundant in memory as one might expect15:15
benschubertyeah15:16
* benschubert really doesn't like browsing code on sourceforge15:17
tristancomposition will not end up duplicating them, the strings are immutable anyway15:17
tristannah15:17
tristanI always browse code in .tox/15:17
benschuberthaha15:17
tristanI don't have grep in sourceforge anyway :)15:17
tristanwhat use is that15:18
benschubertI like gitlab search/github search as it allows also to look if issues are mentioning this :)15:19
*** santi has quit IRC16:04
*** santi has joined #buildstream16:05
tomazdo you guys know about buildstream.co.uk ?16:14
tristanbenschubert, around still ?17:46
benschubertyep!17:46
benschuberttristan: ^17:47
tristanI've been tinkering, found that there is of course no need for recursion to compose this key17:47
tristanBecause variables are a flat dict17:47
tristanBut we cannot use the id() of a Provenance17:47
tristanbecause ProvenanceInformation() has been refactored into something that is constructed on demand !17:47
benschubertoh right17:47
tristanHere I was thinking it was something which could uniquely identify the node17:48
tristanMaybe I can use the ID of the value Nodes ?17:48
tristanlemme check if they remain constant17:48
tristanThe caching is otherwise working fine if I use value.as_str()17:48
tristanbut that's of course undesirably expensive17:48
benschubertyeah their ID might be fine17:51
tristanSo long as composition doesnt clone them, which I suppose it shouldn't anyway17:51
*** santi has quit IRC17:52
tristanNope :-S17:52
tristanScalarNodes like 'prefix' values from ~/buildstream/src/buildstream/data/projectconfig.yaml have different IDs from one element to another, in the same project/session17:53
tristanOh damn17:54
tristanMaybe this is futile after all17:54
benschubertah I think the reason for that is that plugins can access their nodes, so we clone them per plugin17:55
tristanAh of course17:56
tristanBut, there is another disastrous thing17:56
tristanhttps://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/element.py#L28617:56
tristanThat line kind of throws the whole thing off :-S17:56
benschuberthahaha, oh yeah of course -_-'17:57
tristanThat would mean that, any caching/reuse might instead require Variables.clone() with a partial re-resolution of some vars referencing 'element-name' :-S17:57
benschubert yeah no, let's forget that17:57
benschuberttoo many places where it could go wrong17:57
tristanYeah, do we really ever use 'element-name' at all for anything though ?17:57
benschubertfor some plugins it can be nice17:58
benschubertah but that would end up with ".bst" anyways right?17:58
benschubertso that's not idea17:58
benschubertor maybe we should keep it but remove the .bst?17:58
tristanSeems like we really should be able to avoid a lot, and otherwise, people can always name stuff in the yaml17:58
tristanI don't know what use case you're thinking of where .bst is undesirable17:59
tristanI don't really understand the usecase of why it's desirable either, I mean it does 'seem neat'17:59
tristanbut otherwise17:59
benschubertif at a project level you force the 'package' name or 'library' name to match the name of the element, your plugin could infer much more and sometimes require very little setup17:59
benschubertI for sure know a few places where there is such mapping18:00
tristanI suppose if we had composite plugins then the composite plugin implementor would still be able to automatically assign names to the elements which are declared inside the same .bst file18:00
*** tomaz has quit IRC18:01
tristanmyeah so, I guess we should ditch the idea of caching entire Variables instances, although, it was attractive18:01
tristanHmmm18:01
tristanIt could be an exception18:01
tristanwell, a bit complex18:02
tristanugh18:02
benschuberttristan: you might be interested: https://gitlab.com/BuildStream/buildstream/-/compare/master...bschubert%2Fno-multiprocessing is what I've got currently, I've got still problems with lingering threads and must then make sure that everything works really correctly wrt terminate and pause, but otherwise it's looking that we will be able to simplify the codebase a lot18:04
tristanThat's gonna be a big review, I don't think I can context switch very well at this time :-S18:05
tristanLets give ourselves another day or two, and if I can get acceptable performance here, I'll put my full attention on that review ?18:06
benschubertah don't right now18:06
benschubertI just wanted to give a head's up that I'm working on it and think it will be a good thing :)18:06
benschubertI'll still need 1-2 weeks before I can put it to review I think18:06
tristanYeah I figured you'd still have some road to go :)18:06
benschubertIt's really more convoluted than I had oped18:06
tristanhaha ok, overly optimistic here :)18:07
benschubertbut oh well18:07
tristanI'm curious why the current variables code is consistently more performant with `bst show --format '%{deps}'`, and what that tradeoff means18:08
tristanit might be that the overheads are not where we think they are18:09
tristani.e. in the tight resolution loops18:09
benschubertthat's a possibility18:09
benschubertand profiling the cython code is tricky18:09
benschubertunless you like perf/strace18:10
tristanstrace doesnt sound fun18:10
tristancachegrind is nice18:10
tristanbut maybe not possible18:10
tristanmight be able to hone in a bit with hand written micro-benchmarking18:11
tristanmaybe there is just a constant overhead of allocating the Value() objects18:11
tristanand not a (severe) algorithmic problem18:12
benschubertthat's also a possiblity indeed18:13
tristanMaybe inline C is better ?18:13
tristaninline C for struct allocation and then cython functions to operate on them ?18:13
benschubertit's probably worth a shot18:14
tristanI'll have to figure out how all of that works18:14
tristando you have a good link which describes how to use cython this way ? I've basically just been googling as-needed and getting results of varying quality18:15
benschuberttristan: one thing you can try before, is add `annotate=True` in https://gitlab.com/BuildStream/buildstream/-/blob/master/setup.py#L271, then `pip install` again, it should generate a .html for each pyx file18:16
benschubertthat will allow you to see the equivalent code genrated18:16
benschubertmore yellow means more python interaction18:16
benschubertless yellow is better, white means pure C18:16
tristanoh wait, I think I've got a hint18:17
tristanbenschubert, there *might* actually be a difference between __init__ and __cinit__18:19
benschubertah there is!18:19
benschubertsorry did I miss that?18:19
* tristan says with blatent naivety18:19
benschubertbasically, only use __cinit__ if you can18:20
tristanwell, that appears to make a small difference18:20
* tristan thinks he has all of his def/cdef/cpdef correct (with maybe one minor fix here)18:23
benschubertyou can use annotate=True to see that quickly :)18:24
tristanI ended up doing that manually the other day (cython -a _variables.pyx)18:25
tristanlemme try again18:25
tristanor use the setup trick18:25
benschubertah yeah -a also works18:26
tristanSo the yellow lines are the good ones or the bad ones ?18:27
tristanThe bad ones I guess ?18:27
benschubertbad18:27
benschubertthe more yellow the worse it is :)18:27
benschubertwhite is perfect C18:27
benschubert"perfect"18:27
tristanOk we can improve expand()18:28
tristanit calls into itself, recursing through a dictionary where stuff gets expanded18:28
tristanIt needs to be called in python, but it could call into itself in C18:28
benschubertyep that would be good18:28
benschuberteither cpdef or def then a wrapper in C (cdef _expand)18:29
benschubertI usually prefer the second one since it's more explicit18:29
tristanwell wait, if it's already cpdef, then when it calls into itself it should be in C ?18:30
tristanOk I'll change it anyway18:30
tristancause I totally agree, that's less sloppy18:30
benschubertI am not 100% sure18:31
tristanDo you know what is the difference from bright and half yellow ?18:33
benschubertthe brighter the more it uses python APIs18:34
benschubertyou might want to split if you are doing multiple things per line to know exactly what's happening18:34
tristanHmmm18:34
tristanbenschubert, https://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables/src/buildstream/_variables.pyx#L15718:34
tristanI don't understand why that line would be bright yellow18:34
tristanit looks like cython frowns upon for loops18:35
benschuberthttps://gitlab.com/BuildStream/buildstream/-/blob/tristan/partial-variables/src/buildstream/_variables.pyx#L396 you are returning an object here18:36
benschubertnot a list18:36
benschubertcdef list dependencies()18:36
tristanoh so I need to declare it as a list18:36
benschubertshould help18:36
benschubertyep18:36
benschubertotherwise it returns `PyObject*`18:36
tristanstill bright yellow18:38
benschubertalso: if you click on a yellow line it will show you the code18:38
tristanooooooh18:38
benschubertYeah :D18:39
tristanAnd the red stuff in the code is "bad" I guess ?18:41
tristanPyMethod_Check18:42
tristan__Pyx_PyObject_CallOneArg18:42
benschubertYeah18:43
tristaniter_referee, iter_provenance, iter_name_set = pending.pop() looks expensive18:43
tristanunpacking tuples has a lot of checks18:43
tristanfor some reason strings have a lot of uncertainty about them too18:43
tristanchecking for unicode or which one of the dozens of kinds of strings are in use18:44
benschubert:/18:44
benschubertI wonder if we should go deeper in C18:45
benschubertit's less user friendly, harder to follow18:45
benschubertbut it might be good if used in very targeted places18:46
tristanI wonder if that is better to create a cython class and instantiate it instead of pushing tuples18:46
benschubertoh yeah!18:47
tristanit might mean more consistent memory slices (less thrashing), and less typechecking on unpack18:47
benschubertthat's actually very probable18:47
tristanok well, it's very late now, I think I'll give it a rest and revisit tomorrow18:48
tristanbut now I've got a lot to work with !18:48
tristanreduce yellowness !18:48
benschuberttristan: sure :) I'm looking forward to seeing improvements, and am happy more people are taking on the cython part :)18:50
*** phildawson has quit IRC22:28
*** benschubert has quit IRC23:22

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!