IRC logs for #trustable for Friday, 2019-02-01

reiterativeI agree that there is no difference between these different types of testing process at one level - in the sense that they may all be reduced to an act of deployment to a test environment and evaluation against a set of constraints - but I am convinced that the value of tests increases the more closely the test environment corresponds to the target environment.09:04
reiterativeBut this is a slippery concept, because it depends on (a) what you are testing, (b) what you define as the target environment and (c) the nature of the associated constraints.09:05
reiterativewhen verifying the behaviour of an individual component, for example, you *might* argue that it doesn't matter where you instantiate it, as long as you use the same interface that would be used in its target environment. But that would only hold true for a certain set of constraints - and might deliberately *ignore* other constraints that might pertain to the target environment itself.09:14
* paulsherwood currently things we could just go with 'tests that are concerned with behaviour of the software in general, not specific to an environment', and 'tests that are concerned with the software's behaviour/properties in a specific environment'09:21
paulsherwoodi may be wrong of course09:21
reiterativeIn my opinion, that is an axis rather than a binary distinction09:22
paulsherwoodbut this dichotomy would align with the distinction between 'software properties' and 'system properties'09:22
paulsherwoodi agree it's not binary09:22
reiterativeSoftware is frequently systems within systems within systems09:23
*** sambishop has joined #trustable09:23
reiterativeThe increased value of testing in close proximity to the target environment is actually about the range of constraints that may be considered.09:24
* paulsherwood is expressly shorthanding 'software' to mean source, bits+byes, 'systems' to mean hardware running software09:24
paulsherwoodagain i may be wrong09:25
reiterativeOK, I will rephrase09:25
reiterativeIt's a balancing act between testing software in its final (and complete) form in its final (and complete) enviornment versus testing progressively smaller parts of that software in an enviornment progressively removed from the final target.09:26
reiterativeThere's more value at one end of the axis because you can apply all the relevant constraints09:27
reiterativeBut there's more value at the other end because it costs you much les sto remedy the defects that you find09:27
paulsherwoodso you're saying there's more value at both ends than in the middle?09:28
reiterativeNo, I'm saying that the value of testing at any point on the axis needs to be set against the cost of undertaking it09:29
reiterativeThere will be sweet spots, but they are not always easy to identify09:30
reiterativeUnit testing may be immensely valuable in one context, and a complete was of effort in another09:32
reiterative(waste)09:32
reiterativeBut if you can perform meaningful testing early, then it will help to reduce the number of defects that you find in later testing09:34
reiterativeBut the main issue with this is the availability of properly defined constraints in the earlier stages of development09:36
* paulsherwood previously sketched this https://imgur.com/jcZCH9G on various whiteboards09:36
paulsherwoodnot just for testing...09:36
reiterative:-)09:36
paulsherwoodin many cases there's a cost curve where doing too little or too much are both suboptimal09:37
reiterativeAgreed. And with testing, it's not enough to consider the cost of executing the tests - you have to factor in the cost of fixing the defects - including defects that may be just 'noise'09:38
persiaNot just classic cost, but also costing in terms of response time, context loss, etc.09:39
* reiterative nods09:39
persiaIn terms of calendric delivery, if a potential defect can be scheduled for remediation within a short time of candidate proposal, the relevant folk are likely to respond "oh, right" and be able to immediately address it.  The longer the delay, the more (calendar) time required to regain sufficient context, the longer the project runs as a whole.09:40
reiterativeYes. And if a defect is discovered late in a project, the people who implemented the relevant bit of code might not even be available any more.09:42
persiaBut, if I'm summarising the above correctly, we consider there to be only one class of test/validation/etc., and there are likely to be metrics associated with any given stage in a process related to a) the signal/noise ratio of FAIL results, b) the time between scheduling and response, and c) the costings associated with preparation of the execution environment.09:42
reiterativeMy conclusion is that *all* types of testing are potentially valuable.09:42
persiaOrganisations benefit the most where (a) is relatively small, and (b)/(c) is either relatively static as one progresses a pipeline in a process or slowly increases as the pipeline progresses.09:43
reiterativeand (d) the completeness of the constraints covered09:43
persiaI consider (d) to be a different class of property than (a), (b), (c).  (a),(b),(c) can be usefully measured for a single result to be evidenced by a single vote.  (d) can only be measured in terms of a system totality.09:44
reiterativeAgreed09:45
persiaNote that I do agree it is important to measure completeness, both in terms of whether all contraints are satisfied under all reasonable conditions and in terms of the amount of total functionality existing within the system is exercised by the procedure of validation.  I just think they are different.09:46
reiterativeThe question is how do you measure / account for that when comparing the testing strategies used by two candidate trustable processes09:47
reiterativeThere are answers, but they involve a lot of overhead in collecting metrics09:48
reiterativeSo I distrust them09:48
persiaFor (a), (b), (c), one presumably wants to create some (f) that represents a collective over the entire process.09:49
persiaFor (d), (e), (f), one generates an abstract quantative metric, and then compares them.  This permits tradeoffs.09:49
persiaIn practice, the vast majority of comparisons are going to be performed within an organisation, so one can probably reuse most of the data and/or execution units when performing the comparison.09:50
persiaIn an abstract "Is this trustable" way, it doesn't matter as much, as that bar will be set externally, and is likely to be based on provided collatteral about processes and arguments for compliance of a given process, rather than in terms of comparison of two processes.09:51
reiterativeAre we saying that the details of a process are irrelevant, so long as we have evidence that it has been applied?09:57
reiterativeI'd want evidence for the effectiveness of a process if I was deciding whether it was trustworthy09:58
reiterative(but I appreciate that's not the same as trustable)09:59
persiaI think trustability and effectiveness are independent.  I expect to be able to create an untrustable efficient process or a trustable inefficient process.09:59
persiaBut, yes, I assert that the details of process are unimportant as long as there exists sufficient collateral to cause a meaningful chain of argument between the base expectations of "trustable" and whatever process is being evaluated.10:00
reiterativeI'm not convinced that is enough10:01
persiaIf an arbitrary process can answer questions about provenance, construction, reproducibility, functionality/reliability, consistency between system and intent, ability to update, and safety, I don't see any reason not to call it "trustable".10:02
reiterativeBut some of those characteristics require an evaluation of the process - functionality/reliability most obviously.10:03
persiaMind you, I might disagree with a given argument, and so might not personally wish to call some process for which I thought the argument was weak by that term, but that's about each of our own sense of logic and our ability to argue cases.10:03
persiaIn practice, any process will be evaluated against some set of stipulations (e.g. "it does what it is supposed to do").  The collateral produced during this evaluation can be considered in terms of whether it provides sufficient justification for a claim about that stipulation.  So long as there is a supportable claim of conformance to each stipulation, how isn't the process "trustable"?10:06
persiaMind you, it may be that the set of stipulations will end up being revised, but that is independent of the evaluation of processes.10:06
reiterativeI guess that would depend on what a process is claiming to achieve10:06
reiterativeAnd whether that holds up to scrutiny10:06
persiaRight, and we can only judge a process against the claims.10:07
persiaNow, if a process claims to do a variety of particularly interesting things (e.g. never link any non-GPL application against a GPL library), there needs to be support for those claims, which can be evaluated in terms of frameworks.10:07
persiaAnd I suspect that constructing metrics to allow one to appreciate confidence in evidence quantitatively will massively improve the ability of various parties to evaluate such claims.10:08
reiterativeAgreed. So the available evidence should support the claims that are made for a process?10:08
persiaBut it is important not to be distracted by the potential universe of claims to be supported to ensure the metrics for evaluating argument are sufficiently general.10:09
persiaThat restatement feels like it might miss something, but basically, yes.  The potential missing bit is the process of argument.  While it is true that if parallel lines never intersect, the sum of the inner angles of a triangle will be 180˚, demonstrating this from the evidence available requires presentation of additional collateral information (the proof)10:10
reiterativeYes, we probably don't want to get into the proof, but I think we do want to define some minimum characteristics for our trustable principles (provenance, construction, reproducibility, etc) and factor the availability of supporting evidence for these into any trustability metrics that we may devise.10:14
persiaRight, and it becomes the responsibility of someone claiming a given process is "trustable" to provide sufficient collateral (argument/proof) to satisfy anyone to whom they wish to make such a claim.10:23
*** sambishop has quit IRC11:10
*** sambishop has joined #trustable11:13
*** traveltissues has joined #trustable11:20
reiterativeI would be tempted to go further. As I've just said in another conversation:11:27
reiterativeIn my opinion, it suggests that we need to distinguish between two (related) types of evidence:11:28
reiterative(1) Evidence that a policy exists and has been applied11:28
reiterative(2) Evidence that the application of the policy enhances trustability11:28
reiterativeWe have been focussing on (1) thus far, but I think it is important not to lose sight of (2), since (1) is arguably meaningless without it.11:28
reiterativeWe have already identified a set of factors that we believe must be considered when assessing the trustability of software: its provenance, construction, reproducibility, clarity of purpose, reliability, resilience and safety. We have also made some attempt to examine 'what good looks like' in some of these areas.11:28
reiterativeI believe that the next challenge should be to express these ideas as a set of 'trustability intents' (or constraints, if possible at this stage) that can be used to evaluate a set of available evidence.11:28
reiterativePerhaps a way for us to determine the overall goal for our 'trustability intents' would be to consider how they contribute to the identification and management of risk as part of a software engineering process?11:37
*** sambishop has quit IRC12:46
*** sambishop has joined #trustable13:00
reiterativeI've pushed a new commit to pa-nomenclature, incorporating Edmund's review feedback into core-concepts.md14:12
* persia sets aside time for rebuttal 14:13
* reiterative expected that14:13
persiaOn (1) vs (2), that matches what I have been calling comparatives.14:13
persiaI am uncertain if we can make argument that a given process enhances trustability, but I am confident that if we have a standardised mechanism to describe two processes, we can probably suggest which of them is able to provide greater confidence in the validity of a specific claim.14:15
persiaIf these evaluations are quantitative, it ought be possible to suggest weighted models for a collective score for a set of claims (such as those listed for trustable), which I suspect is close to what you describe.14:16
reiterativeYes, but I think we need to define a set of qualitative metrics (i.e. establish what constitutes evidence of good practice) that can be used as part of a quantitative evaluation.14:20
persiaI believe one can go from quantitative to qualitative, where one can assign value to a metric.14:24
persiaI do not know of any way to go in the other direction.14:24
persiaAs such, I think it important to capture evidence of practice and evidence of result, allowing one to make assertions about “good”.14:25
persiaFor example, if it is interesting to assert that “test before release” is “good”, it makes sense to show that the number of defects experienced by release consumers differs.  If there is no difference in the experience of release customers, then there is no reason to suggest prerelease testing is “good”.14:27
persiaIf someone believes that it is not worth the experiment, that is fine, but in such a case, the argument rests on the assumption, and it becomes useful to document and attribute that assumption.14:29
reiterativeIf you *can* use evidence of positive outcome to distinguish what constitutes good practice in this way, then I agree that it's a good approach. I'm just not convinced that it will always be possible - and I think we will need to base initial evaluations on more subjective value judgements about what constitutes good practice.14:31
persia(On qualitative->quantitative, it is also important to understand impact: do the effects of “good” scale exponentially, linearly, logarithmically, or differently to, e.g. lines of source code?)14:34
persiaI think we may be saying more similar things than might be apparent.  I believe that early evaluations of processes will be based on an arbitrary unproven set of assumptions.  I just very strongly believe that while performing such assessments depends on an agreed language for assessment, the language for assessment ought be unaffected by the assumptions of early assessments (a one-way dependency).  As such, I find it unuseful to attempt to discuss14:42
persiaparticular assessment criteria or expected assumptions about best practice until there is a common semantic mapping to use for such discussion.14:42
*** sambishop has quit IRC17:16
*** traveltissues has quit IRC20:21

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!