A couple of months ago I ran a few tests like this on the testing server. I tried testing with a single gum on a boon type that has 2 boons, two gums on a boon type that has 2 boons, and a single gum on a boon type that has 1 boon. My results were consistent with the following:
Each Boon gets weighted by a weight 1 + n x, where n is the number of modifiers and x is a weighting factor. The probability of a boon being chosen is then equal to its own weight divided by the sum of all weights. For instance if you use a special gum, both aoe and magnetic collector receive the extra +x to their weight. My results were inconsistent with gums being added to types as opposed to boons. I also believe that gums only ever add weight and there is no way to subtract weight. E.g. trait reduction gum adds a weight, x, to existing boons. The quantity x is then a measure of the strength of the gum. I measured x to be 18 ± 6 in one trial and 12 ± 4 in another. Considering that a human wrote the code, I believed that x had a value of 10.
Now, I have heard people talk about forging becoming more difficult since the venerable update. I was skeptical for awhile. After all for other tools (besides hammer, axe, shovel) there is a real reduction in probability due to the existence of venerable boons, and I thought this could be what people were observing. But anecdotally, I started to see it. It seemed that x had reduced in value, but I did not record data well enough to make a solid conclusion. I have been using extra gums under the assumption that the strength of the gum was reduced.
For your data i find that x = 9.72 ± 5.4. This is less than my pre-venerable tests but considering the error bars, it is consistent with the pre-venerable test.
Would you be willing do another test or collect more data to get better precision? If you only use a single gum for a boon type which only has a single boon, it will reduce the error in the measurement substantially. (You can measure the strength of the gum better in a regime where there is nearly an equal contribution from the gum and the rest of the system). More measurements could improve things as well, but unfortunately it follows a square-root dependence. In order to double the precision you would have to multiply the number of trials by 4.
Also I hid the math for simplicity, let me know if anyone wants to know how this works. I used standard error analysis for stochastic processes.