![]() |
|
|
#1 | ||||||
|
Reborn
![]() ![]() ![]() ![]() ![]()
|
We had quite some threads about loot and loot analysis in the last weeks. Jimmy's insane loot test is one of them. Moreover, thx to Starfinder we have now enough data to do some deeper analysis.
To analyze loot some statistical background is needed. At least one should be able to understand the main results and hence this “How to”. The standard question about loot ingame is “how is loot today?”, so let’s try to see how to answer it. Basics All we have is observed data. We can describe it in a first step and try to model it in a second one. The latter one is only needed if one is interested in deeper insights. There are two possibilities to describe a dataset. First, you can calculate and plot the distribution of the data and second, you can use some measures like the mean. A frequency distribution describes the frequencies of single observations, while a probability distribution gives us the probabilities of random events. Both distributions are closely related to each other, but it would lead to far explaining it here. Let’s denote a random observation as xi and the random variable to which it belongs as X (capital x). In observing loot we observe single xi’s. If we don’t know the loot function (i.e. how loot is calculated, I for sure do not), those xi’s are random and therefore we enter the field of statistics. “How is loot today?” can mean, “what is the probability to get some nice loot over x Ped”. This is mathematically expressed as S(x) = P(X > x), where S(x) is a function of x giving us the probability that the realization of X (a single xi) is greater than x. S can be estimated by the empirical cumulative distribution function (ECDF) as S = 1 – ECDF. We will see in the next chapter how to do it. One further note. We are going to model probabilities of observed data. This has nothing to do with a loot function as implemented by MA. This would be something like L(p) = x, where L is a loot function that depends on the parameter vector p and gives a loot value as a result. As already mentioned, there is also the possibility to use the mean to describe data. There are other measures as well. The median for example. This is the number that stands in the middle, i.e. 50% of the observations are higher than it, and 50% are lower. The median is better suited for skewed data. If the distribution is symmetrically, then the mean is the same as the median. Both measures do give us the location of the underlying distribution and hence they do describe only one aspect. What’s about variability? To describe the variability of a sample, measures like the variance, the std. deviation or quartiles are used. They give us an indication on how data is spread about the mean. “How is loot today?” can therefore also mean, “what is the mean loot today or how variable is loot today?”. Survivor function As mentioned above the survivor function for a random variable X is defined as S(x) = P(X > x). We will see now how to estimate it. In our example we observe 5 events denoted as x(1),..,x(5) and therefore n=5 (number of cases in the sample). To get S we have first to order them and let’s denote the ordered sample as x1,..,x5. S(x1) is then( n-1)/n, S(x2) is then (n-2)/n .. S(x5) is then (n-5)/n = 0 so S(xi) = (n-i)/n. Remark: This procedure does not account for ties and will interpolate them. A better estimator is the Kaplan-Meier estimator, but I won’t explain that here. Furthermore, there is also the possibility to do a continuity correction using S(xi) = (n-i+.5)/n. However, for our purposes the above mentioned method is sufficiently precise. Example 1 Code:
x rank S 51 1 0,8 52 2 0,6 67 3 0,4 80 4 0,2 120 5 0 Example 2 Now let’s use some real data. I will use a dataset that I did copy and paste from Starfinders side in the last days. Since this dataset consists of captured globals, instead of using the observed values I use x -50. You can use the original data as well, but since we would like to fit some models, it is better to transform them first. Fig. 1 Click to enlarge Quite impressive I would say, but where is the loot? Let’s zoom in. Fig. 2 Click to enlarge Oh, now things get clearer. We observe a high probability in the beginning, that drops quite fast till 500 PED and seems to disappear thereafter. So the distribution is rather skewed. The median can be seen from the graph at 50% survival and is 27 and hence for the global data it would be 27 + 50 = 77. The mean is 65.22 and hence 115.22 for the original data. Quite some difference. As we know from the median, 50% of all globals will be less than 77 and 50% higher. Using the survivor function we can find a percentage for the mean as well and I do get that to be about 22%. So 22% of all globals are higher than 115. This implies, in using means one will get a wrong expectation about loot, it will be reached only in 22% of all cases. The models As we have seen before, we are able to use the survivor function to describe the loot distribution. Is it possible to find a function for this distribution? As a statistician I can tell you maybe, so let’s try it. In the second figure we have seen that loot seems to follow a specific curve. Not much variability around this curve. This is due to the rather high sample size and the way MA calculates loot, hence modeling is easier. The curve looks quite like an exponential distribution. Let’s first see how an exponential distribution is defined: S(x) = exp(-a*x), where exp is the exponential function and a is a scale parameter. We know further, that for an exponentially distributed random variable the mean is 1/a, the variance is 1/a^2 and the median is ln(2)/a. A good estimator for 1/a is the observed mean. Moreover, one can show, that for an exp distribution the percentage above the mean is exp(-1) = 36.8%. From the previous analysis we know that 22% of all globals are greater than the mean, so both things do not fit. Nevertheless, let’s try to fit it. Fig. 3 Click to enlarge As already mentioned, we won’t have a perfect fit. The exp dist looks similar but overestimates loot till 200 PED and underestimates thereafter. As an alternative one can try to use a different model. We have a lot of them. To make things short, the best model to describe the data is a Pareto distribution. Fig. 4 Click to enlarge Looks quite better now. The pareto distribution is a power function and is defined as S(x) = (x/s)^(-k), where s is a scale and k is a shape parameter. The mean would be k*s/(k-1) and the median s * 2 ^ (1/k). From the data I do get a k =2.59 and s = 34.76, therefore the estimated mean is 56.62 and the median is 45.43. Those values are closer to our observed ones but not perfect yet. So why I’m not able to find any perfect fitting model. Maybe MA did invent some new system that leads to a yet unknown distribution? Let’s make a step back. We have global data. All values are greater than 50. It might be that globals are only shown when higher than 50 but the distribution starts before. This is what we call left truncation. Furthermore, we have quite some heterogeneous data. Many different mob types with maybe different loot and we have different days. It is known that a random variable Y = b * exp(X) that depends on an exp distributed X, will follow a pareto distribution. Functions of the form b*f(X) do typically arise in statistics when a mixture of several distributions is involved. So it might be rather plausible to assume a mixture distributed loot. What we need now is to identify those mixtures. Some more models, the effect of health From Jimmy’s thread we know that loot depends on mob health. To find a model where health is an additional variable, might be rather tricky since we don’t know the model yet. Fortunately, there is a non parametric solution. Like with the ECDF, we can try to model the data empirically using health as an additional variable. I’m going to use Cox-Proportional Hazard Model. Fig. 5 Click to enlarge Quite a good fit with some minor underestimation (to see differences, x axis is now limited to 500 PED). The effect of health is highly significant (p < 0.001), but we did already know that. So our observed distribution depends on mobs health. In fig. 5 the distribution is plotted for the overall mean health. Let’s check how this looks like for 500, 1000, 1500 and 2000 HP. Fig. 6 Click to enlarge Loot increases proportional to health. Mean health from data is 1146 and the ecdf is between the plotted 1000 and 1500 HP curve, so this fits again quite good. (Btw., Ambus are the most hunted mobs, therefore HP > 1000 is rather obvious). If I find the time I will show you, how we can fit models for constant HP and what else I was able to identify till now. There is still a lot to explain but I hope this intro was helpfully. Btw. Loot didn’t change over days. Click to enlarge The graph is called a boxplot. It shows the median in the middle and the percentiles 25, 75. To test the day effect one needs a statistical test. The one appropriate for the data is the Kruskal-Wallis Test. It gives a p = .275. Since this value is higher than 0.05, we say not statistically significant. Section 2: Mixtures We have seen in fig. 6, that loot depends on the total health of the mob. Since we used a semi parametric model, we have to very this finding. So let’s take the most hunted mobs, an Ambu Young (1010 HP), Argo Y (300 HP) and Aurli Ravager (2800 HP). Fig. 7 Click to enlarge Since Ambu is the most hunted mob (10%), its distribution is quite close to that of total loot. As expected, Argo Y are below Ambu’s distribution and Aurli’s are above. The sample is now much smaller due to lack of data, copy & paste is quite time consuming, therefore we eill have a lower precision. Means (including the subtracted 50 PED) are 39.9, 80.7 and 116.1. Relating Ambu to Argo, Aurli to Ambu gives 1.46 and 1.27, which shows the proportionality to health as expected. Now let’s try to fit a model for Ambus: Fig. 8 Click to enlarge Once again, the pareto distribution fits better as the exp indicating once again that there might be some mixture distribution involved. So where does the mixture come from? Ambu’s are fast regenerating mobs. So someone is able to kill one faster and some will be slower having more dmg done as a result. This might explain a portion of the mixture. Furthermore, there is the possibility that there is more than one distribution where globals do come from. Right, we have hof’s as well. Now some formal things again. A mixture distribution is when you observe in one dataset, data that is coming from two or more distributions. Mathematically this is written as f(x)= p * f1(x) + (1-p)*f2(x), where f(x) is the density of the overall sample, p is the proportion of the first distribution in the sample and (1-p) that of the second one. I’m using here the density for simplicity. Doing that with S would lead to a more complex formula. It is quite tricky to estimate a mixture from one dataset and there are several approaches. I’m using an MLE (maximum likelihood) approach for that. So let’s see to what this leads when using two exp distributions. Fig. 9 Click to enlarge In using MLE I get p = .994 for the first distribution with a mean of 45.35 + 50 and a (1-p) = .006 with a mean of 5958.8 + 50 for the second one. There is still some over- underestimation, but it shows that data comes clearly form a mixture. Since the second mean is quite large, it is rather obvious that we have two distributions, one for globals and one for hof’s. The estimated means are not very precise due to the small sample size. Moreover, I have two very large observations above 15000 PED in the data. Let’s try to exclude them. p is now .98 with a mean of 43.2 + 50 and (1-p) = .02 with a mean of 422.5 + 50 for the second one. This seems more reasonable and corresponds quite well to the hof data. There is still the same over – underestimation as in fig. 9. This might be related to the variability in hp that we can’t observe. So to conclude. Recorded globals data is composed of two distributions, one for globals and one for hofs. There is still space for a third distribution (ath’s). Hof’s have a frequency of 2% within globals data. To know how to break even, we need to know the frequency of globals. There is not much data to estimate it from, but I’ll show what we can do till we have it. Some experimental stuff As we have seen, global loot is a mixture of several distributions. Maybe we can find some parameters to describe it. What I’m going to show now is very experimental. Furthermore, I don’t have enough data to do reliable estimations. Therefore you can ignore the estimated values. What’s interesting with this experiment is, that the correlation between loot an health is quite linear. Code:
hp p1 p2 m1 m2 m3 300 0.23311 0.74469 13.753 34.667 485.73 500 0.2911 0.69868 13.656 41.443 662.81 1010 0.36811 0.62644 20.078 59.993 966.42 2000 0.55995 0.43653 27.231 101.84 2038.6 Fig. 10 Click to enlarge For a hp = 2000 mob the model with 3 exp’s does quite a good job. So let’s see if we find a relation between hp and the estimated means m1, .., m3. Fig. 11 Click to enlarge Click to enlarge Fig. 12 Click to enlarge The mean of the exp distributions seems to increase linearly with hp. So if I’m right with my assumption of 3 exp distribution and the relation to hp is linearly, then it is indifferent which mob you hunt to get the same return. However, this does not imply that one mob will global more often than another one. added: 2008/04/22 As the mean increases the probability to get one decreases (fig. 12). Overall loot expectation is 40, 40, 50 and 67 PED, so this does not follow the cost to kill a mob. I.e. when a mob with 500 HP brings in mean 40 PED, then one with 2000 HP should bring at least 4 times as much (160 PED). I get 67 PED. So what is the problem here? As explained above, we have the problem that a global is recorded when loot is above 50 PED. As explained in Starfinders post, there are two possibilities that could apply to the observed data, loot is shifted by 50 or truncated at 50 PED. Unfortunately there is no method to find this out with global data only. From Buckaroo (thx Buck) I know that in his data 1/3 of the looted value comes from globals. This is an indication that the observed globals are truncated at 50 PED and not shifted. If this is the case, then the estimated means from the 50 PED corrected data would be the real means. The estimated weights (probabilities) per mean from the 3 exp mix model might be wrong and should be revised. I have to check this with some simulations. Edit 08/05/05 As a result form the above mentioned experimental chapter I was able to derive a loot model that is posted here. see post below for a model proposal. Last edited by falkao; Yesterday at 19:55. Reason: sec 2 added, typos, fig 12 added |
||||||
|
__________________
|
|||||||
|
|
|
| EFD Awarded to falkao for this Post | |||
| Date | User | Comment | Amount |
| 04-20-2008 | Einstein | Great work! | 500.00 |
| 04-20-2008 | safara | WOW - more Beer required | 500.00 |
| 04-20-2008 | sahel | Keep up the research ![]() |
500.00 |
| 04-19-2008 | Doer | I might have paid more attention in class if it was about something virtually real. ![]() |
500.00 |
| 04-19-2008 | jdegre | outstanding post; thx for sharing your analysis | 500.00 |
|
|
#3 | ||||||
|
Reborn
![]() ![]() ![]() ![]() ![]()
|
The Loot Model
In this section I will describe the loot model I’ve got so far: As explained in the previous chapters, global loot shows to be a mixture of several exponential distributions. Estimation can be done numerically we MLE (maximum likelihood) or EM (expectation maximization). Unfortunately there are some pitfalls. With 3 exp distributions one has to estimate 5 parameters and one needs a good initial guess to not end up in a local minimum. Moreover, the mixture probabilities will be correlated with the other parameter estimates, and hence we have a suboptimal situation in which estimation becomes a nightmare. To model a mixture of exponentials a generalized pareto distribution (GPD) can be used. It has 3 parameters and looks like the following: S(x) =(1 + s*(x-t)/k)^(-1/k) with location parameter t, scale s and shape k. Furthermore x>=t and k, s > 0. (There is also a form of the GPD where k can be less than 0, this is however a situations that is not of interest here). The mean of the GPD is m = t + s/(1-k) The advantage of the GPD over the mixture is that I have only 2 parameters to estimate instead of 5, since t can and should be chosen a priori. So I have a more stable situation. Now let’s try it out: Fig. 1: Ambu Y global loot in PED. Source starfinders loot tracker. Click to enlarge As you see in Fig. 1 the model seems to do a great job. I had to use 3 exp. distributions to get a similar fit before. However, I had to limit loot to 1000 PED to get a good fit and I explain later why. We have found now a model that can handle global loot rather easily, but what exactly did we calculate? Let’s go one step back. Global loot is truncated at 50 PED. So we have got parameters of a truncated distribution. For those readers that are not interested in the hard stuff, it would be better to jump directly to the next figures. A left truncated distribution is nothing else as ST(x) = S(x)/S(T) where T is the left truncation point. If we resolve this term, we get another GPD (so a similar property as the exp dist has). It can be shown that that the mean of ST(x) is mT = T + (s + k(T-t))/(1-k) Having estimates for the parameters of ST(x), we have already an estimate of k and from mT we get estimates for s. So far so good. Now let’s apply this to real data: Fig. 2: Parameter estimates for s and t according to HP Click to enlarge (please note: Fig. 2 is the result of several fits using first, suited mobs (HP > 2000) to derive k. Having k we can calc s as mentioned above.) As one can see, the relation is linear. So the parameters of the GPD’s for the different mobs do depend on HP. Moreover, we get a formula how to calc those parameters from HP. Parameter k itself does not depend on HP and is estimated to be .25. S can be estimated as 2/100 * HP (PED). The shift parameter t seems to be identically to s but with some constant added. This was what I used in my first model. However, it turns out that the constant term might only be sample related and I therefore removed it. We have now a model for global loot below 1000 PED. Loot over 1000 PED is very rare. For Ambu we observe values from 1100 till 1600 (probability in global loot p = .002), , and 15000 till 20000 (p = .0005). So it seems we have two further loot distributions that can’t be estimated atm, due to lack of data. However we can use their means and probabilities. This is also the reason why I did the first fit for loot below 1000 PED. We know now that global loot does depend on HP. From Jimmy’s post we know that this holds also for normal loot. The best approximation is an exponential distribution with mean HP/10 PEC, so HP/1000 in PED and that there is a min loot corresponding to HP/2000 PED. One of the most important things to know about loot is how much is returned. This is expressed mathematically as the expectation of the loot distribution. The loot distribution can be modeled as L(x) = c +p1*S1(x) + p2 *S2(x) + … + pn*Sn(x) This is a mixture distribution with n terms. The expectation of L(x) is nothing else as E(x) = c + p1 * E1(x) + … + pn*En(x) so the sum of the individual weighted expectations. To make things short. From global loot I was further able to derive a 4th distribution that lies between normal and global loot. All in all global loot for a mob with HP > 1000 is a mixture of 6 distributions. This is depicted in the following table: Table 1: Loot model for HP >1000 Code:
Class p lower upper mean untruncated Truncation C0 1.0000 HP/20 C1 .97500 HP/20 3*HP/10 HP/10 .05 C3 .00375 HP 12*HP (HP+HP/(1-k) .005 C4 .02119 2*HP 28*HP 2*HP+2*HP/(1-k) .003 C5 .00005 HP*1.5*100 C6 .00001 HP*15*100 Next I will show what to do with it. The Means To get an estimate for the expectation of L(x) we need the means per class. It was no surprise for me to detect upper limits. I would have done it the same way, because otherwise an extreme large loot would be possible. Since C1-C4 do have an upper limit, we need again some math. An upper truncated S(x) = SU(x) will have the following form SU(x) = (S(x) – S(U))/(1-S(U)), where U is the upper limit. The expectation EU(x) is the integral from 0 to U of EU(x) = Int[x * fU(x)]0..U, where fU(x) is the density of the truncated distribution and Int[..]a..b is the notational form for an integral. Solution for C1: EC1(x) = HP/10 * [1-pu*(1-ln(pu))], where pu is the truncation probability Solution for C3: Was rather tricky but I found it to be EC3(x) = 1/(1-pu)*[HP+HP/(1-k)-pu*[HP+HP(pu^-k+k-1)/(k*(1-k))]] Solution for C4: Analog to C3 EC4(x) = 1/(1-pu)*[2*HP+2*HP/(1-k)-pu*[2*HP+2*HP(pu^-k+k-1)/(k*(1-k))]] Example: Ambu y assuming 1.6 * HP effective dmg done E(Ambu) = HP/20 + .975 * HP/10 * .8 + .00375 * HP * .2.26 + .02119 *2*HP * 2.28 + .00005 * HP*1.5*100 + .00001 * HP*15*100 = 419.1 PEC So mean loot for ambu is about 420 PEC. 1010 * 1.6 HP will cost 539 PEC, so your expected loss per ambu is 119.5 PEC (22.2%) or expected return rate will be 77.8%. In the coming chapter I will show the validation of the model. tbc ... Last edited by falkao; Yesterday at 22:48. Reason: truncation corrected |
||||||
|
|
|
|
|
#5 | ||||||
|
Stalker
![]() ![]() ![]() ![]() ![]()
|
great research but one question:
Does your research helps you with hunting and breaking even? Do you profit with this knowledge? If the answer is: "No"...than its only waste of time and unnecessary nonsense. Why I post that? Yesterday I realised my knowledge about Entropia Universe is a bullshit, informations about nothing, I wasted my memory and my time. But, mayby its only me, I hope so. |
||||||
|
|
|
|
|
#6 | |||||||
|
Prowler
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Quote:
if you want to profit, it is YOU who must put together all of these pieces, and draw your own conclusions. yes, it's you. |
|||||||
|
__________________
The Chipping Optimizer Tool
http://jdegre.net/pe/unlocker.php Try the new Skill Scanner Automatically extract your skills from in-game screenshots |
||||||||
|
|
|
|
|
#7 | |||||||
|
Elite
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Quote:
Fortunately for you, the majority of careers are much more mundane. ![]() |
|||||||
|
__________________
Champion of reason, unraveler of MA's mysteries...forever n00ber Myth busters: Evade/Defense Skills Weapon damage Armor decay Unlocking Skills Weapon attachmentsOther esoterica: My Story Luck Project Entropia: what's in a name? More bang
|
||||||||
|
|
|
|
|
#8 | ||||||
|
Dominant
![]()
|
Ehum, wtf whas this? The only thing one need to know about EU is that youre loot depends on what you do, and only you, nothing more nothing less!
We dont need anyone trying do breake the system, couse there is no way to do it, you will win some and you will be fucked over and loose alot! Thats it. EU is a dream that will crash and burn in a near future, no doubt about it. |
||||||
|
|
|
|
|
#9 | ||||||
|
Old
![]()
|
ok
my english is not very good + some of the maths which you use i dont understand from first reading... but am i right that what you analyze is loot vs. hp ... and chance of global over a certain ammount? what i think is most important , is time and place of loot. to collect data youll have to : loot , check time and note both ... over and over again ... only globals wount help i think. cause loots vary quite a lot these days... not very hard to find some data from the past and compare to nowdays. somehow i have the feeling that EU is tracking your run ... i dont know right now how to try to prove it ... but this also could be a sideeffect of randomnes. |
||||||
|
|
|
|
|
#10 |
|
Reborn
|