Social Complexity

Tag: psychometrics

When (and why) you can actually compare measurements from the social sciences

Can I compare my measurement of happiness with the one in another study? Is it just like comparing meters to miles?
Why my measurement shows that polarization is increasing but another study shows the opposite?

Today we will clarify one crucial aspect of the social sciences: measurements. Especially, we will see that (1) these measurements are quite different from measurements in physics, (2) what we can do with them and (3) when we have to be careful.

What is this all about?

Let me start by telling you that this is not about the difference between interval and ordinal scales. They are important too, and they can mess everything up as well; that is why we will have an entire series just dedicated to them. But not today.

Today we speak about something even more basic: how quantities are defined. Let us start by looking at the world of physics.

Concepts in physics

I am pretty sure you are familiar with the concepts of length and distance. And you can discuss these with anyone without worrying that the other person may interpret distance in a completely different way. Besides relying on common knowledge, we can also check how the units of distance are defined, reading for example that:

The metre is currently defined as the length of the path travelled by light in vacuum in 1/299 792 458 of a second.

(from wikipedia)

Something you may be wondering now is: why does this sounds so ugly and boring at the same time? Why do we have a damn fraction in the definition?

The short answer is: because physics heavily relies on operative definitions. These definitions are not aimed at explaining a general concept, but more at telling you how to practically measure something.

Have you ever heard of the fact that science is “reproducible?”

Well, these definitions are aimed exactly at that. They make sure that everyone would measure exactly the same things. They make sure that we all know exactly what 100 meters are, with no room for interpretation.

Concepts in the social sciences

As you may expect, the social sciences do not rely much on operative definitions. This is not because social sciences are bad, or worse, or anything else along these lines; but mostly because they focus on general concepts.

Indeed, if you get the definition of happiness from the APA Dictionary of Psychology you read:

an emotion of joy, gladness, satisfaction, and well-being.

As you can tell, this does not contain any information about the measuring process. But is this a problem?

As we will see in the next lines, this will definitely be a problem if we do not understand this process.

Let’s explore it better

Ok, let’s suppose we want to measure how many potato chips are in a bag of chips. Sounds like an easy task, right? Well, actually it is quite a complex one. Indeed, while we have no problem with “full chips,” we do not really know what to do with a broken potato chip.

Take as an example the image on the left. Should we count this as 1 chip? It someway makes sense as you could recompose it to be a “full” chip. But it also makes sense to consider it 0 as, it is clearly not a full chip.

Someone else may also claim that we count each fragment separately, as every piece in the mouth is indistinguishable from a small chip. Therefore we should count this as 6.

Notice that this debate could go on forever getting progressively more and more complex, with questions such as:

  • How big should be a fragent to be still considered in the count?
  • When a “full chip” becomes a fragment? (consider a chip with a very small missing piece)
  • etc.

The main problem is that we do not have an operative definition of potato chips. Nor do we have a unit of measurement for chips.

This means that every person will measure a different number of chips.

Can we convert them?

Let us suppose that the measurement that takes into account both full chips and fragments tell us that we have 100 chips. How many full chips do we have? That is: how do we convert this number into another measurement?

As you may expect you cannot precisely do this, as the first measurement simply merged everything together (i.e. full and fragments of chips). So the only thing you know is that the number you are looking for is between 0 and 100; which is not very precise…

Similarly, if you know that in another bag you have 50 full chips, you still have no idea of how many fragments+full chips you may have. Maybe it is 50, maybe it is 10,000; who knows?

And this is quite a big range of uncertainty!

This is not a statistical problem

Many people here may feel like this is the same old problem of sampling: you may get a bag with 20 chips or a bag with 30 chips; so what’s new here?

The fact is that this is not a sampling problem but a measurement one. Indeed, the bag is always the same. We did not resample or replaced it with anything else. What we changed is how we are measuring, but the object is still the same.

Is this an artifact?

An argument that I hear often is that “this is an artefact.” This can also be rephrased as “one of these measurements is the correct one and the other is simply wrong“. And, someway, this argument is correct; but it is also quite wrong. Let’s see why.

Let us suppose we want to predict the number of times a certain child (le’ts call her “child X”) will put her hand in the bag of chips for eating. We know that this child picks fragments and full chips one by one, as long as they are above a certain size S.

In this case, we want to count as 1 each peace above size S and ignore smaller pieces. Every other measurement would generate artefacts… in this context.

Suppose, instead, we are dealing with child Y. This child eats full chips one by one, while she does not eat fragments. So, in this case, the correct measurement would be counting as 1 full chips and every fragment should be counted as 0.

This means that the right measurement is determined by what we want to measure. And all the other measurements will introduce some artifacts.

Therefore, we cannot have a measurement which is good in every situation.

Just use differnt names

Another interesting argument that I hear sometimes is that we need better classification. For example, instead of using the general concept of “chips” we may distinguish them into “full chips” and “fragments.”

While this approach is helpful, as it limits the possibilities we have, it still does not completely solve the problem. Indeed, as we discussed before, when does a full chip become a fragment?

You can observe something similar in this article where they notice that the concept of “polarization” is too vague and the authors come up with 4 main sub-types of polarization. However, the same article then highlights how the same sub-type can be still measured in different ways.

Indeed, at the end of the day, what specifies exactly how to measure something is the measurement process itself (i.e. the operative definition). This is why better (non-operative) definitions ay help but not solve the problem.

Can correlation save us?

An important ally of every scientist in the social sciences is our friend correlation. Indeed, as we will see, it can strongly help us in solving some of these problems. Even if we should not blindly trust it as we may still end up with some bad surprise.

When to trust it

Consider a chips brand whose bags contain usually 90% full chips and 10% fragments. In this case, you can easily convert one measurement into the other. For example, if you measure 100 in the measurement which counts also fragments, you should have a number very close to 90 in the full-chips measurement.

If this relationship (i.e. 90-10) is not given to you as initial data, you can still explore it using tools such as linear modelling or simple correlation. You just need the process to be reliable. In this case, you will be able to know:

  • How to transform one measurement in the other
  • How precise your estimate of the second measurement will be
    (i.e. how uncertain your prediction is going to be)

If it is so simple, why even bother with the first part of this post? The problem is that things are not always so simple…

When you should not trust it

You figured out that for brand X, the percent full/fragment is 90/10. So now you can use both the full-chips measurement and the full+fragment since you can convert one into the other; very well!

What happens now if you apply this relationship (90/10) to another brand? Or if the same brand changes something in their production chain altering this ratio?

The problem here is that you can convert the two measurements as long as they have a stable relationship. But this relationship may change in time or not be universal at all (i.e. it works only for a specific brand).

For example, two measurements of polarization may be perfectly equivalent in France but not in Germany. If you know this phenomenon, you will not be surprised to see the two methods diverging. However, many scientists are unaware of this and they may get totally puzzled by these results.

Summing up…

Some people may reach this point and ask: if we are always measuring the same thing, why do we end up having different results?

And the answer is: because we are actually measuring different things!

Yes, we started from the same macro-definition (chips, polarization, happiness, …). But then, we ended up using different operative definitions. This means that practically we measured different things (e.g. full chips vs fragments). This generates the following situations/problems:

  1. We cannot directly compare results.
  2. We can estimate one measurement from the other by using correlation/linear modelling and making sure that we are not changing anything important between the two measurements (finger crossed🤞).
  3. The measurement which is the best for us may actually be bad for other people/studies.
  4. Different measurements may actually produce different dynamic behaviors (e.g. one measurement shows increasing polarization and the other shows decreasing)

While we explored points 1 to 3, we did not really discuss point number 4. This is because it deserves a lot of attention and we will have a post just on that (coming up in 1 or 2 weeks).

If you are interested in measurements and how this may affect modelling (especially I am interested in agent-based modelling), check out this blog or my social media, as I will keep exploring this topic.

See you soon!

How measurements change your data’s shape

This image has an empty alt attribute; its file name is image-1.png

The way we measure effects in the social sciences may be way more important than what you think…



This post is for a broad academic readership


The mystery of the top earners

Just yesterday I came across this post from the NeuroNeurotic blog. The idea is very interesting as it discusses how some “psychological effects” may actually not be psychological at all. Instead, the effect may appear just from some data manipulation (aka an artefact).

The blog’s post takes a look at this other article from the Guardian. Here a study shows how the top earners in Germany believe their earnings are almost the average ones. This claim is someway supported by this pretty cool visualization:

On the left, people are divided in deciles. For example, the maximum decile (i.e. 10) would be the top 10% earners. On the right, we have some kind of perceived income. (More details later!)

The problem we now face is: are we sure this picture is telling the truth? Which can be reformulated in: “do we really need some psychological effect to obtain this graph? Or can it be obtained just from data manipulation?”

NeuroNeurotic’s solution

According to the NeuroNeurotic blog, the previous image does not really support the claim. Indeed, it may just be an artefact due to binning.

For those who do not know yet this binning guy, he is just the cousin of rounding. Indeed, when we round, we take a lot of numbers and collapse them into fewer groups. For example, all the numbers from 1.5 to 2.499 will be grouped into the number 2.

Similarly, we may take person 1 to 1,000 and put all of them into the same bin/group. Thus, deciles are a way to group people into 10 bins.

Representation of how we may bring an entire population into 2 bins.

The main idea behind the blog’s argument is that binning is putting in the same group people that, maybe, should not be together. For example, the top decile will contain people which may have gigantic differences in earnings. Thus, averaging these values together will bring them closer to the mean value.

For a more detailed explanation, you may look at the original post. However, what I found extremely interesting is how the author was able to reproduce a similar image in simulations even without any psychological effect!

Indeed, he assumed that the distribution of earnings followed a normal (aka Gaussian) distribution. Then, he assumed that every person is just answering their real earning and collected the average value per decile. The striking result is the image below.

My question this time is: is the simulation really reproducing the results from the article? Which can also be restated as: “is it just a matter of binning?”

The surprising effect of binning

Let us try to simulate something slightly different now. Earnings are still normally distributed like before, and people are still divided into deciles (i.e. binned). However, this time we ask people: “in which decile of the population do you think you are?

This means that in the previous simulation everyone was answering her own earnings. Now, everyone will answer her own decile. Similarly to the previous simulation, also here everyone is answering correctly (i.e. no errors or effects).

The interesting fact is that if we run this simulation, we obtain the following image. Why? In this case we still have binning but the result disappeared!

The short answer is that everyone is just answering her own decile. So all the people in the 10th decile are answering 10 and the mean value would still be 10.

The longer answer is that we are actually facing a problem of measurement…

A problem of measurement

What was not really clear here is that we are currently dealing with two different scales of income. The first scale is just the earnings and it is measured in dollars. Meaning that if I earn 1,000 $ and you earn 5,000 $, the difference between us would be 4,000 $.

However, there is also a second hidden scale: ranking. In this scale, each person receives a score (aka number) according to how they place. For example, the poorest person would be number 1, the second-poorest would be number 2, etc.

To understand why this difference is important let us take the two poorest people in the simulation. Let us say one has 1 cent and the other has 5 cents. Thus, their difference in dollars would be 4 cents. However, their difference on the ranking scale would be 1.

This difference of 1 would also be the difference between the two richest. However, their difference in dollars may be of some millions or even billions.

This tells us that the relationship between the two scales is someway weird. This “weirdness” is called “non-linearity” in mathematical terms, but let us stay away from obscure mathematical concepts.

Instead, let us plot the relationship between the ranking and the dollar scale. Does it look someway similar to something else? Notice how most of the lines are again tilted towards the center!

What we just observed is the fact that when we change scale we produce some distortions on the graph. This may result in compression (e.g. all the lines going towards the centre) or expansion depending on the two scales.

Furthermore, if we bring back our old friend Mr. binning, we will be back to our initial effect. As you see, for example, the top line is not horizontal anymore as it has been averaged with the other top 10% lines.

So what?

Our analysis shows us some little interesting facts:

  1. Binning alone is not sufficient to produce the effect in the article. Indeed, it would result in straight horizontal lines.
  2. Scale transformation is a beautiful way to create a mess. Indeed, the relationship between the two scales would look like a mishmash of tilted lines.
  3. Scale transformation + binning is the ultimate key for a disaster. One creates a mess while the other averages it out partially. This creates a cool relationship between the two scales which may be confused for an effect.

Then, is the study wrong?

The short answer is: “we don’t know.” Actually, everything depends on the question that was asked to participants.

If the authors asked “what decile do you think you belong to?” then everything is fine. Indeed, the two scales would be decile VS perceived decile. Here we have no scale change and binning alone cannot do anything to explain this.

For example, the study showed that the top decile answered an average of 6.5. This means that, roughly, people in the top 10% think they are only in the top 40% and that there are still 30% of people richer than them. This bias is definitely an interesting psychological effect!

However, what if they asked something that made people think in terms of earnings instead of ranking? In that case, the plot would be affected by scale transformation. Indeed, the first column would be a ranking while the second would be an earning scale! Thus, we would have all the ugly effects we discussed before.

For example, we may ask “on a scale 1 to 100 how does your earning compare to the richest person? With 100 being the same earning as the top one, 50 being half of it and 1 being 1/100 of it.

Let us suppose the richest person earns 1 million and the second richest earns 0.7 millions. Even without any psychological effects, person one will answer 100 and the second will answer 70. Thus, the line of the top earners would not be horizontal but tilted towards 50!

In conclusion

Always be careful of how you measure things, especially in the social sciences. Indeed, changes of measurement have the potential of messing things pretty badly.

Next time we will discuss about another effect that may be present in this study!

Until then, let’s stay rational!

Related topics:

Increasing the degrees of freedoms in agent-based models

A new preprint came out today from a work I developed together with Alejandro Dinkelberg and Prof. Mike Quayle at UL, Limerick.

Opinion dynamics is a field focused on understanding the evolution of people’s opinions. Within this field, opinions are usually represented as numbers without any other requirement. So we started wondering: can this have some implications on the models?

For example, as we convert meters to feet, we can also convert different scales for measuring people’s opinions. How do these models change when we change their “units?”

From our study, it appears that while physics equations are unaffected by changes of the variable (you just need to rescale) this is not true for opinion dynamics models.

Indeed, we found that scale conversion (even in the case of perfect measurement) can totally change the model’s dynamics. This result in a change on the final outcome up to 100%. Furthermore, by changing scale, we were able to convert one model into another one.

If you are interested in this research, you can find it here:

http://user102493.psee.ly/x4j2e

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 891347.

Powered by WordPress & Theme by Anders Norén