News

Industries

Companies

Jobs

Events

People

Video

Audio

Galleries

My Biz

Submit content

My Account

Advertise

Research News South Africa

When is data in cross-tabs unstable?

A short exposé of a practice that has caused so much confusion in reading cross-tabs; a practice that is misunderstood and wrongly applied in so many instances that it just has to be put right once and for all.

Do an AMPS run and two things are immediately apparent, irrespective of which software programme you are using to access the data:

  1. The weighted population figures are shown as the first line in any row cell, and the sample [the most important number in assessing the validity and stability of the data] is shown second.
  2. Where the sample size is under 100 but greater than 50, a star [*] appears next to the number; where the sample is under 50, two stars [**] appear next to the number.

    What does all this mean? Most people see stars - and immediately conclude that the dataset is unstable, unusable, and is not to be used.

    Here is a simple cross-tab with the sample size shown as the first element because it is the most important number in the cell:

    At first glance, one certainly sees stars! But, unfortunately, many people jump to the immediate conclusion that this means that the data can't be used because “the samples are too small” and they reject the table out of hand. This is incorrect and demonstrates that these people do not know how to read a cross-tab correctly.

    The fact is that the sample sizes in the primary or originating cells are the key: if the sample size in each of the primary or originating cells is 100 or more, then EVERY RESULT in the interlocking cells is statistically valid and can be used.

    So while it is true that 10% of respondents who live in LSM 10 households use Brand A, it would not be right to drill down any further to find out who these LSM 10 Brand A users are because there are only 68 respondents who fulfil both criteria in the sample, and this sample is too small to become a primary number.

    So - what the stars are warning is that these numbers cannot be analysed further - they cannot become primary numbers in their own right; they are only valid while they are protected by the sample sizes in the two primary cells that created them. Any further analysis would indeed be unstable; but the primary analysis is good, valid, often insightful, information.

    Here's proof - there are 5099 males and 9961 females in the sample in the example below. The question is “Are you pregnant?” The answer, in the case of the males, is a universal “No”. And that answer in the table below will have two stars next to it - but it is perfectly true, valid and reflects reality!

    It is a true answer statistically speaking because there were over 100 respondents in the two primary cells that created that answer. And intuitively, you know that this is correct.

    Where your intuition cannot help you, don't let the stars confuse you - remember the rule: provided the sample sizes in the primary cells in both the vertical as well as the horizontal axes are 100 or more, the result in the interlocking cell will reflect statistically valid, usable data. Just don't try to analyse this data further if the resulting sample size in that interlocking cell is under 100.

    By the way - do a run using Choices to access the TGI data, and you won't be seeing stars! That is because we know that our users understand the basic rules that apply to cross-tabs and we apply this rule right from the start when the dataset is selected for the rows and columns in the first place.

    So - don't let the stars get in your eyes...

About Barbara Cooke

Barbara Cooke has been involved in marketing research since 1958. She is currently a partner in TGI South Africa (www.tgi.co.za), which she helped found with Tim Bester in 2002. Contact Barbara on tel +27 (0)11 234 0656 or email .
Let's do Biz