Frequently Asked Questions

What is the Healthcare Numeracy Assessment (HNA) ?

The Healthcare Numeracy Assessment (HNA) is an online assessment of 20 numeracy skills, for use primarily with healthcare students and healthcare professionals. The total assessment consists of 60 numeracy problems, with 3 items for each of the 20 numeracy skills.

The HNA is underpinned by more than three decades of research into the numeracy skills that are necessary for healthcare professionals.

The HNA is also the foundation assessment tool in a comprehensive learning and assessment package that includes the assessment (HNA), a personalised profile (the Healthcare Numeracy Performance Fingerprint) and a learning support
environment (the Mathematics and Healthcare Numeracy Learning Support Environment).

What levels are there in HNA?

There are two levels of the HNA:

The Point of Entry (POE) level HNA is intended for 1st and 2nd year undergraduate healthcare students, or for pre-undergraduate students who wish to assess their numeracy skills as part of a
university admissions application. Although intended primarily for health care students, the more general context of the POE test means that it is suitable also as an assessment of general numeracy.

The Point of Registration (POR) level HNA is intended for students further along in their university journey (generally, 3rd or 4th year students), and assesses numeracy specifically in a healthcare context (with focus on nursing and midwifery). Data presented by other researchers from Australia and Canada suggest that assessing numeracy among other healthcare providers is important as well.

The HNA POR can also be used by qualified healthcare professionals, to assess their numeracy skills.

Is there just one version of the HNA?

For summative assessment purposes, there are 10 versions of the HNA POE, and 10 versions of the HNA POR, that are of similar difficulty.

When a student or professional signs up to take a summative assessment, they will be randomly assigned one of the 10 test versions.

Also, there are exemplar versions of both assessments that are used as formative assessments within the learning support package

Does the entire 60-item HNA have to be taken in a single test sitting?

No. It is possible to take subsets of items contained within a standard 60-item HNA.

How many opportunities do students get to pass the test?

Any limit to the number of attempts and timing of attempts will be decided by the university the
student is attending.

Are there international versions of the test?

The HNA was originally designed for use in the United Kingdom, but we are in the process of
designing and beta testing other, international versions of the test. Currently we are beta testing a US version of the HNA POR-level assessment, for example.

Is the HNA available in other languages?

Currently, the HNA is available only in English language.

Can students practice HNA problems?

Yes. Currently, there are exemplar items embedded within the learning platform linked to the HNA.

What is the “bank or retake” system?

In some circumstances, students may be required by their university to pass the HNA at POE level
before progressing to POR-level learning.

Universities may also require students to pass the HNA at POR-level before graduating or progressing to registration (these decisions are the prerogative of the university). In these situations, students may take subsets of items from within the HNA, rather than taking a complete, 60-item test. As they pass specific subscales (achieve 3 correct out of 3), that skill will be “banked” at whichever level the student passed (POE or POR).

In these situations, a banked Pass allows the student to progress towards either (a) learning that same skill at the POR level (if they banked a Pass at POE-level), or (b) towards registration or
graduation (if they banked a Pass at POR-level), if this is a requirement of their university.

How difficult is the HNA?

The items, and subscales, on the HNA vary in difficulty to reflect the variation in difficulty of
numeracy problems in the real world of healthcare. The POR-level test is more difficult than the POE-level test because POR-level numeric problems are embedded within more complex real-world healthcare contexts.

For example, in large samples of UK-based undergraduate nursing students, the average first-attempt score on the 60-item POE-level test was 53.2%, and the average score on the 60-item POR-level test was 50.1%. Within both test versions, some subscales are more difficult than others. This reflects the easier challenge of, for example, basic arithmetical problems (e.g., Addition, Subtraction, Multiplication and Division subscales) compared to more complex types of numeracy (e.g., Ratios, Formulas, Indices and Logs, and Percentages subscales).

Also, within each subscale, item difficulty varies as items were written to represent the full range of difficulty of different types of number problems in healthcare practice.

What data supports the use of the HNA?

So far, we have collected over 30,000 complete, unique (first attempt) data sets for the HNA POE-level test, and almost 4,000 complete, unique (first attempt) data sets for the HNA POR-level test, in the United Kingdom. We are in the process of collecting additional data in the US, and have plans to collect data in other countries. The large UK database has allowed us to statistically analyse the properties of data collected with the HNA, including the determination of reliability and validity.

Importantly, data from all versions of the 60-item HNA are normally-distributed, meeting data distribution conditions necessary for running hypothesis-testing, parametric statistical analyses on data collected using our assessment instruments. This supports the suitability of the HNA for use in research studies investigating numeracy in healthcare students and professionals. Further information on the empirical evidence supporting the use of the HNA is available below.

How reliable and valid is the HNA?

Reliability and validity of both levels (POE and POR) of the HNA have been established and demonstrated rigorously via systematic research and analysis within healthcare training institutions. The underlying evidence is summarised below.

The content-related validity of the HNA has been established via a research programme extending over more than three decades, using inductive reasoning to establish what numeracy skills underpin the professional practice of nursing and midwifery. This line of research is constantly evolving and will lead to an understanding of how different numeracy skills are used in synthesis in the conducting of a range of professional healthcare skills. In addition to this UK-based programme of research, we are currently conducting empirically-driven content expert ratings on a US version of the HNA, and results of this study have confirmed the content relevance of a US version of the HNA POR to US healthcare settings. Overall, the Content Validity Index from this research was .98, which translates to excellent content-related validity evidence. Similarly high results were found at the item and subscale level.

Convergent validity has been established within each version of the HNA, via item discrimination indices (item-total correlations), correlations among the subscale scores, and correlations between each subscale score and the total test score. This has been analysed for each of the two tests (Point of Entry and Point of Registration), and for each of the 2 x 10 test versions.

Item discrimination indices. POE: Across all 10 test versions, the average item discrimination index was 0.27. Within each of the 10 test versions of the POE, a very similar pattern was found, with only one item having a very trivial (close to zero) negative item discrimination index. POR: Average item discrimination index across all 10 test versions was 0.36. Within each of the 10 test versions of the POR, the same pattern was found. Across all of these results, this shows that at the item level, for all 10 test versions of the two test levels (POE and POR), items distinguished between students with higher levels of overall numeracy and students with overall lower levels of numeracy

Subscale intercorrelations. POE: Across the 20 skill domains (as measured by the 20 test subscales), the mean of the 190 bivariate correlations was r = .15. All subscale intercorrelations were positive except for two, and these were trivial (close to zero). POR: Data on the POR showed similar patterns. All of the 190 bivariate correlations were positive (mean r = .25). The positive correlations for both POE and POR confirm that the subscales measure a similar underlying construct (numeracy), and the low values indicate that each subscale measures a distinct aspect (skill) of numeracy.

Subscale-total correlations. POE: Across the total sample, all subscales correlated positively with the total test score, with a mean correlation of r = .35. POR: Similar to the POE, all POR subscales correlated positively with the total test score (mean r = .47). The results for both the POE and POR followed a similar pattern, with all of the 400 subscale-total correlations being positive and low to moderate. As with the subscale intercorrelations, these results demonstrate that each subscale measures a common underlying general construct (numeracy), but within general numeracy, each subscale measures a discrete numeracy skill.

Test equivalence is critical when more than one version of a test is used. Generally, when a student sits the HNA, the test version will be randomly assigned from the 10 test versions available (this is true for both the POE-level and POR-level HNA tests, and is also true if the student is sitting a subset of items, under the “bank or retake” system). It is critical in these situations that student performance will not be influenced by which version of the HNA they have been randomly assigned to take. We have established test equivalence at the whole test level (60 items) and at the subscale level (3 items), via analysis of data on several thousand UK-based students. These analyses are summarised below:

Whole test level. Equivalence at the whole test level was established primarily by looking at the mean scores on each of the 10 versions of each test. For the POE-level, mean total test scores (number correct out of 60 items) ranged from 52.6 to 53.9, which constitutes a clinically non-significant difference between the highest and lowest mean (Cohen’s d effect size = 0.22). For the POR-level assessment, mean total test scores ranged from 49.7 to 50.1, which also constitutes a trivial, clinically non-significant difference between the highest and lowest mean (Cohen’s d effect size = 0.06).

Subscale level. Equivalence at the subscale level was established primarily by comparing the Pass rates (expressed as a % of participants who answered all 3 items correctly) for each subscale across all 10 test versions. This comparison is critical in situations where sub-sections of the HNA are taken under a “bank or retake” system. These comparisons comprise a much larger set of analyses.

For the POE, the subscale Pass rates per subscale averaged 88% (Addition subscale); 92% (Subtraction); 88% (Multiplication); 90% (Division); 87% (Negatives); 90% (Fractions); 90% (Decimals); 78% (Percentages); 76% (Ratios); 73% (Fractions to Decimals); 73% (SI Conversions); 64% (Problem-Solving); 74% (Formulas); 65% (Rounding); 56% (Estimation); 37% (Indices and Logs); 86% (Calculator Use); 75% (Measurement); 91% (Charts and Graphs); 68% (Statistics).

For the POR, the subscale Pass rates per subscale averaged 78% (Addition subscale); 93% (Subtraction); 93% (Multiplication); 89% (Division); 91% (Negatives); 82% (Fractions); 79% (Decimals); 56% (Percentages); 25% (Ratios); 87% (Fractions to Decimals); 78% (SI Conversions); 75% (Problem-Solving); 52% (Formulas); 65% (Rounding); 78% (Estimation); 23% (Indices and Logs); 56% (Calculator Use); 62% (Measurement); 66% (Charts and Graphs); 67% (Statistics).

Internal consistency reliability was established via Cronbach’s alpha (α), at both the whole test level and the subscale level.

Whole test level. At the whole test level, reliability for the POE-level HNA was excellent (average Cronbach’s α across the 10 test versions = .86). Reliability was similarly high for the POR-version (average Cronbach’s α across the 10 test versions = .91).

Subscale level. Achieving subscale internal consistency reliability in short (3-item) tests, with items restricted to dichotomous scoring (correct/incorrect) is very difficult. The results in the HNA should therefore be interpreted with this in mind.

In terms of average Cronbach’s α across the 10 test versions for each subscale, reliability for the POE-level HNA was as follows, for each subscale: Addition (average α = .15); Subtraction (α = .23); Multiplication (α = .43); Division (α = .35); Negatives (α = .35); Fractions (α = .39); Decimals (α = .60); Percentages (α = .44); Ratios (α = .62); Fractions to Decimals (α = .68); SI Conversions (α = .69); Problem-Solving (α = .27); Formulas (α = .63); Rounding (α = .30); Estimation (α = .67); Indices and Logs (α = .56); Calculator Use (α = .78); Measurement (α = .64); Charts and Graphs (α = .32); Statistics (α = .44). Importantly, Cronbach’s α was positive for all subscales, across all 10 test versions.
For the POR-level HNA, average Cronbach’s α were as follows, for each subscale: Addition (average α = .34); Subtraction (α = .47); Multiplication (α = .36); Division (α = .28); Negatives (α = .68); Fractions (α = .65); Decimals (α = .47); Percentages (α = .57); Ratios (α = .56); Fractions to Decimals (α = .65); SI Conversions (α = .60); Problem-Solving (α = .47); Formulas (α = .69); Rounding (α = .65); Estimation (α = .68); Indices and Logs (α = .68); Calculator Use (α = .74); Measurement (α = .60); Charts and Graphs (α = .50); Statistics (α = .57). Similar to the POE, Cronbach’s α was positive for all subscales, across all 10 test versions.

Other reliability and validity evidence is being gathered in partnership with our UK and international partners, and will be added as it becomes available. Preliminary results from two North American cohorts demonstrate reliability similar to that found in the UK sample.

Share by: